Mathematics Logic & Set Theory
Mathematics Logic & Set Theory
html
Mathematical Induction
Mathematical Induction is a special way of proving things. It has only 2 steps:
Then all will be true
How to Do it
Step 1 is usually easy, you just have to prove it is true for n=1
Step 2 can often be tricky ... because you may need to use imaginative tricks to make it work!
3k+1 is also 3×3k
And then split 3× into 2× and 1×
And each of these are multiples of 2
Because:
2·3k is a multiple of 2 (you are multiplying by 2)
3k-1 is true (we said that in the assumption above)
So:
3k+1-1 is true
DONE!
Did you see how we used the 3k-1 case as being true, even though we had not proved it? That is OK,
because we are relying on the Domino Effect ...
So we take it as a fact (temporarily) that the "n=k" domino falls (i.e. 3k-1 is true), and see if that means
the "n=k+1" domino will also fall.
Tricks
I said before that you often need to use imaginative tricks.
http://www.cut-the-knot.org/induction.shtml
Mathematical Induction
Mathematical Induction (MI) is an extremely important tool in Mathematics.
First of all you should never confuse MI with Inductive Attitude in Science. The latter is just a
process of establishing general principles from particular cases.
MI is a way of proving math statements for all integers (perhaps excluding a finite number) [1] says:
Statements proved by math induction all depend on an integer, say, n. For example,
It's convenient to talk about a statement P(n). For (1), P(1) says that 1 = 12 which is incidently true.
P(2) says that 1 + 3 = 22, P(3) means that 1 + 3 + 5 = 32. And so on. These particular cases are
obtained by substituting specific values 1, 2, 3 for n into P(n).
Assume you want to prove that for some statement P, P(n) is true for all n starting with n =
1. ThePrinciple (or Axiom) of Math Induction states that, to this end, one should accomplish just two
steps:
The idea of MI is that a finite number of steps may be needed to prove an infinite number of
statements P(1), P(2), P(3), ....
Let's prove (1). We already saw that P(1) is true. Assume that, for an arbitrary k, P(k) is also true,
i.e.1 + 3 + ... + (2k-1) = k2. Let's derive P(k+1) from this assumption. We have
Which exactly means that P(k+1) holds. (For 2k+1 = 2(k+1)-1.) Therefore, P(n) is true for all n
starting with 1.
Intuitively, the inductive (second) step allows one to say, look P(1) is true and implies P(2).
Therefore P(2) is true. But P(2) implies P(3). Therefore P(3) is true which implies P(4) and so on.
Math induction is just a shortcut that collapses an infinite number of such steps into the two above.
In Science, inductive attitude would be to check a few first statements, say, P(1), P(2), P(3), P(4),
and then assert that P(n) holds for all n. The inductive step "P(k) implies P(k + 1)" is missing.
Needless to say nothing can be proved this way.
Remark
1. Often it's impractical to start with n = 1. MI applies with any starting integer n0. The result is
then proved for all n from n0 on.
2. Sometimes, instead of 2., one assumes 2':
Derive from here that P(k+1) is also true. The two approaches are equivalent, because one
may consider statement Q: Q(n) = P(1) and P(2) and ... and P(n), so that Q(n) is true iff P(1),
P(2), ..., P(n) are all true.
In problem solving, mathematical induction is not only a means of proving an existing formula, but
also a powerful methodology for finding such formulas in the first place. When used in this manner
MI shows to be an outgrowth of (scientific) inductive reasoning - making conjectures on the basis of a
finite set of observations.
I. Overview
The English word logic comes from the Greek word "logos" usually translated as "word", but with
the implication of an underlying structure or purpose. Hence its use as a synonym for God in the
New Testament Gospel of John. Logic is often defined as the process of "correct" reasoning. A more precise
definition might be the study of the structures of arguments that guarantees correct or true conclusions from
correct or true premises. There are generally speaking two "kinds" of logic: deductive and inductive.
Inductive logic is the body of methods used to generate "correct" conclusions based on observation or data. It is
the type of reasoning used in the natural sciences and statistics where general principles are "inferred" from many
particular facts. The use of the methods of inductive logic always carries with it the risk of incorrect
generalizations, so that the validity of this kind of argument is essentially probabilistic in nature. We will consider
this type of reasoning later this semester in the Probability Unit.
Deductive logic is the type of reasoning used in mathematics where we start from general principles and derive
from these principles particular facts and relationships. Deductive logic usually denotes the process of proving
true statements (theorems) within an "axiomatic system". If one accepts the validity of the axiomatic system, one
is "forced" to accept the validity of the derived theorems. Their "truth" is beyond dispute unless the whole
axiomatic system is inconsistent. These notes are primarily concerned with deductive logic.
1. The set of allowed symbols. These are sometimes called the "primitives" or undefined terms of the system.
2. The well-formed formulas. These are sequences of the allowed symbols constructed according to some
allowed rules. Definitions of new symbols are allowed as well-formed formulas of old symbols.
3. The axioms or set of "self-evident" truths of the system. These are well-formed formulas which are taken as
statements of fact which can not be proven within the system. In some sense, the axioms must be "accepted on
faith".
4. The rules of inference. These are rules which allow or license moves from certain well-formed formulas to
other well formed formulas. As with the axioms the rules of inference are accepted as being self-evidently valid.
To some people one of the disturbing aspects of deductive logic is the difference between
the syntactic and semantic content of a conclusion. An argument is syntactically valid if it "follows" the correct
form or syntax of the language. The semantic content of an argument is related to its meaning or interpretation.
By and large deductive logic is concerned with the syntax of an argument and ignores the semantics of the
sentences in the argument. For example, consider the following two arguments.
Argument 1
Argument 2
Both arguments are logically correct and, from the syntactical point of view, identical. Note: the correctness of
the conclusion of Argument 1 does not really depend on any knowledge of geometry or triangles. It is the
structure or syntax of the argument, not its content or interpretation, which forces us to the correct conclusion. An
understanding or interpretation of the meaning of the sentences (i.e., the semantics) is not really required, and
maybe not even desired, in a logical analysis of an argument.
Propositions
Negation
Disjunction
Conjunction
Conditional
Biconditional
Tautologies
Propositions
A proposition is a statement, which in English would be a declarative sentence. Every proposition is either true or
false. This condition is sometimes referred to as a "dichotomy" or an example of "binary logic". This restriction to
only two "truth values" is a source of difficulty in "real life" where ambiguity and "shades of truth" often cloud
our reasoning and decisions. Another problem with this "truth functional" definition of a proposition is the
contextual dependence on space and time. Consider the simple declarative sentence, "It is raining". The truth
value of such a sentence depends on where and when you are! Nevertheless, for any observer the statement is
either true or false once a "satisfactory" definition of "raining" is agreed upon.
A proposition is said to be simple or atomic if it has no connectives or quantifiers and in these notes
will be represented by a single letter of the alphabet such as p or q . The truth-functional structure of
such a statement can be represented by a truth table in which all possible truth values are displayed.
A simple truth table is shown below.
p q r
Note: All possible truth values of all three propositions are displayed. Since each of the three propositions has 2
possible truth values, there are 23 = 8 rows in the truth table. More generally, if there are n propositions there will
be 2n rows in the corresponding truth table.
Below are the five standard connectives used to form compound propositions from atomic propositions.
1. Negation (NOT)
2. Disjunction (OR)
3. Conjunction (AND)
4. Conditional (IF…THEN…)
1. Negation (NOT) transforms a proposition into its opposite truth value, i.e., ~ p is true whenever p is false and ~ p is
false whenever p is true. For example if p is the proposition, "George Washington was born in 1732", then ~ p is the
proposition "George Washington was not born in 1732". The truth table for negation is shown below.
p ~p
True False
False True
A statement like ~ (~ p) is equivalent or can always be replaced by the simpler statement p. For example, the
statement "John is not dishonest" is the negation of "John is not honest" and is equivalent to "John is honest".
2. Disjunction (OR) is sometimes referred to as inclusive OR and is true as long as one of the
"disjuncts" that comprise it is true. For example, the statement in a college course catalogue,
"Students must take a statistics course or a logic course to graduate", would seem to imply that a
student has met the requirement if she/he has taken a statistics course, or has taken a logic course, or
has taken both a statistics and a logic course. In many everyday situations, OR means the exclusive
OR which precludes both disjuncts being true. For example, the statement "Sally can have cake or
ice cream for dessert", might often be interpreted as saying that she can have cake or ice cream but
not both at the same time. The truth table for OR is shown below. Note: is false if and only if
both p and q are false. OR is a commutative operator, i.e., is always the same as .
p q
True True True
This truth table can be entered into an Excel Spreadsheet by using the Excel Logical (or Boolean) Function OR as
shown below.
The output of the Excel OR function generates the results shown below.
3. Conjunction (AND) is used to join two statements ("conjuncts") with the understanding that the
conjunction is true if and only if all conjuncts are true. The truth table for AND is shown
below. Note: is true if and only if both p and q are true. Like the OR operator, AND is
commutative, i.e., is always the same as .
p q
This truth table can be entered into an Excel Spreadsheet by using the Excel Logical Function AND as shown
below.
The output of the Excel AND function generates the results shown below.
When more than one connective is used in a compound proposition, the order in which we apply them to the
simple propositions in the sentence can cause confusion. For example, is to be interpreted as the
disjunction of p with the conjunction of q with r, or as the conjunction with r of the disjunction of p with q? To be
unambiguous, standard order of operation rules (analogous to those used in arithmetic) were developed so that
anyone reading the proposition understands what it says. According to this convention, the above proposition
means the disjunction of p with the conjunction of q with r. This is because (AND) has higher priority than
(OR), and therefore propositions connected by are done first. The standard priority list is .
This means that if no grouping symbols are present, connectives will be applied in this order proceeding from left
to right. In the compound proposition , we first negate p then take the conjunction of this with q.
Grouping symbols, , are used to either override the normal order or to make the intended order more
explicit. To negate the conjunction of p with q, we would write . Except for expressions
like in which simple propositions are negated, these notes will make free use of grouping symbols to
make compound statements easier to read and understand.
4. Conditional (IF…THEN…) is probably the most common form of a conjunction in logic and mathematics.
The basic idea is that the truth of the condition expressed in proposition p is sufficient to guarantee the truth of
proposition q. One of the oldest rules of inference which forms the core of Aristotelian logic is the syllogism
consisting of two premises and a conclusion.
Premise:
Premise:
Conclusion:
Here the symbol is shorthand for therefore. This particular form of a syllogism is sometimes called the Law
of Detachment or in Latin modus ponendo ponens. The core idea of (often "read" as p impliesq) is
that if p is true, q must be true. Stated differently, it would be impossible for p to be true and q to be false, i.e., if
q is false then p must be false. A true p never leads to a false q. On the other hand, if p is false it is quite
possible for q to be either true or false (the guarantee is now void). This means that is always true if p
is false (i.e., ~ p is true). From these consideration we see that is equivalent to (i.e., always has the
same truth value as) . Thus, the statement is always true if ~ p is true or if p is true and q is true. The
only way can be false is when p is true and q is false. Since is equivalent to ,
which is equivalent to , we have that is equivalent to .
The statement is called the contrapositive of and from a logical point of view
always says the same thing as . These facts are summarized below in the following truth
table.
p q ~p ~q
p is a sufficient condition for q. (If p is true then q is true regardless of anything else.)
q is a necessary condition for p. (If q is not present, i.e., q is false, then p is not present.)
Unless q not p.
Not p unless q.
Not p without q.
To illustrate these connections suppose there were a college that required the passing of a statistics
course for graduation. Let s be the proposition "John passes a statistics course." and g be the
proposition "John graduates." The intent of the rule is clearly embodied in the conditional . If
we know John graduated, then he must have passed statistics. Note the converse need not be
true. John could pass the statistics course, but then later drop out of school for other reasons. The
condition g is sufficient for s. Graduating guarantees a statistics course was passed! The condition s
is necessary for g. John will not graduate without passing statistics (~ g without s). Unless John
passes statistics he will not graduate (unless s not g).
The term necessary condition may seem confusing when time ordering and "pre-conditions" are
considered. For example, let w be the proposition "the Packers beat the Vikings" and p the
proposition "the Packers make the playoffs". In the conditional , p is a necessary condition
for w, i.e., for the Packers to beat the Vikings it is necessary that the Packers make the playoffs. This certainly
seems a strange statement if a necessary condition is to be interpreted as a pre-condition in time! It does make
sense logically in the sense that if p is "lacking", i.e., the Packers fail to make the playoffs, then it must be true
that w didn't happen, i.e., the Packers didn't beat the Vikings. Note: the converse need not be true.
Other conditions (the results of other games) could get the Packers into the playoffs even if they lose to the
Vikings!
The conditional connective has bothered many people. By the rules of , the following are all true!
If George Washington lived on Mars, then Abraham Lincoln was never president. (False False is True)
If pigs can fly, then Nazi Germany lost World War II. (False True is True)
If birds can fly, then Madison is the capitol of Wisconsin. (True True is True)
Many people would regard these statements as either nonsense or, at best, examples of a "non-sequitur". There
seems to be no "logical" connection between the "p's" and the "q's". The flying of birds seems irrelevant to the
location of Wisconsin's capitol city. Despite such objections, we will continue to use the conditional connective as
it was defined!
5. Biconditional (IF AND ONLY IF) is a statement of equivalence. This means that if is true,
then the truth values of p and q are identical. In some sense p and q are identical, they are just restatements of the
same proposition. Alternatively, means q can be taken as the definition of p or p can be taken as the
definition of q. Any occurrence of p in an argument can be replaced with q with no loss or change in meaning.
means p and q are either both true or both false. means that p is both necessary and
sufficient for q and q is both necessary and sufficient for p. Thus, is equivalent to
.
p q
Actually, we only really need two connectives, for example conjunction, conditional and biconditional can all be
defined in terms of negation and disjunction.
As this illustrates, while we don't need the additional connectives, their absence would greatly complicate
compound statements we might want to consider.
An additional connective, which is sometimes used, is the Exclusive OR which is false if both disjuncts p
and q are true. It can be defined as follows :
p q
True True False
Tautologies
A compound proposition whose truth-values are all true is said to a tautology. The negation of a tautology has all
false truth values and is called a contradiction. A tautology is true regardless of the truth values of its constituent
atomic propositions just as a contradiction is false regardless of the truth values of its propositions.
The compound proposition is a tautology. It's also pretty meaningless. Nevertheless, in any argument
we can introduce this statement and sometimes it actually gets us somewhere!
The compound proposition is a contradiction. It some sense it is the parent of all lies since it
immediately and without shame purports to tell us something and then denies that same thing!
If the compound propositions p and q always have has the same truth values, then the biconditional will
always be a tautology. In sentential logic all theorems are tautologies and all tautologies are either axioms or
theorems. Thus, one can determine if a given proposition is an axiom or theorem by constructing its truth table. If
the proposition is a tautology, it must be an axiom or theorem of sentential logic. This is illustrated in the
following Excel spreadsheet, which establishes DeMorgan's Rules for negating conjunctions and disjunctions.
Note: The use of DeMorgan's Rules demonstrates that the negation of the tautology is the
contradiction and the negation of the contradiction is the tautology .
Predicates
Multiple Quantifiers
Unique Existence
Predicates
Conclusion: Socrates is mortal.
The last statement seems an irrefutable conclusion of the premises, yet the validity of this type of argument lies
beyond the rules of sentential logic. The key of the argument is the quantifier "all" that precedes the first premise.
Before we deal with quantifiers let's consider the arithmetic sentence "x + 1 = 2". Here the letter x is called a
variable since the symbol "x" apart from its position in the alphabet has no standard interpretation as a definite
object. In contrast, the symbols "1", "=", and "2" have specific meanings. The first thing we need to specify for a
variable is its Domain or Universe which is the collection of objects (i.e., a set) from which a given variable
takes its particular values. In "x + 1 = 2" the most reasonable domain is some set of numbers (more on these in
Section V). In the sentence "z was president of the United States in 1955" the domain for z would be a set of
human beings.
In instances like the above two examples the variables x and z are said to be "free", in that any member of the
domain is allowed to be substituted into the sentence. The sentence "x + 1 = 2" could be represented by the
symbol S(x) . The sentence S(3) is the false proposition "3 + 1 = 2", while the sentence S(1) is the true proposition
"1 + 1 = 2". A statement like S(x) with free variables is called a predicate or open sentence. Note: the
resemblance to function notation is deliberate. The idea is that the truth of the proposition S(x) depends on or is a
function of the variable x. Thus, some authors refer to predicates as propositional functions. If you were asked to
determine the truth value of S(x), the question would be meaningless. The statement is sometimes true (when x is
replaced by 1) and sometimes false (when x is not replaced by 1). The truth of S(x) is an open question until a
value for x is specified. Similarly, let P(z) be the predicate "z was president of the United States in 1955".
Then P(Dwight David Eisenhower) is true, P(John F. Kennedy) is false, and P(z) is open.
When we introduce quantifiers like all, every, some, there exist, etc., in front of a predicate, the variables in the
sentence are bound by the quantifier. The two quantifiers we will use are the Universal Quantifier, ,
commonly read as "for all" or "for every", and the Existential Quantifier, , commonly read as "there exists …
such that" or "for some". For example, is interpreted to mean the sentence "there exists z such thatz was
President of the United States in 1955". This sentence is considered closed, not open. In fact, it is certainly a true
statement since Dwight David Eisenhower is just such an individual! If we wanted to be more specific as to the
domain of this statement, we could introduce the additional predicate H(x) "x is a human being" into the closed
sentence . Again this is a true statement since substituting Dwight David Eisenhower
for z makes true.
Note: the predicate H(x) was defined using the variable x rather than z. The choice of variable names in the
definition of a predicate or in a quantified closed sentence is usually arbitrary. Such variables are often called
"dummy" variables, since the choice of the name used is immaterial to the interpretation of the sentence. Thus,
the statements and are equivalent.
The closed sentence is interpreted as "everything is both a human being and president of the
United States in 1955". Clearly, for any domain having more members than Dwight David Eisenhower, this is a
false statement. Using a similar analysis, is true since x = 1 does the job, while is false since
only x = 1 does the job.
1. The closed sentence is true if and only if Q(x) is true for every value in the domain of x.
2. The closed sentence is true if and only if Q(x) is true for at least one value in the domain of x.
It's not the case that every US citizen pays taxes if and only if there is at least one US citizen who doesn't pay
taxes.
Multiple Quantifiers
Often we need predicates with more than one variable. Consider the closed sentence "Every team wants to win all
of its games". Let T(x) be the predicate "x is a team", G(x,y) the predicate "y is a game played by x" andW(x,y) the
predicate "x wants to win y". The above sentence can be symbolized as
. Note: The sentence , which could be rendered as "In every game played
by any team, the team wants to win", is identical in meaning to . In
general, for any predicate Q(x,y), .
As a second example, let S(x,y) be the predicate "the sum of x and y is zero". Let the domains of both x and y be
the set of integers . Now consider the following four closed sentences.
For example, consider the predicate M(x,y) "x is a human being, and y is a human being, and y is the mother of x".
Then the closed sentence could be rendered as "Everyone has a biological mother". While this
statement might not be true due to advances in cloning technology, it is at least plausible. The closed
sentence is the absurdity "There is a person who is everyone's biological mother".
The rules of negation for multiple quantifiers follow from repeated application of the rules for negating a single
quantifier. The results are stated and paraphrased below.
It's not true that all x, y pairs make Q(x,y) true if and only if you can find at least one x, y pair that makes Q(x,y)
false.
It's not true that for every x you can find a y that makes Q(x,y) true if and only if there is at least one x value for
which all y values make Q(x,y) false.
It's not true that you can find an x value that for every y value makes Q(x,y) true if and only if for every x value
you can find a y value that makes Q(x,y) false.
It's not true that you can find an x, y pair that makes Q(x,y) true if and only if every x, y pair makes Q(x,y) false.
Unique Existence
A third quantifier often used is unique existence . The sentence is read as "There exists a
unique x such that Q(x)". This means that there is one and only one value in the domain of x that makes the
predicateQ(x) true. Using the notion of equality, i.e., x = y if and only if x and y are the same object, one has the
following equivalence.
Formal Proof
Informal Proof
Conditional Proof
Indirect Proof
Mathematical Induction
Formal Proof
A Formal Proof is a derivation of a theorem that consists of a finite sequence of well-formed formulas. Every
sentence in this sequence is either an axiom, an identified premise (a statement of "fact" that is not an axiom of
the formal system), or follows from previous statements by the rules of inference of the system. The only
"allowed moves" in a derivation are those "sanctioned" by the axiomatic system. If we accept the rules and
axioms but nevertheless doubt the conclusions of a proof, the fault must lie in the validity of the premises.
An axiomatic system for sentential and predicate logic is somewhat arbitrary to set up. One scheme might take as
an axiom or rule of inference what another scheme derives as a theorem from a slightly different set of axioms or
rules of inference. In these notes we will designate as A an axiomatic system for sentential and predicate logic
which has the following rules of inference. The statement preceded by the is the well-formed formula that
follows from earlier well-formed formulas by the stated rule of inference. The ellipsis … is used to stand for
possible steps in the derivation before or between the well-formed formulas required for the inference.
Conjunction: In a derivation if proposition p is a step and another step is proposition q, you may then conclude
the conjunction of p and q.
Simplification: In a derivation if the conjunction of propositions p and q is a step, you may then conclude p.
Addition: In a derivation if proposition p is a step, you may then conclude the disjunction of p and any
proposition q.
Disjunctive Syllogism: In a derivation if the disjunction of propositions p with q is a step and another step is the
negation of p, you may then conclude q.
Excluded Middle Introduction (EMI): For any step in a derivation you may use the disjunction of any
proposition p with the negation of p.
Universal Specification (US): If you have the assertion , you may then conclude for any s in the
domain of the variable x .
Existential Specification (ES): If you have the assertion , you may then conclude
where s represents a constant particular object in the domain of the variable x . This means that s is a new
temporaryconstant symbol, not a variable!
Universal Generalization (UG): If you produce a derivation of , where x is a free variable representing
any member of a certain domain, you may then conclude .
Existential Generalization (EG): If you produce a derivation of , where s represents a constant particular
member of the domain of the variable x, you may then conclude .
Rules of Replacement: In any derivation you may generate a new step by replacing in a previous
step an occurrence of any of the following propositions with its stated ( ) equivalent
proposition. Note: all of these statements when considered as biconditionals are tautologies as can be
verified with the appropriate truth table.
Title Rule
Commutative Property of OR
Commutative Property of AND
Associative Property of OR
Associative Property of AND
Distributive Property of OR
Distributive Property of AND
Double Negation (The parentheses is added to avoid confusion.)
Definition of Implication
Definition of Equivalence
DeMorgan's Rule: Negation of an OR
DeMorgan's Rule: Negation of an AND
Title Rule
Negation of a Universal Specification
Negation of an Existential Specification
Commutation of Universal Quantifiers
Commutation of Existential Quantifiers
There are additional rules of inference which are valid for A. These rules can be derived from the initial set of
rules already given for A , so in some sense they are redundant. Nevertheless, they are pretty intuitive and very
useful.
Hypothetical Syllogism
Constructive Dilemma
Destructive Dilemma
Rules of Replacement:
Transposition
Exportation
p q r
Suppose we wanted to prove just using the initial set of rules presented
for A (i.e., we won't take Hypothetical Syllogism as a given rule of inference). Such a derivation is given in the
following proof. The propositions are numbered from 1 starting at the the first step. After each step is the rule of
inference used to generate that proposition.
Number Proposition Rule Used
1. EMI
3. Addition on 2.
9. EMI
10. Addition on 9.
12. EMI
27.
28.
29.
30.
31.
32.
Commutative Property of OR on 31.
33.
Here the abbreviation QED at the bottom of the sequence of propositions is a signal that the derivation is
complete. QED stands for Quod Erat Demonstrandum (Latin for "which was to be shown").
This proof, while correct according to the "rules of the game" of the axiomatic system A, is nearly impossible to
understand! Even if you follow each and every step, it is quite easy to lose your way within the tangle of so many
propositions. The overall strategy of the derivation is certainly not apparent. It would seem that so many trees
have obscured the sight of the forest! More people would be convinced of the truth of this proposition by the
eight-row truth table than by the formal proof. It must also be remembered that this is a proof of a rather
elementary and self-evident result. Imagine the length and complexity of a difficult theorem! The situation is very
analogous to programming a computer in its rather restricted machine language. Programs in machine language
tend to be very long and hard to follow. To quote Roger Penrose in Shadows of the Mind: A Search for the
Missing Science of Consciousness, Oxford University Press, 1994, page 72, "Rules can sometimes be a partial
substitute for understanding, but they can never replace it entirely".
Informal Proof
For all of these reasons strictly formal proofs are rarely used in practice. Instead a "higher level" more "human"
language consisting of ordinary English (or any other "natural" language) and the rules of inference is employed.
Similar steps are combined or skipped but noted. Some of the simpler details are left to the reader. This rather
"loose" style of proof is called "informal proof" and is used throughout mathematics. A "good" informal proof
outlines the main ideas or constructions of the corresponding formal proof but with less detail and more clarity.
Ideally anyone who reads the informal proof could, if required, supply the missing steps and structure of the
corresponding formal proof. As an example here is an informal version of the proof of the Hypothetical Syllogism
proposition done formally above.
Using the Associative and Commutative Properties of OR these can all be rearranged as follows.
.
Forming the conjunction of these four propositions and using the Distributive Property of OR to "factor out"
gives
By Double Negation and DeMorgan's Rule for the negation of a disjunction we have
Conditional Proof
Many if not most theorems are stated as conditionals of the form , where the conclusion q follows as an
implication of the premise p. Thus, if we can establish that q follows from p, the conditional must be true. This
observation is the basis for the method of Conditional Proof (CP). Any theorem proved by CP could also be
proven by Formal Proof, so CP need not be assumed as an additional Rule of Inference. On the other hand,
conditional proofs are often very intuitive and easy to understand. Furthermore they illustrate in a very direct way
the logical connection between the premise p and the conclusion q. The form of CP is as follows:
The right arrow before p indicates that this statement is the initial premise of the CP. The initial premise and all
statements up to and including the conclusion q are prefaced by a vertical line meaning these statements are
"under the assumption" of p. The horizontal line "closes off" the steps under the assumption of p.
As an example consider the following conditional proof of the Hypothetical Syllogism proposition.
Note: This proof makes use of a Conditional Proof "nested" within a Conditional Proof! This is
legitimate as long as each nested initial premise is closed off before any preceding initial premises
are closed off. The structure of this proof makes a very convincing demonstration of the validity of
the rule of Hypothetical Syllogism.
Indirect Proof
A special case of Conditional Proof is to assume p and then reach as a contradiction the conjunction
of q and ~ q for some sentence q. This serves to establish that p was not true to begin with. Hence,
we conclude ~ p. This method is attributed to Plato and often goes by the Latin name Reductio ad
absurdum ("reduce to the absurd"). The method of Indirect Proof is related to the reasoning used in
Hypotheses Testing in statistics (an application of Inductive Logic), where one assumes the Null
Hypothesis and then tries to show that it can't be supported by the available empirical evidence. A
formal justification of the method of Indirect Proof is presented below.
A famous example of Indirect Proof is Euclid's theorem that there are infinitely many prime numbers. A prime
number is an integer larger than 1 that has a non-zero remainder when it's divided by any integer other than itself
that's greater than 1.
In this proof we take as given the Fundamental Theorem of Arithmetic: every integer greater than 1 can be
expressed uniquely (apart from a reordering of factors) as a product of its prime factors. If the integer
is itself prime, it is the only factor in the product. For example, , and the only way
3560 can be written as a product of prime factors is to have exactly three factors of 2, one factor of 5,
one factor of 89, and no other factors.
Assume there are a finite number, n, of prime numbers. Call them . Now consider
the number . This number is larger than all of the existing prime
numbers and is therefore not a prime. All of the n prime numbers when divided into L leave a
remainder of 1. Since L is not itself prime, it has no prime factors in contradiction to the Fundamental
Theorem of Arithmetic. Therefore, our assumption of a finite number of primes is false. There must
be an infinitely many prime numbers.
Suppose you want to establish . By the Rule of Replacement for the Negation of a Universal
Specification this is equivalent to showing . We need only find a particular instance in the domain
of x for which P(x) is false. This is often a good strategy when confronted with a universal statement whose
validity you suspect. Maybe it's not true! All you have to do is find a case that doesn't work. For example,
consider the assertion that for all integers n, n > 1, there are no positive integer solutions for x, y, and z to the
equation . Since for n = 2, x = 3, y = 4, and z = 5 is a counter example, the assertion must be false.
For n > 2, however, the statement is true and is known as Fermat's Last Theorem. This famous conjecture
remained unresolved until Andrew Wiles finally proved it in 1994.
Mathematical Induction
The principle of Mathematical Induction, despite the word "induction", is a method of Deductive Logic. It is used
to prove sentences of the form .
Here the domain of n is the set of Integers ( ) , b is a specific integer, and P(n) is a
predicate. The principle can be stated as follows:
1. Base Case is true. This is usually established by substitution of b into the predicate P(x).
Once we have these two facts we can construct the following "argument".
This proof never stops. It is an infinite string of implications. It seems reasonable to make the assertion that we
have established P(n) for all integers greater than or equal to b. Mathematical Induction is not a rule of inference
of predicate logic. Its validity must be assumed in addition to the rules of logic already presented. Mathematical
Induction is sometimes considered as a rule of "metalogic". This means that it's a rule we can use to prove
properties about axiomatic systems rather than just proving statements within axiomatic systems. Sometimes
Mathematical Induction is pictured as a line of dominoes set up so that each domino falling over causes the next
to fall (this is the Induction Step). Thus, knocking over the first domino (the Base Case) causes all of the
dominoes to fall. The only problem with this visualization is that in Mathematical Induction we've got infinitely
many dominoes which is beyond our experience!
As an example of Mathematical Induction we can prove that the Rule of Replacement known as the Distributive
Property of OR can be extended to a disjunction with a conjunction of n sentences.
Base Case: n = 2 is just the accepted Distributive Property of OR.
Induction
Step: Assume is
true. Now consider . By the Associative Property of OR this is
equivalent to . Using the Distributive Property of OR this is equivalent
to . Using the equivalence stated in the Induction Hypothesis
gives the equivalent expression.
As a second example consider the formula for the sum of the cubes of the first n natural numbers. Natural
numbers are the positive integers {1, 2, 3, 4, 5, 6, … } .
at j = 1 (the meaning of the expression below the ) and end at j = n (the meaning of the expression above the
). S(n) is a function (more about these in Section V) whose input is n and whose output is the stated formula.
This formula can be verified by computation for any specific value of n. For example,
.
Base Case: n = 1
Induction Step: Assume
is valid.
Consider , by the Induction Hypothesis this is equal to
This establishes the Induction Step. Hence the formula is valid for any natural number n.
Mathematical Induction is an indispensable tool of mathematics, but it generally only validates results that we
arrive at by other means. In the above example, Mathematical Induction verifies the formula for S(n) but does not
come up with the formula to begin with!
In some problems a modified version of Mathematical Induction called "Strong" or "Complete" Mathematical
Induction is used. The version presented above is then called "Weak" or "Ordinary" Mathematical Induction.
Complete Mathematical Induction can be formulated as follows:
In fact, Complete Mathematical Induction can be derived from Ordinary Mathematical Induction by defining Q(n)
to be the predicate . Certainly , so Ordinary Induction on Q(n)
establishes Complete Induction on P(n). For some theorems the use of Complete Mathematical Induction makes
the argument easier to follow.
Sets of Numbers
Russell's Paradox
Set theory was developed in the second half of the Nineteenth Century. It has its roots in the work
of Georg Cantor, although contributions of others such as Gottlob Frege and Giuseppe Peano were
significant. Ultimately, the goal of Set Theory was to provide a common axiomatic basis for all of mathematics.
In some sense, mathematics could then be reduced to logic. Attempts to provide an axiomatic basis for
mathematics were undertaken by such prominent individuals as Bertrand Russell, Alfred North Whitehead,
and David Hilbert.
A set is a collection of things of any kind. If B is a set we call the "things" in B the elements or
members of B. In symbols meansthat b is an element of B. Similarly, for a set B the statement
means that the object b is not in B. Sets themselves are often symbolized by enclosing their elements within
"curly brackets". For example {Jimmy Carter, Ronald Reagan, George Bush, Bill Clinton} represents the set of
US presidents during the years 1980 to 1995. Note: The use of { } , which is also employed as a grouping
symbol, for sets could be confusing, but generally the context makes what is intended clear.
Sets can also be described by a rule or predicate members must satisfy. If P(x) is the predicate "x was a president
of the US during the years 1980 to 1995", then the set listed above could be symbolized as {x | P(x) }. This is
rendered in words as "the set of all x such that P(x)". The "|" stands for the expression "such that", although some
authors prefer ":" to mean the same thing. The use of curly brackets and a rule to specify sets is calledset-builder
notation and is used throughout mathematics.
1. Axiom of Abstraction (or Comprehension): Any collection of objects that can either be listed or described by
some predicate constitutes a set. Furthermore, an object is a member of a set if and only if it is one of the objects
listed or satisfies the open predicate describing the set. In symbols given any predicate, P(x), the set {x | P(x)}
exists and .
2. Axiom of Extensionality: Given two sets A and B, A = B if and only if . In words, two
sets are identical if and only if they have exactly the same elements.
Let A = { Bob Jones, Felix the cat, Mars, 9 } and B = { 9, Mars, Bob Jones, Felix the cat}. By the application of
the axioms A = B, since every member of one set is also in the other set. Thus, the order in which elements are
listed is immaterial in defining a set. A set has no characteristics other than its status as a collection of its
elements.
These two axioms seem very intuitive and self-evident. In fact, they seem almost devoid of content. You might
suspect that they lack sufficient power to prove anything of importance. In fact, these two axioms
are toopowerful! You can prove absolutely everything from them because they are self-contradictory! We will
discus this later when we outline Russell's Paradox. Suffice it to say that the problem lies in the Axiom of
Abstraction's lack of restriction on the predicate P(x). Thus, by the Axiom of Abstraction, we are allowed to
formulate things such as the set of all sets. By making our universe too big we are begging for trouble! In our use
of Naïve set theory we will purposely avoid such constructions.
For a given predicate P(x), consider the set . Suppose for some a, , then by the
Axiom of Extensionality, . Hence, by Indirect Proof, we've established that the set Qhas no
members, i.e., . This set is called the empty Set or the null Set and is commonly
symbolized by either empty brackets { } or the symbol . The empty set goes by infinitely many aliases. For
example, the following sets are all different names for the one and only empty set. (Z is the "standard" name for
the set of integers.)
Note: The set is not the empty set. This is a set of sets which has an element, namely the null set.
Some sets are contained entirely within other sets. For example, the set of women is contained within the set of
human beings. We say that is a subset of . In symbols,
In words, given any two sets A and B, A is a subset of B if and only if every element of A is also an element of B.
Some authors use the symbol instead of for inclusion. In analogy to the symbols < and , we will use
for inclusion and will reserve to mean that A is a proper subset of B. This means that A is contained
in B, but is not all of B. In symbols, if A and B are sets, . Since
is always true when p is false and since is always false, for any set A, it is true
that . Hence, the empty set is a subset of every set, . From the
notion of subset we can characterize set equality by the following theorem, given two sets A and B,
.
Proof: Assume A = B, then by the Axiom of Extensionality every element in A is also in B and every
element in B is also in A, so and . So
. Assume .Pick any x. If x is in set A, since , x is also in B. If x is in set B, since
, x is also in A. So, . So by the Axiom of Extensionality, A = B. So
.
Consider the set L = {a, b, c}. The full list of the 8 subsets of L is as follows:
,{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}.
The set of all possible subsets of a given set A is called the power set of A, in symbols P(A). Thus, for the set L,
P(L) = { ,{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.
Given two sets A and B we can define the following binary set operations.
1. The union of A and B : .
2. The intersection of A and B : .
3. The relative complement of A in B : .
The union of two sets is everything in either one of them. It is the "sum total" of the two sets. The intersection of
two sets is the "overlap", i.e., the part of the two sets which is common to both. If the intersection is empty, we
say the two sets are disjoint. This means they have no elements in common. In probability theory two disjoint
events are called mutually exclusive, which means one event happening excludes the possibility of the other
having happened. The relative complement of A in B are the elements in B without the ones also in A. One might
read this as B "minus" A or B "without" A.
For example, if A = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and B = {0, 2, 4, 6, 8, 10, 12, 14, 16}, then
.
From the definitions of intersection, union and inclusion, we can immediately establish that for any
sets A and B, , , and .
Applying the indicated tautology or contradiction along with the set operation definitions establishes
each of the following set operation identities for any sets A, B, and C.
1.
2.
Proof:
1. . Now consider any x in A. Since , x is also in B. Therefore x is in . Hence,
. Since ,we have .
2. . Now consider any x in . Then x is either in A or B, but since , if x is in A, it is
also in B. Therefore x is in B.
Hence, . Since ,we have .
As was noted earlier, allowing the scope of sets to become too large (such as the set of all sets) can lead to
contradictions. This can often be avoided by imagining that all the sets under consideration are subsets of some
finite universe. Some authors use U for such a set, but we will call it S to avoid confusion with the set operation
of union. S is also the symbol used in probability for the universe of all possible outcomes of an experiment. In
this context S is called the Sample Space. A common usage is to abbreviate by . This of course is a
more concise notation. It also reflects the definitions of Boolean Algebra, where multiplication, as indicated by
the juxtaposition of two variables, is defined as set intersection and addition is set union. Working within the
restriction of a universe S, allows us to define the complement of a set A as the relative complement of A in S.
Since the complement of A is that part of "everything" (i.e., S ) that's not A , we have the interpretation of
complement as negation. In symbols, is "like" .
Assuming that the sets A and B are subsets of S and using the algebra of sets we can establish the following:
; ;
Set operations in a universe S can be viewed graphically by a Venn diagram. Here the rectangle represents S and
any subset of S is drawn as a closed figure within this rectangle. This is illustrated below.
Sets of Numbers
The most basic numbers are the natural numbers ("counting" numbers or positive integers). While
there are ways to construct the natural numbers using sets, such a development is beyond the scope
of these notes. Furthermore, this construction is not intuitive and strikes many people
(mathematicians included) as artificial. Thus, we will consider the natural numbers as undefined and
primitive objects. This approach can also be taken to extremes as seen in the work of Leopold
Kronecker, who declared "God created the integers, all else is the work of man". The set of natural
numbers is often symbolized as either N or Z+. If the ellipsis is understood to represent the same
sequence repeating without end, then N = { 1, 2, 3, 4, 5, 6, 7, … }.
After much resistance zero was eventually accepted as a "valid" number. The set of whole numbers is given by
Since not every mathematical process (for example, measurements with a rule) results in an integer answer, we
need to consider numbers that are fractions or ratios of integers. Assuming that the process of division is
understood, we can write
.
For integers this can be interpreted as repeated subtraction just as multiplication can be interpreted as repeated
addition. Thus, 34 divided by 5 is 6, remainder 4. This means that we can subtract 5 from 34 six times before we
get a remainder of 4 less than 5. Essentially, thecomputation tells us how many 5's are in
34. There are six 5's in 34 with 4 left over. This is one reason why division by zero is meaningless. How many 0's
are in 34? How many 0's could we subtract from 34 till we get a remainder less than 0? So when we
write , we assume that q is not zero. We therefore define the set of rational numbers as the
set of all permissible ratios of integers. In symbols, the set of rationals is given by
Q = {x | x has a terminating or repeating decimal representation }. In fact this definition can be shortened as just
the set of repeating decimals. Any terminating decimal is a repeating decimal with a repeating string of 0's. For
example, 3.458 = 3.458000… . One problem with defining rational numbers this way it that the decimal
representation of some rational numbers is not unique. Consider the decimal 0.49999… , where it is understood
that the string of 9's never stops. This is the rational number one half which is also represented by the terminating
decimal 0.5 . This redundancy in representation exits for every rational number that is a terminating decimal. To
understand the equivalence of these forms requires a familiarity with limits and infinite geometric series. These
concepts are beyond the scope of the present notes. However, the following computation may make the result at
least seem plausible.
Since we would like our decimal representation of rational numbers to be unique, we will assume that all numbers
that end in an infinite string of 9's have been rewritten as the corresponding terminating decimal.
Still there are processes whose answers can not be rational. Pythagoras is said to have known that the square root
of 2 is such a number. A number that is not rational is called irrational. This certainly does not mean that the
number is "deranged" or even illogical, it simply means that the number can not be expressed as a ratio of
integers. Other Greek mathematicians such as Theodorus, were aware of additional irrational quantities.
The proof that is irrational was one of the first examples of indirect proof. It assumes some familiarity with
arithmetic and is presented below.
Assume that , where P and Q are both positive integers without any common factors. If P and Q were
not originally in lowest terms, we could remove as many common factors as necessary until we arrived at a
rational representation of that was in lowest terms. So without any loss of generality, we will assume
P = 2M for some natural number M. Substituting this into the expression for P squared
yields . So Q must be even since it has an even square. Hence, 2 divides
both P and Q in contradiction to the assumption that P and Q have no common factors. Therefore, it is impossible
to express as a rational number.
In fact, using the Fundamental Theorem of Arithmetic, we can show that the n'th root of any natural number that
is not a perfect n'th power must be irrational. Let a be a natural number that is not a perfect n'th power. This
means that there is no natural number m with the property that mn = a .
. Now, the prime factors of Pn are the same as the prime factors of P, i.e.,
p1, p2, … pJ, and the prime factors of Qn are the same as the prime factors of Q . Only the multiplicity (how often
each factor occurs) has changed.
However, so we have a second factorization of Pn , which involves different prime factors. This is
a contradiction of the Fundamental Theorem of Arithmetic. So for a not a perfect n'th power, is irrational.
By taking the union of the set of rational numbers with the set of irrational numbers we form a bigger set, ,
called the set of real numbers. It is convenient to represent the real numbers geometrically as points on a "number
line". This is illustrated below.
Often we want to work with subsets of real numbers that correspond to intervals on the number line.
and .
The entire real number line can be represented using this notation; . Here the symbol is "infinity"
and is not any kind of real number. It is used in this context to mean that the interval has no bounds.
An ordered pair is a pair of objects in which (unlike a set) order makes a difference. We represent ordered pairs
by (a, b) . This use of parentheses reflects an earlier time when the choice of typographical symbols was limited.
Unfortunately, it is the third use of parentheses we've encountered! The earlier uses were as a grouping symbol
and to designate an open interval of real numbers. Reluctance to change notation means that we'll have to rely on
context to distinguish which use of parentheses is intended. In (a, b) a is called the first component and b is called
the second component. The main idea of an ordered pair is that for two ordered pairs to be equal their components
must match.
In fact, the relation (a, b) can defined as the set of sets { {a}, {a, b} } , which ensures the above property.
A set of ordered pairs is called a relation. In symbols this can be written as follows:
Here, in the second statement we have introduced an abbreviated form of set builder notation in which we
designate immediately after the curly brackets that this will be a set of ordered pairs. The nature or conditions of
the relation are given by the predicate xRy. For example, let xMy be the statement "y is the mother of x". Then the
relation M is a set whose elements are of the form (child, mother). We might say this relation defines the
"relationship" of motherhood. In fact, this is why the name relation was chosen to designate a set of ordered pairs.
The set of values of the first component of the ordered pairs in a relation is called the domain of the relation and
is designated as Dom(R). The set of values of the second component of the ordered pairs in a relation is called
the range of the relation and is designated as Rng(R). Here, R means the given relation, not the set of real
numbers.
Dom(R) =
Rng(R) =
In the relation M, Dom(M) is the set of all individuals who were someone's child, while Rng(M) is the set of all
individuals who are mothers. Undoubtedly, the intersection of these two sets is not empty!
Given any relation, we can define its inverse by reversing each ordered pair. The inverse of M, usually designated
by M-1 would be the set of all (mother, child) pairs. In general, we define the inverse of a relation as follows:
From this it is obvious that the Dom(R-1)= Rng(R) and Rng(R-1) = Dom(R).As a second example,
if R were the relation consisting of all (names, phone numbers) in a community (i.e., the contents of a
"phone book"), thenR-1 would be an "inverse phone book" with entries of the form (phone numbers,
names).
So far we have allowed the predicate relationship xRy to define the relation and its domain and range.
An alternate approach is to start with two sets A and B and define the relation (and hence the
relationship) as all possible pairings of elements of A with elements of B. This is called the Cartesian
(after René Descartes) cross product of A with B.
Thus in general, the Cartesian product is non-commutative, i.e., . The set commonly
called is the set of all ordered pairs of real numbers. Just as represents all the points on a
line, represents all the points in a plane. Thus is referred to as the Cartesian plane.
The schematic below graphically represents a relation between two sets A and B. From the 10 connections drawn
between the two sets we can list the 10 ordered pairs in R. Thus, each connection represents a "relationship"
between an object in the domain and an object in the range. Note: In a relation, a given element in the domain can
connect to more than one element in the range and vice versa.
A relation R is called reflexive on a set A if and only if .
Note:
1. If R is symmetric, then Dom(R) = Rng(R). To prove this, let x be any element of Dom(R), then there must be
some
y in Rng(R) with xRy. Since R is symmetric, we must also have yRx. Hence, x is also an element of Rng(R) and
. Now let y be any element of Rng(R), then there must be some x in the Dom(R) with xRy.
Again since R is symmetric, we must also have yRx. Hence, y is an element of Dom(R) and
.
2. If R is symmetric and transitive, then R is reflexive on Dom(R). Since R is symmetric, we have Dom(R) =
Rng(R).
Let x be any element of Dom(R), then there is a y in Dom(R) with (x, y) in R. Since R is symmetric, we have both
xRy and yRx. Since R is transitive, xRy and yRx implies xRx. Hence, for all x in Dom(R), we have xRx.
If a relation is reflexive, symmetric, and transitive, it is called an equivalence relation on the set A. The
prototypical equivalence relation is equality, which obviously has all three properties. A slightly different
example of an equivalence relation is "x weighs the same as y", which of course does not assert that x and y are
the same object. It merely states that when the objects are weighed the measurements are equal. A third example
of an equivalence relation is "x has the same shape as y". This is the geometric equivalence relation called
similarity.
The significance of an equivalence relation on a set A is that it allows us to partition A into disjoint subsets all of
which have the same property. For example, we could partition a set of people into "equal weight" groups, or the
set of all possible triangles into subsets based on similarity (shape).
Often we are interested in relationships between two sets of objects in which the choice of an object in one set
completely and unambiguously determines the related object in the second set. For example, if a TV is working
properly, the choice of channel number should unambiguously determine one and only one picture! We often
label the first set, that controls the choice of objects in the second set, as the input set or the independent
variable set. The second set is then called the output or dependent variable set. The idea of these terms is that
there is "some freedom" in the choice of the independent variable (like the TV channel), while the output seen
depends on this choice. In such a determined relationship it is customary to state the independent variable as the
first component (it's selected first after all), and the output as the second component. The key restriction is that for
a given input, there must be one and only one (i.e., a unique) output. The input determines the output, so we
should only see one response. Thus, in this kind of relation, an ordered pair with a given first component can only
appear once! Such a relation is called a function.
A second notation for functions is . Here A is the domain of f, and the "action" of the function is
to map or transform (the symbol, which in this context does not mean implication!) elements of this
domain intoelements of the set B. The notation of a function "of a set", f (A), means the range of f when the
domain of f is A.
f (A) =Rng(f ) = . In general, the range is only a subset of B. If the Rng(f ) = B, we
say the function is onto or surjective. This is expressed in symbols by . To be more precise,
.
In the example, , we would write , the function is into , but not onto .
What makes a function a function is that for every input in the domain there is one and only one output in the
range.
This means a particular first component or input occurs in one and only one ordered pair in the complete set of
ordered pairs that constitutes the function. If in addition, we have that every second component or output occurs
in one and only one ordered pair in the function, we say that the function is 1-1 or injective and write
. This means that every output in the range is generated once and only once by its own particular input. Hence, if
the outputs match, then the inputs must match. This gives us the following definition:
.
In the TV example discussed earlier, the function that converts channels into pictures is usually not 1-1. It is quite
possible that the same picture is being shown on different stations. If the president is addressing the nation, I don't
conclude my set is broken just because the same picture pops up on different channels. The picture is still a
function of channel, it's just not a 1-1 function. The function given by is also not 1-1, since we
get the same output for both a given number and its negative.
One important property of 1-1 functions is that the inverse relation f -1 is also a function. Recall that the
inverse of a relation is set of ordered pairs obtained by switching the order of all pairs in the original
relation. For a function this means switching input with output. The domain of f -1 is the range
of f and the range of f -1 is the domain of f.
Suppose for some x, y, and z that (x, y) and (x, z) are all in f -1. Then (y, x) and (z, x) are all in f , i.e., f (y)
= x and f (z) = x. Since f is 1-1, it must be that y = z . Hence, we have established that
, so that f-1 is a function.
Since f -1 is a function, we can write the relation in two different, but equivalent ways.
Thus, we have the following equivalence that for any 1-1 function f .
,
.
Consider the function defined from set A to set B by the four ordered pairs shown below.
This is certainly a function since no element in the domain A is mapped to more than one element in the range.
However, this function is not 1-1 since both x1 and x2 in the domain get mapped to the same element, y5 in the
range. The function is also not onto since there are elements in B, y2, y4, and y6 , which are not in the range of f .
Consider the function defined from set A to set B by the four ordered pairs shown below.
This is a 1-1 function since every element in the domain A connects to one and only one element in the range, and
every element in the range is connected to only one element in the domain. Since the range does not include the
elements y6 and y4 of B, the function is not onto.
Note: Since f is a 1-1 function, an inverse function f -1 exits. Its domain is not B, since f was not onto.
Consider the function defined from set A to set B by the four ordered pairs shown below.
This is a function since every element in the domain A connects to one and only one element in the
range. However, this function is not 1-1 since both x1 and x4 in the domain get mapped to the same
element, y3 in the range. This is an onto function since there are no elements in B which are not in the
range of f .
bijection.
If f is a bijection, then not only is Rng(f ) = Dom( f--1) = B, but the Rng(f -1) = Dom(f ) = A. In a bijection
each element of A is matched with one and only one element of B, and each element of B is matched with one and
only one element of A. It's like a game of "musical chairs" where there are exactly the same number of chairs as
people. Each person gets a chair and each chair gets a person. The two sets are in a 1-1 correspondence and have
exactly the same number of elements.
Consider the function defined from set A to set B by the four ordered pairs shown below.
This is a 1-1 function since every element in the domain A connects to one and only one element in the range and
every element in the range connects to one and only one element in the domain. This is also an onto function
since there are no elements in B which are not in the range of f . Thus, this function is a bijection.
Note: Since f is a 1-1 function, an inverse function f -1exits. Since f is an onto function the domain of f -
1
is B.
An example of a function which is a bijection from any set unto itself is the Identity Function, I(x)
= x . This function is called the identity function because it returns whatever was put in. The output is
identical to the input.
Often the output of one process becomes the input of a new process. To model this mathematically
we need to apply two or more functions in succession. This is called composition and is symbolized
by an "open circle" between the two function names. To be more precise we state the following
definition: given sets A, B, and C, and functions f and g with , the
composition of "g with f " is a function whose output on the Dom(f ) is characterized
by .
As we have seen, two sets A and B have exactly the same number of elements if there exits a
bijection or one-to-one correspondence between them. We formalize this observation with the
following definition. Two sets Aand B are said to be cardinally equivalent, in symbols, , if and
only if there exits a bijection between them.
A set A is finite if and only if either A is empty or, for some N , the set {1, 2, 3, … n} A. The
natural number n is called the cardinality of A or Card(A) and is simply the number of elements of A. If A is
empty, Card(A) = 0.
Note: This definition recalls the most basic of mathematical operations: counting. As we count, we go through {1,
2, 3, … n} setting up a one-to-one correspondence with the objects being counted.
Cardinal equivalence is an equivalence relation. To demonstrate this, we need only find or construct an
appropriate bijection.
Induction Step: Assume that any set of n elements has 2n elements. Now consider any set of the form
A = {e1, e2, e3, e4, e5, … en, en+1}. Let , then either . If , then x must be one
of the 2n subsets of
{e1, e2, e3, e4, e5, … en}. If , then let . Every element of y must then come from the set
n
{e1, e2, e3, e4, e5, … en}, i.e., then y must be one of the 2 subsets of {e1, e2, e3, e4, e5, … en}. Hence, there are
2n different choices for x with . Thus, the total number of choices for x is
2n + 2n =2n+1.
What if there is no subset of N which has a bijection to A? In that case we say the set A is infinite and the Card(A)
is not a natural number. Stated differently, a set A is infinite if and only if it's not finite. The set N is an infinite
set. This result may seem intuitively obvious, but it still requires a proof based on the definitions. We could use
Mathematical Induction and indirect proof to show that for every natural number n, there does not exist a
bijection from
{1, 2, 3, …, n } to N . For the sake of brevity, we will not work through this argument. A set A which is cardinally
equivalent to N is called denumerable or countably infinite.
Galileo seems to have been the first person to claim that the set of even numbers E = {2, 4, 6, 8, …}, despite that
it has "only half" of the elements in N, has the same number of elements as N, i.e., E N . This result follows
from the fact that the function f (x) = 2x is a bijection from N to E. For any non-zero m, a linear function of the
form, f (x) = mx + b, is a bijection from N to the infinite arithmetic sequence {b + m, b + 2m, b + 3m, b + 4m, …
}. We therefore conclude that all such sequences are cardinally equivalent to N.
Even infinite geometric sequences like {10, 100, 1000, 104, 105, 106, 107 , … }, despite their "sparse" appearance,
are cardinally equivalent to N. This is true, since for , the function is a
bijection from N to such sets.
What can we say about infinite sets that include all of the natural numbers? Are they "bigger" than N ? The
following bijection from N to Z shows that N Z.
Let Q + be the set of positive rational numbers. The following scheme shows that a bijection
from N to Q + exists.
Arrange all the positive rational numbers P/Q in a table with the numerator P labeling the columns of the table
and the denominator Q labeling the rows. Since the desired map from N to Q +is to be 1-1 it is necessary to
remove all multiple occurrences of the same rational numbers. Thus, all unit fractions such as 2/2, 3/3, etc. are
"struck off" the table since they equal 1/1. In a like manner any fraction equivalent to a fraction in an earlier row
is removed. After this process is complete (it will take forever!) we have each element of Q + listed once and only
once in the table. But the positions of these entries can be indexed by a single natural number by moving
diagonally through the table and "counting off" as we go from each listed entry to the next listed entry. [A slightly
different form of this argument would be to use the movement along the diagonal to "count" all the elements
ofN xN (i.e., N xN N).
We then argue that, since the elements of Q + can be constructed from a infinite subset of N xN, Q +
N .
However, this approach anticipates definitions and arguments that we have yet to elaborate.]
Now we can show that N Q. The argument is almost identical to the demonstration that N Z. Let f be the
bijection from N to Q +shown above. Then the function g defined below is a bijection from from N to Q.
Note: The result N Q should seem rather surprising! Unlike N or Z,where the elements are
separated by "gaps" (for example, there are no integers between 2 and 3), Q has the density property
that between every two rational numbers we can always find another. While this is not a "continuum"
of values, from a measurement point of view we could never tell the difference. Every measurement
is a ratio with respect to some standard. We could never measure an irrational quantity since all
measurements have only a finite number of decimal digits! To even the sharpest eye, marking only
the points on the number line which are in Q would "look" like the "real" thing (the double entendre
is intentional). We wouldn't notice that the irrational numbers were missing!
Cantor designated the Card(N) as (aleph naught, aleph being the first letter of the Hebrew alphabet). Cantor
believed that , while not a number in the ordinary sense of the word, represented the "smallest kind of
infinity". He called the first transfinite cardinal.
In order to compare cardinalities of various sets, we introduce the following notations for two sets A and B.
and .
Thus, , while .
The idea is that, if a 1-1 map from A to B exists, then B must have at least as many members as A. Furthermore,
if B has at least as many members as A, but a one to one correspondence from A to B is impossible, then Bmust
have "more members" than A.
The similarity of and to the ordinary symbols is intentional. For any real numbers a, b, and c, we
have the transitive property that if a < b and b < c , then a < c . For and we have the following results
which seem almost intuitively obvious based on "everyday" notions of size and comparison. With the exception
of property 6, the arguments are pretty straightforward. They soon, in fact, seem identical! Once you have
understood say the first seven, you can probably skim the rest. Property 10 is needed when we compare Card(R)
to Card(N).
2. .
If we can find 1-1 functions f and g with and , then the composition
of g with f is 1-1 from A to C.
3. .
4. .
have .
5. .
6. .
7. .
composition of g with f is 1-1 from A to C, so . As an indirect proof, suppose now that ,
8. .
composition of g with f is 1-1 from A to C , so . As an indirect proof, suppose now that ,
9. .
and property 1 imply that , by the CSB Theorem . This contradicts , so it must not be true
that .
10. .
We are now ready to show that there are uncountable sets "bigger" than N. The following proof is due to Cantor
and presents his famous diagonal argument that
N . First, consider the function
.
This is certainly a 1-1 function from N to the open interval (0, 1). We will show that no such 1-1 function can be
onto.Let g be any 1-1 function from N to the open interval (0, 1). Let
designate the elements of the range of g. Each of these a's is a
real number between zero and one. If we agree that all rational numbers that are terminating decimals be
represented only by their repeating string of 0's and not by a repeating string of 9's, the output of g can be
represented uniquely by the decimal expansion of each an. As is shown below, let an, j be the j'th decimal digit
of an= g(n). Now pick two different decimal digits, say 3 and 5. Define the following function h(n) and
let bn = h(n) for every natural number n.
Then b is a real number in the open interval (0, 1) whose decimal expansion consists only of 3's and 5's.
Furthermore, at decimal place n, bn fails to match the n'th decimal digit (an,n) of an .
For any natural number n, the decimal expansion of b does not match the decimal expansions of any of an.
Therefore, the number b is not an element of the range of g. Thus, no 1-1 map from N to the open interval (0, 1)
can be onto.
Hence, N (0, 1). Since the open interval (0, 1) is a subset of ,from property 10 we conclude N .
Thus, the set of real numbers is uncountable.
Note: You might think we could solve the problem in the above proof by simply including the derived
number b in the range of the 1-1 function g. This wouldn't solve the problem. Because now we'd be working with
a "new" g, and then we'd construct a "new" b that would not be in the range of this function. There are "just too
many" real numbers to count! Any scheme that tries to count them all is guaranteed to miss some.
Surprisingly, set size as measured with cardinality is very different than geometric measures of length or area. For
any pair of real numbers a and d, with
a < d, the linear function is a bijection from (0, 1) to the interval (a, d). Thus, (a, d) (0,
1). Since (0, 1) is uncountable, so is (a, d). The geometric length of (0, 1) is 1, while the geometric length of (a, d)
, which ranges from the "indescribably small as a approaches d to the incomprehensibly large as d is
chosen arbitrarily far from a. Yet, never the less, (0, 1) and (a, d) have the "same number" of elements.
Using some properties of rational functions, we could show that the following function is a bijection from the
open interval to .
A "cleaner looking" bijection from to involves the inverse tangent function.
An even more unbelievable result is that . The first set is one-dimensional and has no
geometric area, while the second set is two-dimensional and has an area of 1.
To see the cardinal equivalency consider the function f from to defined by the following
"coding" scheme. Given any number in (0, 1) it has a unique decimal expansion (provided we convert all trailing
strings of 9's into terminating decimals). Let . The output
of f applied to a is the ordered pair . First, this function is 1-1, since
there is an inverse "decoding" in which we use alternate decimal digits from the first and second components of
the output to reconstruct the decimal expansion of the input. Second, given any ordered pair
(b, d) in , using the decoding scheme allows us to reconstruct the number a in (0, 1) with f (a) =
(b, d), so the function is onto.
It might now seem that is the biggest possible set. Cantor, however, showed how starting with any set, we can
get a set with larger cardinality. Specifically, for any set A, A P(A) .
Proof. First consider the function, T(x) ={ x }, which given an object forms the set with that object and only that
object as an element (the "singleton" of x). Then for any a in A, T(a) = {a} is a subset of A, so T(a) is inP(A).
Now suppose T(a) = T(b), .i.e., {a} = {b}. Since two sets are equal if and only if they have the same elements, we
Assume we've found an x in A with f (x) = B. Now, is this x in B? If x is in B, then x must be in A and x is not in
the set given by f (x) = B. That is, since x is in A, if x is in B, then it is not in B! Thus, if x is in B we get a
contradiction. Therefore, it must be the case that x is not in B. However, since x is not in B, x is not in f (x). That
is, we have x in A and x not in f (x), which is precisely the membership criteria for B! Thus, x is both not inB and
in B. Another contradiction! We must therefore conclude that there is no x in A with f (x) = B. Thus, no 1-1 map
from A to P(A) can be onto.
Thus, there is no largest set! By forming the power set we can always make a larger one. In particular,
and .
Both of the sets and P(N) are uncountable. Which is larger? Again Cantor provided the answer.
Theorem: .
Proof. In analogy with the decimal representation every real b number in the closed interval [0, 1] can be
represented by a binary number,
b = 0.b1b2b3b4b5b6b7b8… bn…, where each bn is either 0 or 1. As with decimal numbers we will assume
that any repeating binary number with a trailing string of 1's has been rewritten as a terminating
binary number. Hence, the binary representation of the real numbers in [0, 1] is unique. We will now
generate another coding scheme which codes every subset of N into a binary number in [0, 1].
Let A be any subset of N, then we can construct the function f that maps A into the binary number
0.a1a2a3a4a5a6a7a8… an…., with an = 1 if n is in A, and an = 0 if n is not in A. If A is empty f (A) = 0, and
if A is N, f (A) = 0.1111… = 1. The function f is certainly 1-1 from P(N ) to [0, 1], since if two subsets of N map
to the same binary number, they must contain exactly the same natural numbers. Let b be any real number in (0,
1), then let B be the subset of N with n in B if and only if bn = 1. By construction f (B) = b, so f is an onto
function. So we have established that . Since ,we have
The symbol c is sometimes used for the Card( ). Since Card({1, 2, 3,…n} ) = 2n, the result is often
written symbolically as
.
Cantor made the conjecture that there is no set A, whose cardinality is between the natural numbers and the real
numbers. More precisely this assertion, known as the Continuum Hypothesis, says that there does not exist a
set A with the property that .
Russell's Paradox
In the year 1900, at the dawn of a new century, David Hilbert, a passionate advocate of Cantor's work, addressed
the Second International Congress of Mathematicians in Paris. He presented 23 famous unsolved problems for
consideration. The first problem on the list was to establish the truth or falsehood of the Continuum Hypothesis.
Hilbert presented in his speech what was almost an article of faith among mathematicians, namely, that if a
mathematical proposition is true, it must either be an axiom or theorem of mathematics. According to this view all
true mathematical statements can be proved.
"The great importance of definite problems for the progress of mathematical science in general ... is
undeniable. ... [for] as long as a branch of knowledge supplies a surplus of such problems, it maintains its vitality.
... every mathematician certainly shares ..the conviction that every mathematical problem is necessarily capable of
strict resolution ... we hear within ourselves the constant cry: There is the problem, seek the solution. You can
find it through pure thought..."
In 1902 while Gottlob Frege was preparing to publish his major opus on the foundations of mathematics
representing his life's work, he received a letter from the young English philosopher Bertrand Russell. After
reading this correspondence, Frege wrote in the introduction of his book,
"A scientist can hardly meet with anything more undesirable than to have the foundation give way just as the
work is finished. In this position I was put by a letter from Mr. Bertrand Russell as the work was nearly through
the press."
What had happened? Inspired by the work of Cantor and others, Bertrand Russell considered a construction
reminiscent of the set B in Cantor's proof that
A P(A). Specifically, he considered the set of all sets that are not elements of themselves, i.e., the set of all
collections that are not self-contained. Let . This set is blatantly self-referential, but
it is not necessarily unreasonable. For example, a set of dogs is in W, for a set of dogs is not a dog! In any case,
the Axiom of Abstraction says we can talk about W. The big question we arrive at now is the following:
is ? Well, if , then by the Axiom of Abstraction, . Similarly, if , then
. We get a contradiction either way! There's no way out. Naïve set theory is inherently self-contradictory!
Almost immediately mathematicians and logicians tried to "fix" set theory by various modifications of the Axiom
of Abstraction that seemed to be the source of the trouble. Russell and Whitehead in their book, Principia
Mathematica, developed a "theory of types" which did not even allow the question, "is a set an element of itself ",
to be posed. Other mathematicians developed alternate axiom schemes for set theory that did not allow Russell's
Paradox to occur. The most widely accepted are based on the work of Ernst Zermelo, Adolf Fraenkel ,
and Thoralf Skolem. This system of axioms is called ZF and many mathematicians feel that it (or perhaps it with
the Axiom of Choice added) contain all of the formal methods required for mathematics. Some would even go so
far as to say that a proof of a theorem is valid if and only if it could, at least in principle, be formulated and
proven in ZF.
In 1930 Kurt Gödel completely turned over the apple cart! He showed that all of the systems of set theory
developed to avoid Russel's Paradox could not be proven to be consistent. That is, using methods acceptable to
mathematicians, he was able to show that there is no guarantee that there are not other paradoxes, just like
Russell's Paradox, which would render all of these set theories self-contradictory! In fact, the details of his proof
are even more perplexing. For what Gödel really demonstrated was that if these set theories are consistent (no
contradictions possible), then they must be incomplete. This means that there are well-formed formulas in these
axiomatic systems which can neither be proved nor disproved using the rules of these axiomatic systems. For this
reason, Gödel's result is called Gödel's Incompleteness Theorem. A statement that can neither be proved nor
disproved within an axiomatic system is called undecidable. Gödel in essence showed that any axiomatic system
with sufficient "power" to encapsulate the methods of mathematics must of necessity contain undecidable
statements. The situation is again reminiscent of Cantor's theorem that the real numbers in the open interval (0, 1)
are uncountable. Adding the number b developed by the Cantor "diagonal slash" construction to the list of real
numbers does not make the real numbers countable, since now a "new" b could be generated which is not on the
list. In an analogous fashion, adding either an undeciable statement (or its negation) to the list of axioms to make
a "new" axiomatic system, just generates a "new" undecidable statement within this "new" system!
In particular, Gödel's Incompleteness Theorem guarantees that no finite scheme of axioms can prove or disprove
all the well-formed statements involving the natural numbers! There are statements in any axiomatic formulation
of number theory (probably the most basic subject in mathematics) that are undecidable.
Based on the work of Gödel in 1940 and of Paul Cohen in 1963, the Continuum Hypothesis is undecidable within
standard ZF theory! So much for Hilbert's first problem. This constituted a resolution he probably did not imagine
when he first posed the question. There are some very deep, if not downright disturbing, questions here. Is the
Continuum Hypothesis true? What is meant by the truth of any undecidable proposition? What does this tell us
about the truth of any mathematical statement?