Calculus 1 - Spring 2019 Section 2: Jacob Shapiro April 15, 2019
Calculus 1 - Spring 2019 Section 2: Jacob Shapiro April 15, 2019
Jacob Shapiro
April 15, 2019
Contents
1 Logistics 2
1.1 How these lecture notes are meant . . . . . . . . . . . . . . . . . 3
5 Functions 13
5.1 Functions from R → R . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.1 Basic functions and their shapes . . . . . . . . . . . . . . 19
5.1.2 Trigonometric functions . . . . . . . . . . . . . . . . . . . 19
5.2 Special sets associated with a function . . . . . . . . . . . . . . . 21
5.3 Construction of new functions . . . . . . . . . . . . . . . . . . . . 23
6 Limits 25
6.1 The notion of a distance on R . . . . . . . . . . . . . . . . . . . . 26
6.2 Limits of sequences–functions from N → R . . . . . . . . . . . . . 28
6.3 Limits of functions from R → R . . . . . . . . . . . . . . . . . . . 37
8 Derivatives 50
8.1 Application: Minima and Maxima . . . . . . . . . . . . . . . . . 68
8.2 Convexity and concavity . . . . . . . . . . . . . . . . . . . . . . . 75
8.3 Application: Newton’s method . . . . . . . . . . . . . . . . . . . 78
8.4 Application: Linear Approximation . . . . . . . . . . . . . . . . . 79
1
9 Integrals 80
9.1 The supremum and infimum . . . . . . . . . . . . . . . . . . . . . 80
9.2 The Darboux integral . . . . . . . . . . . . . . . . . . . . . . . . 81
9.3 Properties of the integral . . . . . . . . . . . . . . . . . . . . . . 89
1 Logistics
• Instructor: Jacob Shapiro [email protected]
• Course website: http://math.columbia.edu/~shapiro/teaching.html
2
– Mat: Fridays 10am-12pm.
– Donghan: Thursdays 4pm-6pm.
– Ziad: Tuesdays 4pm-6pm.
• Misc. information: Calculus @ Columbia http://www.math.columbia.
edu/programs-math/undergraduate-program/calculus-classes/
• Getting help: Your best bet is the office-hours, and then the help room.
If you prefer impersonal communication, you may use the Piazza website
to pose questions (even anonymously, if you’re worried). TAs will monitor
this forum regularly.
3
Strictly speaking the material of calculus really starts in Section 4 onward
(Section 2 is a philosophical motivation and Section 3 sets up the language and
notation which is the basis of how we think about the various objects we deal
with).
Since calculus is not a proof-oriented class, most of the statements in this
text are not substantiated by a demonstration that explains why they are correct
(i.e., a proof), be it formal or not. Sometimes I chose to ignore this principle
and include the proof in the body of the text anyway, mostly because I felt
the demonstration was not very much beyond what would be required of an
average calculus student, and to cater to those readers who want to go a bit
deeper. The reader can recognize very easily these proofs because they start
with a Proof and are encased in a box. The contents of these proofs is not part
of the curriculum of the class and will not be required for the midterms or final.
4
reality at all. They are essentially studying the abstract structures that they
themselves have invented.
Imagine that you are playing the popular board game called monopoly. It
is loosely based on an economic system but at the end of the day it is a game
with rules who were invented by people and played by people. We could spend
time studying and exploring the various possibilities that arise as one plays the
game of monopoly. This would be one form of mathematical activity.
One of the main “mechanisms”, so to speak, of math making, is having an
arc to the story: we start with the given structure, and extract out of it certain
constraints that must hold given this structure. This is the basic mechanism of
logic, where for example if we say “this person is a student” and “students attend
lectures” we realize it must be the case that “this person attends lectures”. It is
this process that we will go through again and again, first describing the struc-
tures which we encounter and then “extracting” out of them new constraints.
Math can be done strictly with words (as I’ve been describing it so far) and
indeed this was mostly the approach taken in previous centuries. However, more
and more mathematicians realized that it is more efficient to use abbreviating
graphical symbols to lay down the abstract objects, structures and relations
of math. One writes down these graphical symbols on a piece of paper, on a
blackboard, or increasingly, into a computer, and this is a crucial way in which
we communicate about math nowadays, interlacing these graphical messages
within paragraphs of text which are supposed to introduce the rationale and
heuristics of what is really happening with the graphical symbols.
The graphical symbols are roughly organized as follows:
• A Latin or Greek single alphabet letter to denote the objects: a, b, c, · · · , x, y, z.
• Punctuation marks, “mathematical symbols” denote structures and rela-
tions: () , %, ∗, +, /, · · · , <, =.
• Of course we have the numbers themselves, which for our sake can be
thought of again as abstract objects but with honorary special labels:
1, 2, 3, . . . .
5
The main tool of calculus is the mathematical concept of a limit. The limit
has a stringent abstract definition using the abstract language, but intuitively it
is the end result of an imagined process (i.e. a series of steps) where we specify
the first few steps and imagine (as we cannot actually) to continue the process
forever and ask what would be the end result.
2.1 Example. Let us start with 1, then go to 12 , then 13 , 14 , and so on. Now
imagine that we continue taking more and more steps like this. What would be
the end result? The answer is zero, even though zero is never encountered after
any finite number of steps of this activity.
The concept of the limit is at the heart of anything we will do in this class,
and in particular, taking limits of sequences of numbers, where we have given
rules for generating the sequences of numbers (these are called functions).
Let us start now slightly more formally from the beginning.
{ a, b } versus { b, a }
but we declare that these two graphical symbols refer to one and the same
thing. Sometimes it is convenient to use three dots to let the reader know that
a certain number of steps should continue in an obvious fashion. For instance,
it is obvious that { a, b, . . . , f } really means { a, b, c, d, e, f }. Other times the
three dots mean a hypothetical continuation with no end, such as the case of
6
{ today, tomorrow, the day after tomorrow, . . . } where it is clear that there will
not be a final step to this process (setting aside fundamental questions about
compactness of spacetime and the universe), and that’s OK, since we actually
want to consider also hypothetical procedures. Such hypothetical procedures
are at the heart of limits, which lie at the heart of calculus. We say that such
sets, whose construction is hypothetical, have infinite size.
We often consider sets whose elements are numbers, as numbers for us are
currently just abstract mathematical objects, there should be no hindrance to
consider the set { 1 } if we also consider the set { a }, after all the graphical
symbols 1 or a are just labels. Using the bracket notation, we agree that there
is no “additional” meaning to the graphical symbol { a, a }, i.e., it merely means
the same thing as { a }.
Since sets themselves are abstract mathematical objects, we can just write
some letter, such as A or X or even a, to refer to one of them, rather than
enumerating its elements every time. Since what we mostly care about when
dealing with sets are their contents, i.e., the list of elements, it is convenient to
also have a graphical symbol to state whether an object (an element) resides in
a set or not. This is denoted via
Note that using this graphical notation, it is clear that whatever appears to
the right of ∈ or ∈
/ must be a set.
Since all we know about sets is that they contain things, then we can “specify”
a set A by simply enumerating its contents, e.g. by using the curly-bracket
graphical notation. The special way to say that is using the equal = symbol:
It is also convenient to have graphical notation that builds new sets out of
pre-given ones. For example:
• Union with the graphical symbol ∪: A ∪ B is the set containing all ele-
ments in either A or B. Example: A = { x, y }, B = { β, }, A ∪ B =
{ x, β, y, } (as we said, order doesn’t matter, and we are also not bound
to use Latin alphabet for labels of abstract objects.
7
• Set difference with the symbol \: A \ B is the set containing elements in A
which are not in B. Example: A = { 1, 2 }, B = { 2 } gives A\B = { 1 }.
But B \ A = ∅.
• Intersection, with ∩: A ∩ B are all elements that are in both A and B.
Example: A = { 1, 2 }, B = { 2 } gives A ∩ B = { 2 } but A = { 1 } and
B = { 2 } gives A ∩ B = ∅.
Another important relation is not just between objects and sets, but also be-
tween sets and sets. Sometimes we would like to know whether all elements in
one set are also in another set. This is called a subset and is denoted as follows
To verify that two sets A and B are the same (since what defines them is
the list of elements they contain), we must make sure of two things: A ⊆ B and
B ⊆ A. Hence
(note ∧ is the graphical symbol for the logic of ’and’, and we will also substitute
⇔ for ’means’).
3.1 Example. (A ∩ B) ∩ C = A ∩ (B ∩ C) (that is, the order of taking inter-
section does not change the end-result set). To really see why this is true, let
us proceed step by step. Suppose that x ∈ (A ∩ B) ∩ C. That means x ∈ A ∩ B
and x ∈ C. But x ∈ A ∩ B means x ∈ A and x ∈ B. Hence all together we
learn that the following are true: x ∈ A, x ∈ B and x ∈ C. This would be the
same end conclusion if we assumed that x ∈ A ∩ (B ∩ C). What we have learnt
is that x ∈ A ∩ (B ∩ C) whenever x ∈ (A ∩ B) ∩ C for any x. This is what we
said A ∩ (B ∩ C) ⊆ (A ∩ B) ∩ C means. This is half of the equality = statement.
The other half proceeds in the same way.
{{ { a, 1 } , { x, 2 } } , { { b, 1 } , { x, 2 } } , { { c, 1 } , { x, 2 } } , . . .
. . . , { { a, 1 } , { y, 2 } } , { { b, 1 } , { y, 2 } } , { { c, 1 } , { y, 2 } } , . . .
. . . , { { a, 1 } , { z, 2 } } , { { b, 1 } , { z, 2 } } , { { c, 1 } , { z, 2 } } , . . . }
The point here is that in addition to form pairs, we also (arbitrarily) re-
fer to A as the first origin set, hence the 1 and B as the second origin
set, hence the 2, so that we are keeping track not only of the objects
in each pair, but from which origin set they’ve come from. In this way,
{ { a, 1 } , { x, 2 } } tells us immediately that a belongs to A and x belongs
to B.
8
Because it is exhausting to write so many curly brackets, we agree on a
graphical notation that (,, ) means { { ,, 1 } , { , 2 } } for any , ∈ A
and ∈ B. (,, ) is called an ordered pair. Example: { 1, 2 }×{ N, H } =
{ (1, N) , (2, N) , (1, H) , (2, H) }.
Clearly, (,, ) 6= (, ,), because { { ,, 1 } , { , 2 } } 6= { { ,, 2 } , { , 1 } }–
we can change orders within the curly brackets as we please, but we can’t
change move objects across curly brackets.
Another piece of notation that we want to discuss with sets involves their size.
The set { a } contains one object (no matter what a is) so we say its size (i.e.
the number of objects it contains is 1). Graphically we write two vertical lines
before and after the set in question to refer to its size, i.e.
|{ a }| = 1
|{ a, b }| = 2
...
|{ a, b, . . . , z }| = 26
|{ 1, 2, 3, . . . }| = ∞
3.2 Definition. When a set is of size one, that is, when it has only one element,
we call it a singleton. Any set of the form { a } for any object a is a singleton.
In fact it is possible to turn the picture upside down, so to speak, and define
the numbers 1, 2, 3, . . . not as intrinsic abstract objects (how we used to think
about them so far) but as associated with a hierarchy of sets starting from the
empty one, with a natural association between the number we are naively used
to and the size of the constructed set:
1. Zero is associated with the set ∅, and we have |∅| = 0. We define zero to
be the empty set, making the empty set (rather than zero) the more basic
object and zero a derived object.
2. One is associated with the set { ∅ }. It is a singleton.
3. Two is associated with the set { ∅, { ∅ } }
4. etc.
9
our main way to describe sets (whether they are themselves subsets of other
sets or not doesn’t matter) is to enumerate their contents. For example:
B = { a ∈ A | a is American }
{ 1, 2, 3, 4, 5, . . . }
Note how now that we use the dots this means that the set has actually an
infinite number of elements. This is fine–in fact this is part of what makes
calculus interesting at all. This set above is called the natural numbers and is
denoted with the special graphical symbol N (blackboard N):
N = { 1, 2, 3, 4, 5, . . . } .
10
This set is called the integers and is denoted by Z. Finally we would like to
include fractions as well
1 1 2 2 3 3 3
0, ±1, ± , ± , . . . , ±2, ± , ± , . . . , ±3, ± , ± , ± , . . .
2 3 3 4 2 4 5
i.e. any number that can be written in the form pq with p ∈ Z and q ∈ Z. In set
builder notation we would write
p
p ∈ Z and q ∈ Z
q
These are called the rationals and denoted by Q (for quotient). By the way,
rationals are not named such for being extra reasonable. The etymology is from
the word ’ratio’ which is also a quotient. There is also a decimal notation with
finitely many decimal digits after the point or periodic repeating, but let us skip
over that for now.
It turns out that there are certain numbers that exist (for example they
come from geometry or from physics) yet they are not in Q–they are irrational.
All of these numbers (which turn out to be the vast majority of all numbers,
where majority is meant in a certain sense, as we are trying to compare different
notions of infinity) are denoted by R and are called real numbers.
4.2 Remark. We already saw schematically (though not precisely) that there is
a way to build from the empty set all of N. There is also a concrete and precise
way (out of sets and manipulations of them) to construct Z out of N, Q out
of Z and R out of Q. We will not do so in this class as this material belongs
to a field of mathematics called analysis. If you are curious look at [4] under
Dedekind cuts.
4.3 Example. To give example for certain numbers in R \ Q, consider the
ratio between the circumference of a circle and its diameter. The ancient Greek
realized a while back that this number cannot be written in the form pq for some
p, q ∈ Z. To see this fact requires actually some work and preparation.
4.4 Example. The square root of the a number is the answer to the question √
“what number do we multiply√ √by itself to get what we started
√ with?”. So 4 = 2
because √2 × 2 = 4, i.e. 4 × 4 = 4. √ Can we express 2 in a simple way too?
Clearly, 1 = 1 because 1 × 1 = 1, so 2 must be somewhere between 1 and 2.
If we take the middle 1.5 = 32 we get 1.5 × 1.5 = 94 = 2.25, so 1.5 is already too
much. What√ about 1.4? √ 1.4 × 1.4 = 1.96, so that’s already too little! It turns
out that 2 ∈ / Q, i.e. 2 ∈ R\Q. To see this, assume otherwise. Then we have
p p
q × q = 2 for some p, q ∈ Z. If both p and q are even, we can divide both by
2 and get the same number, so let us assume we have done that so that now
2
2 = pq2 with p, q integers not both even. This is the same as p2 = 2q 2 . That
means that p2 is even, i.e. it is of the form 2x for some x ∈ Z. This implies
that p is even, i.e., it is of the form 2y for some y ∈ Z (if p were odd, it would
2
be of the form 2y + 1, and then p2 = (2y + 1) = 4y 2 + 4y + 1, which is odd!)
11
That means that actually p2 = 4y 2 for some y ∈ Z, i.e. 4y 2 = 2q 2 . But then
2y 2 = q 2 , that is, q 2 is even, which implies q is even (as before). So both p and
q are even!
So Q has some “holes”, and the purpose of using R is to have a set that
contains everything. Indeed the whole point of calculus is limits, and the whole
point of limits is to continue procedures hypothetically with no end. These
hypothetical procedures are precisely where we may suddenly find ourselves out
of Q.
It is that everything set, R, that we geometrically interpret as a continuous
line, i.e. we associated the lack of holes with the concept of continuum. That
is why in physics when we think of a continuous time evolution, for instance,
we model the set of possible times as R. When we think of the set of possible
heights a ball could take as it is thrown up the air, we model that set as R, since
we imagine physical space to be a continuum with no holes, and N, Z and Q
cannot be appropriate to describe the set of all possible physical outcomes. So
you should have in your mind a picture of an infinite straight continuous line
when you think of R.
As we have seen, we also can consider products of sets, and so R × R could
be considered the set of all possible pairs of continuum values, that is, a plane
of continuum. For convenience we write R2 instead of R × R. This set of pairs
should be geometrically pictures as an infinite plane. Physical space, everything
around us, is R3 (at least in Newtonian mechanics).
4.1 Intervals
Sometimes it is convenient to specify subsets of R, which are intervals. Given
any two endpoints a ∈ R and b ∈ R such that a < b, we define the following sets
[a, b] := { x ∈ R | a ≤ x ≤ b }
(a, b) := { x ∈ R | a < x < b }
(a, b] := { x ∈ R | a < x ≤ b }
[a, b) := { x ∈ R | a ≤ x < b }
The first of which is called the closed interval between a and b, the second of
which the open interval between a and b. The last two don’t have special names.
Sometimes it is useful to have the restriction only on one side to obtain a
half-infinite interval, that is, to consider the set of all numbers larger than a for
some a ∈ R. This is achieved in an efficient way via the ∞ symbol as follows
(a, ∞) := { x ∈ R | x > a }
(−∞, a) := { x ∈ R | x < a }
[a, ∞) := { x ∈ R | x ≥ a }
(−∞, a] := { x ∈ R | x ≤ a }
12
5 Functions
Given two sets A and B, we may wish to construct a rule, or a way to map,
objects from A onto objects from B. For instance, if A = { a, b, c } and B =
{ x, y, z } then we may wish to “send” a to x, b to y and c to z. This rule defines
what is called a function, also referred to as a map. We can think of various
other functions from A to B, each one is distinct if it has a different way to map
the objects around. For example, consider the function which “sends” a to z, b
to y and c to x. It is yet another possible way to map the objects of A onto
those of B.
If we write a table of A and B laid out together in perpendicular directions
B ↓;A → a b c
x
y
z
then we may fill in the interior of the table with objects of the product,
A × B:
A×B a b c
x (a, x) (b, x) (c, x)
y (a, y) (b, y) (c, y)
z (a, z) (b, z) (c, z)
Then we can think of the first function we described, i.e. that sending a to
x, b to y and c to z as a way to pair elements of A and B, that is, as a subset of
A × B, namely, the subset { (a, x) , (b, y) , (c, z) }. Looking at the table above,
we can identify the function by coloring in red these pairs of the table
A×B a b c
x (a, x) (b, x) (c, x)
y (a, y) (b, y) (c, y)
z (a, z) (b, z) (c, z)
Similarly, the second function, that mapping a to z, b to y and c to x can be
associated with the list of pairs { (a, z) , (b, y) , (c, z) } and on the table of A × B
would this subset of pairs, associated to those pairs colored red looks like:
A×B a b c
x (a, x) (b, x) (c, x)
y (a, y) (b, y) (c, y)
z (a, z) (b, z) (c, z)
However, not all sets of pairs constitute function. The point is that by
considering the concept of functions, we are interesting in giving a rule, or a
guide to go from A to B. That means in particular that this rule should be
unambiguous, so that we don’t get stuck trying to decide. So the following list
of ordered pairs { (a, x) , (b, y) , (c, z) , (a, z) }, represented in the table as
A×B a b c
x (a, x) (b, x) (c, x)
y (a, y) (b, y) (c, y)
z (a, z) (b, z) (c, z)
13
does not constitute an appropriate function, because it tells us simultane-
ously to send a to x as well as to send it to z. So we don’t know where to send
a. Hence a function should send objects of a to only one place, which means
that the set of pairs encoding the function shouldn’t have two pairs with the
same first component and different second component (e.g. (a, x) and (a, z)).
The converse, however, is perfectly fine. That is, the function { (a, x) , (b, x) , (c, x) },
which sends all of the elements of A to the same spot in B, is a perfectly fine
function.
Finally, we want to make sure that we know at all where to map any given
element, so if there is some element of the origin set that doesn’t exist as the
first fact of some pair in the set of pairs, we won’t know where to map it. We
agree to exclude such scenarios from “appropriate functions”.
Given these considerations, we make the
5.1 Definition. Given two sets A and B, a function f from A to B, written as
f : A → B, is a set of unambiguous rules to associate objects of A with objects
of B, i.e. it is a subset of pairs, i.e. of A × B, such that no two pairs have
the same first component and different second component, and such that all
elements of the origin set are covered as one enumerates all first components of
all pairs. The set A is called the domain of f , the set B is called the co-domain
of f . Sometimes one refers to the graph of f as the that subset of A × B which
specifies it.
Be ware the discrepancy between the intuitive meaning of the word graph
(we think of a geometric object) and the technical meaning given above (an
abstract subset of pairs). This will come up again and again in math, the
distinction between intuitive meanings of words from our daily lives and their
actual technical definition.
Let us introduce some graphical notation which will be used throughout the
course for functions:
a 7→ x
or
A 3 A 7→ x∈B
f (a) = x
fa = x
14
This is mainly used when A ∈ N, or when A is of the form A = T ×X
for two sets T and X, and instead of writing f ((t, x)) for some t ∈ T
and x ∈ X one writes ft (x).
5.2 Example. If the domain of a function, A, is empty, i.e. A = ∅, then
there are not many choices (since there is nothing to map) and so it suffices to
write f : ∅ → B (for any B), and there is just one unique function with this
domain. Similarly, if B contain only one element, then there is again nothing
to desscribe, because we have no choice. E.g. A = N, B = { / }, then we
know what f : A → B does. It merely converts any number into a /. This can
graphically be written as
f (n) = / for any n ∈ N
5.3 Example. If both the domain and co-domain are the same set, f : A → A,
then there is a special function which sends each element to itself. This is known
as the identity function, and is denoted as 1 : A → A for any set A. We have
1:A → A
a 7→ a
When the domain or codomain are rather large (think infinite), e.g. one
of our special sets of numbers, it sometimes becomes easier to give a formula
for what f does rather than specify one by one how it acts on each different
element, or enumerate a list of pairs (indeed, that would be literally impossible
for infinite sets). Consider the function
f :N → N
which adds 1 to any given number. So 1 7→ 2, 2 7→ 3, 3 7→ 4 and so on. An
easy way to encode that is to use variables, i.e. objects which are placeholders
for elements of a certain set1 . A variable is thus any object we could pick from
a certain set. For example, a variable n in the natural numbers is any choice
n ∈ N. We could have n = 5, n = 200 or n = 6000000 (but not n = −1, since
N contains only strictly positive integers). The point is, it is convenient not to
specify which element it is and work with a generic unspecified element. Once
we have a variable, we can easily write down the action of f as a succession of
algebraic operations, i.e. a formula in that variable:
f (n) = n + 1 for any n ∈ N
This specifies in a formula the same verbal description we gave earlier. The
variable is also called the argument of the function.
5.4 Remark. The most common and efficient way to describe a function is to
write two lines of text:
f :A → B
a 7→ some formula of a
1 We already encountered variables when we discussed set-builder notation
15
where the first line tells us the label of the function (in this case f ), the domain,
i.e. the origin set A, the codomain, i.e. the destination set B, and the second
line tells us how to map each object of A into B. In this case the second line is
in the form of a formula, but one could just as well list all possible mappings of
elements in A.
We can quickly run into problems with math, just as we would with natural
language. It doesn’t make sense to write “dry rain” even though we can easily
juxtapose the two words together. In the same way, if we try to write down
f :N → N
f (n) = n − 1 for any n ∈ N
we quickly realize this makes no sense! The reason is that for certain n ∈ N,
namely, for 1, if we apply the formula, we actually land outside of N, because 0 ∈
/
N! That means that the formula-way of describing functions can be dangerous,
that is, it can quickly lead us to write down nonsense. This is a manifestation of
the fact that just because we have a language with rules doesn’t mean that every
combination of any phrase will make sense. We still must be careful, especially
as we build shortcuts.
Here is another example:
f :N→ R
√
n →
7 n
Note that the same formula√ would not make sense with R replaced by N or even
Q, as we just learnt (e.g. 3 ∈
/ Q)!
5.5 Definition. When a function f : A → B has its domain A = N, i.e. the
natural numbers, one often calls that function a sequence and one writes its
argument in subscript notation, i.e.
√
fn = n
We can write many complicated formulas. For example, we can write what
is known as a piecewise formula:
a:R → R
(
x x≥0
x 7→
−x x<0
this means that before we apply the formula we must verify some conditions.
Sometimes it is helpful (though usually not unambiguous) to also sketch a
function. A sketch is the graphical arrangement of all possible values it could
take given all possible inputs. We have already seen how to do this in a rather
rudimentary way using the colorings of the graph of a function within the table
of A × B above. A sketch of a function is thus a way to geometrically draw the
graph of the function.
16
Figure
( 1: A plot of the graph of the function { 1, 2 } → { 1, 2 } given by
1 7→ 2
.
2 7→ 1
x, x2 ∈ R2 x ∈ R
Since R may be pictured as an infinite line, R×R = R2 , the set of all pairs, should
be pictured as an infinite two-dimensional plane, in which case the graph of a
function f : R → R is a curve in that plane. The fact it is a curve, and nothing
else, is related to the fact that no two pairs have the same first component.
The shape of that curve is what we care about when drawing the graph of the
function, which is the sketch of that function. In the particular case of x, x2
for any x ∈ R, the shape of the curve is that of the familiar parabola as in
Figure 2.
5.9 Definition. The function
a:R → R
(
x x≥0
x 7→
−x x<0
17
Figure 2: The parabola x 7→ x2 .
18
Figure 4: The constant function R 3 x 7→ c ∈ R for all x ∈ R, for some constant
c ∈ R.
f (x) = c (x ∈ R)
Graphically this function looks like a flat horizontal line at the height c as in
Figure 4.
5.16 Definition. Let a, b ∈ R be given (note this is short-cut notation for
a ∈ R and b ∈ R). Then the linear function f : R → R associated with a and b
is given by
f (x) = ax + b (x ∈ R)
Graphically this function looks like a straight line at an angle. The parameter
b sets the line’s height when it meets the vertical axis, and the number − ab is
where it meets the horizontal axis:
f (x) = ax2 + bx + c (x ∈ R)
One can of course go on with these to any highest power of x, e.g. f (x) =
x100 which looks like this:
19
We know the entire circumference of the whole circle is 2π where π is some
special irrational number equal to about 3.14 which we cannot write out explic-
itly. Let us traverse, along the circle, starting from the point (1, 0) on the plane,
an arc of arc-length α, for some 0 ≤ α < 2π, and draw a right triangle whose
base is along the horizontal axis, has a point on the circle after arc-length α and
another vertex at the origin
The sinus function is defined as the height of this triangle (as a function of
α), and the cosine function is defined as the base length of this triangle (as a
function of α). Since we are on a circle, it makes sense to agree that after α > 2π
the sine and cosine functions assume the same values as if we were calculating
them with α − 2π, and similarly for α < 0, so that we get a definition for the
whole of R of a periodic function. Things to note:
1. cos (0) = 1, sin (0) = 0
2. cos π2 = 0, sin π2 = 1.
20
The cosine looks like this:
Note: we will not use the word “range” in this course, as it is ambiguous and
sometimes conflated with either co-domain or image.
21
5.19 Definition. Let f : A → B be a function between two sets A and B and
let S ⊆ A be a given subset. Then the image of S under f is the following
subset of B:
f (S) := { b ∈ B | There is some a ∈ S such that f (a) = b }
= { f (a) ∈ B | a ∈ S }
Note that in this graphical notation we use the braces notation on a whole set
rather than an object, and the result is then a set, rather than an object! This
notation can be confusing. Using this notion we can identify
f (A) = im (f )
5.20 Definition. Let f : A → B be a function between two sets A and B and
let S ⊆ B be a given subset. The pre-image of S under f is the following subset
of A:
f −1 (S) = { a ∈ A | f (a) ∈ S }
Note the introduction of a new notation: for the pre-image of a function f ,
we use the graphical symbol f −1 . Again this is a funny notation in the sense
that we plug in a set into f −1 and get back a set. Despite the notation, f −1 is
not a function. Of course for a function f : A → B we have f −1 (B) = A by
definition.
5.21 Definition. A function f : A → B is called surjective if im (f ) = B. That
means there are no elements of B left “uncovered” by f .
5.22 Definition. A function f : A → B is called injective if
−1
f ({ b }) ≤ 1 (b ∈ B)
which means that every point of B gets covered at most once (if not never) by
f . In other words, no two elements of A get sent to the same element of B, that
is, every destination point has a unique origin point, if it is in the image of f .
5.23 Definition. A function f : A → B is called bijective if it is surjective and
injective. Bijective functions should be thought of as reversible, because they
don’t lose information.
5.24 Example. The constant function f : R → R, x 7→ 5 for all x ∈ R
is not surjective, since im (f ) = { 5 } 6= R. It is not injective because while
f −1 ({ x }) = ∅ for all x 6= 5 and |∅| = 0, f −1 ({ 5 }) = R and |R| = ∞ > 1.
5.25 Example. The linear function f : R → R, x 7→ 5x for all x ∈ R is
bijective, because
im (f ) = { 5x | x ∈ R }
= R
−1
({ x }) = { y ∈ R | 5y = x } = 51 x which is of size one.
and f
What about the absolute value function?
22
5.3 Construction of new functions
5.26 Definition. Given two functions f : A → B and g : B → C, we define
their composition, denoted as g ◦ f , as a new function A → C given by the
formula
which first applies f , and then g (considered as rules), all together passing
through B but ultimately producing a route (i.e. a function) from A → C. We
also can compose a function itself, if its co-domain is equal to its domain: if
f : A → A then
fn := f ◦ f ◦ · · · ◦ f
| {z }
n times
for any n ∈ N.
5.27 Example. If f : R → R is given by x 7→ sin (x) then f ◦ f : R → R is the
function given with the formula x 7→ sin (sin (x)) for any x ∈ R (we don’t ask
what that means geometrically).
5.28 Definition. A function f : A → B is called left-invertible iff there is some
other function g : B → A such that g ◦ f = 1 where 1 : A → A is the identity
function discussed above. Conversely, f is called right-invertible iff there is some
other function h : B → A such that f ◦ h = 1 where 1 : B → B is the identity
function. If f is both left and right invertible we call it invertible, and then the
left and right inverse are equal and unique g = h, in which case we denote that
inverse by f −1 = g = h (not to be confused with the pre-image notation, and
also not to be confused as an algebraic operation–we are not dividing anything
by anything else, this is merely graphical notation), so that by definition
f ◦ f −1 = 1B
−1
f ◦f = 1A
These last two equations are interesting, because they tell us that functions
themselves (rather than objects, numbers, or sets) are equal. But since we have
a precise way to think of functions as sets themselves, this is perfectly fine. Also
we use the short-hand notation of 1A to denote the (unique) identity function
A → A for any set A.
What kind of relationship is there between left or right invertibility and
injectivity surjectivity?
5.29 Definition. Given any function f : R → R, we can quickly define a new
function by an algebraic formula on f itself. For example, the function f + 3
has the formula
(f + 3) (x) = f (x) + 3 (x ∈ R)
23
Sometimes these shortcuts don’t always make sense and one has to be careful,
for example, with f1 . Other times the notation itself becomes ambiguous, for
2
example, f 2 could either mean f (f (x)) for any x ∈ R or it could mean (f (x))
for any x ∈ R. So in such cases one has to write out in words what one means.
Another possible confusion is with f −1 . Usually it means either the pre-image
or the (unique) inverse of a function, as defined above, if it exists. It usually
does not mean the function
1
x 7→ (x ∈ R)
f (x)
x 7→ f (x) + g (x) (x ∈ R)
f |X (a) := f (a) (a ∈ X)
So f |X and f have the same formula, but the former is restricted to act on a
smaller subset. This is sometimes a useful notion when considering the proper-
ties of functions, some of which may only hold on a subset but not on the whole
domain.
5.31 Example. Pick any number a ∈ R which is strictly positive, a > 1.
Consider the function expa : R → R which is given by
expa (x) := ax (x ∈ R)
24
It turns out that even if x ∈ R\Q one could proceed, via a limit procedure that
makes the sketch of expa look smooth when plotted on R (using the basic fact
that any element x ∈ R\Q has an element y ∈ Q arbitrarily close to it, so
intuitively, we define expa (x) as expa (y) (which we know how to compute)
where y ∈ Q is arbitrarily close to x ∈ R\Q).
As define, expa : R → R is not surjective, and hence not bijective. Indeed,
it is always larger than zero. That is, we have
im (expa ) = (0, ∞)
So we change the definition expa : R → R by modifying the co-domain to be
(0, ∞):
expa : R → (0, ∞)
to be defined by the same formula as before, and get a surjective function.
Actually expa is also injective. Indeed, we can verify this by verifying that if
expa (x) = expa (y) for some x, y ∈ R, then ax = ay (in HW1 you learn this is
one possible criterion for injectivity). Divide both sides of the equation by ay
to get ax a1y = 1. The basic rules of exponentiation imply now that ax−y = 1.
However, we know that only when exponentiating some number which is strictly
larger than 1 to power zero we get back 1, so that x − y = 0 necessarily. So
that means x = y and hence expa is indeed injective. Since expa : R → (0, ∞)
is injective and surjective, i.e. bijective, you learn in HW1 that means it has a
unique inverse exp−1a : (0, ∞) → R. This inverse is called the logarithm with
base a, and is denoted by loga : (0, ∞) → R.
5.32 Exercise. Both cos and sin functions when defined from R → R are not
injective nor surjective. However, one may modify both domain and co-domain
to make them bijective. How?
6 Limits
At the heart of calculus is the notion of a limit. The limit is a way to consider
a hypothetical process that cannot actually be carried out but whose result still
may have meaning. We have already encountered such hypothetical processes
when we first considered the set
N ≡ { 1, 2, 3, . . . }
where the dots mean the hypothetical process of continuing the list with no end.
Since this is not technically possible, this is merely a hypothetical notion. And
yet it is useful for us to collect together in one set all possible natural numbers,
which really just means that whatever large number one can think of, it is part
of N.
Yet another example that we already encountered was the hypothetical result
of a process of enlisting fractions with increasing denominators, i.e., the sequence
1 1 1 1
1, , , , , . . .
2 3 4 5
25
1.0
0.8
0.6
0.4
0.2
2 4 6 8 10
1
Figure 5: A plot of the graph of N 3 n 7→ n ∈ R.
which has no end. Since it has no end, the final result of this process is merely
hypothetical. And yet intuitively it is clear that the end result will be zero,
which really just means, whatever small number you can think of, one can find
a step in this process which will be smaller than that given number.
26
this is really true is to divide the analysis into cases. The easiest case is
that all three numbers are different and obey x < z < y. Then we have
d (x, y) ≡ |x − y|
(From the definition of the absolute value, because y > x)
= y−x
= y−z+z−x
(From the definition of the absolute value, because y > z and z > x)
= |y − z| + |z − x|
Indeed, from the definition of the absolute value, we know that d (x, y) ≡ |x − y|
is equal to x − y if x > y and y − x if y > x. Hence, either x > y, |x − y| = x − y,
and then x − y < α, or x < y, |x − y| = − (x − y), and then because α > 0 and
x − y < 0 we get x − y < 0 < α or just x − y < α. This shows you that the first
inequality holds. The second one proceeds similarly.
6.2 Claim. The distance function is translation invariant. That is: d (x, y) =
d (x − z, y − z) for any three numbers x, y, z ∈ R. To see this, we write out the
definition
d (x, y) = |x − y|
= |x − y + z − z|
= |x − z − (y − z)|
= d (x − z, y − z)
and
|x| = |x − y + y|
(Regular triangle inequality)
≤ |x − y| + |y|
27
so we have
|x| − |y| ≤ |x − y|
By symmetry (running the same argument after having exchange x with y) we
also have
|y| − |x| ≤ |x − y|
which is equivalent to (by multiplying the inequality by minus one):
|x| − |y| ≥ − |x − y|
so we conclude by Claim 6.1 that
||x| − |y|| ≤ |x − y|
which is what we wanted to show.
28
1. The sequence “converges” to some number c ∈ R as we plug in larger and
larger arguments, as was the case in the first example.
2. The sequence keeps jumping back and forth no matter how far we go.
3. The sequence keeps growing with no bound–it diverges.
d (a (n) , L) < δ
That is, the distance between a (n) and the number L becomes as small
as one wants–one merely has to go far enough into the sequence, and how far
depends on how small the distance we ask for. Another way to say this is that
a (n) converges to L as n → ∞. The point about this concept is that the
distance between a (n) and L becomes smaller and smaller and smaller. If the
distance is “small”, but remains fixed as we enlarge n, the notion does not apply.
6.5 Remark. It is not possible that a (n) converges to L as n → ∞ and also
a (n) converges to L0 as n → ∞ if L 6= L0 .
Proof. We have for all n ≥ max (N, N 0 ), N being the threshold of distance
δ > 0 for the convergence of a to L and N 0 that to L0 ,
That means that the distance between L and L0 can be made arbitrarily
small, that is, they are equal.
a (n) ≥ M
(respectively a (n) ≤ −M )
6.7 Definition. A sequence a : N → R is said to have no limit (one says the
limit does not exist), if there is no L ∈ R to which it converges, and it does not
go to either ∞ or −∞.
Different notations for this are:
29
1. The limit notation: If L is a limit of a, then we write
lim a (n) = L
n→∞
2. or sometimes
lim a = L
3. or sometimes
a (n) → L (n → ∞)
4. and when it is not important what L is, but only that there is some L like
that, we write that lim a exists.
5. When a goes to infinity, we write
lim a (n) = ∞
n→∞
30
√ √
6.10 Example. Take N 3 n 7→ n + 1 − n ∈ R. Let us write out the first
few elements of this sequence (I used a computer to give approximate values of
the square roots):
0.41, 0.31, 0.26, 0.23, 0.21, 0.19, . . .
it seems to be going down, but does it converge to zero? The answer is yes,
again because as n is very large, the difference between n + 1 and n becomes
insignificant (essentially because n is much larger than 1!) So we try to calculate
√ √ √ √
d n + 1 − n, 0 = n + 1 − n
= (The square root is monotone increasing, so this is positive)
√ √
= n+1− n
a2 − b2
Use the identity a − b =
a+b
n+1−n
= 2
(n + 1) + n2
1
= 2
2n + 2n + 1
Use 2n2 + 2n + 1 ≥ 2n2 for any n
1
≤
2n2
and we get the same story (we can make this arbitrarily small by taking n
arbitrarily large).
6.11 Example. Consider the sequence N 3 n 7→ n ∈ R. Can we show it goes
to infinity? Trivially, because for any big number that we can choose, M ∈ R,
there is some N ∈ N such that all n ≥ N will obey n ≥ M . In particular, take
N to be the first integer larger than M .
6.12 Claim. If a : N → R and b : N → R are two sequences which are equal
except for a finite number of elements, then their limit behavior is identical.
That is, if
a (n) = b (n)
for all n ≥ N , for some N ∈ N, then lim a = lim b if this exists (that is, either
both limits exist and converge to the same finite number, or both limits do not
exist, or both limits diverge to infinity or minus infinity).
Proof. Assume for simplicity that lim a exists and converges to a finite number
(the other cases being similar). Then we want to show lim b exists and equals
lim a. To show that, let us assume that Na (δ) ∈ N is that threshold of a such
that if n ≥ Na (δ) then
31
Then if we pick n ≥ max ({ N, Na (δ) }) we have both d (a (n) , lim a) < δ and
a (n) = b (n), which implies
6.13 Claim. (Algebra of limits) If a, b are two sequences N → R which both have
finite limits, then lim (a
+ b) = lim a + lim b, (lim a) (lim b) = lim (ab). Also, if
lim b 6= 0, then lim ab = lim a a
lim b with the understanding of b being a sequence
that might be defined only after a finite number of terms.
Proof. Let us assume that both a and b have finite limits L1 and L2 . Let us
take the thresholds N1 (δ) , N2 (δ) ∈ N for each of these limits. That means
that given any δ > 0, if n ≥ N1 (δ) then
d (a (n) , L1 ) ≤ δ
d (b (n) , L2 ) ≤ δ
Let us define N (δ) := max ({ N1 (δ) , N2 (δ) }) (i.e. the largest of the two
thresholds, so that if n ≥ N (δ) then automatically both n ≥ N1 (δ) and
N2 (δ)). Then we have, using Claim 6.2
1 1
d (a (n) + b (n) , L1 + L2 ) ≤ δ+ δ=δ
2 2
which proves the first statement about the sum. To get the product, we use
32
Remark 5.10 together with the triangle inequality below to get
q q q 2
2 2 2
d (a (n) b (n) , L1 L2 ) ≤ |L1 | |L1 | + δ − |L1 | + |L1 | + δ − |L1 | + |L1 | + δ − |L1 |
q q
2 2
= 2 |L1 | + |L1 | + δ − |L1 | |L1 | + δ − |L1 |
q q
2 2
= |L1 | + |L1 | + δ |L1 | + δ − |L1 |
2 2
= |L1 | + δ − |L1 |
= δ
To that end,
1 1 1 1
d , = b (n) − L2
b (n) L2
L2 − b (n)
=
b (n) L2
d (L2 , b (n))
=
|b (n)| |L2 |
33
Now, we know that b (n) → L2 as n → ∞ and L2 6= 0 by hypothesis. That
means that
or
for any δ > 0 and we take n to be larger than that threshold, we can conclude
that
1 1
d , ≤δ
b (n) L2
This way of making the proof by assuming that b (n) 6= 0 for all n ∈ N also
tells us how to proceed in the other case. Indeed, we have just shown that
due to lim b 6= 0, there is a certain threshold, above which, b (n) 6= 0. So even
if that’s true in the beginning, Claim 6.12 shows it doesn’t matter.
Proof. For convenience let l := lim a = lim c. Then for any δ > 0
|b (n) − l| < δ
34
However, we have a → l and c → l, so for any δ > 0 we can find N large
enough such that if n ≥ n then |a (n) − l| and |c (n) − l| are both smaller than
δ, that is, again by Claim 6.1, equivalent to
so we find, b (n) ≤ c (n) < l + δ and b (n) ≥ a (n) > l − δ which means that
|b (n) − l| < δ for all n ≥ N (the same threshold of both a and c). Since δ > 0
was arbitrary we are finished.
6.15 Remark. If we have two sequences a, b : N → R such that a (n) < b (n)
for any n ∈ N and such that both limits exist, we can “take the limit of the
inequality” and the inequality will still hold (though it stops being strict):
lim a ≤ lim b
for any δ > 0, for n large, and if we pick for example δ := − 21 lim c, we get
that c (n) is strictly negative, which cannot be.
The other cases follow easier reasoning.
6.16 Claim. We have the following special sequences, where α, p ∈ R and p > 0
1. If a : N → R is given by a (n) := n−p then
lim a = lim n−p
n→∞
= 0.
1
2. If a : N → R is given by a (n) := p n then
1
lim a = lim p n
n→∞
= 1.
1
3. If a : N → R is given by a (n) := n n then
1
lim a = lim n n
n→∞
= 1.
35
nα
4. If a : N → R is given by a (n) := (1+p)n then
nα
lim a = lim n
n→∞ (1 + p)
= 0.
a (n + 1) ≥ a (n) (n ∈ N)
If a is not bounded (as in Definition 5.11) then this fits the definition of a
sequence that diverges to infinity Definition 6.6. Then assume otherwise that
a is bounded by some constant M ≥ 0. Consider the set of numbers
which is bounded by M from above and by a (1) from below due to the
monotonicity assumption. It is a fact that any bounded subset S ⊆ R has
what is called a least upper bound, denoted by sup (S), which is the smallest
possible upper bound on it. That is, it is an upper bound, and it is the
smallest in the set of all upper bounds. We will show that lim a exists by
showing that lim a = sup (im (a)) in this case.
First we need what is called the approximation property for the supremum.
It says the following: For any bounded set S ⊆ R, and for any ε > 0, there
is some element sε ∈ S such that sup (S) − ε < sε . Indeed assume otherwise.
Then there is some ε0 > 0 such that for all s ∈ S, sup (S) − ε ≥ s. But
then sup (S) − ε is an upper bound on S and since ε > 0, sup (S) − ε <
sup (S) so that sup (S) is not the least upper bound. Hence we have reached
a contradiction.
Using the approximation property for the supremum, let us return now
to the question of existence of lim a and its equality to sup (im (a)). Let
δ > 0 be given. Then we know by the approximation property that there is
some nδ ∈ N such that sup (im (a)) − δ < a (nδ ). Due to the monotonicity
assumption this implies that for all n ≥ nδ we have
But also, from the fact that sup (im (a)) is an upper bound on im (a) it follows
36
that for any n ∈ N,
37
1.2
1.0
0.8
0.6
0.4
0.2
2 4 6 8 10
Hence it is clear that now for functions whose domain is R or a subset of it,
we need to measure the distance in the domain as well and not just let the
argument go to infinity (in the language of the previous section that mean all
n above a certain threshold N ). This gives us the following table of options for
the limit of a function f : R → R:
1. Probe the function at some point x ∈ R (which might not lie inside its
domain strictly speaking).
2. Probe the function at +∞ (this was the only thing which has an analogue
for sequences).
3. Probe the function at −∞.
38
6.22 Definition. Let f : R → R. We say that limx→∞ f (x) exists and is equal
to some L < ∞ iff for any δ > 0 there is some Mδ > 0 such that if x > Mδ then
d (f (x) , L) < δ.
6.23 Definition. Let f : R → R. We say that limx→∞ f (x) diverges to ∞ iff
for all M > 0 there is some N > 0 such that if x > N then f (x) > M .
6.24 Remark. Similar definitions could be phrased concerning −∞, either in the
domain or in the co-domain of f .
6.25 Definition. (Limit point of a subset of R) Let A ⊆ R. The point l ∈ R
is called a limit point of A iff for any ε > 0 there is some a ∈ A\ { l } such that
d (a, l) < ε. We denote by A (called the closure of A) the union of A together
with the set of all its limit points.
6.26 Example. If A = { 1, 2, 3 } then A has no limit points, since we cannot
get arbitrarily close to any point from within A, as it is discrete.
6.27 Example. If A = (0, 1), then 1 is a limit point of A, even though 1 ∈ /A
itself. 0 is also a limit point, as well as any number in the interior of the interval
a ∈ (0, 1).
6.28 Example. If A = (0, 1) ∪ 2, the set of limit points of A is [0, 1]. In
particular, 2 is not a limit point of A. Then A = [0, 1] ∪ 2.
More often than not, when we talk about limit points, it will be applied
when we take a set A = (a, b) which is an interval and then we want to talk
about a or b as limit points of A.
6.29 Definition. Let f : A → R with A ⊆ R. Let x0 ∈ A be a limit point of
A. Then we say that limx→x0 f (x) exists and is equal to some L ∈ R iff for any
ε > 0 there is some δε > 0 such that for any x ∈ A such that d (x, x0 ) < δε we
have d (f (x) , L) < ε.
6.30 Definition. Let f : A → R with A ⊆ R. Let x0 ∈ A be a limit point
of A. Then we say that limx→x0 f (x) diverges to infinity iff for any M > 0
there is some δM > 0 such that for any x ∈ A such that d (x, x0 ) < δM we have
f (x) ≥ M .
This concept is extremely similar to the limit of a sequence. The only dif-
ference is that now we have a slightly different criterion of what “approaching”
means: we need to make the distance approached in the domain small as well.
6.31 Remark. The laws of limits of sequences we derived Claim 6.13, Claim 6.14,
Claim 6.17 also hold for limits of functions, and we don’t repeat them in this
context.
6.32 Example. Consider the limit limx→0 sin(x) x . This is related to a special
function called the sinc function (see its sketch in Figure 7). Strictly speaking
we define sinc : R → R via the following formula
(
sin(x)
x x 6= 0
sinc (x) :=
1 x=0
39
1.0
0.8
0.6
0.4
0.2
-10 -5 5 10
-0.2
Once we have this inequality, we simply employ the squeeze theorem Claim 6.14
since limx→0 cos (x) = 1 (the proof of this fact is similar to related to Exam-
ple 6.33).
To prove the inequality we take the reciprocal:
x 1
1≤ sin(x) ≤ .
cos (x)
1
We multiply by 2 and sin (x) to get
1 x 1 sin (x)
sin (x) ≤ 2π π ≤ .
2 2 cos (x)
Now if we picture a right-triangle inscribed inside of the unit circle as follows:
40
x
We note that the sector of the circle with vertices OCA has area 2π π. The
1 1
area of the triangle with vertices OCA is 2 sin (x) × 1 = 2 sin (x). The area of
the triangle with vertices OBC can be find using the sinuses theorem: Applying
it on the triangle AOD we get:
sin (x) cos (x) 1
= = π
x y 2
Comparing the two we hence find that the area of the triangle OBC is
1 1x 1 sin (x)
BC = =
2 2y 2 cos (x)
1 x 1 sin(x)
The picture then explains the inequality 2 sin (x) ≤ 2π π ≤ 2 cos(x) , and we’re
finished.
6.33 Example. limx→ π2 cos (x) = 0. We already do know that cos π2 = 0
from the geometric picture. The question is rather can we quantify that as we
approach x → π2 we really have cos (x) → 0 (later on we will see this is the
definition of continuity of cos at π2 ). The answer is yes: Given any ε > 0, we
want
!
d (cos (x) , 0) ≡ |cos (x)| < ε
to hold for all x ∈ R such that d x, π2 < δε (that is, δε > 0 is the ε-dependent
threshold we seek). The way we can prove this is by using the connection
41
between cos and sin. Indeed, cos (x) = − sin x − π2 for all x ∈ R. Hence we
need to study sin when we plug in small values of the argument. Looking at
the geometric picture though, sin (x) is always smaller than the arc, which is x.
Thus
so that
n−1
xn − y n X
lim = lim xn−k−1 y k
x→y x−y x→y
k=0
42
Now we can use Claim 6.13 to find
n−1
xn − y n X
lim = y n−k−1 y k
x→y x−y
k=0
n−1
X
= y n−1
k=0
= ny n−1
√ √
x+ε− x
6.35 Example. Consider the sequence limε→0 ε . Then as we already
saw in Example 6.10, we may factorize
√ √ (x + ε) − x
x+ε− x = √ √
x+ε+ x
ε
= √ √
x+ε+ x
and hence
√ √
x+ε− x 1
= √ √
ε x+ε+ x
√ √
which means,
√ √ 6.13, we only have to evaluate limε→0
using Claim x + ε+ x =
limε→0 x + ε + limε→0 x. Now
√ √
lim x + ε = x
ε→0
We conclude
√ √
x+ε− x 1
lim = √ .
ε→0 ε 2 x
43
Proof. This proof is extremely similar to Claim 6.43.
Since f (a) → L1 as a → a0 we have, for any ε > 0 some δ1 (ε) > 0 such
that if a ∈ A obeys d (a, a0 ) < δ1 (ε) then d (f (a) , L1 ) < ε.
Since g (b) → L2 as b → L1 , we have for any ε > 0 some δ2 (ε) > 0 such
that if b ∈ B obeys d (b, L1 ) < δ2 (ε) then d (g (b) , L2 ) < ε.
Hence for any ε > 0, if a ∈ A obeys d (a, a0 ) < δ1 (δ2 (ε)), we have
d (f (a) , L1 ) < δ2 (ε) so that d (g (f (a)) , L2 ) < ε. But this is precisely what
it means that (g ◦ f ) (a) → L2 as a → a0 .
sin(3x)
6.38 Example. Consider limx→0 x . Compare this to Example 6.32. We
have
sin (3x) sin (3x)
= 3
x 3x
If we define a new function f : x 7→ 3x for all x and sinc : x 7→ sin(x) x for all
sin(3x)
x 6= 0 then we have 3x = (sinc ◦ f ) (x). But we know that f (x) → 0 as
x → 0 (trivially) so that (sinc ◦ f ) (x) → 1 as x → 0 since sinc (x) → 1 as x → 0.
We conclude
sin (3x)
lim = 3
x→0 x
based on Example 6.32.
Example 6.21 pushes us to generalize our definition of limits to one sided
limits:
6.39 Definition. Let f : A → R and x0 ∈ A a limit point of A. Then the
left-sided limit of f at x0 exists and is equal to L, which is denoted by
lim f (x) = L
x→x−
0
iff for any ε > 0 there is some δ > 0 such that for all x ∈ A with x0 − x > δ we
have d (f (x) , L) < ε.
44
The right-sided limit of f at x0 exists and is equal to L, which is denoted by
lim f (x) = L
x→x+
0
iff for any ε > 0 there is some δ > 0 such that for all x ∈ A with x − x0 > δ we
have d (f (x) , L) < ε.
6.40 Remark. Due to Claim 6.1, we can say that limx→x0 f (x) exists if and only
if both one-sided limits exist and are equal to each other.
(
1 x≥0
6.41 Example. Going back to Example 6.21, where f (x) := we
0 x<0
have
lim f (x) = 0
x→0−
and
lim f (x) = 1
x→0+
and indeed since the two limits are not equal we do not have that limx→0 f (x)
exists!
6.42 Example. If f : (0, 1) → R then there is no point to ask about the two-
sided limits at the end points 0 or 1, since the “other side” is not part of the
domain.
Actually there is a relationship between limits of sequences and limits of
functions!
6.43 Claim. Let f : A → R and x0 ∈ A be a limit point of A. Then limx→x0 f (x) =
L for some L ∈ R if and only if for any sequence a : N → A which converges to
x0 the new sequences f ◦ a : N → R converges to L.
lim f (x) = L.
x→x0
Hence let δ > 0 be given. We seek some ε > 0 such that if x ∈ A is such that
d (x, x0 ) < ε then d (f (x) , L) < δ. Assume the contrary. I.e. assume the limit
does not converge to L. Then that means that there is some δ0 > 0 such that
for each ε > 0 there is some xε ∈ A with d (xε , x0 ) < ε yet d (f (xε ) , L) > δ0 .
So pick ε to be a sequence such as n 7→ n1 . Hence there is some δ0 > 0
such that for each n ∈ N, there is some a (n) ∈ A with d (a (n) , x0 ) < n1 yet
d (f (a (n)) , L) > δ0 . But that means that a (n) → x0 yet (f ◦ a) (n) does not
converge to L as assumed. Thus we arrive at a contradiction.
Assume conversely that limx→x0 f (x) = L for some L ∈ R and let a : N →
A be any sequence converging to x0 . We want to show that f ◦ a : N → R
45
converges to L. We know by assumption that for any ε > 0: (1) there is some
δε > 0 such that if x ∈ A is such that d (x, x0 ) < δε then d (f (x) , L) < ε; (2)
there is some Nε ∈ N such that if n ≥ Nε then d (a (n) , x0 ) < ε. Then for
n ≥ Nδε ,
d (a (n) , x0 ) < δε
so that
d (f (a (n)) , L) < ε
46
so once we know f (a) and continuity of f at a, we get a pretty good idea of
what f is nearby a. Coincidentally, this also tells us now something about the
meaning of δε : it is the size of the neighborhoud around a for which we get
estimates of size ε on f (a + s).
As a general rule of thumb, any function which can be written as a sequence
of algebraic manipulations (e.g. f (x) = 5x + 3 − 8 + x100 ) is continuous as long
as it is defined (e.g. x 7→ x1 is not continuous at zero as it is not defined at zero).
More complicated functions, such as cos : R → R, sin : R → R, expa : R →
(0, ∞), loga : (0, ∞) → R have to be examined and in principle their continuity
should not be taken for granted (though these ones listed turn out to be indeed
continuous).
Any function defined using the piecewise notation should be highly suspicious
in terms of its continuity.
7.5 Example. Going back to Example 6.21, it is clear that f there is continuous
on the whole of R except for the point zero, where it is not continuous.
7.6 Example. The function R 3 x 7→ x2 ∈ R is continuous. Indeed limy→x y 2 =
x2 for any x ∈ R. To see this, we calculate
d y 2 , x2 ≡ y 2 − x2
2
= (y − x + x) − x2
2
= (y − x) + 2x (y − x) + x2 − x2
47
indeed.
Actually we first started studying limits of functions R → R and then we
introduced the concept of continuity. But now that we have continuity and are
familiar with a few functions which are continuous, we may go back and use
this in order to calculate limits. Indeed, we
7.7 Claim. Let A, B ⊆ R be two given subsets. If f : A → R is continuous
at some limit point a0 ∈ A, g : B → A and has a limit at some limit point
b0 ∈ B which equals limb→b0 g (b) = a0 then we can “push” the limit through a
continuous function:
lim f (g (b)) = f lim g (b)
b→b0 b→b0
= (Continuity of f at a0 )
= f (a0 )
(Hypothesis on g)
= f lim g (b) .
b→b0
7.8 Remark. Coincidentally this also shows us that the composition of contin-
uous functions is a continuous function:
lim f (g (x)) = f lim g (x)
x→x0 x→x0
= f (g (x0 )) .
7.9 Example. In one of the homework exercises we had to evaluate the limit
x
lim 22 = ?
x→0
Using the fact that y 7→ 2y is continuous, we can now “push” the limit inside
twice to get that this limit exists and equals 2.
One of the important consequences of continuity is the
7.10 Theorem. (Intermediate Value Theorem) Let f : [a, b] → R be continuous
and pick some c ∈ [f (a) , f (b)] (if f (b) < f (a) then reverse the order of the
interval). Then there is some x ∈ [a, b] such that f (x) = c, that is, f takes all
values in between the values at its end points.
48
Proof. This ultimately related back to the fact that the image of an interval
under a continuous map is again an interval. Since this fact requires the
topological notion of connectedness, we shall not prove it here.
7.11 Example. Suppose we are looking for the solution x ∈ R of the equation
x21 −3x+1 = 0. Since there is this really large power of 21 we have no hope for a
closed form solution (such as the formula for the quadratic equation). However,
we know that the function R 3 x 7→ x21 − 3x + 1 ∈ R is continuous (it is just
some basic arithmetic operations). Furthermore, if we plug in x = −1 we get
21
(−1) − 3 (−1) + 1 = −1 + 3 + 1 = 3
and if we plug in x = +1 we get
21
(1) −3+1 = −1
Since 0 ∈ [−1, 3], somewhere between 3 and −1 the continuous function must
pass through zero, that is, there is a solution (one or more) to the equation
(though we still have no ideal what it is).
7.12 Claim. On any great circle around the world, for the temperature, pressure,
elevation, carbon dioxide concentration, if the simplification is taken that this
varies continuously, there will always exist two antipodal points that share the
same value for that variable.
Proof. The proof of this theorem is again related to notions of topology and
is thus outside the scope of our studies. It is related to “boundedness” and
“compactness” and the fact that continuity preserves these concepts.
49
7.15 Corollary. Any continuous function f : [a, b] → R is bounded (as in
Definition 5.11).
8 Derivatives
8.1 Definition. The derivative of f : A → R at x ∈ A, denoted by f 0 (x), is
defined as the limit
f (x + ε) − f (x)
f 0 (x) := lim
ε→0 ε
if it exists. If it does, then f is called differentiable at x. If f is differentiable on
the whole of A, then this defines now a new function f 0 : A → R whose formula
is A 3 x 7→ f 0 (x). This function is well-defined due to limits being unique, if
they exist. If f is differentiable only on a subset of A, say, B ⊆ A, then f 0 , as
a function, is only defined on B.
Sometimes different notations are used for the derivative, the most common
one is the Leibniz notation
d
f 0 (x) = f (x)
dx
The problem with this notation (and why we will not be using it) is that it
forces you to commit to give a name to the independent variable (x in this
case), further conflating the function f (a rule on all numbers) with the number
f (x) (f evaluated at the point x). This confusion between f and f (x), or
between f 0 and f 0 (x), we try to avoid. We prefer to think of the derivative as a
function itself regardless of the name of its argument, so that we prefer to write
f 0 with no mentioning of the name of the argument x.
There is, however, some benefit (and also danger) in the Leibniz notation,
since it helps one remember what the derivative actually is: it is the limit of a
quotient of the difference of the values of the function at near-by points divided
by the distance between the nearby points. One should think of d as “Delta” (the
Greek letter ∆) which stands for difference or change. Hence we are calculating
the quotient “difference in f ” by “difference in x”. Indeed oftentimes one sees
the notation
∆f
lim
∆x→0 ∆x
The danger with this notation is that it (sometimes) makes one take the quotient
too literally and forget that there is also a limit involved.
Another possible notation for the derivative is with the symbol ∂, which
stands usually in math for partial derivative if there are several variables on
which a function depends. However since for us most functions are of one
variable there is no distinction. The way one uses this notation is as
∂f = f0
50
or as
∂x f = f0
if one wishes to make explicit with respect to which variable the differentiation
is happening (which in this case is x), which is sometimes useful, especially if we
want to refer to a function via the formula defining it (i.e. when the domain and
codomain are implicitly obvious). Then one writes conveniently, for instance
∂xn = nxn−1
8.2 Remark. Clearly this notion only makes sense if x ∈ A is a limit point of A.
8.3 Example. The derivative of any constant function is the constant zero
function.
Proof. The constant function f (no matter what the constant is) will always
have f (x+ε)−f
ε
(x)
= 0ε = 0, so that the limit is always zero, no matter which
x we plug in.
51
exists and is finite. Unpacking what the limit actually means, we get that for
any a > 0, there is some ba > 0 such that if |ε| < ba then
1
(f (x + ε) − f (x)) − f 0 (x) < a
ε
so we get even more information about the function near by, namely, how and
in which direction it changes with ε; cf Claim 7.4.
8.6 Remark. Another, geometric interpretation of the derivative, as as the slope
of the function at a certain point. The slope is related to the angle of a straight
line which is tangent to the function at the given point. Recall a straight line
is a function of the form
R 3 x 7→ ax + b
where a, b ∈ R are the parameters that define the straight line. a is called its
slope and it is related to the angle that the straight line forms with the horizontal
axis. Indeed, it is the tangent of that angle α: a = tan (α). Hence the derivative
gives us the angle of the straight line which is tangent to the function at that
point, i.e., its slope at that point.
8.7 Remark. Yet another interpretation of the derivative is as instantaneous
rate of change of the function, at the given point. What that means is, given
any point, how quickly does the function increase (if its derivative is positive) or
decrease (if its derivative is negative) at a given point, which is of course related
to its slope at that point. In physics, if f : R → R denotes the function that gives
a particle’s position at a certain instance of time, then f 0 would corresponds to
its instantaneous velocity. We will see this in Claim 8.49.
f
8.8 Example. The derivative of the absolute value R 3 x 7→ |x| does not exist
at zero, but otherwise exists everywhere else.
|0 + ε| − |0|
f 0 (0) = lim
ε→0 ε
|ε|
= lim
ε→0 ε
52
Now if ε > 0 we get 1 and if ε < 0 we get −1, i.e. the left-sided limit is −1
and the right-sided limit is +1, so that the double-sided limit does not exist and
hence the function is not differentiable at zero. Anywhere else, e.g. if x > 0,
|x + ε| − |x|
f 0 (x) = lim
ε→0 ε
(x > 0 so |x| = x; For |ε| < x, |x + ε| = x + ε)
x+ε−x
= lim
ε→0 ε
= 1
Proof. This was actually a problem on the midterm. Let us see how it works.
Pick any x ∈ R. Then we calculate
1
sin0 (x) ≡ lim (sin (x + ε) − sin (x))
ε→0 ε
We cannot evaluate this limit directly because it is of the indeterminate form
0 a−b
cos a+b
0 . So we use sin (a) − sin (b) = 2 sin 2 2 to get
ε ε
sin (x + ε) − sin (x) = 2 cos x + sin
2 2
so that
1 ε ε
sin0 (x) = lim 2 cos x + sin
ε→0 ε 2 2
sin 2ε
ε
= lim ε
cos x +
ε→0
2
2
(Use algebraic laws of limits)
!
sin 2ε ε
= lim ε
lim cos x +
ε→0
2
ε→0 2
sin (a)
= Use the definition sinc (a) ≡
a
ε ε
= lim sinc lim cos x +
ε→0 2 ε→0 2
Now both sinc and cos are continuous functions, so we may push the limit
through. Recall the limit of sinc from Example 6.32: lima→0 sinc (a) = 1.
Thus we find
53
Since x ∈ R was arbitrary, we conclude sin0 = cos.
trick of identifying a sinc as in the example above we conclude that the limit
converges to − sin (x).
8.11 Claim. The derivative is linear. That means that if f and g are two
functions which are differentiable at some x ∈ R and α, β ∈ R then the new
function
αf + βg
is differentiable at x0 with derivative equal to αf 0 (x)+βg 0 (x), i.e., we can write
0
(αf + βg) = αf 0 + βg 0
Proof. We have
0 1
(αf + βg) (x) ≡ lim (αf (x + ε) + βg (x + ε) − αf (x) − βg (x))
ε→0 ε
1
= lim (αf (x + ε) − αf (x) + βg (x + ε) − βg (x))
ε→0 ε
1 1
= lim (αf (x + ε) − αf (x)) + (βg (x + ε) − βg (x))
ε→0 ε ε
(Use algebra of limits)
1 1
= lim (αf (x + ε) − αf (x)) + lim (βg (x + ε) − βg (x))
ε→0 ε ε→0 ε
54
8.12 Theorem. If f is differentiable at some x ∈ R then f is continuous at
x.
lim f (x + ε) = f (x)
ε→0
ε
lim (f (x + ε) − f (x)) = (f (x + ε) − f (x))
lim
ε→0 εε→0
(Algebra of limits)
1
= lim ε lim (f (x + ε) − f (x))
ε→0 ε→0 ε
0
= 0 · f (x)
= 0
8.13 Example. It is clear that the converse is false, namely, continuity does
not imply differentiability. The prime counter example is Example 8.8. The
absolute value is not differentiable at zero yet it is continuous at zero.
8.14 Claim. The derivative obeys the so-called Leibniz rule for products. That
means that if f and g are two functions differentiable at some x ∈ R then the
new function f g (the product) is also differentiable at x0 and its derivative is
equal to
0
(f g) (x) = f 0 (x) g (x) + f (x) g 0 (x)
55
Proof. We have
0 1
(f g) (x) ≡ lim (f (x + ε) g (x + ε) − f (x) g (x))
ε→0 ε
1
= lim (f (x + ε) g (x + ε) − f (x + ε) g (x) + f (x + ε) g (x) − f (x) g (x))
ε→0 ε
1
= lim (f (x + ε) (g (x + ε) − g (x)) + (f (x + ε) − f (x)) g (x))
ε→0 ε
(Algebra of limits)
1 1
= lim f (x + ε) (g (x + ε) − g (x)) + lim (f (x + ε) − f (x)) g (x)
ε→0 ε ε→0 ε
Now we use again the algebra of limits, noting that because f is differen-
tiable at x, it is also continuous at x (as proven in Theorem 8.12) so that
limε→0 f (x + ε) = f (x). We find
0 1 1
(f g) (x) = f (x) lim (g (x + ε) − g (x)) + lim (f (x + ε) − f (x)) g (x)
ε→0 ε ε→0 ε
8.15 Example. The most important example of the product rule is when one
of the functions is a constant: Let f (x) = cg (x) for some c ∈ R, for all x ∈ R,
where g is a given function. Then
0
f0 = (cg)
= c0 g + cg 0
But c is just a constant, so c0 = 0 as we saw in Example 8.3, so we find f 0 = cg 0 .
8.16 Claim. Actually we already saw that if f (x) ≡ xn for some n ∈ N then
f 0 (x) = nxn−1 . Indeed, this was precisely Example 6.34! Actually this rule
works for any α ∈ R on [0, ∞) and not just n ∈ N: If f (x) = xα for all
x ∈ [0, ∞) then f 0 (x) = αxα−1 for all x ∈ [0, ∞).
8.17 Example. For instance, if f (x) := x12 for all x ∈ R \ { 0 }, then since
−2
1
x2 = x , we have f 0 (x) = −2x−3 = −2 x13 . Of course since the function f is
not defined at zero it is not differentiable there!
√
8.18 Example. Another example: if f (x) := x for all x ≥ 0 then since
√ 1 1 1
x = x 2 we have f 0 (x) = 21 x 2 −1 = 12 x− 2 = 12 11 = 2√
1
x
.
x2
We will see the proof for general n (i.e. not just n ∈ N) further below once
we understand the derivatives of log and exp.
8.19 Claim. For any a > 1, the logarithm function is differentiable and log0a (x) =
1
x loga (e) where e ≈ 2.718 is the natural base of the logarithm as in Defini-
tion 10.2. In particular, log0e (x) = x1 .
56
Proof. We have
1
log0a (x) ≡ lim (loga (x + ε) − loga (x))
ε→0 ε
(Use logarithm laws)
1 x+ε
= lim loga
ε→0 ε x
1 ε
= lim loga 1 +
ε→0 ε x
x
Replace y :=
ε
1 1
= lim y loga 1 +
y→∞ x y
(Use logarithm laws)
y
1 1
= lim loga 1+
y→∞ x y
y
1 1
= lim loga 1+
x y→∞ y
(Use continuity of log)
y
1 1
= loga lim 1 +
x y→∞ y
(Use definition of natural logarithm base *)
1
= loga (e)
x
where in the last step we used Definition 10.2.
8.20 Claim. For any a > 1, the exponential function is differentiable and its
derivative is equal to exp0a = loge (a) expa (recall e from Definition 10.2). In
particular, since loge (e) = 1 we find that
exp0e = expe .
57
1
so we would be finished if we could show that limε→0 ε (expa (ε) − 1) =
loge (a). Let us rewrite
1 expa (ε) − 1
lim (expa (ε) − 1) = lim
ε→0 ε ε→0 loga (expa (ε))
1
= lim 1
ε→0
expa (ε)−1 loga (expa (ε) − 1 + 1)
(Use logarithm laws)
1
= lim 1
ε→0
loga (1 + expa (ε) − 1) expa (ε)−1
1
Use continuity of α 7→ to push the limit through
loga (α)
1
= 1
loga limε→0 (1 + expa (ε) − 1) expa (ε)−1
Now we use the fact that expa (ε) → 1 as ε → 0, since expa is continuity at
1
zero. Hence if g (ε) := expa (ε)−1 and f (α) := (1 + α) α , what we have is the
limit limε→0 (f ◦ g) (ε), and we have already learnt in Claim 6.36 that since
the limit of g at zero exists and equals zero, this limit equals limα→0 f (α), so
we find
1 1
lim (expa (ε) − 1) = 1
ε→0 ε loga limα→0 (1 + α) α
58
or more succinctly
0
(g ◦ f ) = (g 0 ◦ f ) f 0
Proof. We have
0 1
(g ◦ f ) (x) ≡ lim (g (f (x + ε)) − g (f (x)))
ε→0 ε
(Rewrite the same thing)
g (f (x + ε)) − g (f (x)) 1
= lim (f (x + ε) − f (x))
ε→0 f (x + ε) − f (x) ε
(Limit of products equal to product of limits, if both exist, and use the fact that f is differ
g (f (x + ε)) − g (f (x))
= lim f 0 (x)
ε→0 f (x + ε) − f (x)
If we define
(
g(y)−g(f (x))
y−f (x) y ∈ R \ { f (x) }
q (y) := 0
g (f (x)) y = f (x)
= lim (q ◦ f ) (x + ε)
ε→0
g (y) − g (f (x))
= lim
y→f (x) y − f (x)
8.22 Example. Let us try to evaluate the derivative of exp2 ◦ sin. This function
is given by the formula R 3 x 7→ 2sin(x) . We have by Claim 8.21 that
0 0
(exp2 ◦ sin) = (exp02 ◦ sin) sin0
59
Now we know from Example 8.9 that sin0 = cos and from Claim 8.20 we know
that exp02 = loge (2) exp2 . Hence
0
(exp2 ◦ sin) = loge (2) (exp2 ◦ sin) cos .
8.23 Example. Recall that there are the so-called hyperbolic functions, which
are analogous to the trigonometric functions (with domain and codomain R):
the hyperbolic sinus is
ex − e−x
sinh (x) ≡
2
and the cosinus is
ex + e−x
cosh (x) ≡
2
and the tangent is
sinh (x)
tanh (x) ≡
cosh (x)
hence we can immediately calculate their derivatives and get (with m (x) ≡ −x
for all x ∈ R)
1 0
sinh0 = exp0 − (exp ◦m)
2
1
= (exp − (exp ◦m) m0 )
2
(Use m0 = −1)
1
= (exp + (exp ◦m))
2
≡ cosh
and
1 0
cosh0 = exp0 + (exp ◦m)
2
1
= (exp + (exp ◦m) m0 )
2
(Use m0 = −1)
1
= (exp − (exp ◦m))
2
≡ sinh
Compare this with Example 8.9 and Example 8.10 (i.e. the lack of minus sign
on cosh0 ).
60
8.24 Example. Now that we have the composition, logarithm and exponential
derivatives, let us prove Example 6.34 or Claim 8.16 for powers which are not
necessarily natural numbers. So let f (x) := xα for any α ∈ R and x ∈ (0, ∞).
We want to show that f 0 (x) = αxα−1 .
Since x > 0, it is valid to rewrite y = exp (log (y)) for any y > 0 since log is
the inverse of exp. So
Now if x < 0 then xα does not necessarily make sense, because we have no
prescription to take a root of a negative number.
8.25 Example. Continuing the example above, if α = − 13 , then xα = √1
3 x and
√ p
we when x < 0, we could write 3 x = 3 − |x|. Now because taking roots is
multiplicative, we have
p √ p
3
− |x| = 3 −1 3 |x|
√ 3
p
and 3 −1 = −1 (since (−1) = −1) and 3 |x| we know how to do. More
formally, let f : R → R be defined by
√
x x>0
3
√
3
f (x) := x= 0 x=0
p
− |x| x < 0
3
61
At zero the situation is more delicate and the definition of f 0 from the limit has
to be employed:
1
f 0 (0) ≡ lim (f (ε) − f (0))
ε(
ε→0
√
1 3ε ε>0
= lim √
ε→0 ε − −ε ε < 0
3
( 2
ε− 3 ε>0
= lim 2
ε→0 |ε|− 3 ε<0
− 23
= lim |ε|
ε→0
this last limit does not exist–it diverges to +∞. Hence f is not differentiable
at zero. The geometric meaning of this ∞ is that the slope of the tangent is
actually vertical! tan π2 = ∞.
1
8.26 Example. The derivative of f : R\ { 0 } → R whose formula is f (x) = x
is equal to f 0 (x) = − x12 .
Proof. We could either use the rule Claim 8.16 or we can appeal directly to
the definition:
1 1 1
f 0 (x) = lim −
ε→0 ε x+ε x
1 x−x−ε
= lim
ε→0 ε x (x + ε)
−1
= lim 2
ε→0 x + εx
1
Use continuity of x 7→
x
1
= − 2
x
62
At x = 0, we must revert to the definition:
1
f 0 (0) ≡ lim (f (ε) − f (0))
ε→0 ε
1
= lim sin
ε→0 ε
At x 6= 0 we get
0 1 2 1 1
f (x) = 2x sin + x cos − 2
x x x
1 1
= 2x sin − cos
x x
This limit tends to zero by the squeeze theorem for example (since im (sin) =
[−1, 1]). Hence f 0 (0) exists and equals zero. However, f 0 as a function itself is
not continuous at zero, since we do not have
lim f 0 (ε) = 0
ε→0
Indeed, the left hand side does not even exist since cos 1ε does not converge to
63
whereas again at zero special care must be taken
1 0
f 00 (0) = lim (f (ε) − f 0 (0))
ε
ε→0
1 1 1
= lim 2ε sin − cos
ε→0 ε ε ε
1 1 1
= lim 2 sin − cos
ε→0 ε ε ε
Proof. Let us apply the various rules we have in order to figure this. Firstly,
let us define the function r (x) := x1 for all x 6= 0. This function takes a
number and gives its reciprocal. Then we get write g1 = r ◦ g, and so
f
= f (r ◦ g)
g
64
Hence we have r0 ◦ g = − g12 . We learn that
0
f f 0 g f g0
= − 2
g g2 g
f 0 g − f g0
=
g2
which is the result we were looking for.
8.30 Example. Let us try to evaluate tan0 . We know that tan ≡ sin
cos . Hence
using Claim 8.29
0
sin
tan0 =
cos
sin0 cos − sin cos0
=
cos2
Now we use the rules Example 8.9 and Example 8.10 to find
cos2 + sin2
tan0 =
cos2
However, cos2 + sin2 ≡ 1 (make a drawing if you need to, but this is the Pythago-
rian theorem). Hence
1
tan0 =
cos2
1
This last expression, cos , is sometimes called the secant.
Similarly, we have
0
0 sinh
tanh =
cosh
sinh0 cosh − sinh cosh0
=
cosh2
cosh − sinh2
2
=
cosh2
Now there is a similar identity for the hyperbolic functions that says that
cosh2 − sinh2 = 1 (you can verify this directly) so that
1
tanh0 =
cosh2
8.31 Theorem. ( The Hospital rule; L’Hôpital’s rule) If f : A → R and g :
A → R are both differentiable, and if for some limit point of A, a (which is
not necessarily inside of A!) we have limx→a f (x) = limx→a g (x) = 0 or (both)
0
±∞, and g 0 (x) 6= 0 for all x ∈ A and limx→a fg0 (x)
(x)
exists, then
f (x) f 0 (x)
lim = lim
x→a g (x) x→a g 0 (x)
65
0
8.32 Example. The requirement that limx→a fg0 (x)(x)
exists is crucial. Consider
f (x) = x + sin (x) and g (x) = x at a = ∞. Then
and
g 0 (x) = 1
0
so that fg0 (x)
(x)
= 1+cos(x)
1 = 1 + cos (x). The limit here does not exist as x → ∞.
But we can work with the original quotient to get
x + sin (x) sin (x)
lim = lim 1 +
x→∞ x x→∞ x
= 1 + lim sinc (x)
x→∞
| {z }
=0
= 1
which exists!
8.33 Example. Consider the limit limx→0 exp(x)−1
x2 +x . Since exp (0) = 1, we have
the indeterminate form 00 . But proceeding with the hospital’s rule, we get
10A = 1
i.e. the constant function which always equals 1. On the other hand if we
66
differentiate the LHS of the equation, using Claim 8.21 we find
0 0
f −1 ◦ f = f −1 ◦ f f 0
So we find
0
1 = f −1 ◦ f f0
or
1 0
= f −1 ◦f.
f0
If we now apply f −1 to both sides of the equation from the right (using
f ◦ f −1 = 1B ), we get
1 0
◦ f −1 = f −1
f0
which is the result we were looking for.
8.35 Example. Let us use this rule in order to find the derivative of arcsin.
Recall that arcsin is defined as the inverse of sin. Since sin is in general not
invertible, we must restrict its domain and codomain in order to really get an
honest inverse. Then if we re-define sin as
h π πi
sin : − , → [−1, 1]
2 2
then it is both injective and surjective, and so it is indeed invertible, and we
denote its inverse by arcsin (whatever it is, we do not really know how to get a
formula for it...) We only know that
(sin ◦ arcsin) (x) = x (x ∈ [−1, 1])
h π π i
(arcsin ◦ sin) (x) = x x∈ − ,
2 2
Using these two relations it is enough to find arcsin0 , even though we still have
no idea what the formula for arcsin is!
1
arcsin0 = ◦ arcsin
sin0
1
= ◦ arcsin
cos
You might say this is useless, since we still don’t have a formula for arcsin.
However, cos acting on arcsin is something we can figure out, since we can re-
write
π π(from the Pythagorian theorem sin2 + cos2 = 1 and using the fact that on
− 2 , 2 the sin is always increasing, so there sin0 is positive and cos = sin0 , so
67
so that
q
2
cos (arcsin (x)) = 1 − (sin (arcsin (x)))
p
= 1 − x2
We find that
1
arcsin0 (x) = √
1 − x2
which is quite remarkable since we still have no formula for what arcsin is! The
sign may be found by working out when arcsin is increasing vs. decreasing.
0 1
f −1 = ◦ f −1
f0
1
= r ◦ f −1
2
1 1
= √
2 ·
68
Figure 8: Local and global extrema (source: Wikipedia).
69
since both numerator and denominator are negative. Hence
1
f 0 (x) ≥ (f (x + ε) − f (x)) − η
ε
≥ 0−η
= −η
Since η was arbitrary with the constraint, this means that f 0 (x) ≥ 0. Simi-
larly, if 0 < ε < δη , then x < x + ε < x + δη so that again by the assumption
of the maximum, f (x + ε) ≤ f (x) and so now
1
(f (x + ε) − f (x)) ≤ 0
ε
from which we learn that
1
f 0 (x) ≤ (f (x + ε) − f (x)) + η
ε
≤ 0+η
= η
70
1.0
0.5
-0.5
-1.0
since f (t) < f (a), this means y ∈ (a, b) necessarily, and again, f 0 (y) = 0 by
Theorem 8.42.
In either case, f 0 (y) = 0 for some y ∈ (a, b) (and not y = a or y = b).
8.46 Example. If differentiability fails somewhere in the middle then the con-
clusion of Rolle’s theorem fails. For instance, take f (x) = |x|. We know this
isn’t differentiable at x = 0 and indeed if f ’s domain is [−1, 1] then there is no
point in (−1, 1) at which the derivativeis zero, even though f (−1) = f (1) = 1.
1
x>0
We actually already saw that f 0 (x) = −1 x < 0.
undefined x = 0
71
which f 0 is zero in the interior of (−1, 1), and that point is zero, as can be seen
from the formula.
8.48 Theorem. (The mean value theorem) If f : [a, b] → R is continuous, and
differentiable at least on (a, b), then there is some point x ∈ (a, b) such that
f (b) − f (a)
f 0 (x) =
b−a
by direct calculation. But now by Theorem 8.45 there must be some c ∈ (a, b)
for which h0 (c) = 0. Note that
f (b) − f (a)
f 0 (c) =
b−a
as desired.
Proof. Assume first that f 0 (x) ≥ 0 for all x ∈ [a, b]. Let t, s ∈ [a, b] such that
t < s. To show that f is increasing would mean to show that
f (t) ≤ f (s) .
Using Theorem 8.48 on [t, s] we learn that there must be some ξ ∈ (t, s) such
that
f (t) − f (s)
f 0 (ξ) = .
t−s
72
Figure 10: The mean value theorem.
73
However, since by assumption f 0 (x) ≥ 0 for all x ∈ [a, b], and (t, s) ⊆ [a, b],
we must have f 0 (ξ) ≥ 0. Also, we know t − s > 0, so that
Figure 3 shows that zero is a global minimum for the function, but it is clearly
not a stationary point since f is not differentiable there. The global maxima
are the boundaries −1 and 1, again, not stationary points.
8.53 Claim. If f : R → R is twice differentiable at some x ∈ R and f 0 (x) = 0
then:
1. If f 00 (x) > 0 then x is a point of local minimum.
2. If f 00 (x) < 0 then x is a point of local maximum.
If f 00 (x) = 0 the test is not informative.
74
Figure 11: (Wikipedia) Convex function, because the purple line is above the
black line.
Proof. Using Claim 8.49, we see that if f 00 (x) > 0, then f 0 is strictly increas-
ing near x, which means that it must change from negative to positive, so
that using Corollary 8.50 we learn that x is a local minimum for f .
Conversely, if f 00 (x) < 0, then by Claim 8.49 f 0 is strictly decreasing near
x, so it changes from positive to negative, so that again by Corollary 8.50 it
follows that x is a local maximum for f .
If f 00 (x) = 0, then we found a stationary point for f 0 , which does not give
further information about how it changes from being increasing to decreasing
or vice versa near x, i.e., nothing about the nature of the stationary point for
f.
f is called strictly convex iff for any x, y ∈ [a, b] and for any t ∈ [0, 1], f we have
75
we see that it is merely the restriction of f onto the interval [x, y], re-parametrized
so as to elapse that interval at length one (i.e. with the variable t ∈ [0, 1] instead
of [x, y]) so it goes between (0, f (x)) and (1, f (y)). On the other hand,
corresponds to the straight line between the points (0, f (x)) ∈ R2 and (1, f (y)) ∈
R2 with slope f (y) − f (x). Hence, the requirement of convexity is that the
graph of the function between any two points always lies below the straight line
between the two points, as in Figure 11.
8.56 Claim. f : [a, b] → R is convex if and only if for any s, t, u ∈ [a, b] such
that
a<s<t<u<b
we have
f (t) − f (s) f (u) − f (t)
≤ .
t−s u−t
t−s
Proof. Assume f is convex. Define λ := u−s . Since s < t < u holds, λ ∈
u−s−t+s u−t
(0, 1). Also, 1 − λ = u−s = u−s . Finally, note that
u−t
1−λ =
u−s
l
(1 − λ) (u − s) = u−t
l
−s − λ (u − s) = −t
l
t = (1 − λ) s + λu .
76
which is in turn equivalent to
Proof. Let x, y ∈ [a, b] such that x < y. Pick now also some ε > 0 and δ > 0.
Assume f is convex. Then we have by (two applications of) Claim 8.56 on
x, x + ε, y and on x + ε, y, y + δ, we get:
77
f 00 is non-negative.
Connecting this with Claim 8.53, we can interpret now that if a function is
strictly convex and has a stationary point, then that stationary point is neces-
sarily a minimum!
8.59 Claim. Let f : [a, b] → R be differentiable and convex. Then f 0 is contin-
uous.
f (xn )
xn+1 := xn − (n ∈ N) .
f 0 (xn )
Geometrically, the point xn+1 is chosen so that it is where the straight line with
slope f 0 (xn ) passing through (xn , f (xn )) intercepts the horizontal axis:
f (xn ) − 0
= f 0 (xn ) .
xn − xn+1
lim xn = ξ.
n→∞
Proof. First let us prove that xn ≥ ξ. We know already that x1 > ξ. Assume
that for all n ≤ m for some m ∈ N, xn > ξ. Check xm+1 . Apply the MVT
78
Theorem 8.48 on f 0 between (ξ, xm ) to get some c ∈ (ξ, xm ) such that
f (xm ) − f (ξ)
| {z }
=0
= f 0 (c) .
xm − ξ
f (xm )
≤ f 0 (xm )
xm − ξ
l
f (xm ) ≤ f 0 (xm ) (xm − ξ)
l
f (xm )
− xm ≤ −ξ
f 0 (xm )
l
xm+1 ≥ ξ.
This in turn implies (by knowledge that ξ is the unique zero point of f ,
and that f is increasing) that f (xn ) > 0 for all n. We also know f 0 (xn ) ≥ δ
for all n. Hence
−f (xn )
xn+1 − xn =
f 0 (xn )
< 0.
But now Claim 6.17 implies that limn→∞ xn exists and equals ξ.
8.62 Claim. [[4] Exercise 5.25 (d)] One can prove (though this is beyond the
scope of this class since it uses Taylor’s theorem) that
2(n−1)
2δ M
0 ≤ xn − ξ ≤ (x1 − ξ) (n ∈ N) .
M 2δ
so that we get an upper bound on how far we are from the true value ξ and our
approximation xn at any given step n ∈ N.
f (x + ε) ≈ f (x) + εf 0 (x) .
The precise meaning of the symbol ≈ was given in Remark 8.5, see also HW7Q2.1.
79
9 Integrals
9.1 The supremum and infimum
The following notions of supremum and infimum have been discussed a bit in the
proof of Claim 6.17 (where also an approximation property has been presented)
but we repeat them here since they are essential for the definition of the integral.
9.1 Definition. Let S ⊆ R. Then S is called bounded from above iff there is
some M ∈ R such that for all s ∈ S, s ≤ M . M is then called an upper bound
on S (note that many upper bounds exist once one exists).
9.2 Example. S = R is not bounded from above. S = (0, 1) is bounded above,
1 is an upper bound, but also 2 etc.
9.3 Remark. Sets bounded from below, and lower bounds, are defined similarly.
Another name for the supremum is least upper bound. One then writes α =
sup (S).
9.5 Definition. Let S ⊆ R be a set bounded from below. Then an infimum on
S is a number α ∈ R such that the following two conditions hold:
1. α is a lower bound on S, in the sense above.
80
Figure 12: The integral’s geometric interpretation.
Since the curve of the function may be very complicated, we want to devise a way
to understand very general functions instead of restricting ourselves to simple
shapes (like triangles and squares). The way we do it is via approximation by
many small rectangles, and this is rigorously defined as follows.
9.8 Definition. (The Darboux integral ) Let f : [a, b] → R be a bounded func-
tion (as in Definition 5.11). We define its upper Darboux sum as the limit
b
lim S a (f, N )
N →∞
with
N −1
b b−a X b−a b−a
Sa (f, N ) := sup f (x) ∈ R x ∈ a + n , (n + 1)
N n=0 N N
(if the limit exists at all) and its lower Darboux sum as
lim S ba (f, N )
N →∞
81
with
N −1
b−a X b−a b−a
S ba (f, N )
:= inf f (x) ∈ R x ∈ a + n , (n + 1)
N n=0 N N
(again, if the limit exists). If these two limits exist and are equal, i.e., if
b
lim S a (f, N ) = lim S ba (f, N )
N →∞ N →∞
then f is called integrable on [a, b] and the result of these limits is called its
integral on [a, b], and denoted by
Z b
f
a
or sometimes by
Z b
f (x) dx .
a
which is much more common and natural than the cumbersome
Z b Z b
(x 7→ f (x)) ≡ f (x) dx .
a a
is bounded (from above and below) for each n, and so it necessarily has a
supremum and an infimum.
9.9 Remark. Another common name for this construction is the Riemann in-
tegral, which is defined in a slightly different way, but the end result can be
proven to be equivalent to our Darboux integral.
9.10 Example. The simplest example is the integral of the constant function.
Indeed let
f : [a, b] → R
be given by
f (x) = c
82
Figure 13: The two Darboux sum limits approximating the integral.
83
and the infimum to
b−a b−a
f a+n = α a+n +β
N N
84
now
PN −1 PN
n=0 n (n − 1)
n=1
lim = lim
N →∞ N2 N →∞ N2
PN PN
n=1 n − n=1
= lim
N →∞ N2
1
N (N + 1) − N
= lim 2
N →∞ N2
1
=
2
so we indeed get the same result, as
1 2
α (b − a) a + α (b − a)
2
1
= α (b − a) a + (b − a)
2
1
= α (b − a) (b + a)
2
α 2
b − a2 .
=
2
In conclusion, the two Darboux sum limits equal, so that f is integrable and we
conclude
Z b Z b
f ≡ (αx + β) dx
a a
1 2
b − a2 + β (b − a) .
= α
2
This makes perfect sense thinking about the meaning of the integral geomet-
rically, as it is precisely the area of the trapezoid defined by the straight line
f.
9.12 Example. Here is an example of when a function is not integrable. Define
f : [a, b] → R
by the formula
(
1 x∈Q
f (x) :=
0 x∈
/Q
Then f keeps jumping between zero and 1. The lower Darboux sum will be
zero, but the upper Darboux sum will be 1 and so we’ll get that
b−a 6= 0
hence the two limits both do exist, but are not equal, and so the function is not
integrable.
85
9.13 Claim. For any N ∈ N, we have
b
S ba (f, N ) ≤ S a (f, N ) .
Proof. The infimum of a set is always smaller than the supremum of that
same set.
9.14 Claim. The upper Darboux sums define a monotone decreasing subsequence
in N and the lower Darboux sums define a monotone increasing subsequence in
N:
S ba (f, N ) ≤ S ba (f, 2N ) (N ∈ N)
b b
Sa (f, N ) ≥ Sa (f, 2N ) (N ∈ N) .
Proof. Let us consider just the upper sums (the lower sums follow a similar
argument). To make the notation a bit shorter and the argument clearer,
let us (without loss of generality assume that a = 0 and b = 1; otherwise
one may rescale the function afterwards). Then we have (with f [ n , n+1 ] :=
N N
N −1
1 1 X
S 0 (f, N ) ≡ f n n+1 .
N n=0 [ N , N ]
1 2 3 N −1
0, , , ,..., ,1.
N N N N
1 1
For S 0 (f, 2N ), we divide [0, 1] into 2N intervals, each of length 2N , so this
is actually a subdivision of the previous one, since now the boundary points
of the sub-intervals are
1 1 3 2
0 , , , ,...,1.
2N N 2N N
86
1
Thus we can re-write S 0 (f, N )
N −1
1 1 X
S 0 (f, N ) = f n n+1
N n=0 [ N , N ]
N −1 N −1
1 X 1 X
= f [ n , n + 1 ]∪[ n + 1 , n+1 ] + f n n 1 n+1
2N n=0 [ N , N + 2N ]∪[ N + 2N , N ]
n 1
2N n=0 N N 2N N 2N N
(sup (A ∪ B) ≥ sup (A))
N −1 N −1
1 X 1 X
≥ f[ n , n + 1 ] + f n 1 n+1
2N n=0 N N 2N 2N n=0 [ N + 2N , N ]
N −1 N −1
1 X 1 X
= f [ 2n , 2n+1 ] + f 2n+1 2n+2
2N n=0 2N 2N 2N n=0 [ 2N , 2N ]
2N −1
1 X
= f n n+1
2N n=0 [ 2N , 2N ]
1
≡ S 0 (f, 2N )
Proof. In order to prove this, we need the notions of lim inf and lim sup which
we have not yet introduced.
87
9.19 Theorem. A function f : [a, b] → R is integrable if it is monotone (in-
creasing or decreasing).
and
b−a b−a b−a
inf f (x) ∈ R x ∈ a + n , (n + 1) = f a+n
N N N
Hence we have
N −1
b b−a X b−a b−a
S a (f, N ) − S ba (f, N ) = f a + (n + 1) −f a+n
N n=0 N N
"N −1 N −1 #
b−a X b−a X b−a
= f a + (n + 1) − f a+n
N n=0
N n=0
N
−1
"N N #
b−a X b−a X b−a
= f a+n − f a+n
N n=1
N n=0
N
−1 −1
" NX N #
b−a b−a b−a X b−a
= f a+N + f a+n − f (a) − f a+n
N N n=1
N n=1
N
b−a
= (f (b) − f (a))
N
But this can be made arbitrarily small, so that by Theorem 9.15 we conclude
that f is integrable.
88
1.0
0.5
-0.5
-1.0
1
Figure 14: The function x 7→ sin x is integrable.
89
9.24 Theorem. (Monotonicity) Let f : [a, b] → R and g : [a, b] → R be two
integrable functions. If f (x) ≤ g (x) for all x ∈ [a, b] then
Z b Z b
f ≤ g.
a a
9.25 Theorem. Let f : [a, b] → R be integrable and c ∈ (a, b). Then f |[a,c] :
[a, c] → R and f |[c,b] : [c, b] → R are integrable, and
Z c Z b Z b
f+ f = f.
a c a
A good way to remember this result is that if A, B ⊆ R are two sets with no
intersection, then
Z Z Z
f = f+ f,
A∪B A B
90
Proof. For continuity, let us check that
?
F (x) = lim F (x + ε)
ε→0
Z x+ε
= lim f.
ε→0 a
and so
R x+ε
−εM ≤ x
f ≤ εM
and so by the squeeze theorem Claim 6.14 the limit is zero. We learn that F
is indeed continuous.
Let us now further assume that f is continuous at some x ∈ [a, b], and
verify that F is differentiable at that same x. We have
? 1
F 0 (x) = lim (F (x + ε) − F (x))
ε
ε→0
Z x+ε Z x
1
= lim f− f .
ε→0 ε a a
91
so that
Z x+ε
0 ? 1
F (x) = lim f.
ε→0 ε x
Since f is continuous at x, for any e > 0 there is some de > 0 such that if
y ∈ [a, b] is such that |y − x| < de then |f (y) − f (x)| < e. So when |ε| < de ,
we have for any y ∈ [x, x + ε] that |x − y| < de and so
but the outer integrals are of constant functions (in y) so that (upon dividing
by 1ε we learn)
R x+ε
f (x) − e ≤ 1ε x f (y) dy ≤ f (x) + e
R
x+ε
which is equivalent to 1ε x f (y) dy − f (x) < e. Since e > 0 was arbitrary,
x+ε
we learn that limε→0 1ε x f exists and equals f (x), i.e.
R
F 0 (x) = f (x) .
92
Rb
Proof. We know by Definition 9.8 that a
f 0 is going to be approximated from
b
below by S ba (f 0 , N ) and from above by S a (f 0 , N ) for some finite N , and that
these approximations become better as N grows larger. Consider the upper
approximation,
N −1
b 0 b−a X 0
b−a b−a
Sa (f , N ) ≡ sup f (x) ∈ R x ∈ a + n
, (n + 1) .
N n=0 N N
f a + (n + 1) b−a − f a + n b−a
0 N N
f (xn ) =
a + (n + 1) b−a
N −a−n N
b−a
f a + (n + 1) b−a − f a + n b−a
N N
= b−a
N
the sum telescopes and only the first and last terms survive:
N −1
b − a X f a + (n + 1) b−a − f a + n b−a
N N
= b−a
N n=0 N
N −1
X b−a b−a
= f a + (n + 1) −f a+n
n=0
N N
b−a b−a b−a
= f a+ − f (a) + f a + 2 −f a+ + ··· +
N N N
b−a b−a
+ f a+N − f a + (N − 1)
N N
= f (b) − f (a) .
93
and hence when we sum up,
PN −1 b
S ba (f 0 , N ) ≤ b−a
N n=0 f 0 (xn ) ≤ S a (f 0 , N )
But we just learnt above that the inner term is independent of N (due to the
telescoping) and simply equals f (b) − f (a), hence,
b
S ba (f 0 , N ) ≤ f (b) − f (a) ≤ S a (f 0 , N ) .
which is equivalent to
Z b
f0 = f (b) − f (a) .
a
9.33 Remark. Theorem 9.32, which is a culmination of our entire effort in this
course, says that differentiation is the right inverse of integration, i.e.
Z
◦∂ = 1functions .
9.34 Remark. Theorem 9.32 will be our main hammer or work horse to “solve”
integrals, rather than the explicit definition Definition 9.8 which is explicitly
worked out in Example 9.11 (i.e. one almost never follows the procedure in Ex-
ample 9.11, which is really presented here more for illustration than an actual
computational tool). Using Theorem 9.32 we learn that if we can re-write a
function as the derivative of another function, then we can immediately inte-
grate. This is easier said than done, and many many functions for which one
can prove (by brute-force) integrability, one still cannot write down an explicit
formula for the result of the integral.
At any rate, this tells us immediately the following rules, by essentially
undoing Section 8:
(Note indeed this formula would not make sense for α = −1 due to α + 1
in the denominator).
94
2. Derivative of reciprocal from Claim 8.19:
Z b Z b
1
dx = log0
a x a
b
= log|a .
and
Z b Z b
cosh = sinh0
a a
b
= sinh|a .
95
and
Z b Z b
1
= tanh0
a cosh2 a
b
= tanh|a .
Rx
Proof. Define F : [a, b] → R by F (x) := a f . Then by Theorem 9.30, since
f is continuous, F 0 = f . Let us calculate, using Theorem 9.32
Z B Z B
0
(f ◦ ϕ) ϕ = (F 0 ◦ ϕ) ϕ0
A A
Z B
0
= (F ◦ ϕ)
A
(Use the fundamental theorem of calc.)
= F (ϕ (B)) − F (ϕ (A))
Z ϕ(B)
= F0
ϕ(A)
Z ϕ(B)
= f.
ϕ(A)
g : [0, 2] → R
p
x 7→ 2x x2 + 1 .
We are interested in Z 2 Z 2 p
g≡ 2x x2 + 1dx .
0 0
96
To this end, let us try to use the change of variables theorem above. Define
ϕ : [0, 2] → [1, 5] by
ϕ (x) := x2 + 1
Then ϕ is continuously differentiable, and ϕ0 (x) = 2x, which is a linear function,
which we already saw was integrable (Example 9.11). Also note that
p
g (x) = ϕ0 (x) ϕ (x)
and so applying
√ the change of variables theorem with f : [1, 5] → R defined via
f (x) := x for all x ∈ [1, 5] we find that
Z 2 p Z 5
2
√
2x x + 1dx = xdx .
0 1
√ √
The point being, it is much easier to integrable x 7→ x than x 7→ 2x x2 + 1.
√
The key here was to define ϕ and to observe that g = ϕ0 ϕ.
Now using Remark 9.34 we find that
Z 5 5
√ 2 3
xdx = x2
1 3 1
2 3
= 52 − 1 .
3
9.37 Example. Let us consider the integral
Z 4
x
dx.
2 2
0 (1 + x )
97
9.38 Example. Consider Z 4
1
dx .
0 1 + x2
Define ϕ (x) := tan (x). Then ϕ0 (x) = cos(x)
1
2 and so if f (x) :=
1
1+x2 then
change of variables says
Z 4 Z 4
1
2
dx = f
0 1+x 0
Z ϕ(ϕ−1 (4))
= f
ϕ(ϕ−1 (0))
Z ϕ−1 (4)
= (f ◦ ϕ) ϕ0
ϕ−1 (0)
Z ϕ−1 (4)
1 1
= .
ϕ−1 (0) 1 + tan2 cos2
1 1 cos2
But now, 1+tan2 = sin2
= cos2 + sin2
= cos2 and hence we find
1+ cos2
1 1
= 1.
1 + tan2 cos2
Rd
Since c
= d − c we have
Z 4
1
dx = ϕ−1 (4) − ϕ−1 (0)
0 1 + x2
9.39 Remark. By the way, Example 9.38 raises an interesting point: since
sin (x)
limπ tan (x) = limπ
x→ 2 x→ 2 cos (x)
= ∞
98
Hence it makes sense to think of
Z b
1
dx = arctan (b) − arctan (a)
a 1 + x2
with a → −∞ and b → +∞, and the result is called an improper integral. I.e.,
Z b
1
lim lim dx = lim arctan (b) − lim arctan (a)
a→−∞ b→∞ a 1 + x2 b→∞ a→−∞
π π
= − −
2 2
= π.
We learn that
Z b Z b
b
f 0g + f g0 = f g|a .
a a
99
9.42 Example. Consider the function R 3 x 7→ x sin (x) ∈ R. Let us define
f (x) := x (with the same domains and co-domains). Then
Z b Z b
x sin (x) dx = f sin
a a
Z b
= − f cos0
a
(integration by parts)
Z b
b
= −f cos|a + f 0 cos
a
(Use f 0 = 1)
Z b
b
= −f cos|a + cos
a
Z b
b
= −f cos|a + sin0
a
(Use fundamental thm. of calc.)
b
= −f cos + sin|a .
100
for this last integral we do a change of variables with ϕ (x) := 1 − x2 so that
ϕ0 (x) = −2x and hence with f (x) := √1x we find
Z b Z b
x 1 0
√ dx =
ϕ (x) f (ϕ (x)) dx
a a −2
1 − x2
1 ϕ(b)
Z
= − f
2 ϕ(a)
1 √ ϕ(b)
= − 2 xϕ(a) .
2
Rb
9.46 Example. [Courant] Consider a eαx sin (βx) dx. In this example re-
peated integration by parts will result in an algebraic equation:
Z b Z b 0
αx αx 1
e sin (βx) dx = − e x 7→ cos (βx) dx
a a β
b Z b
1 1
= − eαx cos (βx) + αeαx cos (βx) dx
β a a β
b Z b 0
1 α 1
= − eαx cos (βx) + eαx x 7→ sin (βx) dx
β a β a β
b b !
α b αx
Z
αx 1 α αx 1
= −e cos (βx) +
e sin (βx) −
e sin (βx) dx
β a β β a β a
b
α2 b αx
Z
1 α
− eαx cos (βx) + 2 eαx sin (βx) − 2
= e sin (βx) dx .
β β a β a
Rb
We solve this for a
eαx sin (βx) dx to get
Z b b
αx 1 1 αx α αx
e sin (βx) dx = 2 − e cos (βx) + 2 e sin (βx)
a 1+ αβ2
β β a
1 b
= eαx [α sin (βx) − β cos (βx)]|a .
β 2 + α2
9.47 Remark. We summarize that our main tools to evaluate integrals is combine
Remark 9.34 together with Theorem 9.23, Theorem 9.35 and Theorem 9.41.
This is not a lot, and indeed most functions which can be integrated don’t
admit an explicit formula for their integral.
10 Important functions
10.1 The trigonometric functions
Recall the definitions and properties of sin, cos, tan, cot etcetera discussed in
Section 5.1.2.
101
sin
• The tangent function is defined as the quotient tan ≡ cos whenever cosine
is non-zero (so one must restrict its domain of definition).
• The cotangent function is defined as the quotient cot ≡ cos
sin whenever sine
is non-zero (so one must restrict its domain of definition).
• These two definitions mean that one must mainly keep in mine sin and
cos whereas properties of tan and cot may be inferred by the quotient
definition.
• There is a special (irrational) number, denoted by π, equal to approxi-
mately 3.1415. Geometrically it is the ratio of a circle’s circumference to
its diameter. It is also a convenient way to measure arc lengths on the
circle of radius 1 for that reason: an arc-length of 2π is the entire circle,
π is half the circle, π2 is one-quarter of it, etc. Naturally the trigonomet-
ric functions, which related to arc-lengths of the unit circle, have special
values corresponding to special multiples of π:
– sin (nπ) = 0 for all n ∈ Z.
n
– cos (nπ) = (−1) for all n ∈ Z.
n
– sin n π2 = − (−1) for all n ∈ Z.
π
• The sine and cosine are related by a shift of the angle in 2:
π
sin (x) = cos x − (x ∈ R)
2
• Their image is in [−1, 1] and they are continuous throughout their domain
R. They do not have limits at ±∞ as they keep oscillating.
102
2. loga (1) = 0.
3. loga (a) = 1.
4. limx→0 loga (x) = −∞.
10. expa is defined on R, and has as its image (0, ∞) (so it is always strictly
positive). It is strictly monotone increasing. It is a continuous function.
11. expa (0) = 1.
12. limx→−∞ expa (x) = 0.
103
Proof. We have by the binomial theorem (see below)
n n
1 X n! 1
1+ =
n j=0
j! (n − j)! nj
n
X n (n − 1) · · · (n − j + 1) (n − j)! 1
=
j=0
j! (n − j)! nj
n
X 1 n (n − 1) · · · (n − j + 1)
=
j=0
j! nj
n
X 1 1 j−1
= 1− ··· 1 −
j=0
j! n n
1 1 1 1 2 1 1 n−1
= 1+1+ 1− + 1− 1− + ··· + 1− ··· 1 −
2! n 3! n n n! n n
So that
n n+1
1 1
1+ − 1+
n n+1
n n+1
X 1
X 1 1 j−1 1 j−1
= 1− ··· 1 − − 1− ··· 1 −
j=0
j! n n j=0
j! n+1 n+1
n X n
X 1 1 j−1 1 1 j−1
= 1− ··· 1 − − 1− ··· 1 − −
j=0
j! n n j=0
j! n+1 n+1
1 1 n
− 1− ··· 1 −
(n + 1)! n+1 n+1
n
X 1 1 j−1 1 j−1
= 1− ··· 1 − − 1− ··· 1 − −
j=0
j! n n n+1 n+1
1 1 n
− 1− ··· 1 −
(n + 1)! n+1 n+1
1 1
Using ≤
n+1 n
n
X 1 1 j−1 1 j−1
≤ 1− ··· 1 − − 1− ··· 1 − −
j=0
j! n n n n
1 1 n
− 1− ··· 1 −
(n + 1)! n+1 n+1
1 1 n
= − 1− ··· 1 −
(n + 1)! n+1 n+1
≤ 0
104
1 n
This shows that the sequence N 3 n 7→ 1 + n is increasing.
It is also bounded, since
n n
1 X 1 1 j−1
1+ = 1− ··· 1 −
n j=0
j! n n
j−1 1
≥ for all j
n 2
n
X 1
≤ 1+
j=0
2j
Pn
The latter sequence, N 3 n 7→ 1 + j=0 21j actually converges to 3. Be-
n
ing monotone increasing and bounded, N 3 n 7→ 1 + n1 converges by
Claim 6.17. Since it is increasing and the first term is larger than 2, we
find that n
1
lim 1 + exists and ∈ (2, 3)
n→∞ n
n
10.2 Definition. We define e := limn→∞ 1 + n1 and call it the natural base
for the logarithm. It turns out that e ≈ 2.718 and is irrational. When we don’t
write the subscript a for expa or loga , we mean a = e. The reason why it is
called the natural base for the logarithm is because its derivative is given by
1
log0 (x) =
x
and also that
Proof. Let us first assume that x > 1. Then we may apply the mean value
theorem Theorem 8.48 to get that there must be some y ∈ (1, x) such that
105
and log (1) = 0, so we find
1 log (x)
=
y x−1
1
Since y ∈ (1, x), y > 1 so y < 1, and so we find
log (x)
1 <
x−1
moving the denominator to the other side of the inequality we get the result.
When x < 1, we have again by an application of Theorem 8.48 that there
is some y ∈ (x, 1) such that
− log (x)
1 >
1−x
and we find the same result.
If x = 1, then log (x) = 0 and x − 1 = 0 so the inequality is actually an
equality.
106
10.4 Special functions
10.4.1 The sinc function
The sinc function, which is discussed in Example 6.32, is defined as
(
1 x=0
R 3 x 7→ sinc (x) := sin(x) ∈R
x x 6= 0
The example cited shows that this function is continuous at the origin. It
converges to zero at ±∞.
a2 − b2 = (a − b) (a + b)
11.3 Claim. Recall if ax2 + bx + c = 0 for some a, b, c ∈ R then the two solutions
x1 , x2 ∈ R can be written as
1 p
x1,2 = −b ± b2 − 4ac
2a
107
given that b2 − 4ac > 0. If b2 − 4ac = 0 then there is only one solution. If
b2 − 4ac < 0 then there are no solutions. If there are two solutions, then we
may factorize
2 1 p
2
1 p
2
ax + bx + c = x− −b − b − 4ac x− −b + b − 4ac
2a 2a
11.2 Inequalities
1
• If a < b and a, b > 0 then a > 1b .
|x| < ε
−ε < x <ε
108
12 Dictionary of graphical symbols and acronyms
12.1 Graphical symbols
1. The equivalent symbol, ≡, means the following: if a and b are any two
objects, then a ≡ b means we have agreed, at some earlier point, that
a and b are two different labels for one and the same thing. Example:
{ a, a } ≡ { a }.
2. The definition symbol, :=, means the following: if a and b are any two
objects, then a := b means that right now, through this very equation, we
are agreeing that a is a new label for the pre-existing object b. The main
difference to ≡ is about when this agreement happens: The ≡ symbol is
a reminder about our previous conventions, whereas the := symbol is an
event of establishing a new convention. These both should be contrasted
with the equal sign =, which merely says that two things (turn out) to be
equal, whether by convention or not is not specified.
3. ∞ means the size of a set whose number of elements is unbounded.
(
x x≥0
4. x 7→ |x| means the absolute value function, i.e. |x| ≡ .
−x x < 0
min ({ 1, 2, 100 }) = 1
n! ≡ n (n − 1) (n − 2) . . . 2
109
so that
2! = 2
3! = 3·2=6
4! = 4 · 3 · 2 = 24
5! = 5 · 4 · 3 · 2 = 120
···
12.2 Acronyms
1. s.t. means “such that”.
2. w.r.t. means “with respect to”.
3. l.h.s. means “left hand side”, usually of an equation. r.h.s. means “right
hand side”.
4. iff means “if and only if”, which is a relationship between two statements,
meaning the first implies the second and the second implies the first.
5. “The origin” means either 0 ∈ R or (0, 0) ∈ R2 .
References
[1] Tom M. Apostol. Calculus, Vol. 1: One-Variable Calculus, with an Intro-
duction to Linear Algebra. Wiley, 1991.
[2] Richard Courant and Fritz John. Introduction to Calculus and Analysis,
Vol. 1 (Classics in Mathematics). Springer, 1998.
[3] Paul R. Halmos. Naive Set Theory. Martino Fine Books, 2011.
110