Differentiation - in A Nutshell: The Fundamentals
Differentiation - in A Nutshell: The Fundamentals
We discuss differentiation in a nutshell and provide a rundown of the basic definition, the fundamental
laws, a few other fundamental derivatives that are good to know, and derive the trigonometric derivatives.
At the end we present a collection of problems, ranging from accessible to challenging. A large portion of
the first section is based on “evan explains differentiation in 20 minutes with no rigor whatsoever,” which I
recommend watching.
1 The Fundamentals
I suspect that most people reading this handout will already know the limit definition of the derivative,
which I am absolutely not interested in. Therefore, we instead provide two more intuitive and fundamental
definitions for those who feel that they are just pushing symbols. (Both of these definitions are secretly the
same idea.) The first one is for those who cannot understand what a derivative means at all, and the second
one is the one we are going to push to understand what calculus means at all.
Actually, I am going to state this explicitly, because this is the result we’re building up to anyways:
calculus is about approximating functions (and perturbations of functions at a point) with a polynomial.
Tangent Line. The derivative of a function at a point is just the slope of the tangent line to that point.
Derivatives are Approximations. Derivatives are a way to approximate a function near a particular point.
None of these means anything without an example, so we provide the prototypical example of 𝑓 (𝑥) = 𝑥 2 .
Example. Find the derivative of 𝑓 (𝑥) = 𝑥 2 at 𝑥 = 2 (and describe what this means).
1
Solution: Note that the line is tangent to the curve at the point (2, 4), which means that it is a linear
approximation of 𝑓 (𝑥) at points close to (2, 4). In particular, this means that
𝑓 (2 + 𝜖) ≈ 4 + slope ×𝜖.
²
derivative
of 𝑓 at 2
We denote the derivative of 𝑓 at 2 as 𝑓 ′ (2). To compute this, we can just expand (2 + 𝜖)2 = 4 + 4𝜖 + 𝜖2 , and
so approximating it as a linear function will give you 4 + 4𝜖. Thus the derivative is 4.
This mindset will completely explain the Power Rule and higher order derivatives (such as second order
derivatives). But first, we should define the order of a function. Violent abuses of notation will follow.
Order. We say that the order of a function is 𝑂(𝜖 𝑘 ) if the smallest degree of 𝜖 in a term is 𝑘.
Now we can be more specific when we say 𝑓 (2 + 𝜖) ≈ 4 + 4𝜖 – what we really mean is that 𝑓 (2 + 𝜖) =
4 + 4𝜖 + 𝑂(𝜖 2 ), where we truthfully could care less what the function 𝑂(𝜖2 ) entails.
Derivative of a Function. We denote the derivative of 𝑓 ′ at any arbitrary point as 𝑓 ′ (𝑥). We express this
in terms of 𝑥. It satisfies the equation
This fundamentally means that 𝑓 ′ (𝑥) is the coefficient of the 𝜖 term of 𝑓 (𝑥 + 𝜖).
We take 𝑓 (𝑥) = 𝑥 2 again as an example.
Solution: Note that 𝑓 (𝑥 + 𝜖) = (𝑥 + 𝜖)2 = 𝑥 2 + 2𝑥𝜖 + 𝜖 2 . Since 𝜖 2 is 𝑂(𝜖 2 ) and the rest of the terms are not,
𝑓 ′ (𝑥) = 2𝑥.
Now we go further – what’s the derivative of 𝑓 (𝑥) = 𝑥 𝑛 in general?
Proof. Note that 𝑓 (𝑥 + 𝜖) = 𝑥 𝑛 + 𝑛𝑥 𝑛−1 𝜖 + 𝑂(𝜖 2 ), so the derivative is the coefficient of 𝜖, namely 𝑛𝑥 𝑛−1 .
Keep in mind that this holds for non-integer 𝑛 as well because of the Extended Binomial Theorem (as (𝑛1 )
is always 𝑛).
2
1.3 Higher-Order Derivatives
If the first derivative is the linear approximation (the coefficient of the 𝜖 term) of 𝑓 (𝑥 + 𝜖), then it intuitively
follows that the second derivative is the quadratic approximation of 𝑓 (𝑥 +𝜖), or in other words, the coefficient
of the 𝜖 2 term. With this understanding in mind, note that adding higher order approximations is just
reducing the error in our previous approximation. With this in mind, if you keep adding higher order
approximations, you should be able to get the function itself – and this is the idea behind Taylor Series.
This is the entire definition of higher order derivatives. Even with non-polynomial functions, there is
usually1 a Taylor Series, because we can just keep approximating.
nth Derivative. We denote the 𝑛th derivative, as defined by the Taylor Series, as 𝑓 (𝑛) (𝑥). We denote the
𝑑𝑛 𝑦
result of taking the derivative of 𝑦 = 𝑓 (𝑥) 𝑛 times as 𝑑𝑥 𝑛 𝑓 (𝑥).
We now prove that the 𝑛th derivative is achieved by taking the derivative 𝑛 times to show that it is
consistent with our Taylor Series definition.
nth Derivative results from taking the derivative n times. Given a function 𝑓 (𝑥) = 𝑦,
𝑑𝑛 𝑦
𝑓 (𝑛) (𝑥) = 𝑓 (𝑥).
𝑑𝑥 𝑛
where the derivative of 𝑓 is taken 𝑛 times.
This proof relies on the fact that derivatives are additive and multiplicative under scalar multiplication –
which we will only formally prove until later. An astute reader will note that the factorial denominators are
not strictly necessary to sum up approximations, and that we in fact contrive the definition of Taylor series
to create this consistency in the definitions.
Proof. Since derivatives are additive and multiplicative under scalar multiplication, and differentiable
functions can be written as polynomial series, we need only prove this for polynomial functions 𝑓 (𝑥) = 𝑥 𝑐 .
We note that by the Power Rule, 𝑓 ′ (𝑥) = 𝑐𝑥 𝑐−1 . Applying this recursively yields
𝑑𝑛 𝑦
𝑓 (𝑥) = 𝑐(𝑐 − 1)(𝑐 − 2)⋯(𝑐 − (𝑛 − 1))𝑥 𝑐−𝑛 .
𝑑𝑥 𝑛
Now note by expanding 𝑓 (𝑥 + 𝜖) = (𝑥 + 𝜖)𝑐 we get the coefficient of the 𝜖 𝑛 term to be
𝑛
( )𝑥 𝑐−𝑛 .
𝑐
Since
𝑐(𝑐 − 1)(𝑐 − 2)⋯(𝑐 − (𝑛 − 1))𝑥 𝑐−𝑛 𝑛
= ( )𝑥 𝑐−𝑛 ,
𝑛! 𝑐
we have shown the desired result.
1For our purposes, we are only going to look at functions with Taylor Series – but keep in mind there are weird functions that do
not, namely those that are not differentiable at any point (which do exist).
3
Thus the notation is interchangable, and more crucially, the definitions are too.
Maclaurin Series. If you want to approximate 𝑓 (𝜖) with a polynomial, plug in 𝑥 = 0 to get
that 𝜖 is slightly perturbing the function at a point, and then approximating it at 𝑥 with the derivative.
L’Hopital’s Rule. If 𝑓 (0) = 𝑔(0) = 0 and 𝑘 is the smallest number such that 𝑔 (𝑘) (𝑎) ≠ 0, then
𝑓 (𝑘) (0)𝑥 𝑘
𝑓 (𝑥) = + 𝑂(𝑥 𝑘+1 )
𝑘!
𝑔 (𝑘) (0)𝑥 𝑘
𝑔(𝑥) = + 𝑂(𝑥 𝑘+1 ),
𝑘!
so
𝑓 (𝑥) 𝑓 (𝑘) (0) + 𝑂(𝑥) 𝑓 (𝑘) (0)
lim = lim (𝑘) = .
𝑥→0 𝑔(𝑥) 𝑥→0 𝑔 (0) + 𝑂(𝑥) 𝑔 (𝑘) (0)
A proof with the limit definition of derivatives is also quite straightforward. We do not prove the more
𝑓 (𝑥) 𝑓 ′ (𝑥)
general rule, which is that lim = lim in the indefinite forms, but you still do need to know this.
𝑥→𝑎 𝑔(𝑥) 𝑔 ′
𝑥→𝑎 (𝑥)
sin(2𝑥)
Exercise. Find lim 2 .
𝑥→0 𝑥+𝑥
Tangent slope and approximation give the same derivative. Say that 𝑓 (𝑥 + 𝜖) = 𝑓 (𝑥) + 𝑓 ′ (𝑥)𝜖 + 𝑂(𝜖 2 ).
Then 𝑦 − 𝑓 (𝑎) = 𝑓 ′ (𝑎)(𝑥 − 𝑎) is tangent to 𝑓 at 𝑎, 𝑓 (𝑎).
4
Proof. Note that
𝑓 (𝑥 + 𝜖) − 𝑓 (𝑥)
𝑓 ′ (𝑥) = lim
𝜖→0 𝜖
by the limit definition, which would give us the slope of the tangent line. Now rearranging gives
and since lim𝜖→0 𝑓 (𝑥 + 𝜖) = lim𝜖→0 𝑓 (𝑥) + 𝑓 ′ (𝑥)𝜖 + 𝑂(𝜖 2 ). Since 𝜖 approaches 0 and 𝑂(𝜖2 ) is smaller than
linear, the two definitions are consistent.
1.5 Summary
We give a summary here because the fundamental ideas are quite advanced/non-standard; we do not
continue this for other sections because formula sheets already exist.
∎ Derivatives are approximations.
F The first derivative is the linear approximation, the second derivative is the quadratic approximation,
and so on.
F The Taylor Series is defined as
𝑓 (𝑎) 𝑓 ′ (𝑎)
lim = .
𝑥→𝑎 𝑔(𝑎) 𝑔 ′ (𝑎)
F This can be proved with Maclaurin Series or the limit definition of a derivative.
2 Laws of Differentiation
I will say the following explicitly: Everything is based off of the chain and product rule (except for
additive/multiplicative, which are just obvious).
Derivatives are Additive and Scalar Multiplicative. For any functions 𝑓 , 𝑔 (where ( 𝑓 + 𝑔)(𝑥) denotes the
function 𝑓 (𝑥) + 𝑔(𝑥)), and given scalars 𝑎, 𝑏,
5
Proof. Consider the functions as Taylor Series, which are polynomials, and note that the 𝑛th degree
term of polynomial are additive and scalar multiplicative for all 𝑛.a
a This is also why ( 𝑓 𝑔)′ ≠ 𝑓 ′ 𝑔 ′ – it should not be too hard to think of two polynomials 𝑓 and 𝑔 such that the 𝑥 coefficient of 𝑓 𝑔
𝑓 (𝑥)𝑔(𝑥) − 𝑓 (𝑎)𝑔(𝑎)
( 𝑓 ○ 𝑔)′ (𝑎) = lim
𝑥→𝑎 𝑥−𝑎
𝑓 (𝑎)𝑔(𝑥)
To see why the product rule must be true, we proceed by adding and subtracting 𝑥−𝑎 to the above
limit statement.
𝑓 (𝑥)𝑔(𝑥) − 𝑓 (𝑎)𝑔(𝑎)
( 𝑓 ○ 𝑔)′ (𝑎) = lim
𝑥→𝑎 𝑥−𝑎
𝑓 (𝑥)𝑔(𝑥)− 𝑓 (𝑎)𝑔(𝑥) + 𝑓 (𝑎)𝑔(𝑥) − 𝑓 (𝑎)𝑔(𝑎)
= lim
𝑥→𝑎 𝑥−𝑎
𝑓 (𝑥)− 𝑓 (𝑎) 𝑔(𝑥) − 𝑔(𝑎)
= lim ( ⋅ 𝑔(𝑥) + 𝑓 (𝑎) ⋅ )
𝑥→𝑎 𝑥−𝑎 𝑥−𝑎
𝑓 (𝑥)− 𝑓 (𝑎) 𝑔(𝑥) − 𝑔(𝑎)
= lim ⋅ lim 𝑔(𝑥) + lim 𝑓 (𝑎) ⋅ lim
𝑥→𝑎 𝑥−𝑎 𝑥→𝑎 𝑥→𝑎 𝑥→𝑎 𝑥−𝑎
= 𝑓 ′ (𝑎)𝑔(𝑎) + 𝑓 (𝑎)𝑔 ′ (𝑎).
Proof 2 (Taylor). Note 𝑓 (𝑥 + 𝜖) = 𝑓 (𝑥) + 𝑓 ′ (𝑥)𝜖 + 𝑂(𝜖2 ) and 𝑔(𝑥 + 𝜖) = 𝑔(𝑥) + 𝑔 ′ (𝑥)𝜖 + 𝑂(𝜖2 ), so
By the definition of the derivative, ( 𝑓 (𝑥)𝑔(𝑥))′ is the coefficient of the 𝜖 term, or 𝑓 ′ (𝑥)𝑔(𝑥) + 𝑓 (𝑥)𝑔 ′ (𝑥),
as desired.
6
Proof. By the definition of the derivative as a limit, we want the derivative of the composed function
𝑓 (𝑔(𝑎)) to be
𝑓 (𝑔(𝑥)) − 𝑓 (𝑔(𝑎))
lim
𝑥→𝑎 𝑥−𝑎
But why do we multiply the derivative of 𝑓 (𝑥) with respect to 𝑔(𝑥) by the derivative of 𝑔(𝑥) when
differentiating a composition of functions? Well, that is the question we address in this proof.
To start, the definition of the derivative as a limit to find the instantaneous rate of change of 𝑓 (𝑔(𝑥))
gives
𝑓 (𝑔(𝑥)) − 𝑓 (𝑔(𝑎))
𝑓 ′ (𝑔(𝑎)) = lim
𝑥→𝑎 𝑔(𝑥) − 𝑔(𝑎)
Note here that the denominator contains 𝑔(𝑥)− 𝑔(𝑎) because we must define the rate of change of 𝑓 (𝑔(𝑥))
with respect to the rate of change of the input (i.e., 𝑔(𝑥) in this case as it is the input of the composed
function).Multiplying the above limit with the derivative of 𝑔(𝑥) as a limit, we get the following
And the equivalent expression is exactly what we want. To complete this proof, we summarize the above
findings as
𝑑
( 𝑓 (𝑔(𝑎)) = 𝑓 ′ (𝑔(𝑎)) ⋅ 𝑔 ′ (𝑎)
𝑑𝑥
Proof. This is just a consequence of the chain rule, since the inner function is 𝑓 , the outer function is 𝑥1 ,
and the derivative of 𝑥1 is − 𝑥12 .
2Colloquially, a lemma is an intermediate result proved in order to prove a theorem (the main result).
7
Proof. Note that
−𝑔 ′ (𝑥)
′
1 1
( 𝑓 (𝑥) ⋅ ) = 𝑓 ′ (𝑥) ⋅ + 𝑓 (𝑥) ⋅
𝑔(𝑥) 𝑔(𝑥) 𝑔(𝑥)2
by the product and reciprocal rules, which simplifies to
as desired.
Example. Find the slope of the line tangent to the curve 𝑥 𝑦 2 + 𝑦 3 + 𝑥 𝑦 + 𝑥 + 𝑦 = 8 at (2, 1).
𝑑 2 𝑑
(𝑥 (𝑦 ) + 𝑦 2 ) + ( (𝑦 3 )) + (𝑥 𝑦 ′ + 𝑦) + (1) + (𝑦 ′ ) = 0.4
𝑑𝑥 𝑑𝑥
By the Chain Rule, this is equal to
or 𝑦 ′ = − 10
3
, which is our answer.
Many important/convenient results in calculus are proved using implicit differentiation; we present a
couple of these results here.
𝑑 𝑑
3We do this because we want to solve for 𝑦 ′ = 𝑑𝑥 𝑦, not 𝑑𝑥 (𝑦 2 ). The reason we can apply the Chain Rule to 𝑦 is because we can
represent 𝑦 as 𝑓 (𝑥) – if this doesn’t make sense to you, replace every 𝑦 with an 𝑓 (𝑥) until it does.
4Parentheses are placed around the derivative of each term for clarity.
8
Inverse Function Rule. Given an invertible function 𝑓 ,
1
( 𝑓 −1 (𝑥))′ = ,
𝑓 ′ ( 𝑓 −1 (𝑥))
as long as 𝑓 ′ ( 𝑓 −1 (𝑥)) ≠ 0.
𝑓 (𝑦) = 𝑥.
𝑓 ′ (𝑦)𝑦 ′ = 1
1
𝑦′ = .
𝑓 ′ (𝑦)
Note that 𝑦 = 𝑓 −1 (𝑥), so substituting gives us
1
( 𝑓 −1 (𝑥))′ = ,
𝑓 ′ ( 𝑓 −1 (𝑥))
as desired.
Make sure you understand that intuitively, the derivative of the inverse is just the reciprocal of the
derivative at the corresponding point.
If you want basic practice with implicit differentiation, all you have to do is make up some sort of
reasonably small polynomial. Therefore, we’ll be presenting harder exercises that aren’t so mindless (in
particular, no polynomials).
𝑑𝑦
Exercise (AoPS Calculus, 3.6.3). Find 𝑑𝑥 if 𝑥 2 + 𝑦 = ln(𝑦 2 − 1).
Exercise (AoPS Calculus, 3.6.4). Find the slope of the tangent line to the curve 𝑥 sin(𝑥 + 𝑦) = 𝑦 cos(𝑥 − 𝑦)
at the point (0, 𝜋2 ).
9
Proof. Consider a unit circle with an arc of 𝑥 (in radians). Note that sin 𝑥 is the height of the altitude
from one point on the circle to the other radius in the diagram, and also note that 𝑥 is the arclength. As
𝑥 approaches 0, the arc gets smaller and less curved, and the line becomes a better approximation of the
arclength.
sin 𝑥 𝑥
Proof. Refer to the same setup as above. Note that the altitude splits the radius into two pieces of length
cos 𝑥 and 1 − cos 𝑥. As 𝑥 approaches 0, the path sin 𝑥 takes approaches the path 𝑥 takes, so the difference
in the paths (i.e. 1 − cos 𝑥) becomes negligible compared to 𝑥.
Proof. We use the limit definition of the derivative and the angle addition formulas.
𝑑
sin(𝑥) = cos(𝑥).
𝑑𝑥
10
Proof. We just piggyback on the sin proof. Note that cos 𝑥 = sin(𝑥 + 𝜋2 ), so the derivative of cos 𝑥 is
cos(𝑥 + 𝜋2 ) = sin(𝑥 + 𝜋) = − sin 𝑥.
Exercise (Periodic Derivatives). If 𝑓 (𝑥) = sin 𝑥, find 𝑓 ′ (𝑥), 𝑓 ′′ (𝑥), 𝑓 ′′′ (𝑥), and 𝑓 ′′′′ (𝑥). Do the same for
𝑓 (𝑥) = cos 𝑥.
𝑑 𝑑 sin(𝑥)
tan(𝑥) = ( )
𝑑𝑥 𝑑𝑥 cos(𝑥)
cos(𝑥) cos(𝑥) − sin(𝑥)(− sin(𝑥))
=
cos2 (𝑥)
cos2 (𝑥) + sin2 (𝑥)
=
cos2 (𝑥)
𝑑 1
= tan(𝑥) =
𝑑𝑥 cos2 (𝑥)
𝑑
= tan(𝑥) = sec2 (𝑥).
𝑑𝑥
Exercise (Derivatives of Reciprocal Functions). Given how the trigonometric derivatives for sin, cos,
and tan were derived, determine and prove the derivatives of csc, sec, and cot.
11
Exercise. Find the Maclaurin Series of 𝑥 cos 𝑥.
Proof. Let 𝑓 (𝑥) = 𝑦 and note that this implies sin 𝑦 = 𝑥. Differentiating with respect to 𝑥 gives
𝑑𝑦
cos 𝑦 =1
𝑑𝑥
𝑑𝑦 1
=
𝑑𝑥 cos 𝑦
𝑑𝑦 1
=√ .
𝑑𝑥 1 − 𝑥2
Proof. Let 𝑓 (𝑥) = 𝑦 and note that this implies cos 𝑦 = 𝑥. Differentiating with respect to 𝑥 gives
𝑑𝑦
− sin 𝑦 =1
𝑑𝑥
𝑑𝑦 1
=−
𝑑𝑥 sin 𝑦
𝑑𝑦 1
= −√ .
𝑑𝑥 1 − 𝑥2
Proof. Let 𝑓 (𝑥) = 𝑦 and note that this implies tan 𝑦 = 𝑥. Differentiating with respect to 𝑥 gives
𝑑𝑦
sec2 𝑦 =1
𝑑𝑥
𝑑𝑦 1
=
𝑑𝑥 sec2 𝑦
𝑑𝑦 1
= .
𝑑𝑥 1 + 𝑥 2
The other three functions (the reciprocal functions) are left to the reader. You can either implicitly
differentiate from the start or just use the quotient rule on sin, cos, and tan – both will work.
Exercise (Derivative of Inverse of Reciprocal Trigonometric Functions). Find the derivative of arccsc 𝑥,
arcsec 𝑥, and arccot 𝑥.
12
3.3 Exponential and Logarithmic Functions
If this is your first time learning calculus, What is e? is mandatory reading.
Here’s a short summary of the facts about 𝑒 you absolutely have to know.
Facts about e.
Now we find and prove the derivative of ln 𝑥. It follows straight from the Inverse Function Rule, so try to
do it on your own for a little bit.
Proof. We put this straight into the Inverse Function Rule. Note that
1 a 1
(ln 𝑥)′ = = ,
𝑒 ln 𝑥 𝑥
as desired.
a Remember that the derivative of 𝑒 𝑥 is itself.
13
4 Problems
As a disclaimer, miscellaneous problems related to differentiation that are not on this handout will show up,
such as limit problems and maximization/minimization problems.
Minimum is [32 p]. Problems denoted with n are required. (They still count towards the point total.)
𝑓 (𝑎+ℎ)− 𝑓 (𝑎)
[3p] (MIT OCW) Show that, 𝑔(ℎ) = ℎ has a removable discontinuity at h = 0 given that 𝑓 ′ (𝑎)
exists.
[3p] (HMMT) Determine the real number 𝑎 having the property that 𝑓 (𝑎) = 𝑎 is a relative minimum of
𝑓 (𝑥) = 𝑥 4 − 𝑥 3 − 𝑥 2 + 𝑎𝑥 + 1.
𝑥 cos 𝑥
[4 n] (HMMT) Compute lim 𝑒 −1−𝑥
sin(𝑥 2 )
.
𝑥→0
[4p] (MAST Diagnostic 2020/C10) Find the maximum value of 𝑘 such that (𝑥 + 1)4 ≥ 𝑘𝑥 3 for all 𝑥.
[2p] (Extension of C10) Find the range of values 𝑘 such that (𝑥 + 1)4 ≥ 𝑘𝑥 3 for all 𝑥.
[6 n] (Leibniz Rule) Given two 𝑛th differentiable functions 𝑓 , 𝑔, prove that
𝑛
𝑛
( 𝑓 𝑔)(𝑛) (𝑥) = ∑ ( ) 𝑓 (𝑘) (𝑥)𝑔 (𝑛−𝑘) (𝑥).
𝑘=0 𝑘
[13p] (Hong Kong TST 2021/1/1) Find, with proof, all real triples (𝑎, 𝑏, 𝑐) satisfying
14