0% found this document useful (0 votes)

1 views34 pages

Basic Calculation in Statistics Ver2.1

Uploaded by

fzmgp5y9jx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views34 pages

Basic Calculation in Statistics Ver2.1

Uploaded by

fzmgp5y9jx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Basic Calculation in Statistics

쁘띠유

Department of Statistics, SNU

2025 01 02

To study Mathematical Statistics effectively, it is essential to have a solid un-

derstanding of basic mathematical concepts, such as integration, differentiation, and
related calculations. However, many students often overlook the importance of these
foundational skills. This lecture note is designed to provide you with practical knowl-
edge and exercises to strengthen your basic calculation skills, which are crucial for
mastering mathematical statistics.

Structure of the Lecture

The lecture consists of four parts, including a preliminary section:

1. Indicator Function in Statistics

2. Product Notation (Π)

3. Integration and Derivatives of Vectors

4. Taylor Series and Maclaurin Series

1
Preliminary: Mathematical Notation

This section introduces the fundamental mathematical notations and their meanings.
These notations will be used throughout the lecture notes to ensure clarity and
precision in presenting mathematical ideas.

Symbols and Their Definitions

Defining Symbol

Caution: You have to distinguish it with equal =

• :=: Denotes definition using a colon and equal sign. For example, a := b means
that a is defined to be equal to b.

def
• =: Denotes definition explicitly. For example, in the context of a theorem,
def
f (x) = x2 means that f (x) is defined as x2 .

• ≡: Denotes equivalence or definition using three parallel bars. For example,

a ≡ b can mean that a is defined as b or that a is equivalent to b in a given
context.

def
Commonality: Both :=, =, and ≡ can be used to express definitions in mathe-
matics.
Difference:

• :=: Most commonly used to introduce new symbols or concepts. For example,
g(x) := sin(x) + cos(x).

def
• =: Explicitly highlights that the relation is a formal definition, such as in
def
f (x) = x2 .

• ≡: Can also be used in contexts beyond definitions, such as logical equivalence

(P ≡ Q) or modular arithmetic (a ≡ b mod n).

2
Logical Statements

Quantifiers

• ∀: Universal quantifier.

– Example 1: ∀x ∈ {1, 2, 3}, x > 0 means that all elements in the set
{1, 2, 3} are greater than 0.

– Example 2: ∀y ∈ {Alice, Bob, Carol}, y is a student means that everyone

in the set {Alice, Bob, Carol} is a student.

– Example 3: ∀x ∈ {a, b, c}, x ̸= d means that none of the elements in the

set {a, b, c} are d.

• ∃: Existential quantifier.

– Example 1: ∃x ∈ R, x = 2 means there exists a real number x such that

x = 2.

– Example 2: If 쁘띠유 is a graduate student in the classroom, then ∃x ∈

Classroom, x is a graduate student, meaning there exists at least one grad-
uate student in the classroom.

– Example 3: ∃x ∈ {1, 2, 3, 4}, x > 3 means there exists at least one element
in the set {1, 2, 3, 4} that is greater than 3.

Logical Connectives

• ∧: Logical AND. For example, P ∧ Q means that both P and Q are true; e.g.,
2 > 1 ∧ 3 > 2.

• ∨: Logical OR. For example, P ∨ Q means that either P or Q (or both) are
true; e.g., 2 > 3 ∨ 4 > 3.

• ¬: Logical NOT. For example, ¬P means that P is not true; e.g., if P is 2 > 3,
then ¬P is 2 ≤ 3.

3
• ⇐⇒ : Represents a bidirectional implication. For example, P ⇐⇒ Q means
that P is true if and only if Q is true.

• =⇒ : Denotes implication. For example, P =⇒ Q means that if P is true,

then Q must also be true.

Application of Logical Symbols

Below, I will provide some theorems about mathematical logic. You don’t have
to understand them all. However, you should remember all the theorems
and examples.Especially, example of Negation of Complete Statistics

Theorem 0.1 (Negation of a Universal Quantifier (∀))

The statement:
∀x ∈ R, P (x) (For all x, P (x) is true.)

Its negation is:

¬ (∀x ∈ R, P (x)) ≡ ∃x ∈ R, ¬P (x)

Example 0.1
Consider the statement: ”All real numbers are positive.”

∀x ∈ R, x > 0

Its negation is: ”There exists a real number that is not positive.”

∃x ∈ R, x ≤ 0

Theorem 0.2 (Negation of an Existential Quantifier (∃))

The statement:

∃x ∈ R, P (x) (There exists an x such that P (x) is true.)

4
Its negation is:
¬ (∃x ∈ R, P (x)) ≡ ∀x ∈ R, ¬P (x)
Example 0.2
Consider the statement: ”There exists a real number greater than 10.”

∃x ∈ R, x > 10

Its negation is: ”For all real numbers, none is greater than 10.”

∀x ∈ R, x ≤ 10

Theorem 0.3 (Negation of a Conditional (P =⇒ Q))

The statement:
P =⇒ Q (If P , then Q.)

Its negation is:

¬(P =⇒ Q) ≡ P ∧ ¬Q
Example 0.3
Consider the statement: ”If it rains, the ground is wet.”

Rain =⇒ Wet ground

Its negation is: ”It rains, but the ground is not wet.”

Rain ∧ ¬(Wet ground)

Theorem 0.4 (Combined: Universal Quantifier and Conditional)

The statement:
∀x ∈ R, P (x) =⇒ Q(x)

Its negation is:

¬ (∀x ∈ R, P (x) =⇒ Q(x)) ≡ ∃x ∈ R, P (x) ∧ ¬Q(x)

5
Example 0.4
Consider the statement: ”For all real numbers x, if x > 0, then x2 > 0.”

∀x ∈ R, (x > 0) =⇒ (x2 > 0)

Its negation is: ”There exists a real number x such that x > 0 and x2 ≤ 0.”

∃x ∈ R, (x > 0) ∧ (x2 ≤ 0)

Example 0.5 (Negation of Complete Statistics)

A statistic T (X) is complete for a family of distributions {f (x; θ) : θ ∈ Θ} if for
every function g, the condition:

E[g(T (X))] = 0 for all θ ∈ Θ

implies g(T (x)) = 0 .

Its negation is:

∃g :function, E[g(T (X))] = 0 for all θ ∈ Θ, but g(T (x)) ̸= 0.

Special Symbols

Numbers and Set Notations

• N: The set of natural numbers {1, 2, 3, . . . }.

• Z: The set of integers {. . . , −2, −1, 0, 1, 2, . . . }.

• Q: The set of rational numbers. For example, 1

2
∈ Q.

√
• R: The set of real numbers. For example, 2 ∈ R.

• R+ : Represents the set of positive real numbers. For example, R+ = {x ∈ R |

x > 0}.

6
• R− : Represents the set of negative real numbers. For example, R− = {x ∈ R |
x < 0}.

• C: The set of complex numbers. For example, 2 + 3i ∈ C.

• Intervals:

– Open Interval: An open interval (a, b) is the set of all real numbers x
such that a < x < b. The endpoints a and b are not included in the
interval.
(a, b) = {x ∈ R | a < x < b}.

Example: (1, 3) includes all real numbers x such that 1 < x < 3.

– Closed Interval: A closed interval [a, b] is the set of all real numbers x
such that a ≤ x ≤ b. The endpoints a and b are included in the interval.

[a, b] = {x ∈ R | a ≤ x ≤ b}.

Example: [1, 3] includes all real numbers x such that 1 ≤ x ≤ 3.

– Half-Open Interval:

∗ Left-Open Interval: A half-open interval (a, b] includes all real

numbers x such that a < x ≤ b. The left endpoint a is not included,
but the right endpoint b is included.

(a, b] = {x ∈ R | a < x ≤ b}.

Example: (1, 3] includes all real numbers x such that 1 < x ≤ 3.

∗ Right-Open Interval: A half-open interval [a, b) includes all real

numbers x such that a ≤ x < b. The left endpoint a is included, but
the right endpoint b is not included.

[a, b) = {x ∈ R | a ≤ x < b}.

7
Example: [1, 3) includes all real numbers x such that 1 ≤ x < 3.

• ×: Denotes the Cartesian product of two sets.

– {1, 2}×{3, 4}: The Cartesian product of the sets {1, 2} and {3, 4} consists
of all ordered pairs (x, y), where x ∈ {1, 2} and y ∈ {3, 4}:

{1, 2} × {3, 4} = {(1, 3), (1, 4), (2, 3), (2, 4)}.

4 (1, 4) (2, 4)
3 (1, 3) (2, 3)

x
1 2

– [1, 2] × [3, 4]: The Cartesian product of the intervals [1, 2] and [3, 4]
consists of all ordered pairs (x, y), where x ∈ [1, 2] and y ∈ [3, 4]:

[1, 2] × [3, 4] = {(x, y) | x ∈ [1, 2], y ∈ [3, 4]}.

4
3

x
1 2

• min and max:

– min: Denotes the minimum of a set or sequence. For example, min(3, 1, 4) =

1, meaning that the smallest value in the set {3, 1, 4} is 1. For ordered
data, the minimum value is often denoted as X(1) .

8
– max: Denotes the maximum of a set or sequence. For example, max(3, 1, 4) =
4, meaning that the largest value in the set {3, 1, 4} is 4. For ordered data,
the maximum value is often denoted as X(n) , where n is the total number
of elements in the set.

• e: Represents the base of the natural logarithm. For example, e ≈ 2.71828.

• log: Represents the natural logarithm. For example, log(e) = 1.

Limits and Directional Notations

• Left-hand Limit (limx→a− f (x)) and Equivalent Notations:

– limx→a− f (x)( f (a− )): The left-hand limit, where x approaches a from
values smaller than a (x < a).

– limx↑a f (x)(limx↗a f (x)): The left-hand limit with the added condition
that x approaches a in a monotonically increasing manner.

• Right-hand Limit (limx→a+ f (x)) and Equivalent Notations:

– limx→a+ f (x)(f (a+ )): The right-hand limit, where x approaches a from
values greater than a (x > a).

– limx↓a f (x)(limx↘a f (x)): The right-hand limit with the added condition
that x approaches a in a monotonically decreasing manner.

• Example Function: 
x2




 if x < 2,


f (x) =





5
 if x ≥ 2.

– limx→2− f (x) = f (2− ) = 4.

– limx→2+ f (x) = f (2+ ) = 5.

9
Remark
– f (2− ) can also be represented using the right-upper arrow notation (limx↗2 f (x))
since x < 2 and increasing towards 2.

– f (2+ ), however, cannot be represented using a right-lower arrow notation

(limx↘2 f (x)) because x > 2 is already implied in the definition of f (2+ ),
and x → 2 cannot occur through decreasing from the left for x > 2.

f (x)

f (x) = 5
f (2+ )

f (2− )

f (x) = x2

Figure: Visualization of the function f (x) with distinct left-hand and

right-hand limits at x = 2.

Point Set Representations and Similar Theorems

• Singleton Representation:

∞
\ 1
{x} = (x − , x].
n=1
n

This representation shows that the singleton set containing x can be expressed
as the infinite intersection of nested intervals (x − n1 , x], where the intervals
shrink to the single point x.

10
• Closed Interval as Infinite Intersection:

∞
\ 1 1
[a, b] = (a − , b + ).
n=1
n n

This representation shows that a closed interval [a, b] can be expressed as the
intersection of a sequence of open intervals that converge to [a, b].

• Open Interval as Infinite Union:

∞
[ 1 1
(a, b) = [a + , b − ].
n=1
n n

The open interval (a, b) is expressed as the union of a sequence of closed

intervals, which expand to cover (a, b) as n → ∞.

• Single Point as Nested Closed Intervals:

∞
\ 1 1
{x} = [x − , x + ].
n=1
n n

This representation shows that the singleton set containing x can also be
expressed as the intersection of nested closed intervals around x.

Statistical Symbols and Concepts

• ⊥( ⊥⊥): Denotes independence. For example, X ⊥ Y means that the random

variables X and Y are independent.

• ̸⊥: Denotes dependence (not independent). For example, X ̸⊥ Y means that

X and Y are not independent.

• ⊥⊥| Z: Denotes conditional independence. For example, X ⊥⊥ Y | Z means

that X and Y are independent given Z.

• i.i.d.: Independent and identically distributed. A set of random variables

X1 , X2 , . . . , Xn is said to be i.i.d. if they are independent of each other and

11
follow the same probability distribution.

• ∼: In statistics, ∼ means ”is distributed as.” Examples include:

– X ∼ Normal(µ, σ 2 ): The random variable X is normally distributed with

mean µ and variance σ 2 .

– X ∼ F (x): The random variable X follows the cumulative distribution

function F (x).

– X ∼ f (x): The random variable X follows the probability density func-

tion f (x).

• Mutually (Pairwise) Disjoint: A collection of subsets {A1 , A2 , . . . , An } is

called mutually disjoint if:

Ai ∩ Aj = ∅ for all i ̸= j.

– Example: Let A1 = {1, 2}, A2 = {3, 4}, A3 = {5, 6}. These subsets are
mutually disjoint because:

A1 ∩ A2 = ∅, A2 ∩ A3 = ∅, A1 ∩ A3 = ∅.

– Visualization:

A1 A2 A3

1, 2 3, 4 5, 6

• Partition: A collection of subsets {A1 , A2 , . . . , An } of a set S is called a par-

tition of S if:
A1 ∪ A2 ∪ · · · ∪ An = S

12
(The subsets cover the entire set S) and

Ai ∩ Aj = ∅ for all i ̸= j

(The subsets are mutually disjoint).

– Example: Let S = {1, 2, 3, 4, 5, 6}. A partition of S is A1 = {1, 2}, A2 =

{3, 4}, A3 = {5, 6}, since:

A1 ∪ A2 ∪ A3 = S and Ai ∩ Aj = ∅ for i ̸= j.

– Visualization:
S

A1 A2 A3

13
1 Indicator Function in Statistics
Definition 1.1 (Indicator Function)
The indicator function IA (x) of a set A is defined by



1 if x ∈ A,

IA (x) =

0 if x ∈
 / A.

Example 1.1 (Indicator for a Closed Interval)

Let A = [a, b]. Then


1
 if a ≤ x ≤ b,
I[a,b] (x) =

0
 otherwise.

Example 1.2 (Indicator for a PDF)

If X is uniform on [m, n], then

1
fX (x) = I[m,n] (x).
n−m

Example 1.3 (Piecewise Function via Indicator)


x 2

 0 < x < 3,
f (x) = =⇒ f (x) = x2 I(0,3) (x).

0
 otherwise.

Example 1.4 (Another Piecewise Function)


x 2



 0 < x < 3,


f (x) = x2 I(0,3) (x) + −x + 12 I[3,12) (x).

f (x) = − x + 12 3 ≤ x < 12, =⇒





0
 otherwise.

Theorem 1.1 (Properties of the Indicator Function)

For sets A, B and random variable X:

14
1. IA∪B (x) = IA (x) + IB (x) − IA∩B (x).

2. IAc (x) = 1 − IA (x).

3. IA∩B (x) = IA (x) IB (x).

Example 1.5 (Indicator of Observations)

Let X1 , . . . , Xn be random variables. Then

n
Y

I(Xi < a). = I X1 < a, . . . , Xn < a = I max{X1 , . . . , Xn } < a
i=1

n
Y

I(Xi > a) = I X1 > a, . . . , Xn > a = I min{X1 , . . . , Xn } > a
i=1

Remark
The notation I(X1 < a, . . . , Xn < a) represents the indicator function for the inter-
section of events:

n
\
I(X1 < a, . . . , Xn < a) = I {Xi < a} .
i=1

Similarly, I(X1 > a, . . . , Xn > a) represents:

n
\
I(X1 > a, . . . , Xn > a) = I {Xi > a} .
i=1

Theorem 1.2 (Derivative with an Indicator)

If f is differentiable and IA is the indicator of A, then for x ∈
/ ∂A,

d
f (x) IA (x) = f ′ (x) IA (x).

dx

Remark (Boundary Effects)

At x ∈ ∂A, the derivative may involve terms like δ∂A (x).

15
Theorem 1.3 (Integral with an Indicator)
If f is integrable, then

Z Z
f (x) IA (x) dx = f (x) dx.
C C∩A

Example 1.6 (Integral Over Intervals)

for any integrable function f (x),


Z b
R min(b,d) f (x) dx, if they overlap,


max(a,c)
f (x) I(c,d) (x) dx =
a 
0,
 otherwise.

Theorem 1.4 (Bernoulli Distribution of an Indicator)

Let X be a random variable and A an event. The indicator variable IA (X) is defined
as: 

1 if X ∈ A,

IA (X) =

0 otherwise.


IA (X) follows a Bernoulli distribution with parameter p = P (X ∈ A). Its expectation

is:
E[IA (X)] = p.

Example 1.7
Consider a coin flip experiment where N represents the number of heads in one
repetition, and A denotes the event that the coin lands heads. N is defined as:


1 if the coin lands heads,

N=

0 if the coin lands tails.


If the probability of heads is p = 0.5, then N follows a Bernoulli distribution with

parameter p = 0.5, and its expectation is:

E[N ] = 0.5.

16
Theorem 1.5 (Counting Elements Having Property P via Indicators)
Let X1 , X2 , . . . , Xn be random variables taking values in some set X , and let P ⊆ X
represent a certain property (or subset) of interest.
Define
n
X
S= IP (Xi ),
i=1

where 

1 if Xi ∈ P,

IP (Xi ) =

0 otherwise.


Then S is a random variable that counts how many of the Xi ’s belong to P . In

particular,

n
X Xn n
X
E[S] = E IP (Xi ) = E[IP (Xi )] = P (Xi ∈ P ).
i=1 i=1 i=1

Example 1.8
Suppose we have n coin flips, denoted by X1 , X2 , . . . , Xn . Let the property P be the
event {coin lands heads}. Define

n
X
S= I{heads} (Xi ).
i=1

Then S is the total number of heads in n coin flips. If each coin flip has probability
p of landing heads (independently), then

n
X n
X n
X
E[S] = E[I{heads} (Xi )] = P (heads) = p = np.
i=1 i=1 i=1

Thus, S follows a Binomial distribution with parameters (n, p), and its expectation
is np.

17
2 Product Notation (Π)
Definition 2.1 (Product Notation)
For a1 , . . . , an ,
n
Y
ai = a1 a2 · · · an .
i=1

If n = 0, the product is 1 (by convention).

Q
Theorem 2.1 (Properties of )
1. ni=1 c = cn .
Q

Qn P
2. i=1 bei = b ei
.

Qn Q Q
n n
3. i=1 (ai bi ) = i=1 ai i=1 bi .

Example 2.1 (Product of Powers)

Y3
2i = 21+2+3 = 26 = 64.
i=1

Example 2.2 (Product of Constants)

Y3
2 = 23 = 8.
i=1

Example 2.3 (Likelihood with Normal Density)

Let X1 , . . . , Xn be i.i.d. N (µ, σ 2 ). Then

1 2

f (x) = √ exp − (x−µ)
2σ 2
,
2π σ 2

and
1 n Pn
(X −µ)2

L(µ, σ 2 ) = √ exp − i=12σ2i .
2π σ 2

18
3 Integration and Derivatives of Vectors

3.1 Integration: Fubini’s Theorem

Definition 3.1 (Double Integral)
For a function f (x, y) integrable on [a, b] × [c, d],

Z bZ d Z d Z b
f (x, y) dy dx = f (x, y) dx dy.
a c c a

Example 3.1 (Double Integral Computation)

Let f (x, y) = x + y on [0, 1] × [0, 2]. Then

Z 1 Z 2 Z 1h iy=2 Z 1
y2
(x + y) dy dx = xy + 2
dx = (2x + 2) dx = 3.
0 0 0 y=0 0

Remark
We can calculate double integral easily by chaing the order of x and y.

Example 3.2 (Joint Distribution)

Let the joint density function be:

fX,Y (x, y) = λ2 xe−λx(1+y) , x ≥ 0, y ≥ 0, λ > 0

Verify that:
Z ∞ Z ∞
fX,Y (x, y) dy dx = 1.
0 0

Proof ) Substitute fX,Y (x, y):

Z ∞ Z ∞
λ2 xe−λx(1+y) dy dx
0 0

Factor e−λx , which depends only on x:

Z ∞ Z ∞ Z ∞ Z ∞
2 −λx(1+y) 2 −λx −λxy
λ xe dy dx = λ xe e dy dx
0 0 0 0

19
The inner integral is:
Z ∞
e−λxy dy
0

du
Perform a substitution: let u = λxy, so du = λx dy, or equivalently dy = λx
.
Update the limits:

• When y = 0, u = 0,

• When y → ∞, u → ∞.

The inner integral becomes:

Z ∞ Z ∞ Z ∞
−λxy −u 1 1
e dy = e du = e−u du
0 0 λx λx 0

The integral of e−u over [0, ∞) is 1, so:

Z ∞
1
e−λxy dy =
0 λx

Substitute this back into the double integral:

Z ∞ Z ∞ Z ∞
−λx(1+y) 1
2
λ xe dy dx = λ2 xe−λx · dx
0 0 0 λx

Simplify:
Z ∞ Z ∞ Z ∞
fX,Y (x, y) dy dx = λe−λx dx
0 0 0

Now compute the outer integral:

Z ∞ ∞
λe−λx dx = −e−λx 0 = 1

0

Thus, the double integral equals 1, and fX,Y (x, y) is a valid joint probability den-
sity function.

20
3.2 Derivatives of Vectors
Definition 3.2 (Vectors)
A vector x in Rn is an ordered n-tuple of real numbers:

x = (x1 , x2 , . . . , xn ).

We often write it in column form:

 
x
 1
 
x 
 2
x=
 . .

 .. 
 
 
xn

Remark
In many contexts, especially in calculus and linear algebra, an n-dimensional vector
is conventionally treated as a column vector (i.e., written vertically).

Definition 3.3 (Gradient and Jacobian)

1. Gradient: Let g : Rn → R. Its gradient is the column vector of partial deriva-
tives:
∂g
 
 ∂x1 
 ∂g 
 
∇g(x) =  .
 ∂x2 
 ∂g 
∂xn

2. Jacobian Matrix: For a function y : Rn → Rm with components y =

(y1 , . . . , ym ), its Jacobian matrix is the m × n matrix of partial derivatives:

 
∂y1 ∂y1
 ∂x1 · · · ∂xn 
 
dy  .
.. ... .. 
Jy (x) = =  .
. 
dx 
 
 ∂ym ∂ym 
···
∂x1 ∂xn

21
Remark
The term “Jacobian” may refer either to the Jacobian matrix or the Jacobian de-
terminant. In this text, unless otherwise specified, we use “Jacobian” to mean the
Jacobian matrix.
Definition 3.4 (Hessian Matrix)
Let g : Rn → R be twice differentiable. Then the Hessian matrix of g, denoted by
∇2 g(x) or Hg (x), is defined as the matrix of all second-order partial derivatives:

∂ 2g ∂ 2g ∂ 2g
 
 ∂x2 ···
 1 ∂x1 ∂x2 ∂x1 ∂xn 

 ∂ 2g ∂ 2g ∂ 2g 
···
 
∂x ∂x ∂x22 ∂x2 ∂xn  .
2
 
∇ g(x) = 
 2. 1
.. .. .. .. 

 . . . 

 
 ∂ 2g ∂ 2g ∂ 2g 
···
∂xn ∂x1 ∂xn ∂x2 ∂x2n

Remark
Whereas the gradient is an n-dimensional column vector, the Hessian is an n × n
matrix. It captures how the gradient itself changes with respect to x, thus providing
curvature information of the function.

Example 3.3 (Elementary Jacobian Example)

Let x = (x1 , x2 )T and define y(x) = x21 + x2 , sin x2 . Then

y1 = x21 + x2 , y2 = sin(x2 ).

Hence    
∂(x21 +x2 ) ∂(x21 +x2 )
∂y  ∂x1 ∂x2 2x1 1 
= = .

∂x ∂(sin x2 ) ∂(sin x2 )
0 cos x2
∂x1 ∂x2

Example 3.4 (Statistical Jacobian: Transforming Parameters)

Consider a bivariate normal with parameters (µ, σ) and let θ1 = µ, θ2 = log σ. Then

22

σ = eθ2 . The Jacobian of µ, σ w.r.t. θ1 , θ2 is

   
∂µ ∂µ
 ∂θ1 ∂θ2 1 0 
 = .
∂σ ∂σ θ2
∂θ1 ∂θ
0 e
2

Example 3.5 (2D Hessian Example)

Consider the function g : R2 → R given by

g(x1 , x2 ) = x21 + 3 x1 x2 + 2 x22 .

• Gradient (1st-order partials):

∂g

  
2x + 3x2
 1  1
∇g(x1 , x2 ) =  ∂x = .

∂g  
∂x2 3x1 + 4x2

• Hessian (2nd-order partials):

∂ 2g ∂ 2g
   
 ∂x2 ∂x1 ∂x2  2 3
∇2 g(x1 , x2 ) = 
 ∂ 2g
1
2
= .
∂ g  
3 4
∂x2 ∂x1 ∂x22

Notice the Hessian is a constant 2 × 2 matrix in this example. To determine whether

a critical point is a local minimum, maximum, or saddle, one often checks whether
the Hessian is positive definite, negative definite, or indefinite at that point.

Example 3.6 (MLE Gradient)

Consider i.i.d. X1 , . . . , Xn ∼ N (µ, σ 2 ). Let θ = (µ, σ 2 ). The log-likelihood is

n
1 X
ℓ(θ) = ℓ(µ, σ 2 ) = − n2 ln(2π) − n
2
ln(σ 2 ) − (Xi − µ)2 .
2σ 2 i=1

23
∂ℓ
, ∂ℓ

We use the gradient ∇ℓ(θ) = ∂µ ∂σ 2
. Then

n n
∂ℓ 1 X ∂ℓ n 1 X
= 2 (Xi − µ), = − + (Xi − µ)2 .
∂µ σ i=1 ∂σ 2 2σ 2 2(σ 2 )2 i=1

Setting these derivatives to zero,

n n
∂ℓ X 1X
= 0 =⇒ (Xi − µ) = 0 =⇒ µ = Xi ,
∂µ i=1
n i=1

n n
∂ℓ n 1 X
2 2 1 X
= 0 =⇒ − 2 + (Xi − µ) = 0 =⇒ σ = (Xi − µ)2 .
∂σ 2 2σ 2(σ 2 )2 i=1 n i=1

Thus the MLE is µ b2 as above.

b and σ

24
4 Taylor Series and Optimization

4.1 Taylor Series and Maclaurin Serires

Theorem 4.1 (Taylor Series)
Let f (x) be a function that is infinitely differentiable at a point x0 .
I. Original Series:

∞
X f (n) (x0 )
f (x) = (x − x0 )n ,
n=0
n!

where f (n) (x0 ) denotes the n-th derivative of f (x) evaluated at x = x0 .

II. Truncated Series with Remainder: If the series is truncated at the N -th
term, the function can be exactly expressed as:

N
X f (n) (x0 )
f (x) = (x − x0 )n + RN (x),
n=0
n!

where the remainder term RN (x) is:

f (N +1) (x∗ )
RN (x) = (x − x0 )N +1 ,
(N + 1)!

for some x∗ in the interval between x0 and x.

Theorem 4.2 (Maclaurin Series)

The Maclaurin series is a special case of the Taylor series where x0 = 0.
I. Original Series:
∞
X f (n) (0)
f (x) = xn .
n=0
n!

II. Truncated Series with Remainder: If the series is truncated at the N -th
term, the function can be exactly expressed as:

N
X f (n) (0)
f (x) = xn + RN (x),
n=0
n!

25
where the remainder term RN (x) is:

f (N +1) (x∗ ) N +1
RN (x) = x ,
(N + 1)!

for some x∗ in the interval between 0 and x.

Remark (Comparison of Taylor and Maclaurin Series)
The Taylor series provides a local approximation of a function around any point x0 ,
while the Maclaurin series is a special case that approximates around x0 = 0. Both
series can be expressed as a truncated sum with a remainder term RN (x), which
quantifies the error in the approximation.
Corollary 1 (Examples of Maclaurin Series)
The following are the Maclaurin series expansions for common functions:

1. ex :
∞
X xn x2 x3
ex = =1+x+ + + ···
n=0
n! 2! 3!

1
2. 1−x
(for |x| < 1):

∞
1 X
= xn = 1 + x + x2 + x3 + · · ·
1 − x n=0

1
3. 1+x
(for |x| < 1):

∞
1 X
= (−1)n xn = 1 − x + x2 − x3 + · · ·
1 + x n=0

4. − ln(1 − x) (for |x| < 1):

∞
X xn x2 x3
− ln(1 − x) = =x+ + + ···
n=1
n 2 3

5. ln(1 + x) (for |x| < 1):

∞
X xn x2 x3
ln(1 + x) = (−1)n+1 =x− + − ···
n=1
n 2 3

26
6. sin x:
∞
X x2n+1 x3 x5
sin x = (−1)n =x− + − ···
n=0
(2n + 1)! 3! 5!

7. cos x:
∞
X x2n x2 x4n
cos x = (−1) =1− + − ···
n=0
(2n)! 2! 4!

Remark (Moments)
The k-th moment of a random variable X is defined as:

µk = E[X k ],

where k = 1, 2, 3, . . .. The first moment is the mean (µ1 = E[X]), and the second
central moment is the variance (E[(X − E[X])2 ]).

Example 4.1 (Approximating an Integral Using Maclaurin Series)

Evaluate the integral:
Z 0.5
2
I= e−x dx
0

2
using the Maclaurin series for e−x .
Step 1: Write the Maclaurin series for ex . The Maclaurin series for ex
is:
∞
x
X xn
e = .
n=0
n!

Substitute −x2 for x:

∞ ∞
−x2
X (−x2 )n X (−1)n x2n
e = = .
n=0
n! n=0
n!

Step 2: Approximate the integral by truncating the series. Truncate

the series to the first three terms:

2 x4
e−x ≈ 1 − x2 + .
2

27
Step 3: Integrate term by term.

0.5
x4
Z
2
I≈ 1−x + dx.
0 2

Compute each term:

Z 0.5
1 dx = [x]00.5 = 0.5,
0

0.5 0.5
x3 (0.5)3
Z
2 0.125
x dx = = = ≈ 0.04167,
0 3 0 3 3
0.5 5 0.5
x4 (0.5)5
Z
x 0.03125
dx = = = = 0.003125.
0 2 10 0 10 10

Add these results:

I ≈ 0.5 − 0.04167 + 0.003125 = 0.461455.

Step 4: Compare with the exact value. Using numerical integration, the
R 0.5 2
exact value of 0 e−x dx is approximately 0.46128, showing the Maclaurin approx-
imation is highly accurate.

Example 4.2 (Finding a Root Using Maclaurin Series)

Approximate a root of the equation:

sin x = 0.5

using the Maclaurin series for sin x.

Step 1: Write the Maclaurin series for sin x.

∞
X x2n+1 x3 x5
sin x = (−1)n =x− + − ···
n=0
(2n + 1)! 3! 5!

Step 2: Set sin x ≈ 0.5. Truncate the series to the first two terms:

x3
sin x ≈ x − .
6

28
Solve:
x3
x− = 0.5.
6

Step 3: Solve iteratively. Let x0 = 0.5 (initial guess). Substitute into the
equation:
(0.5)3
x1 = 0.5 + = 0.52083.
6

Repeat:
(0.52083)3
x2 = 0.52083 − = 0.52333.
6

After two iterations, the root is approximately:

x ≈ 0.52333.

Step 4: Compare with the exact value. The exact value of the root is
π
x = arcsin(0.5) = 6
≈ 0.523598, showing that the Maclaurin approximation is
highly accurate.

Example 4.3 (Relation Between MGF and Moments)

The k-th moment of a random variable X can be obtained by differentiating its MGF
MX (t) k-times and evaluating the result at t = 0:

dk
µk = MX (t) .
dtk t=0

Proof Using Maclaurin Series (Corollary 1):

Let MX (t) = E[etX ]. Expanding etX using its Maclaurin series:

∞
X (tX)n
etX = .
n=0
n!

Taking the expectation:

∞
(tX)n
X
tX
MX (t) = E e =E .
n=0
n!

29
By linearity of expectation:

∞
X tn
MX (t) = E[X n ].
n=0
n!

Thus, the MGF can be expressed as:

∞
X µn t n
MX (t) = ,
n=0
n!

where µn = E[X n ] is the n-th moment of X.

To find the k-th moment:

dk
k
µk = E[X ] = k MX (t) .
dt t=0

Differentiating MX (t) k-times:

∞
dk dk X µn tn
MX (t) = .
dtk dtk n=0 n!

By the properties of differentiation:

∞ ∞
dk X µn tn X µn
= · n(n − 1) · · · (n − k + 1) · tn−k .
dtk n=0 n! n=k
n!

At t = 0, all terms vanish except for n = k, leaving:

dk µk
MX (t) = · k! = µk .
dtk t=0 k!

Conclusion: The k-th moment µk is obtained by differentiating the MGF MX (t)

k-times and setting t = 0:
dk
µk = MX (t) .
dtk t=0

30
4.2 Optimization
Definition 4.1 (Critical Point)
A point x∗ in the domain of a differentiable function f (x) is called a critical point
if:
∇f (x∗ ) = 0.

Alternatively, x∗ is a critical point if f ′ (x∗ ) = 0 (in one dimension).

Remark
Critical points are candidates for local minima, local maxima, or saddle points. Fur-
ther analysis, such as the first or second derivative test, is required to classify them.
Definition 4.2 (Convex Function)
A function f (x) is called convex on a domain D ⊆ Rn if, for all x, y ∈ D and
λ ∈ [0, 1], it satisfies:

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).

Theorem 4.3 (Second Derivative Condition for Convexity)

Let f (x) be twice differentiable on D ⊆ Rn .

1. In one dimension (n = 1): f (x) is convex if and only if:

f ′′ (x) ≥ 0 for all x ∈ D.

2. In higher dimensions (n > 1): f (x) is convex if and only if the Hessian matrix
∇2 f (x) is positive semidefinite for all x ∈ D, i.e.:

z ⊤ ∇2 f (x)z ≥ 0 for all z ∈ Rn .

Definition 4.3 (Strictly Convex Function)

A function f (x) is called strictly convex on a domain D ⊆ Rn if, for all x, y ∈ D
with x ̸= y and λ ∈ (0, 1), it satisfies:

f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y).

31
Theorem 4.4 (Second Derivative Condition for Strict Convexity)
Let f (x) be twice differentiable on D ⊆ Rn .

1. In one dimension (n = 1): f (x) is strictly convex if and only if:

f ′′ (x) > 0 for all x ∈ D.

2. In higher dimensions (n > 1): f (x) is strictly convex if and only if the Hessian
matrix ∇2 f (x) is positive definite for all x ∈ D, i.e.:

z ⊤ ∇2 f (x)z > 0 for all nonzero z ∈ Rn .

Remark
The second derivative (or Hessian matrix) provides a powerful tool for determin-
ing the convexity or strict convexity of a function. For scalar functions, convexity
corresponds to f ′′ (x) ≥ 0, while strict convexity corresponds to f ′′ (x) > 0.
Theorem 4.5 (Convex Function)
Let f (x) be a differentiable convex function. If ∇f (x∗ ) = 0, then x∗ is a global
minimum of f (x).
Remark
For convex functions, the graph is ”open downward,” and any stationary point where
∇f (x) = 0 is guaranteed to be a global minimum.
Theorem 4.6 (Strictly Convex Function)
Let f (x) be a differentiable strictly convex function. If ∇f (x∗ ) = 0, then x∗ is the
unique global minimum of f (x).
Remark
Strict convexity ensures that the function’s graph is ”sharply open downward,” mak-
ing the global minimum unique.
Theorem 4.7 (Coercivity)
Let f (x) be a coercive function, meaning f (x) → ∞ as ∥x∥ → ∞. Then, f (x) has
at least one global minimum.

32
Remark
Coercivity guarantees the existence of a global minimum even when the domain is
unbounded. It describes functions that ”grow large enough” at the boundaries.

Theorem 4.8 (Global Minimum via First Derivative)

Let f (x) be a differentiable function on R. Suppose f ′ (x∗ ) = 0 at some critical point
x∗ . Additionally, assume that:

1. f ′ (x) < 0 for all x < x∗ , and

2. f ′ (x) > 0 for all x > x∗ .

Then x∗ is the unique global minimum of f (x).

Remark
This theorem allows for identifying global minima without requiring the function to
be convex. However, it only holds on one dimensional euclidian space R

Example 4.4 (Application of Global Minimum via First Derivative)

Consider the function f (x) = x4 − 4x2 + 3.

1. Find critical points:

f ′ (x) = 4x3 − 8x = 4x(x2 − 2).

√
Critical points: x = 0, ± 2.
√ √
2. Analyze f ′ (x): - For x < − 2, f ′ (x) > 0. - For − 2 < x < 0, f ′ (x) < 0. -
√ √
For 0 < x < 2, f ′ (x) < 0. - For x > 2, f ′ (x) > 0.
√
3. Apply the theorem: - At x = ± 2, f ′ (x) changes sign from negative to
positive, indicating local minima. - At x = 0, f ′ (x) does not change sign, so
x = 0 is not a local extremum.

4. Evaluate f (x):

√ √
f (− 2) = f ( 2) = −3, f (0) = 3.

33
√ √
5. Conclusion: Since f (x) → ∞ as x → ±∞, and f (− 2) = f ( 2) = −3, the
√
points x = ± 2 are global minima.

Summary Table: Convexity and Global Minimum

Type Unique Key Features

Convex with Critical Point No ∇f (x∗ ) = 0 =⇒ global min

Convex with Coercive No f (x) → ∞ as ∥x∥ → ∞

Strict Convex with Critical Point Yes ∇f (x∗ ) = 0 =⇒ unique global min

First Order Condition Yes f ′ (x) and behavior at infinity

Calculus For Engineers, 4th Edition
100% (13)
Calculus For Engineers, 4th Edition
1,216 pages
Richard Earl - Towards Higher Mathematics - A Companion-Cambridge University Press (2017)
No ratings yet
Richard Earl - Towards Higher Mathematics - A Companion-Cambridge University Press (2017)
544 pages
Algebra
100% (1)
Algebra
367 pages
02 Final Theory
No ratings yet
02 Final Theory
47 pages
Mathematical Analysis Volume 1
100% (2)
Mathematical Analysis Volume 1
574 pages
Analysis MATH2039 PDF
No ratings yet
Analysis MATH2039 PDF
178 pages
Fullnotes 21
No ratings yet
Fullnotes 21
52 pages
Business Mathematics
No ratings yet
Business Mathematics
171 pages
Discrete Math Notes
0% (1)
Discrete Math Notes
454 pages
FE Exam MathReview PDF
100% (1)
FE Exam MathReview PDF
178 pages
Algebra PDF
No ratings yet
Algebra PDF
367 pages
Maths1 Lecture Notes
No ratings yet
Maths1 Lecture Notes
42 pages
Mathematical Analysis 1704209468
No ratings yet
Mathematical Analysis 1704209468
300 pages
Chapter 4 Mathematical English - M1&M2
No ratings yet
Chapter 4 Mathematical English - M1&M2
39 pages
Elements of Mathematical Style: A Work in Progress
No ratings yet
Elements of Mathematical Style: A Work in Progress
24 pages
Lecture Notes For Transition To Advanced Mathematics
No ratings yet
Lecture Notes For Transition To Advanced Mathematics
115 pages
Algebra Notes 2024
No ratings yet
Algebra Notes 2024
36 pages
MATT101 MATV111 Algebra Notes 2025 (31 Jan)
No ratings yet
MATT101 MATV111 Algebra Notes 2025 (31 Jan)
55 pages
Lec 1adasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf A
No ratings yet
Lec 1adasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf Aadasdqwe Qewq Ew Adsf A
27 pages
Lec Notes
No ratings yet
Lec Notes
76 pages
Chaptre 1 (Set of Real Numbers)
No ratings yet
Chaptre 1 (Set of Real Numbers)
10 pages
Introduction To Set Theory by J.V.Benitez
No ratings yet
Introduction To Set Theory by J.V.Benitez
72 pages
Proofs PDF
No ratings yet
Proofs PDF
28 pages
Introduction
No ratings yet
Introduction
13 pages
Lecture Notes: 0N1 (MATH19861) Mathematics For Foundation Year
No ratings yet
Lecture Notes: 0N1 (MATH19861) Mathematics For Foundation Year
180 pages
Smth012 Lecture Notes 2019
No ratings yet
Smth012 Lecture Notes 2019
151 pages
Lecture Notes: 0N1 (MATH19861) Mathematics For Foundation Year
No ratings yet
Lecture Notes: 0N1 (MATH19861) Mathematics For Foundation Year
177 pages
2 - Mathematical Language and Symbols (No Video)
No ratings yet
2 - Mathematical Language and Symbols (No Video)
39 pages
Lecture Notes (English)
No ratings yet
Lecture Notes (English)
78 pages
Cis 2203
No ratings yet
Cis 2203
28 pages
Mat 102
No ratings yet
Mat 102
81 pages
Mortad M. Basic Abstract Algebra. Exercises and Solutions 2022
No ratings yet
Mortad M. Basic Abstract Algebra. Exercises and Solutions 2022
210 pages
Symbols and Notation - ENG
No ratings yet
Symbols and Notation - ENG
2 pages
Fuchs
No ratings yet
Fuchs
177 pages
Sma2100 Discrete Mathematics
100% (1)
Sma2100 Discrete Mathematics
78 pages
Math Chapter 0
No ratings yet
Math Chapter 0
7 pages
Harolds Sets Cheat Sheet
No ratings yet
Harolds Sets Cheat Sheet
13 pages
Math 448 Lecture
No ratings yet
Math 448 Lecture
71 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
MAT133
No ratings yet
MAT133
5 pages
Math Preparedness Workbook
100% (2)
Math Preparedness Workbook
26 pages
220lecture Notes
No ratings yet
220lecture Notes
76 pages
MATH 421 Practice
No ratings yet
MATH 421 Practice
86 pages
Logic, Sets, and Proofs: David A. Cox and Catherine C. Mcgeoch Amherst College
No ratings yet
Logic, Sets, and Proofs: David A. Cox and Catherine C. Mcgeoch Amherst College
7 pages
Functions
No ratings yet
Functions
47 pages
Reviewer
No ratings yet
Reviewer
8 pages
Mathematics For Sociologists
No ratings yet
Mathematics For Sociologists
27 pages
Notes On Discrete Math
No ratings yet
Notes On Discrete Math
458 pages
Chapter 1.1 - Introduction To The Language of Mathematics
No ratings yet
Chapter 1.1 - Introduction To The Language of Mathematics
7 pages
Chapter 1 - Standard Form
No ratings yet
Chapter 1 - Standard Form
4 pages
Ch1 Functions
No ratings yet
Ch1 Functions
8 pages
Ma1100 Cheatsheet Midterms
No ratings yet
Ma1100 Cheatsheet Midterms
2 pages
Unit-2 Vector Calculus Notes
No ratings yet
Unit-2 Vector Calculus Notes
28 pages
Some Mathematical Notations
No ratings yet
Some Mathematical Notations
11 pages
Some Mathematical Notations
No ratings yet
Some Mathematical Notations
9 pages
Discrete Math - 2024 One Pager
No ratings yet
Discrete Math - 2024 One Pager
1 page
Notations and Formulas
No ratings yet
Notations and Formulas
8 pages
Calculus 1
No ratings yet
Calculus 1
12 pages
Mathematical Structure:: An Introduction To Mathematical Reasoning and Proofs
No ratings yet
Mathematical Structure:: An Introduction To Mathematical Reasoning and Proofs
68 pages
Equations of Tangents & Normals (Powers of Only) : Mr. Haider Ali
No ratings yet
Equations of Tangents & Normals (Powers of Only) : Mr. Haider Ali
23 pages
E M T Complete e Notes
No ratings yet
E M T Complete e Notes
121 pages
Blending Photos in Photoshop by Robert Berdan
No ratings yet
Blending Photos in Photoshop by Robert Berdan
4 pages
(Rothe) Topic From Relativity (2010)
No ratings yet
(Rothe) Topic From Relativity (2010)
107 pages
Gateaux Differentials and Frechet Derivatives
No ratings yet
Gateaux Differentials and Frechet Derivatives
6 pages
National Open University of Nigeria: School of Science and Technology
No ratings yet
National Open University of Nigeria: School of Science and Technology
244 pages
Functions and Graphs Summary Notes Revision
No ratings yet
Functions and Graphs Summary Notes Revision
44 pages
Ed PPT-MCP-PH121
No ratings yet
Ed PPT-MCP-PH121
35 pages
2016 Pape1 To 2018 PDF
100% (1)
2016 Pape1 To 2018 PDF
29 pages
Anna University (Reg 2013) : Multiple Choice Question
No ratings yet
Anna University (Reg 2013) : Multiple Choice Question
38 pages
Partial Diferentiation
No ratings yet
Partial Diferentiation
41 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
6 Ex 4A - The Derivative Review
No ratings yet
6 Ex 4A - The Derivative Review
7 pages
First Semester
No ratings yet
First Semester
43 pages
Lac
No ratings yet
Lac
30 pages
EMTL Notes PDF
No ratings yet
EMTL Notes PDF
139 pages
Mathematics II Course Outline
No ratings yet
Mathematics II Course Outline
5 pages
12 PiXL Gateway Physics Practical Skills
No ratings yet
12 PiXL Gateway Physics Practical Skills
12 pages
Ma1201 - 2015-16new PDF
No ratings yet
Ma1201 - 2015-16new PDF
7 pages
An Isotropic 3x3 Image Gradient Operator: February 2014
No ratings yet
An Isotropic 3x3 Image Gradient Operator: February 2014
6 pages
Towards Rehearsal-Free Multilingual ASR - A LoRA-based Case Study On Whisper
No ratings yet
Towards Rehearsal-Free Multilingual ASR - A LoRA-based Case Study On Whisper
5 pages
Differentiation YME Qs
No ratings yet
Differentiation YME Qs
19 pages
Notes MATH 251 Lecture 17
No ratings yet
Notes MATH 251 Lecture 17
27 pages
Differentiation of e Ax and LN (Ax)
No ratings yet
Differentiation of e Ax and LN (Ax)
5 pages
Kintzel Fourth Order Tensors Part2
No ratings yet
Kintzel Fourth Order Tensors Part2
23 pages
Campos Vetoriais
No ratings yet
Campos Vetoriais
3 pages
Vector Calculus - Understanding The Gradient - BetterExplained
No ratings yet
Vector Calculus - Understanding The Gradient - BetterExplained
30 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)