0% found this document useful (0 votes)
1 views34 pages

Basic Calculation in Statistics Ver2.1

Uploaded by

fzmgp5y9jx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views34 pages

Basic Calculation in Statistics Ver2.1

Uploaded by

fzmgp5y9jx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Basic Calculation in Statistics

쁘띠유

Department of Statistics, SNU

2025 01 02

To study Mathematical Statistics effectively, it is essential to have a solid un-


derstanding of basic mathematical concepts, such as integration, differentiation, and
related calculations. However, many students often overlook the importance of these
foundational skills. This lecture note is designed to provide you with practical knowl-
edge and exercises to strengthen your basic calculation skills, which are crucial for
mastering mathematical statistics.

Structure of the Lecture


The lecture consists of four parts, including a preliminary section:

1. Indicator Function in Statistics

2. Product Notation (Π)

3. Integration and Derivatives of Vectors

4. Taylor Series and Maclaurin Series

1
Preliminary: Mathematical Notation

This section introduces the fundamental mathematical notations and their meanings.
These notations will be used throughout the lecture notes to ensure clarity and
precision in presenting mathematical ideas.

Symbols and Their Definitions

Defining Symbol

Caution: You have to distinguish it with equal =

• :=: Denotes definition using a colon and equal sign. For example, a := b means
that a is defined to be equal to b.

def
• =: Denotes definition explicitly. For example, in the context of a theorem,
def
f (x) = x2 means that f (x) is defined as x2 .

• ≡: Denotes equivalence or definition using three parallel bars. For example,


a ≡ b can mean that a is defined as b or that a is equivalent to b in a given
context.

def
Commonality: Both :=, =, and ≡ can be used to express definitions in mathe-
matics.
Difference:

• :=: Most commonly used to introduce new symbols or concepts. For example,
g(x) := sin(x) + cos(x).

def
• =: Explicitly highlights that the relation is a formal definition, such as in
def
f (x) = x2 .

• ≡: Can also be used in contexts beyond definitions, such as logical equivalence


(P ≡ Q) or modular arithmetic (a ≡ b mod n).

2
Logical Statements

Quantifiers

• ∀: Universal quantifier.

– Example 1: ∀x ∈ {1, 2, 3}, x > 0 means that all elements in the set
{1, 2, 3} are greater than 0.

– Example 2: ∀y ∈ {Alice, Bob, Carol}, y is a student means that everyone


in the set {Alice, Bob, Carol} is a student.

– Example 3: ∀x ∈ {a, b, c}, x ̸= d means that none of the elements in the


set {a, b, c} are d.

• ∃: Existential quantifier.

– Example 1: ∃x ∈ R, x = 2 means there exists a real number x such that


x = 2.

– Example 2: If 쁘띠유 is a graduate student in the classroom, then ∃x ∈


Classroom, x is a graduate student, meaning there exists at least one grad-
uate student in the classroom.

– Example 3: ∃x ∈ {1, 2, 3, 4}, x > 3 means there exists at least one element
in the set {1, 2, 3, 4} that is greater than 3.

Logical Connectives

• ∧: Logical AND. For example, P ∧ Q means that both P and Q are true; e.g.,
2 > 1 ∧ 3 > 2.

• ∨: Logical OR. For example, P ∨ Q means that either P or Q (or both) are
true; e.g., 2 > 3 ∨ 4 > 3.

• ¬: Logical NOT. For example, ¬P means that P is not true; e.g., if P is 2 > 3,
then ¬P is 2 ≤ 3.

3
• ⇐⇒ : Represents a bidirectional implication. For example, P ⇐⇒ Q means
that P is true if and only if Q is true.

• =⇒ : Denotes implication. For example, P =⇒ Q means that if P is true,


then Q must also be true.

Application of Logical Symbols

Below, I will provide some theorems about mathematical logic. You don’t have
to understand them all. However, you should remember all the theorems
and examples.Especially, example of Negation of Complete Statistics

Theorem 0.1 (Negation of a Universal Quantifier (∀))


The statement:
∀x ∈ R, P (x) (For all x, P (x) is true.)

Its negation is:


¬ (∀x ∈ R, P (x)) ≡ ∃x ∈ R, ¬P (x)

Example 0.1
Consider the statement: ”All real numbers are positive.”

∀x ∈ R, x > 0

Its negation is: ”There exists a real number that is not positive.”

∃x ∈ R, x ≤ 0

Theorem 0.2 (Negation of an Existential Quantifier (∃))


The statement:

∃x ∈ R, P (x) (There exists an x such that P (x) is true.)

4
Its negation is:
¬ (∃x ∈ R, P (x)) ≡ ∀x ∈ R, ¬P (x)
Example 0.2
Consider the statement: ”There exists a real number greater than 10.”

∃x ∈ R, x > 10

Its negation is: ”For all real numbers, none is greater than 10.”

∀x ∈ R, x ≤ 10

Theorem 0.3 (Negation of a Conditional (P =⇒ Q))


The statement:
P =⇒ Q (If P , then Q.)

Its negation is:


¬(P =⇒ Q) ≡ P ∧ ¬Q
Example 0.3
Consider the statement: ”If it rains, the ground is wet.”

Rain =⇒ Wet ground

Its negation is: ”It rains, but the ground is not wet.”

Rain ∧ ¬(Wet ground)

Theorem 0.4 (Combined: Universal Quantifier and Conditional)


The statement:
∀x ∈ R, P (x) =⇒ Q(x)

Its negation is:

¬ (∀x ∈ R, P (x) =⇒ Q(x)) ≡ ∃x ∈ R, P (x) ∧ ¬Q(x)

5
Example 0.4
Consider the statement: ”For all real numbers x, if x > 0, then x2 > 0.”

∀x ∈ R, (x > 0) =⇒ (x2 > 0)

Its negation is: ”There exists a real number x such that x > 0 and x2 ≤ 0.”

∃x ∈ R, (x > 0) ∧ (x2 ≤ 0)

Example 0.5 (Negation of Complete Statistics)


A statistic T (X) is complete for a family of distributions {f (x; θ) : θ ∈ Θ} if for
every function g, the condition:

E[g(T (X))] = 0 for all θ ∈ Θ

implies g(T (x)) = 0 .


Its negation is:

∃g :function, E[g(T (X))] = 0 for all θ ∈ Θ, but g(T (x)) ̸= 0.

Special Symbols

Numbers and Set Notations

• N: The set of natural numbers {1, 2, 3, . . . }.

• Z: The set of integers {. . . , −2, −1, 0, 1, 2, . . . }.

• Q: The set of rational numbers. For example, 1


2
∈ Q.


• R: The set of real numbers. For example, 2 ∈ R.

• R+ : Represents the set of positive real numbers. For example, R+ = {x ∈ R |


x > 0}.

6
• R− : Represents the set of negative real numbers. For example, R− = {x ∈ R |
x < 0}.

• C: The set of complex numbers. For example, 2 + 3i ∈ C.

• Intervals:

– Open Interval: An open interval (a, b) is the set of all real numbers x
such that a < x < b. The endpoints a and b are not included in the
interval.
(a, b) = {x ∈ R | a < x < b}.

Example: (1, 3) includes all real numbers x such that 1 < x < 3.

– Closed Interval: A closed interval [a, b] is the set of all real numbers x
such that a ≤ x ≤ b. The endpoints a and b are included in the interval.

[a, b] = {x ∈ R | a ≤ x ≤ b}.

Example: [1, 3] includes all real numbers x such that 1 ≤ x ≤ 3.

– Half-Open Interval:

∗ Left-Open Interval: A half-open interval (a, b] includes all real


numbers x such that a < x ≤ b. The left endpoint a is not included,
but the right endpoint b is included.

(a, b] = {x ∈ R | a < x ≤ b}.

Example: (1, 3] includes all real numbers x such that 1 < x ≤ 3.

∗ Right-Open Interval: A half-open interval [a, b) includes all real


numbers x such that a ≤ x < b. The left endpoint a is included, but
the right endpoint b is not included.

[a, b) = {x ∈ R | a ≤ x < b}.

7
Example: [1, 3) includes all real numbers x such that 1 ≤ x < 3.

• ×: Denotes the Cartesian product of two sets.

– {1, 2}×{3, 4}: The Cartesian product of the sets {1, 2} and {3, 4} consists
of all ordered pairs (x, y), where x ∈ {1, 2} and y ∈ {3, 4}:

{1, 2} × {3, 4} = {(1, 3), (1, 4), (2, 3), (2, 4)}.

4 (1, 4) (2, 4)
3 (1, 3) (2, 3)

x
1 2

– [1, 2] × [3, 4]: The Cartesian product of the intervals [1, 2] and [3, 4]
consists of all ordered pairs (x, y), where x ∈ [1, 2] and y ∈ [3, 4]:

[1, 2] × [3, 4] = {(x, y) | x ∈ [1, 2], y ∈ [3, 4]}.

4
3

x
1 2

• min and max:

– min: Denotes the minimum of a set or sequence. For example, min(3, 1, 4) =


1, meaning that the smallest value in the set {3, 1, 4} is 1. For ordered
data, the minimum value is often denoted as X(1) .

8
– max: Denotes the maximum of a set or sequence. For example, max(3, 1, 4) =
4, meaning that the largest value in the set {3, 1, 4} is 4. For ordered data,
the maximum value is often denoted as X(n) , where n is the total number
of elements in the set.

• e: Represents the base of the natural logarithm. For example, e ≈ 2.71828.

• log: Represents the natural logarithm. For example, log(e) = 1.

Limits and Directional Notations

• Left-hand Limit (limx→a− f (x)) and Equivalent Notations:

– limx→a− f (x)( f (a− )): The left-hand limit, where x approaches a from
values smaller than a (x < a).

– limx↑a f (x)(limx↗a f (x)): The left-hand limit with the added condition
that x approaches a in a monotonically increasing manner.

• Right-hand Limit (limx→a+ f (x)) and Equivalent Notations:

– limx→a+ f (x)(f (a+ )): The right-hand limit, where x approaches a from
values greater than a (x > a).

– limx↓a f (x)(limx↘a f (x)): The right-hand limit with the added condition
that x approaches a in a monotonically decreasing manner.

• Example Function: 
x2




 if x < 2,


f (x) =





5
 if x ≥ 2.

– limx→2− f (x) = f (2− ) = 4.

– limx→2+ f (x) = f (2+ ) = 5.

9
Remark
– f (2− ) can also be represented using the right-upper arrow notation (limx↗2 f (x))
since x < 2 and increasing towards 2.

– f (2+ ), however, cannot be represented using a right-lower arrow notation


(limx↘2 f (x)) because x > 2 is already implied in the definition of f (2+ ),
and x → 2 cannot occur through decreasing from the left for x > 2.

f (x)

f (x) = 5
f (2+ )

f (2− )

f (x) = x2

Figure: Visualization of the function f (x) with distinct left-hand and


right-hand limits at x = 2.

Point Set Representations and Similar Theorems

• Singleton Representation:


\ 1
{x} = (x − , x].
n=1
n

This representation shows that the singleton set containing x can be expressed
as the infinite intersection of nested intervals (x − n1 , x], where the intervals
shrink to the single point x.

10
• Closed Interval as Infinite Intersection:


\ 1 1
[a, b] = (a − , b + ).
n=1
n n

This representation shows that a closed interval [a, b] can be expressed as the
intersection of a sequence of open intervals that converge to [a, b].

• Open Interval as Infinite Union:


[ 1 1
(a, b) = [a + , b − ].
n=1
n n

The open interval (a, b) is expressed as the union of a sequence of closed


intervals, which expand to cover (a, b) as n → ∞.

• Single Point as Nested Closed Intervals:


\ 1 1
{x} = [x − , x + ].
n=1
n n

This representation shows that the singleton set containing x can also be
expressed as the intersection of nested closed intervals around x.

Statistical Symbols and Concepts

• ⊥( ⊥⊥): Denotes independence. For example, X ⊥ Y means that the random


variables X and Y are independent.

• ̸⊥: Denotes dependence (not independent). For example, X ̸⊥ Y means that


X and Y are not independent.

• ⊥⊥| Z: Denotes conditional independence. For example, X ⊥⊥ Y | Z means


that X and Y are independent given Z.

• i.i.d.: Independent and identically distributed. A set of random variables


X1 , X2 , . . . , Xn is said to be i.i.d. if they are independent of each other and

11
follow the same probability distribution.

• ∼: In statistics, ∼ means ”is distributed as.” Examples include:

– X ∼ Normal(µ, σ 2 ): The random variable X is normally distributed with


mean µ and variance σ 2 .

– X ∼ F (x): The random variable X follows the cumulative distribution


function F (x).

– X ∼ f (x): The random variable X follows the probability density func-


tion f (x).

• Mutually (Pairwise) Disjoint: A collection of subsets {A1 , A2 , . . . , An } is


called mutually disjoint if:

Ai ∩ Aj = ∅ for all i ̸= j.

– Example: Let A1 = {1, 2}, A2 = {3, 4}, A3 = {5, 6}. These subsets are
mutually disjoint because:

A1 ∩ A2 = ∅, A2 ∩ A3 = ∅, A1 ∩ A3 = ∅.

– Visualization:

A1 A2 A3

1, 2 3, 4 5, 6

• Partition: A collection of subsets {A1 , A2 , . . . , An } of a set S is called a par-


tition of S if:
A1 ∪ A2 ∪ · · · ∪ An = S

12
(The subsets cover the entire set S) and

Ai ∩ Aj = ∅ for all i ̸= j

(The subsets are mutually disjoint).

– Example: Let S = {1, 2, 3, 4, 5, 6}. A partition of S is A1 = {1, 2}, A2 =


{3, 4}, A3 = {5, 6}, since:

A1 ∪ A2 ∪ A3 = S and Ai ∩ Aj = ∅ for i ̸= j.

– Visualization:
S

A1 A2 A3

13
1 Indicator Function in Statistics
Definition 1.1 (Indicator Function)
The indicator function IA (x) of a set A is defined by



1 if x ∈ A,

IA (x) =

0 if x ∈
 / A.

Example 1.1 (Indicator for a Closed Interval)


Let A = [a, b]. Then


1
 if a ≤ x ≤ b,
I[a,b] (x) =

0
 otherwise.

Example 1.2 (Indicator for a PDF)


If X is uniform on [m, n], then

1
fX (x) = I[m,n] (x).
n−m

Example 1.3 (Piecewise Function via Indicator)


x 2

 0 < x < 3,
f (x) = =⇒ f (x) = x2 I(0,3) (x).

0
 otherwise.

Example 1.4 (Another Piecewise Function)


x 2



 0 < x < 3,


f (x) = x2 I(0,3) (x) + −x + 12 I[3,12) (x).

f (x) = − x + 12 3 ≤ x < 12, =⇒





0
 otherwise.

Theorem 1.1 (Properties of the Indicator Function)


For sets A, B and random variable X:

14
1. IA∪B (x) = IA (x) + IB (x) − IA∩B (x).

2. IAc (x) = 1 − IA (x).

3. IA∩B (x) = IA (x) IB (x).

Example 1.5 (Indicator of Observations)


Let X1 , . . . , Xn be random variables. Then

n
Y  

I(Xi < a). = I X1 < a, . . . , Xn < a = I max{X1 , . . . , Xn } < a
i=1

n
Y  

I(Xi > a) = I X1 > a, . . . , Xn > a = I min{X1 , . . . , Xn } > a
i=1

Remark
The notation I(X1 < a, . . . , Xn < a) represents the indicator function for the inter-
section of events:

n
\ 
I(X1 < a, . . . , Xn < a) = I {Xi < a} .
i=1

Similarly, I(X1 > a, . . . , Xn > a) represents:

n
\ 
I(X1 > a, . . . , Xn > a) = I {Xi > a} .
i=1

Theorem 1.2 (Derivative with an Indicator)


If f is differentiable and IA is the indicator of A, then for x ∈
/ ∂A,

d
f (x) IA (x) = f ′ (x) IA (x).

dx

Remark (Boundary Effects)


At x ∈ ∂A, the derivative may involve terms like δ∂A (x).

15
Theorem 1.3 (Integral with an Indicator)
If f is integrable, then

Z Z
f (x) IA (x) dx = f (x) dx.
C C∩A

Example 1.6 (Integral Over Intervals)


for any integrable function f (x),


Z b
R min(b,d) f (x) dx, if they overlap,


max(a,c)
f (x) I(c,d) (x) dx =
a 
0,
 otherwise.

Theorem 1.4 (Bernoulli Distribution of an Indicator)


Let X be a random variable and A an event. The indicator variable IA (X) is defined
as: 

1 if X ∈ A,

IA (X) =

0 otherwise.

IA (X) follows a Bernoulli distribution with parameter p = P (X ∈ A). Its expectation


is:
E[IA (X)] = p.

Example 1.7
Consider a coin flip experiment where N represents the number of heads in one
repetition, and A denotes the event that the coin lands heads. N is defined as:


1 if the coin lands heads,

N=

0 if the coin lands tails.

If the probability of heads is p = 0.5, then N follows a Bernoulli distribution with


parameter p = 0.5, and its expectation is:

E[N ] = 0.5.

16
Theorem 1.5 (Counting Elements Having Property P via Indicators)
Let X1 , X2 , . . . , Xn be random variables taking values in some set X , and let P ⊆ X
represent a certain property (or subset) of interest.
Define
n
X
S= IP (Xi ),
i=1

where 

1 if Xi ∈ P,

IP (Xi ) =

0 otherwise.

Then S is a random variable that counts how many of the Xi ’s belong to P . In


particular,

n
X  Xn n
X
E[S] = E IP (Xi ) = E[IP (Xi )] = P (Xi ∈ P ).
i=1 i=1 i=1

Example 1.8
Suppose we have n coin flips, denoted by X1 , X2 , . . . , Xn . Let the property P be the
event {coin lands heads}. Define

n
X
S= I{heads} (Xi ).
i=1

Then S is the total number of heads in n coin flips. If each coin flip has probability
p of landing heads (independently), then

n
X n
X n
X
E[S] = E[I{heads} (Xi )] = P (heads) = p = np.
i=1 i=1 i=1

Thus, S follows a Binomial distribution with parameters (n, p), and its expectation
is np.

17
2 Product Notation (Π)
Definition 2.1 (Product Notation)
For a1 , . . . , an ,
n
Y
ai = a1 a2 · · · an .
i=1

If n = 0, the product is 1 (by convention).


Q
Theorem 2.1 (Properties of )
1. ni=1 c = cn .
Q

Qn P
2. i=1 bei = b ei
.

Qn Q Q 
n n
3. i=1 (ai bi ) = i=1 ai i=1 bi .

Example 2.1 (Product of Powers)


Y3
2i = 21+2+3 = 26 = 64.
i=1

Example 2.2 (Product of Constants)


Y3
2 = 23 = 8.
i=1

Example 2.3 (Likelihood with Normal Density)


Let X1 , . . . , Xn be i.i.d. N (µ, σ 2 ). Then

1  2

f (x) = √ exp − (x−µ)
2σ 2
,
2π σ 2

and
1 n  Pn
(X −µ)2

L(µ, σ 2 ) = √ exp − i=12σ2i .
2π σ 2

18
3 Integration and Derivatives of Vectors

3.1 Integration: Fubini’s Theorem


Definition 3.1 (Double Integral)
For a function f (x, y) integrable on [a, b] × [c, d],

Z bZ d Z d Z b
f (x, y) dy dx = f (x, y) dx dy.
a c c a

Example 3.1 (Double Integral Computation)


Let f (x, y) = x + y on [0, 1] × [0, 2]. Then

Z 1 Z 2 Z 1h iy=2 Z 1
y2
(x + y) dy dx = xy + 2
dx = (2x + 2) dx = 3.
0 0 0 y=0 0

Remark
We can calculate double integral easily by chaing the order of x and y.

Example 3.2 (Joint Distribution)


Let the joint density function be:

fX,Y (x, y) = λ2 xe−λx(1+y) , x ≥ 0, y ≥ 0, λ > 0

Verify that:
Z ∞ Z ∞
fX,Y (x, y) dy dx = 1.
0 0

Proof ) Substitute fX,Y (x, y):

Z ∞ Z ∞
λ2 xe−λx(1+y) dy dx
0 0

Factor e−λx , which depends only on x:

Z ∞ Z ∞ Z ∞ Z ∞ 
2 −λx(1+y) 2 −λx −λxy
λ xe dy dx = λ xe e dy dx
0 0 0 0

19
The inner integral is:
Z ∞
e−λxy dy
0

du
Perform a substitution: let u = λxy, so du = λx dy, or equivalently dy = λx
.
Update the limits:

• When y = 0, u = 0,

• When y → ∞, u → ∞.

The inner integral becomes:

Z ∞ Z ∞ Z ∞
−λxy −u 1 1
e dy = e du = e−u du
0 0 λx λx 0

The integral of e−u over [0, ∞) is 1, so:

Z ∞
1
e−λxy dy =
0 λx

Substitute this back into the double integral:

Z ∞ Z ∞ Z ∞
−λx(1+y) 1
2
λ xe dy dx = λ2 xe−λx · dx
0 0 0 λx

Simplify:
Z ∞ Z ∞ Z ∞
fX,Y (x, y) dy dx = λe−λx dx
0 0 0

Now compute the outer integral:

Z ∞ ∞
λe−λx dx = −e−λx 0 = 1

0

Thus, the double integral equals 1, and fX,Y (x, y) is a valid joint probability den-
sity function.

20
3.2 Derivatives of Vectors
Definition 3.2 (Vectors)
A vector x in Rn is an ordered n-tuple of real numbers:

x = (x1 , x2 , . . . , xn ).

We often write it in column form:

 
x
 1
 
x 
 2
x=
 . .

 .. 
 
 
xn

Remark
In many contexts, especially in calculus and linear algebra, an n-dimensional vector
is conventionally treated as a column vector (i.e., written vertically).

Definition 3.3 (Gradient and Jacobian)


1. Gradient: Let g : Rn → R. Its gradient is the column vector of partial deriva-
tives:
∂g
 
 ∂x1 
 ∂g 
 
∇g(x) =  .
 ∂x2 
 ∂g 
∂xn

2. Jacobian Matrix: For a function y : Rn → Rm with components y =


(y1 , . . . , ym ), its Jacobian matrix is the m × n matrix of partial derivatives:

 
∂y1 ∂y1
 ∂x1 · · · ∂xn 
 
dy  .
.. ... .. 
Jy (x) = =  .
. 
dx 
 
 ∂ym ∂ym 
···
∂x1 ∂xn

21
Remark
The term “Jacobian” may refer either to the Jacobian matrix or the Jacobian de-
terminant. In this text, unless otherwise specified, we use “Jacobian” to mean the
Jacobian matrix.
Definition 3.4 (Hessian Matrix)
Let g : Rn → R be twice differentiable. Then the Hessian matrix of g, denoted by
∇2 g(x) or Hg (x), is defined as the matrix of all second-order partial derivatives:

∂ 2g ∂ 2g ∂ 2g
 
 ∂x2 ···
 1 ∂x1 ∂x2 ∂x1 ∂xn 

 ∂ 2g ∂ 2g ∂ 2g 
···
 
∂x ∂x ∂x22 ∂x2 ∂xn  .
2
 
∇ g(x) = 
 2. 1
.. .. .. .. 

 . . . 

 
 ∂ 2g ∂ 2g ∂ 2g 
···
∂xn ∂x1 ∂xn ∂x2 ∂x2n

Remark
Whereas the gradient is an n-dimensional column vector, the Hessian is an n × n
matrix. It captures how the gradient itself changes with respect to x, thus providing
curvature information of the function.

Example 3.3 (Elementary Jacobian Example)



Let x = (x1 , x2 )T and define y(x) = x21 + x2 , sin x2 . Then

y1 = x21 + x2 , y2 = sin(x2 ).

Hence    
∂(x21 +x2 ) ∂(x21 +x2 )
∂y  ∂x1 ∂x2 2x1 1 
= = .

∂x ∂(sin x2 ) ∂(sin x2 )
0 cos x2
∂x1 ∂x2

Example 3.4 (Statistical Jacobian: Transforming Parameters)


Consider a bivariate normal with parameters (µ, σ) and let θ1 = µ, θ2 = log σ. Then

22
 
σ = eθ2 . The Jacobian of µ, σ w.r.t. θ1 , θ2 is

   
∂µ ∂µ
 ∂θ1 ∂θ2 1 0 
 = .
∂σ ∂σ θ2
∂θ1 ∂θ
0 e
2

Example 3.5 (2D Hessian Example)


Consider the function g : R2 → R given by

g(x1 , x2 ) = x21 + 3 x1 x2 + 2 x22 .

• Gradient (1st-order partials):

∂g

  
2x + 3x2
 1  1
∇g(x1 , x2 ) =  ∂x = .

∂g  
∂x2 3x1 + 4x2

• Hessian (2nd-order partials):

∂ 2g ∂ 2g
   
 ∂x2 ∂x1 ∂x2  2 3
∇2 g(x1 , x2 ) = 
 ∂ 2g
1
2
= .
∂ g  
3 4
∂x2 ∂x1 ∂x22

Notice the Hessian is a constant 2 × 2 matrix in this example. To determine whether


a critical point is a local minimum, maximum, or saddle, one often checks whether
the Hessian is positive definite, negative definite, or indefinite at that point.

Example 3.6 (MLE Gradient)


Consider i.i.d. X1 , . . . , Xn ∼ N (µ, σ 2 ). Let θ = (µ, σ 2 ). The log-likelihood is

n
1 X
ℓ(θ) = ℓ(µ, σ 2 ) = − n2 ln(2π) − n
2
ln(σ 2 ) − (Xi − µ)2 .
2σ 2 i=1

23
∂ℓ
, ∂ℓ

We use the gradient ∇ℓ(θ) = ∂µ ∂σ 2
. Then

n n
∂ℓ 1 X ∂ℓ n 1 X
= 2 (Xi − µ), = − + (Xi − µ)2 .
∂µ σ i=1 ∂σ 2 2σ 2 2(σ 2 )2 i=1

Setting these derivatives to zero,

n n
∂ℓ X 1X
= 0 =⇒ (Xi − µ) = 0 =⇒ µ = Xi ,
∂µ i=1
n i=1

n n
∂ℓ n 1 X
2 2 1 X
= 0 =⇒ − 2 + (Xi − µ) = 0 =⇒ σ = (Xi − µ)2 .
∂σ 2 2σ 2(σ 2 )2 i=1 n i=1

Thus the MLE is µ b2 as above.


b and σ

24
4 Taylor Series and Optimization

4.1 Taylor Series and Maclaurin Serires


Theorem 4.1 (Taylor Series)
Let f (x) be a function that is infinitely differentiable at a point x0 .
I. Original Series:


X f (n) (x0 )
f (x) = (x − x0 )n ,
n=0
n!

where f (n) (x0 ) denotes the n-th derivative of f (x) evaluated at x = x0 .


II. Truncated Series with Remainder: If the series is truncated at the N -th
term, the function can be exactly expressed as:

N
X f (n) (x0 )
f (x) = (x − x0 )n + RN (x),
n=0
n!

where the remainder term RN (x) is:

f (N +1) (x∗ )
RN (x) = (x − x0 )N +1 ,
(N + 1)!

for some x∗ in the interval between x0 and x.

Theorem 4.2 (Maclaurin Series)


The Maclaurin series is a special case of the Taylor series where x0 = 0.
I. Original Series:

X f (n) (0)
f (x) = xn .
n=0
n!

II. Truncated Series with Remainder: If the series is truncated at the N -th
term, the function can be exactly expressed as:

N
X f (n) (0)
f (x) = xn + RN (x),
n=0
n!

25
where the remainder term RN (x) is:

f (N +1) (x∗ ) N +1
RN (x) = x ,
(N + 1)!

for some x∗ in the interval between 0 and x.


Remark (Comparison of Taylor and Maclaurin Series)
The Taylor series provides a local approximation of a function around any point x0 ,
while the Maclaurin series is a special case that approximates around x0 = 0. Both
series can be expressed as a truncated sum with a remainder term RN (x), which
quantifies the error in the approximation.
Corollary 1 (Examples of Maclaurin Series)
The following are the Maclaurin series expansions for common functions:

1. ex :

X xn x2 x3
ex = =1+x+ + + ···
n=0
n! 2! 3!

1
2. 1−x
(for |x| < 1):


1 X
= xn = 1 + x + x2 + x3 + · · ·
1 − x n=0

1
3. 1+x
(for |x| < 1):


1 X
= (−1)n xn = 1 − x + x2 − x3 + · · ·
1 + x n=0

4. − ln(1 − x) (for |x| < 1):


X xn x2 x3
− ln(1 − x) = =x+ + + ···
n=1
n 2 3

5. ln(1 + x) (for |x| < 1):


X xn x2 x3
ln(1 + x) = (−1)n+1 =x− + − ···
n=1
n 2 3

26
6. sin x:

X x2n+1 x3 x5
sin x = (−1)n =x− + − ···
n=0
(2n + 1)! 3! 5!

7. cos x:

X x2n x2 x4n
cos x = (−1) =1− + − ···
n=0
(2n)! 2! 4!

Remark (Moments)
The k-th moment of a random variable X is defined as:

µk = E[X k ],

where k = 1, 2, 3, . . .. The first moment is the mean (µ1 = E[X]), and the second
central moment is the variance (E[(X − E[X])2 ]).

Example 4.1 (Approximating an Integral Using Maclaurin Series)


Evaluate the integral:
Z 0.5
2
I= e−x dx
0

2
using the Maclaurin series for e−x .
Step 1: Write the Maclaurin series for ex . The Maclaurin series for ex
is:

x
X xn
e = .
n=0
n!

Substitute −x2 for x:

∞ ∞
−x2
X (−x2 )n X (−1)n x2n
e = = .
n=0
n! n=0
n!

Step 2: Approximate the integral by truncating the series. Truncate


the series to the first three terms:

2 x4
e−x ≈ 1 − x2 + .
2

27
Step 3: Integrate term by term.

0.5
x4
Z  
2
I≈ 1−x + dx.
0 2

Compute each term:


Z 0.5
1 dx = [x]00.5 = 0.5,
0

0.5 0.5
x3 (0.5)3
Z 
2 0.125
x dx = = = ≈ 0.04167,
0 3 0 3 3
0.5  5 0.5
x4 (0.5)5
Z
x 0.03125
dx = = = = 0.003125.
0 2 10 0 10 10

Add these results:

I ≈ 0.5 − 0.04167 + 0.003125 = 0.461455.

Step 4: Compare with the exact value. Using numerical integration, the
R 0.5 2
exact value of 0 e−x dx is approximately 0.46128, showing the Maclaurin approx-
imation is highly accurate.

Example 4.2 (Finding a Root Using Maclaurin Series)


Approximate a root of the equation:

sin x = 0.5

using the Maclaurin series for sin x.


Step 1: Write the Maclaurin series for sin x.


X x2n+1 x3 x5
sin x = (−1)n =x− + − ···
n=0
(2n + 1)! 3! 5!

Step 2: Set sin x ≈ 0.5. Truncate the series to the first two terms:

x3
sin x ≈ x − .
6

28
Solve:
x3
x− = 0.5.
6

Step 3: Solve iteratively. Let x0 = 0.5 (initial guess). Substitute into the
equation:
(0.5)3
x1 = 0.5 + = 0.52083.
6

Repeat:
(0.52083)3
x2 = 0.52083 − = 0.52333.
6

After two iterations, the root is approximately:

x ≈ 0.52333.

Step 4: Compare with the exact value. The exact value of the root is
π
x = arcsin(0.5) = 6
≈ 0.523598, showing that the Maclaurin approximation is
highly accurate.

Example 4.3 (Relation Between MGF and Moments)


The k-th moment of a random variable X can be obtained by differentiating its MGF
MX (t) k-times and evaluating the result at t = 0:

dk
µk = MX (t) .
dtk t=0

Proof Using Maclaurin Series (Corollary 1):


Let MX (t) = E[etX ]. Expanding etX using its Maclaurin series:


X (tX)n
etX = .
n=0
n!

Taking the expectation:


(tX)n
X 
 tX 
MX (t) = E e =E .
n=0
n!

29
By linearity of expectation:


X tn
MX (t) = E[X n ].
n=0
n!

Thus, the MGF can be expressed as:


X µn t n
MX (t) = ,
n=0
n!

where µn = E[X n ] is the n-th moment of X.


To find the k-th moment:

dk
k
µk = E[X ] = k MX (t) .
dt t=0

Differentiating MX (t) k-times:


dk dk X µn tn
MX (t) = .
dtk dtk n=0 n!

By the properties of differentiation:

∞ ∞
dk X µn tn X µn
= · n(n − 1) · · · (n − k + 1) · tn−k .
dtk n=0 n! n=k
n!

At t = 0, all terms vanish except for n = k, leaving:

dk µk
MX (t) = · k! = µk .
dtk t=0 k!

Conclusion: The k-th moment µk is obtained by differentiating the MGF MX (t)


k-times and setting t = 0:
dk
µk = MX (t) .
dtk t=0

30
4.2 Optimization
Definition 4.1 (Critical Point)
A point x∗ in the domain of a differentiable function f (x) is called a critical point
if:
∇f (x∗ ) = 0.

Alternatively, x∗ is a critical point if f ′ (x∗ ) = 0 (in one dimension).


Remark
Critical points are candidates for local minima, local maxima, or saddle points. Fur-
ther analysis, such as the first or second derivative test, is required to classify them.
Definition 4.2 (Convex Function)
A function f (x) is called convex on a domain D ⊆ Rn if, for all x, y ∈ D and
λ ∈ [0, 1], it satisfies:

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).

Theorem 4.3 (Second Derivative Condition for Convexity)


Let f (x) be twice differentiable on D ⊆ Rn .

1. In one dimension (n = 1): f (x) is convex if and only if:

f ′′ (x) ≥ 0 for all x ∈ D.

2. In higher dimensions (n > 1): f (x) is convex if and only if the Hessian matrix
∇2 f (x) is positive semidefinite for all x ∈ D, i.e.:

z ⊤ ∇2 f (x)z ≥ 0 for all z ∈ Rn .

Definition 4.3 (Strictly Convex Function)


A function f (x) is called strictly convex on a domain D ⊆ Rn if, for all x, y ∈ D
with x ̸= y and λ ∈ (0, 1), it satisfies:

f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y).

31
Theorem 4.4 (Second Derivative Condition for Strict Convexity)
Let f (x) be twice differentiable on D ⊆ Rn .

1. In one dimension (n = 1): f (x) is strictly convex if and only if:

f ′′ (x) > 0 for all x ∈ D.

2. In higher dimensions (n > 1): f (x) is strictly convex if and only if the Hessian
matrix ∇2 f (x) is positive definite for all x ∈ D, i.e.:

z ⊤ ∇2 f (x)z > 0 for all nonzero z ∈ Rn .

Remark
The second derivative (or Hessian matrix) provides a powerful tool for determin-
ing the convexity or strict convexity of a function. For scalar functions, convexity
corresponds to f ′′ (x) ≥ 0, while strict convexity corresponds to f ′′ (x) > 0.
Theorem 4.5 (Convex Function)
Let f (x) be a differentiable convex function. If ∇f (x∗ ) = 0, then x∗ is a global
minimum of f (x).
Remark
For convex functions, the graph is ”open downward,” and any stationary point where
∇f (x) = 0 is guaranteed to be a global minimum.
Theorem 4.6 (Strictly Convex Function)
Let f (x) be a differentiable strictly convex function. If ∇f (x∗ ) = 0, then x∗ is the
unique global minimum of f (x).
Remark
Strict convexity ensures that the function’s graph is ”sharply open downward,” mak-
ing the global minimum unique.
Theorem 4.7 (Coercivity)
Let f (x) be a coercive function, meaning f (x) → ∞ as ∥x∥ → ∞. Then, f (x) has
at least one global minimum.

32
Remark
Coercivity guarantees the existence of a global minimum even when the domain is
unbounded. It describes functions that ”grow large enough” at the boundaries.

Theorem 4.8 (Global Minimum via First Derivative)


Let f (x) be a differentiable function on R. Suppose f ′ (x∗ ) = 0 at some critical point
x∗ . Additionally, assume that:

1. f ′ (x) < 0 for all x < x∗ , and

2. f ′ (x) > 0 for all x > x∗ .

Then x∗ is the unique global minimum of f (x).

Remark
This theorem allows for identifying global minima without requiring the function to
be convex. However, it only holds on one dimensional euclidian space R

Example 4.4 (Application of Global Minimum via First Derivative)


Consider the function f (x) = x4 − 4x2 + 3.

1. Find critical points:

f ′ (x) = 4x3 − 8x = 4x(x2 − 2).


Critical points: x = 0, ± 2.
√ √
2. Analyze f ′ (x): - For x < − 2, f ′ (x) > 0. - For − 2 < x < 0, f ′ (x) < 0. -
√ √
For 0 < x < 2, f ′ (x) < 0. - For x > 2, f ′ (x) > 0.

3. Apply the theorem: - At x = ± 2, f ′ (x) changes sign from negative to
positive, indicating local minima. - At x = 0, f ′ (x) does not change sign, so
x = 0 is not a local extremum.

4. Evaluate f (x):

√ √
f (− 2) = f ( 2) = −3, f (0) = 3.

33
√ √
5. Conclusion: Since f (x) → ∞ as x → ±∞, and f (− 2) = f ( 2) = −3, the

points x = ± 2 are global minima.

Summary Table: Convexity and Global Minimum

Type Unique Key Features

Convex with Critical Point No ∇f (x∗ ) = 0 =⇒ global min

Convex with Coercive No f (x) → ∞ as ∥x∥ → ∞

Strict Convex with Critical Point Yes ∇f (x∗ ) = 0 =⇒ unique global min

First Order Condition Yes f ′ (x) and behavior at infinity

34

You might also like