Basic Calculation in Statistics Ver2.1
Basic Calculation in Statistics Ver2.1
쁘띠유
2025 01 02
1
Preliminary: Mathematical Notation
This section introduces the fundamental mathematical notations and their meanings.
These notations will be used throughout the lecture notes to ensure clarity and
precision in presenting mathematical ideas.
Defining Symbol
• :=: Denotes definition using a colon and equal sign. For example, a := b means
that a is defined to be equal to b.
def
• =: Denotes definition explicitly. For example, in the context of a theorem,
def
f (x) = x2 means that f (x) is defined as x2 .
def
Commonality: Both :=, =, and ≡ can be used to express definitions in mathe-
matics.
Difference:
• :=: Most commonly used to introduce new symbols or concepts. For example,
g(x) := sin(x) + cos(x).
def
• =: Explicitly highlights that the relation is a formal definition, such as in
def
f (x) = x2 .
2
Logical Statements
Quantifiers
• ∀: Universal quantifier.
– Example 1: ∀x ∈ {1, 2, 3}, x > 0 means that all elements in the set
{1, 2, 3} are greater than 0.
• ∃: Existential quantifier.
– Example 3: ∃x ∈ {1, 2, 3, 4}, x > 3 means there exists at least one element
in the set {1, 2, 3, 4} that is greater than 3.
Logical Connectives
• ∧: Logical AND. For example, P ∧ Q means that both P and Q are true; e.g.,
2 > 1 ∧ 3 > 2.
• ∨: Logical OR. For example, P ∨ Q means that either P or Q (or both) are
true; e.g., 2 > 3 ∨ 4 > 3.
• ¬: Logical NOT. For example, ¬P means that P is not true; e.g., if P is 2 > 3,
then ¬P is 2 ≤ 3.
3
• ⇐⇒ : Represents a bidirectional implication. For example, P ⇐⇒ Q means
that P is true if and only if Q is true.
Below, I will provide some theorems about mathematical logic. You don’t have
to understand them all. However, you should remember all the theorems
and examples.Especially, example of Negation of Complete Statistics
Example 0.1
Consider the statement: ”All real numbers are positive.”
∀x ∈ R, x > 0
Its negation is: ”There exists a real number that is not positive.”
∃x ∈ R, x ≤ 0
4
Its negation is:
¬ (∃x ∈ R, P (x)) ≡ ∀x ∈ R, ¬P (x)
Example 0.2
Consider the statement: ”There exists a real number greater than 10.”
∃x ∈ R, x > 10
Its negation is: ”For all real numbers, none is greater than 10.”
∀x ∈ R, x ≤ 10
Its negation is: ”It rains, but the ground is not wet.”
5
Example 0.4
Consider the statement: ”For all real numbers x, if x > 0, then x2 > 0.”
Its negation is: ”There exists a real number x such that x > 0 and x2 ≤ 0.”
∃x ∈ R, (x > 0) ∧ (x2 ≤ 0)
Special Symbols
√
• R: The set of real numbers. For example, 2 ∈ R.
6
• R− : Represents the set of negative real numbers. For example, R− = {x ∈ R |
x < 0}.
• Intervals:
– Open Interval: An open interval (a, b) is the set of all real numbers x
such that a < x < b. The endpoints a and b are not included in the
interval.
(a, b) = {x ∈ R | a < x < b}.
Example: (1, 3) includes all real numbers x such that 1 < x < 3.
– Closed Interval: A closed interval [a, b] is the set of all real numbers x
such that a ≤ x ≤ b. The endpoints a and b are included in the interval.
[a, b] = {x ∈ R | a ≤ x ≤ b}.
– Half-Open Interval:
7
Example: [1, 3) includes all real numbers x such that 1 ≤ x < 3.
– {1, 2}×{3, 4}: The Cartesian product of the sets {1, 2} and {3, 4} consists
of all ordered pairs (x, y), where x ∈ {1, 2} and y ∈ {3, 4}:
{1, 2} × {3, 4} = {(1, 3), (1, 4), (2, 3), (2, 4)}.
4 (1, 4) (2, 4)
3 (1, 3) (2, 3)
x
1 2
– [1, 2] × [3, 4]: The Cartesian product of the intervals [1, 2] and [3, 4]
consists of all ordered pairs (x, y), where x ∈ [1, 2] and y ∈ [3, 4]:
4
3
x
1 2
8
– max: Denotes the maximum of a set or sequence. For example, max(3, 1, 4) =
4, meaning that the largest value in the set {3, 1, 4} is 4. For ordered data,
the maximum value is often denoted as X(n) , where n is the total number
of elements in the set.
– limx→a− f (x)( f (a− )): The left-hand limit, where x approaches a from
values smaller than a (x < a).
– limx↑a f (x)(limx↗a f (x)): The left-hand limit with the added condition
that x approaches a in a monotonically increasing manner.
– limx→a+ f (x)(f (a+ )): The right-hand limit, where x approaches a from
values greater than a (x > a).
– limx↓a f (x)(limx↘a f (x)): The right-hand limit with the added condition
that x approaches a in a monotonically decreasing manner.
• Example Function:
x2
if x < 2,
f (x) =
5
if x ≥ 2.
9
Remark
– f (2− ) can also be represented using the right-upper arrow notation (limx↗2 f (x))
since x < 2 and increasing towards 2.
f (x)
f (x) = 5
f (2+ )
f (2− )
f (x) = x2
• Singleton Representation:
∞
\ 1
{x} = (x − , x].
n=1
n
This representation shows that the singleton set containing x can be expressed
as the infinite intersection of nested intervals (x − n1 , x], where the intervals
shrink to the single point x.
10
• Closed Interval as Infinite Intersection:
∞
\ 1 1
[a, b] = (a − , b + ).
n=1
n n
This representation shows that a closed interval [a, b] can be expressed as the
intersection of a sequence of open intervals that converge to [a, b].
∞
[ 1 1
(a, b) = [a + , b − ].
n=1
n n
∞
\ 1 1
{x} = [x − , x + ].
n=1
n n
This representation shows that the singleton set containing x can also be
expressed as the intersection of nested closed intervals around x.
11
follow the same probability distribution.
Ai ∩ Aj = ∅ for all i ̸= j.
– Example: Let A1 = {1, 2}, A2 = {3, 4}, A3 = {5, 6}. These subsets are
mutually disjoint because:
A1 ∩ A2 = ∅, A2 ∩ A3 = ∅, A1 ∩ A3 = ∅.
– Visualization:
A1 A2 A3
1, 2 3, 4 5, 6
12
(The subsets cover the entire set S) and
Ai ∩ Aj = ∅ for all i ̸= j
A1 ∪ A2 ∪ A3 = S and Ai ∩ Aj = ∅ for i ̸= j.
– Visualization:
S
A1 A2 A3
13
1 Indicator Function in Statistics
Definition 1.1 (Indicator Function)
The indicator function IA (x) of a set A is defined by
1 if x ∈ A,
IA (x) =
0 if x ∈
/ A.
1
fX (x) = I[m,n] (x).
n−m
x 2
0 < x < 3,
f (x) = =⇒ f (x) = x2 I(0,3) (x).
0
otherwise.
x 2
0 < x < 3,
f (x) = x2 I(0,3) (x) + −x + 12 I[3,12) (x).
f (x) = − x + 12 3 ≤ x < 12, =⇒
0
otherwise.
14
1. IA∪B (x) = IA (x) + IB (x) − IA∩B (x).
n
Y
I(Xi < a). = I X1 < a, . . . , Xn < a = I max{X1 , . . . , Xn } < a
i=1
n
Y
I(Xi > a) = I X1 > a, . . . , Xn > a = I min{X1 , . . . , Xn } > a
i=1
Remark
The notation I(X1 < a, . . . , Xn < a) represents the indicator function for the inter-
section of events:
n
\
I(X1 < a, . . . , Xn < a) = I {Xi < a} .
i=1
n
\
I(X1 > a, . . . , Xn > a) = I {Xi > a} .
i=1
d
f (x) IA (x) = f ′ (x) IA (x).
dx
15
Theorem 1.3 (Integral with an Indicator)
If f is integrable, then
Z Z
f (x) IA (x) dx = f (x) dx.
C C∩A
Z b
R min(b,d) f (x) dx, if they overlap,
max(a,c)
f (x) I(c,d) (x) dx =
a
0,
otherwise.
Example 1.7
Consider a coin flip experiment where N represents the number of heads in one
repetition, and A denotes the event that the coin lands heads. N is defined as:
1 if the coin lands heads,
N=
0 if the coin lands tails.
E[N ] = 0.5.
16
Theorem 1.5 (Counting Elements Having Property P via Indicators)
Let X1 , X2 , . . . , Xn be random variables taking values in some set X , and let P ⊆ X
represent a certain property (or subset) of interest.
Define
n
X
S= IP (Xi ),
i=1
where
1 if Xi ∈ P,
IP (Xi ) =
0 otherwise.
n
X Xn n
X
E[S] = E IP (Xi ) = E[IP (Xi )] = P (Xi ∈ P ).
i=1 i=1 i=1
Example 1.8
Suppose we have n coin flips, denoted by X1 , X2 , . . . , Xn . Let the property P be the
event {coin lands heads}. Define
n
X
S= I{heads} (Xi ).
i=1
Then S is the total number of heads in n coin flips. If each coin flip has probability
p of landing heads (independently), then
n
X n
X n
X
E[S] = E[I{heads} (Xi )] = P (heads) = p = np.
i=1 i=1 i=1
Thus, S follows a Binomial distribution with parameters (n, p), and its expectation
is np.
17
2 Product Notation (Π)
Definition 2.1 (Product Notation)
For a1 , . . . , an ,
n
Y
ai = a1 a2 · · · an .
i=1
Qn P
2. i=1 bei = b ei
.
Qn Q Q
n n
3. i=1 (ai bi ) = i=1 ai i=1 bi .
1 2
f (x) = √ exp − (x−µ)
2σ 2
,
2π σ 2
and
1 n Pn
(X −µ)2
L(µ, σ 2 ) = √ exp − i=12σ2i .
2π σ 2
18
3 Integration and Derivatives of Vectors
Z bZ d Z d Z b
f (x, y) dy dx = f (x, y) dx dy.
a c c a
Z 1 Z 2 Z 1h iy=2 Z 1
y2
(x + y) dy dx = xy + 2
dx = (2x + 2) dx = 3.
0 0 0 y=0 0
Remark
We can calculate double integral easily by chaing the order of x and y.
Verify that:
Z ∞ Z ∞
fX,Y (x, y) dy dx = 1.
0 0
Z ∞ Z ∞
λ2 xe−λx(1+y) dy dx
0 0
Z ∞ Z ∞ Z ∞ Z ∞
2 −λx(1+y) 2 −λx −λxy
λ xe dy dx = λ xe e dy dx
0 0 0 0
19
The inner integral is:
Z ∞
e−λxy dy
0
du
Perform a substitution: let u = λxy, so du = λx dy, or equivalently dy = λx
.
Update the limits:
• When y = 0, u = 0,
• When y → ∞, u → ∞.
Z ∞ Z ∞ Z ∞
−λxy −u 1 1
e dy = e du = e−u du
0 0 λx λx 0
Z ∞
1
e−λxy dy =
0 λx
Z ∞ Z ∞ Z ∞
−λx(1+y) 1
2
λ xe dy dx = λ2 xe−λx · dx
0 0 0 λx
Simplify:
Z ∞ Z ∞ Z ∞
fX,Y (x, y) dy dx = λe−λx dx
0 0 0
Z ∞ ∞
λe−λx dx = −e−λx 0 = 1
0
Thus, the double integral equals 1, and fX,Y (x, y) is a valid joint probability den-
sity function.
20
3.2 Derivatives of Vectors
Definition 3.2 (Vectors)
A vector x in Rn is an ordered n-tuple of real numbers:
x = (x1 , x2 , . . . , xn ).
x
1
x
2
x=
. .
..
xn
Remark
In many contexts, especially in calculus and linear algebra, an n-dimensional vector
is conventionally treated as a column vector (i.e., written vertically).
∂y1 ∂y1
∂x1 · · · ∂xn
dy .
.. ... ..
Jy (x) = = .
.
dx
∂ym ∂ym
···
∂x1 ∂xn
21
Remark
The term “Jacobian” may refer either to the Jacobian matrix or the Jacobian de-
terminant. In this text, unless otherwise specified, we use “Jacobian” to mean the
Jacobian matrix.
Definition 3.4 (Hessian Matrix)
Let g : Rn → R be twice differentiable. Then the Hessian matrix of g, denoted by
∇2 g(x) or Hg (x), is defined as the matrix of all second-order partial derivatives:
∂ 2g ∂ 2g ∂ 2g
∂x2 ···
1 ∂x1 ∂x2 ∂x1 ∂xn
∂ 2g ∂ 2g ∂ 2g
···
∂x ∂x ∂x22 ∂x2 ∂xn .
2
∇ g(x) =
2. 1
.. .. .. ..
. . .
∂ 2g ∂ 2g ∂ 2g
···
∂xn ∂x1 ∂xn ∂x2 ∂x2n
Remark
Whereas the gradient is an n-dimensional column vector, the Hessian is an n × n
matrix. It captures how the gradient itself changes with respect to x, thus providing
curvature information of the function.
y1 = x21 + x2 , y2 = sin(x2 ).
Hence
∂(x21 +x2 ) ∂(x21 +x2 )
∂y ∂x1 ∂x2 2x1 1
= = .
∂x ∂(sin x2 ) ∂(sin x2 )
0 cos x2
∂x1 ∂x2
22
σ = eθ2 . The Jacobian of µ, σ w.r.t. θ1 , θ2 is
∂µ ∂µ
∂θ1 ∂θ2 1 0
= .
∂σ ∂σ θ2
∂θ1 ∂θ
0 e
2
∂g
2x + 3x2
1 1
∇g(x1 , x2 ) = ∂x = .
∂g
∂x2 3x1 + 4x2
∂ 2g ∂ 2g
∂x2 ∂x1 ∂x2 2 3
∇2 g(x1 , x2 ) =
∂ 2g
1
2
= .
∂ g
3 4
∂x2 ∂x1 ∂x22
n
1 X
ℓ(θ) = ℓ(µ, σ 2 ) = − n2 ln(2π) − n
2
ln(σ 2 ) − (Xi − µ)2 .
2σ 2 i=1
23
∂ℓ
, ∂ℓ
We use the gradient ∇ℓ(θ) = ∂µ ∂σ 2
. Then
n n
∂ℓ 1 X ∂ℓ n 1 X
= 2 (Xi − µ), = − + (Xi − µ)2 .
∂µ σ i=1 ∂σ 2 2σ 2 2(σ 2 )2 i=1
n n
∂ℓ X 1X
= 0 =⇒ (Xi − µ) = 0 =⇒ µ = Xi ,
∂µ i=1
n i=1
n n
∂ℓ n 1 X
2 2 1 X
= 0 =⇒ − 2 + (Xi − µ) = 0 =⇒ σ = (Xi − µ)2 .
∂σ 2 2σ 2(σ 2 )2 i=1 n i=1
24
4 Taylor Series and Optimization
∞
X f (n) (x0 )
f (x) = (x − x0 )n ,
n=0
n!
N
X f (n) (x0 )
f (x) = (x − x0 )n + RN (x),
n=0
n!
f (N +1) (x∗ )
RN (x) = (x − x0 )N +1 ,
(N + 1)!
II. Truncated Series with Remainder: If the series is truncated at the N -th
term, the function can be exactly expressed as:
N
X f (n) (0)
f (x) = xn + RN (x),
n=0
n!
25
where the remainder term RN (x) is:
f (N +1) (x∗ ) N +1
RN (x) = x ,
(N + 1)!
1. ex :
∞
X xn x2 x3
ex = =1+x+ + + ···
n=0
n! 2! 3!
1
2. 1−x
(for |x| < 1):
∞
1 X
= xn = 1 + x + x2 + x3 + · · ·
1 − x n=0
1
3. 1+x
(for |x| < 1):
∞
1 X
= (−1)n xn = 1 − x + x2 − x3 + · · ·
1 + x n=0
∞
X xn x2 x3
− ln(1 − x) = =x+ + + ···
n=1
n 2 3
∞
X xn x2 x3
ln(1 + x) = (−1)n+1 =x− + − ···
n=1
n 2 3
26
6. sin x:
∞
X x2n+1 x3 x5
sin x = (−1)n =x− + − ···
n=0
(2n + 1)! 3! 5!
7. cos x:
∞
X x2n x2 x4n
cos x = (−1) =1− + − ···
n=0
(2n)! 2! 4!
Remark (Moments)
The k-th moment of a random variable X is defined as:
µk = E[X k ],
where k = 1, 2, 3, . . .. The first moment is the mean (µ1 = E[X]), and the second
central moment is the variance (E[(X − E[X])2 ]).
2
using the Maclaurin series for e−x .
Step 1: Write the Maclaurin series for ex . The Maclaurin series for ex
is:
∞
x
X xn
e = .
n=0
n!
∞ ∞
−x2
X (−x2 )n X (−1)n x2n
e = = .
n=0
n! n=0
n!
2 x4
e−x ≈ 1 − x2 + .
2
27
Step 3: Integrate term by term.
0.5
x4
Z
2
I≈ 1−x + dx.
0 2
0.5 0.5
x3 (0.5)3
Z
2 0.125
x dx = = = ≈ 0.04167,
0 3 0 3 3
0.5 5 0.5
x4 (0.5)5
Z
x 0.03125
dx = = = = 0.003125.
0 2 10 0 10 10
Step 4: Compare with the exact value. Using numerical integration, the
R 0.5 2
exact value of 0 e−x dx is approximately 0.46128, showing the Maclaurin approx-
imation is highly accurate.
sin x = 0.5
∞
X x2n+1 x3 x5
sin x = (−1)n =x− + − ···
n=0
(2n + 1)! 3! 5!
Step 2: Set sin x ≈ 0.5. Truncate the series to the first two terms:
x3
sin x ≈ x − .
6
28
Solve:
x3
x− = 0.5.
6
Step 3: Solve iteratively. Let x0 = 0.5 (initial guess). Substitute into the
equation:
(0.5)3
x1 = 0.5 + = 0.52083.
6
Repeat:
(0.52083)3
x2 = 0.52083 − = 0.52333.
6
x ≈ 0.52333.
Step 4: Compare with the exact value. The exact value of the root is
π
x = arcsin(0.5) = 6
≈ 0.523598, showing that the Maclaurin approximation is
highly accurate.
dk
µk = MX (t) .
dtk t=0
∞
X (tX)n
etX = .
n=0
n!
∞
(tX)n
X
tX
MX (t) = E e =E .
n=0
n!
29
By linearity of expectation:
∞
X tn
MX (t) = E[X n ].
n=0
n!
∞
X µn t n
MX (t) = ,
n=0
n!
dk
k
µk = E[X ] = k MX (t) .
dt t=0
∞
dk dk X µn tn
MX (t) = .
dtk dtk n=0 n!
∞ ∞
dk X µn tn X µn
= · n(n − 1) · · · (n − k + 1) · tn−k .
dtk n=0 n! n=k
n!
dk µk
MX (t) = · k! = µk .
dtk t=0 k!
30
4.2 Optimization
Definition 4.1 (Critical Point)
A point x∗ in the domain of a differentiable function f (x) is called a critical point
if:
∇f (x∗ ) = 0.
2. In higher dimensions (n > 1): f (x) is convex if and only if the Hessian matrix
∇2 f (x) is positive semidefinite for all x ∈ D, i.e.:
31
Theorem 4.4 (Second Derivative Condition for Strict Convexity)
Let f (x) be twice differentiable on D ⊆ Rn .
2. In higher dimensions (n > 1): f (x) is strictly convex if and only if the Hessian
matrix ∇2 f (x) is positive definite for all x ∈ D, i.e.:
Remark
The second derivative (or Hessian matrix) provides a powerful tool for determin-
ing the convexity or strict convexity of a function. For scalar functions, convexity
corresponds to f ′′ (x) ≥ 0, while strict convexity corresponds to f ′′ (x) > 0.
Theorem 4.5 (Convex Function)
Let f (x) be a differentiable convex function. If ∇f (x∗ ) = 0, then x∗ is a global
minimum of f (x).
Remark
For convex functions, the graph is ”open downward,” and any stationary point where
∇f (x) = 0 is guaranteed to be a global minimum.
Theorem 4.6 (Strictly Convex Function)
Let f (x) be a differentiable strictly convex function. If ∇f (x∗ ) = 0, then x∗ is the
unique global minimum of f (x).
Remark
Strict convexity ensures that the function’s graph is ”sharply open downward,” mak-
ing the global minimum unique.
Theorem 4.7 (Coercivity)
Let f (x) be a coercive function, meaning f (x) → ∞ as ∥x∥ → ∞. Then, f (x) has
at least one global minimum.
32
Remark
Coercivity guarantees the existence of a global minimum even when the domain is
unbounded. It describes functions that ”grow large enough” at the boundaries.
Remark
This theorem allows for identifying global minima without requiring the function to
be convex. However, it only holds on one dimensional euclidian space R
√
Critical points: x = 0, ± 2.
√ √
2. Analyze f ′ (x): - For x < − 2, f ′ (x) > 0. - For − 2 < x < 0, f ′ (x) < 0. -
√ √
For 0 < x < 2, f ′ (x) < 0. - For x > 2, f ′ (x) > 0.
√
3. Apply the theorem: - At x = ± 2, f ′ (x) changes sign from negative to
positive, indicating local minima. - At x = 0, f ′ (x) does not change sign, so
x = 0 is not a local extremum.
4. Evaluate f (x):
√ √
f (− 2) = f ( 2) = −3, f (0) = 3.
33
√ √
5. Conclusion: Since f (x) → ∞ as x → ±∞, and f (− 2) = f ( 2) = −3, the
√
points x = ± 2 are global minima.
Strict Convex with Critical Point Yes ∇f (x∗ ) = 0 =⇒ unique global min
34