0% found this document useful (0 votes)
83 views20 pages

Math Notation

The document is a draft introduction to artificial neural networks and deep learning written by Sebastian Raschka. It contains a mathematical notation reference section that defines commonly used notation in areas such as sets, sequences, functions, linear algebra, calculus, probability and statistics, and logic that will be used throughout the book. The book will be available online and the GitHub repository contains supporting code examples.

Uploaded by

helloapurba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views20 pages

Math Notation

The document is a draft introduction to artificial neural networks and deep learning written by Sebastian Raschka. It contains a mathematical notation reference section that defines commonly used notation in areas such as sets, sequences, functions, linear algebra, calculus, probability and statistics, and logic that will be used throughout the book. The book will be available online and the GitHub repository contains supporting code examples.

Uploaded by

helloapurba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SEBASTIAN RASCHKA

Introduction to
Artificial Neural Networks
and Deep Learning
with Applications in Python
Introduction to Artificial
Neural Networks
with Applications in Python

Sebastian Raschka

D RAFT
Last updated: May 25, 2018

This book will be available at http://leanpub.com/ann-and-deeplearning.

Please visit https://github.com/rasbt/deep-learning-book for more


information, supporting material, and code examples.


c 2016-2018 Sebastian Raschka
Contents

A Mathematical Notation Reference 4


A.1 Sets and Intervals . . . . . . . . . . . . . . . . . . . . . . . . . 5
A.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
A.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
A.4 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
A.5 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
A.6 Probability and Statistics . . . . . . . . . . . . . . . . . . . . . 11
A.7 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
A.8 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 13
A.9 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

i
Website

Please visit the GitHub repository to download the code examples accom-
panying this book and other supplementary material.
If you like the content, please consider supporting the work by buy-
ing a copy of the book on Leanpub. Also, I would appreciate hearing
your opinion and feedback about the book, and if you have any ques-
tions about the contents, please don’t hesitate to get in touch with me via
[email protected]. Happy learning!

Sebastian Raschka

1
About the Author

Sebastian Raschka received his doctorate from Michigan State University


developing novel computational methods in the field of computational bi-
ology. In summer 2018, he joined the University of Wisconsin–Madison
as Assistant Professor of Statistics. Among others, his research activities
include the development of new deep learning architectures to solve prob-
lems in the field of biometrics. Among his other works is his book "Python
Machine Learning," a bestselling title at Packt and on Amazon.com, which
received the ACM Best of Computing award in 2016 and was translated
into many different languages, including German, Korean, Italian, tradi-
tional Chinese, simplified Chinese, Russian, Polish, and Japanese.
Sebastian is also an avid open-source contributor and likes to contribute
to the scientific Python ecosystem in his free-time. If you like to find more
about what Sebastian is currently up to or like to get in touch, you can find
his personal website at https://sebastianraschka.com.

2
Acknowledgements

I would like to give my special thanks to the readers, who provided feed-
back, caught various typos and errors, and offered suggestions for clarify-
ing my writing.

• Appendix A: Artem Sobolev, Ryan Sun

• Appendix B: Brett Miller, Ryan Sun

• Appendix D: Marcel Blattner, Ignacio Campabadal, Ryan Sun, Denis


Parra Santander

• Appendix F: Guillermo Monecchi, Ged Ridgway, Ryan Sun, Patric


Hindenberger

• Appendix H: Brett Miller, Ryan Sun, Nicolas Palopoli, Kevin Zakka

DRAFT 3
Appendix A

Mathematical Notation
Reference

This appendix provides a brief overview of the mathematical notation used


throughout this book. The following appendices describe most of the corre-
sponding concepts in more detail, and additional information is provided
in the context of the applications in the main chapters.

DRAFT 4
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 5

A.1 Sets and Intervals


Z set of integers, {. . . , −2, −1, 0, 1, 2, ...}

N set of natural numbers, {0, 1, 2, 3, ...}

N+ set of natural numbers excluding zero, {1, 2, 3, ...}

R set of real numbers

∈ element of symbol; for example, x ∈ A translates to "x is an element of set A"


/ not an element of symbol

∅ null set, empty set

A∪B union of two sets, A and B

A∩B intersection of two sets, A and B

A⊆B A is a subset of B or included in B

A∆B symmetric difference between two sets A and B

|A| cardinality of a set A (number of elements in a set A)

(a, b) open interval from a to b, excluding a and b

[a, b] closed interval from a to b, including a and b

[a, b) half-open interval from a to b, including a but not b

(a, b] half-open interval from a to b, including b but not a

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 6

A.2 Sequences
n n
xi = x1 + x2 + · · · + xn
P P
xi summation of an indexed variable xi , defined as
i=1 i=1
n n
xi = x1 · x2 · . . . · xn
Q Q
xi product over an indexed variable xi , defined as
i=1 i=1

A.3 Functions
f :A→B function f with domain A and codomain B

(g ◦ f )(x) composition of two functions g and f alternative form: g[f (x)]

f −1 (x) inverse of a function f , such that f (y) = x if f −1 stands for y

|x| absolute value of x; for example, | − 2| = 2

logb base-b logarithm

log natural logarithm (base-e logarithm)

n! n-factorial, where 0! = 1 and n! = n(n − 1)(n − 2) · · · 2 · 1 for n > 0


n n n!
k binomial coefficient ("n choose k"); k = k!(n−k)! for 0 ≤ k ≤ n

arg max f (x) the x value that makes f (x) as large as possible

arg min f (x) the x value that makes f (x) as small as possible

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 7

A.4 Linear Algebra


x scalar (lower-case italics notation)

x column vector (lower-case bold notation) or n × 1-matrix

a·b dot product of two vectors, a and b;


if a and b are n × 1-matrices, also written as aT b;
a · b = aT b = i ai bi = a1 b1 + a2 b2 + · · · + an bn
P

X m × n-matrix (upper-case bold notation)

X 3D-tensor (upper-case italics notation)

Rn real coordinate
  space, written as a column vector with length n
x1
 x2 
 
x=  .. 

 . 
xn

xT transpose of a n × 1-matrix
 T
x1
i  x2 
h 
x T = x1 x2 . . . xn =  . 

 .. 

xn

kxkp Lp norm, vector p-norm,


1/p
kxkp = |xp1 | + |xp2 | + · · · + |xpn |

kxk∞ L∞ norm, max norm; largest absolute value of a vector


kxk∞ = max |xi |
i

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 8

kxk norm, L2 -norm, kxk = kxk2


vector q
kxk = x21 + x22 + · · · + x2n

Ai,: ith row of matrix A

A:,j jth column of matrix A

AT transpose of a matrix, matrix element Ai,j becomes ATj,i


 T
1 2 " #
1 3 5
for example, 3 4 =
 
2 4 6
5 6

In n × nidentitymatrix
1 0 0
I3 = 0 1 0
 
0 0 1

A−1 inverse of a matrix A, such that AA−1 = A−1 A = I

tr A trace of a matrix A (sum of the diagonal elements)


n
P
tr A = Ai,i
i=1

det A determinant of a matrix A

diag(a1 , a2 , ..., an ) diagonal matrix, matrix whose


diagonal have the values a1 , a2 , ..., an and all other elements are zero

A B Hadamard product, element-wise matrix multiplication

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 9

A.5 Calculus
lim f (x) limit of f (x) as x approaches a
x→a

lim f (x) limit of f (x) as x approaches a from the left


x→a−

lim f (x) limit of f (x) as x approaches a from the right


x→a+
df
dx derivative of f

dn f
dxn n-th derivative of f

∂f
∂x partial derivative of f (x, y, ...)
with respect to variable x, where x is a scalar

∇f gradient of a function f : Rn → R
 
∂f
 ∂x1
 ∂f 
 ∂x2 
∇f (x1 , x2 , ..., xn ) =  . 
 .. 
 
∂f
∂xn

∆f Laplacian of a function f : Rn → R
n
P ∂2f
∆f = ∂x2i
i=1

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 10

Hf Hessian of a function f : Rn → R
 ∂2f ∂2f ∂2f

...
 ∂x∂12∂x
f
1 ∂x1 ∂x2
∂2f
∂x1 ∂xn
∂2f 
 ... 
Hf =  ∂x2.∂x1 ∂x2 ∂x2 ∂x2 ∂xn 

 .. .. .. .. 
 . . . 

∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2 ... ∂xn ∂xn

∂fj
∂xi partial derivative of component function fj and the

variable xj , where f : Rn → Rm , such that


 
  ∂f1
f1 (x) ∂xi 
 ∂f
 f2 (x)  ∂f
   2
 ∂xi 
f (x) = 
 ..  ∂xi =  ... 
 
 .   
fm (x) ∂fm
∂xi

Df Jacobian matrix of f .
 
∂f1 ∂f1 ∂f1
...
 ∂x 1 ∂x2 ∂xn 
 ∂f2 ∂f2
... ∂f2 
 ∂x1 ∂x2 ∂xn 
Df =  . .. .. .. 
 .. . . . 
 
∂fm ∂fm ∂fm
∂x1 ∂x2 ... ∂xn

R
f (x)dx indefinite integral of f (derivative of F ) with f : R → R

Rb
f (x)dx definite integral of f (derivative of F ) with f : R → R
a

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 11

A.6 Probability and Statistics


P (A ∩ B) probability that event A and B occur

P (A ∪ B) probability that event A or B occurs

P (A | B) conditional probability of A given B

E(X), µX expected value (mean) of a random variable X



P
E(X) = pi xi for a discrete random variable X
i=1
with values x1 , x2 , . . . and probabilities p1 , p2 , . . . .
R∞
E(X) = xf (x)dx for a continuos random variable and
−∞
probability density function f (x).

X̄ sample average of numerical data X1 , ..., Xn


n
1 P
X̄ = n Xi
i=1

var(X), σx2 variance of a random variable X


var(X) = E (X − µX ) = E(X 2 ) − E(X)2
2


s2X sample variance of numerical data X1 , ..., Xn


n
1
s2X = (Xi − X̄)2
P
n
i=1

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 12

std(X), σx standard deviation of a random variable, square root of the variance

sX sample standard deviation, the square root of the sample variance s2X

cov(X, Y ) covariance of two random variables X and Y


cov(XY ) = E[(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y )

sXY sample covariance of numerical data X1 , ..., Xn , and Y1 , ..., Yn


n
1
(Xi − X̄)(Yi − Ȳ )
P
sXY = n
i=1

corr(X, Y ) correlation coefficient of two random variables X and Y ,


corr(X, Y ) = cov(X,Y
σX σY
)

H(X) entropy of a random variable X


discrete: H(X) = − P (X = x) logb P (X = x)
P
x
R∞
continuous: H(X) = − f (x) logb f (x)dx
−∞

PMF probability mass function of a discrete random variable, f (x) = P (X = x)

CDF cumulative distribution function of a continuous random variable,


F (x) = P (X ≤ x)

PDF probability density function of a continuous random variable,


Rb
P (X ∈ [a, b]) = f (x)dx
a

X∼D random variable X has a distribution D

θ̂ estimator of a parameter θ

N (x, µ, σ 2 ) normal (Gaussian) distribution over x with mean µ and variance σ 2

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 13

A.7 Numbers
e Euler’s number, mathematical constant approximated by 2.71828

π "pi", mathematical constant approximated by 3.14159

∞ infinity symbol

1.234 × 105 scientific notation for 123, 400

or 1.234E05

< less than sign, for example, x < 10 means that x is smaller than 10

 much less than sign

> greater than sign, for example, x > 10 means that x is larger than 10

 much greater than sign

 much less than sign

A.8 Approximation
≈ approximate equality, for instance, e ≈ 2.71828 is the approximation
of Euler’s number

f (x) ∼ g(x) symbol to assert that the ratio of two functions approaches 1
lim fg(x)
(x)
= 1, if x is small
x→0
lim f (x) = 1, if x is large
x→∞ g(x)

f (x) ∝ g(x) the two functions f (x) and g(x) are proportional to each other

T (n) ∈ O(n2 ) big-O notation, an algorithm is asymptotically bounded by n2 ;


an algorithm has an order of n2 time complexity

DRAFT
APPENDIX A. MATHEMATICAL NOTATION REFERENCE 14

A.9 Logic
⇒ implication operator
for example, A ⇒ B translates to "if A implies B"
or "if A then B" (or "B only if A")

⇔ equality operator (if and only if (iff))


for example, A ⇔ B translates to "A if
and only if B" or "if A then B and if B then A"

∧ logical conjunction, and


for example, A ∧ B means "A and B"

∨ logical (inclusive) disjunction, or


for example, A ∨ B means "A or B"

¬ negation, not
for example, ¬A means "not A" or
"if A is true then ¬A is false" and vice versa

∀ universal quantifier, means for all


for example, "∀x ∈ R, x > 1"
translates to "for all real numbers x, x is greater than one"

∃ existential quantifier, means there exists


for example, "∃x ∈ A, f (x)"
translates to "there is an element in set A for which the predicate f (x) holds true"

DRAFT
Bibliography

DRAFT 15
Abbreviations and Terms

CNN [Convolutional Neural Network]

DRAFT 16
Index

DRAFT
17

You might also like