0% found this document useful (0 votes)
2 views

Notation Example

Uploaded by

vishwanath444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Notation Example

Uploaded by

vishwanath444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Example Notation for Deep Learning

Ian Goodfellow
Yoshua Bengio
Aaron Courville
Contents

Notation ii

1 Commentary 1
1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Bibliography 4

Index 5

i
Notation

This section provides a concise reference describing notation used throughout this
document. If you are unfamiliar with any of the corresponding mathematical
concepts, Goodfellow et al. (2016) describe most of these ideas in chapters 2–4.

Numbers and Arrays


a A scalar (integer or real)
a A vector
A A matrix
A A tensor
In Identity matrix with n rows and n columns
I Identity matrix with dimensionality implied by
context
e(i) Standard basis vector [0, . . . , 0, 1, 0, . . . , 0] with a
1 at position i
diag(a) A square, diagonal matrix with diagonal entries
given by a
a A scalar random variable
a A vector-valued random variable
A A matrix-valued random variable

ii
CONTENTS

Sets and Graphs


A A set
R The set of real numbers
{0, 1} The set containing 0 and 1
{0, 1, . . . , n} The set of all integers between 0 and n
[a, b] The real interval including a and b
(a, b] The real interval excluding a but including b
A\B Set subtraction, i.e., the set containing the ele-
ments of A that are not in B
G A graph
P aG (xi ) The parents of xi in G

Indexing
ai Element i of vector a, with indexing starting at 1
a−i All elements of vector a except for element i
Ai,j Element i, j of matrix A
Ai,: Row i of matrix A
A:,i Column i of matrix A
Ai,j,k Element (i, j, k) of a 3-D tensor A
A:,:,i 2-D slice of a 3-D tensor
ai Element i of the random vector a

Linear Algebra Operations


>
A Transpose of matrix A
A+ Moore-Penrose pseudoinverse of A
A B Element-wise (Hadamard) product of A and B
det(A) Determinant of A

iii
CONTENTS

Calculus
dy
Derivative of y with respect to x
dx
∂y
Partial derivative of y with respect to x
∂x
∇x y Gradient of y with respect to x
∇X y Matrix derivatives of y with respect to X
∇X y Tensor containing derivatives of y with respect to
X
∂f
Jacobian matrix J ∈ Rm×n of f : Rn → Rm
∂x
2
∇x f (x) or H(f )(x) The Hessian matrix of f at input point x
Z
f (x)dx Definite integral over the entire domain of x
Z
f (x)dx Definite integral with respect to x over the set S
S

Probability and Information Theory


a⊥b The random variables a and b are independent
a⊥b | c They are conditionally independent given c
P (a) A probability distribution over a discrete variable
p(a) A probability distribution over a continuous vari-
able, or over a variable whose type has not been
specified
a∼P Random variable a has distribution P
Ex∼P [f (x)] or Ef (x) Expectation of f (x) with respect to P (x)
Var(f (x)) Variance of f (x) under P (x)
Cov(f (x), g(x)) Covariance of f (x) and g(x) under P (x)
H(x) Shannon entropy of the random variable x
DKL (P kQ) Kullback-Leibler divergence of P and Q
N (x; µ, Σ) Gaussian distribution over x with mean µ and
covariance Σ

iv
CONTENTS

Functions
f :A→B The function f with domain A and range B
f ◦g Composition of the functions f and g
f (x; θ) A function of x parametrized by θ. (Sometimes
we write f (x) and omit the argument θ to lighten
notation)
log x Natural logarithm of x
1
σ(x) Logistic sigmoid,
1 + exp(−x)
ζ(x) Softplus, log(1 + exp(x))
||x||p Lp norm of x
||x|| L2 norm of x
x+ Positive part of x, i.e., max(0, x)
1condition is 1 if the condition is true, 0 otherwise
Sometimes we use a function f whose argument is a scalar but apply it to a
vector, matrix, or tensor: f (x), f (X), or f (X). This denotes the application of f
to the array element-wise. For example, if C = σ(X), then Ci,j,k = σ(Xi,j,k ) for all
valid values of i, j and k.

Datasets and Distributions


pdata The data generating distribution
p̂data The empirical distribution defined by the training
set
X A set of training examples
x(i) The i-th example (input) from a dataset
y (i) or y (i) The target associated with x(i) for supervised learn-
ing
X The m × n matrix with input example x(i) in row
Xi,:

v
Chapter 1

Commentary

This document is an example of how to use the accompanying files as well as some
commentary on them. The files are math_commands.tex and notation.tex. The
file math_commands.tex includes several useful LATEX macros and notation.tex
defines a notation page that could be used at the front of any publication.
We developed these files while writing Goodfellow et al. (2016). We release
these files for anyone to use freely, in order to help establish some standard notation
in the deep learning community.

1.1 Examples
We include this section as an example of some LATEX commands and the macros
we created for the book.
Citations that support a sentence without actually being used in the sentence
should appear at the end of the sentence using citep:

Inventors have long dreamed of creating machines that think. This


desire dates back to at least the time of ancient Greece. The mythical
figures Pygmalion, Daedalus, and Hephaestus may all be interpreted
as legendary inventors, and Galatea, Talos, and Pandora may all be
regarded as artificial life (Ovid and Martin, 2004; Sparkes, 1996; Tandy,
1997).

When the authors of a document or the document itself are a noun in the
sentence, use the citet command:
1
CHAPTER 1. COMMENTARY

Mitchell (1997) provides a succinct definition of machine learning: “A


computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P , if its performance
at tasks in T , as measured by P , improves with experience E.”

When introducing a new term, using the newterm macro to highlight it. If
there is a corresponding acronym, put the acronym in parentheses afterward. If
your document includes an index, also use the index command.

Today, artificial intelligence (AI) is a thriving field with many prac-


tical applications and active research topics.

Sometimes you may want to make many entries in the index that all point to a
canonical index entry:

One of the simplest and most common kinds of parameter norm penalty
is the squared L2 parameter norm penalty commonly known as weight
decay. In other academic communities, L2 regularization is also known
as ridge regression or Tikhonov regularization.

To refer to a figure, use either figref or Figref depending on whether you


want to capitalize the resulting word in the sentence.

See figure 1.1 for an example of a how to include graphics in your


document. Figure 1.1 shows how to include graphics in your document.

Similarly, you can refer to different sections of the book using partref, Partref,
secref, Secref, etc.

You are currently reading section 1.1.

Acknowledgments
We thank Catherine Olsson and Úlfar Erlingsson for proofreading and review of
this manuscript.

2
CHAPTER 1. COMMENTARY

Deep learning Example:


Shallow
Example: Example:
Example: autoencoders
Logistic Knowledge
MLPs
regression bases

Representation learning

Machine learning

AI

Figure 1.1: An example of a figure. The figure is a PDF displayed without being rescaled
within LATEX. The PDF was created at the right size to fit on the page, with the fonts at
the size they should be displayed. The fonts in the figure are from the Computer Modern
family so they match the fonts used by LATEX.

3
Bibliography

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New York.

Ovid and Martin, C. (2004). Metamorphoses. W.W. Norton.

Sparkes, B. (1996). The Red and the Black: Studies in Greek Pottery. Routledge.

Tandy, D. W. (1997). Works and Days: A Translation and Commentary for the Social
Sciences. University of California Press.

4
Index

Artificial intelligence, 2 Tikhonov regularization, see weight decay


Transpose, iii
Conditional independence, iv
Covariance, iv Variance, iv
Vector, ii, iii
Derivative, iv
Determinant, iii Weight decay, 2

Element-wise product, see Hadamard prod-


uct

Graph, iii

Hadamard product, iii


Hessian matrix, iv

Independence, iv
Integral, iv

Jacobian matrix, iv

Kullback-Leibler divergence, iv

Matrix, ii, iii

Norm, v

Ridge regression, see weight decay

Scalar, ii, iii


Set, iii
Shannon entropy, iv
Sigmoid, v
Softplus, v

Tensor, ii, iii

You might also like