Fundamentals of Computing: Leonid A. Levin
Fundamentals of Computing: Leonid A. Levin
Acknowledgments. I am grateful to the University of California at Berkeley, its MacKey Professorship fund
and Manuel Blum who made possible for me to teach this course. The opportunity to attend lectures of M. Blum
and Richard Karp and many ideas of my colleagues at BU and MIT were very beneficial for my lectures. I am also
grateful to the California Institute of Technology for a semester with light teaching load in a stimulating environment
enabling me to rewrite the students' notes. NSF grants \#DCR-8304498, DCR-8607492, CCR-9015276 also supported
the work. And most of all I am grateful to the students [see 6.3] who not only have originally written these notes,
but also influenced the lectures a lot by providing very intelligent reactions and criticism.
Contents
I Basics 2
1 Deterministic Models; Polynomial Time \& Church's Thesis 2
1.1 Rigid Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Pointer Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
II Mysteries 12
4 Nondeterminism; Inverting Functions; Reductions 12
4.1 An Example of a Narrow Computation: Inverting a Function . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Complexity of NP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 An NP-Complete Problem: Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Probability in Computing 15
5.1 A Monte-Carlo Primality Tester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Randomized Algorithms and Random Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Arithmetization: One-Player Games with Randomized Transition . . . . . . . . . . . . . . . . . . . . . 17
6 Randomness 18
6.1 Randomness and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2 Pseudo-randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.3 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
References 21
Copyright ○
c 2025 by the author. Last revised: February 2, 2025.
1
2
Part I
Basics
1 Deterministic Models; Polynomial Time \& Church's Thesis
Sections 1,2 study deterministic computations. Non-deterministic aspects of computations (inputs, interac-
tion, errors, randomization, etc.) are crucial and challenging in advanced theory and practice. Defining them
as an extension of deterministic computations is simple. The latter, however, while simpler conceptually,
require elaborate models for definition. These models may be sophisticated if we need a precise measure-
ment of all required resources. However, if we only need to define what is computable and get a very rough
magnitude of the needed resources, all reasonable models turn out equivalent, even to the simplest ones. We
will pay significant attention to this surprising and important fact. The simplest models are most useful for
proving negative results and the strongest ones for positive results.
We start with terminology common to all models, gradually making it more specific to those we actually
study. We represent computations as graphs: the edges reflect various relations between nodes (events).
Nodes, edges have attributes: labels, states, colors, parameters, etc. (affecting the computation or its analysis).
Causal edges run from each event to all events essential for its occurrence or attributes. They form a directed
acyclic graph (though cycles may be added artificially to mark the external input parts of the computation).
We will study only synchronous computations. Their nodes have a time parameter. It reflects logical
steps, not necessarily a precise value of any physical clock. Causal edges only span short (typically, \leq 3
moments) time intervals. One event among the causes of a node is called its parent. Pointer edges connect
the parent of each event to all its other possible causes and reflect connections that allow simultaneous
events to interact and have a joint effect. Pointers with the same source have different labels. The (labeled)
subgraph of events/edges at a given time is an instant memory configuration of the model.
Each non-terminal configuration has active nodes/edges around which it may change. The models with
only a small active area at any step of the computation are sequential. Others are called parallel.
Growth Rates (typically expressed as functions of bit length n = \| x, y\| of input/output x/y):
O, \Omega : f (n) = O(g(n))1 \Leftarrow \Rightarrow g(n) = \Omega (f (n)) \Leftarrow \Rightarrow supn fg(n)
(n)
< \infty .
o, \omega : f (n) = o(g(n)) \Leftarrow \Rightarrow g(n) = \omega (f (n)) \Leftarrow \Rightarrow limn\rightarrow \infty fg(n)
(n)
= 0.
\Theta : f (n) = \Theta (g(n)) \Leftarrow \Rightarrow (f (n) = O(g(n)) and g(n) = O(f (n))).
Here are a few examples of frequently appearing growth rates: negligible (log n)O(1) ; moderate n\Theta (1)
\Omega (1) \sqrt{}
(called polynomial or P, like in P-time); infeasible: 2n , also n! = (n/e)n \pi (2n+1/3) + \varepsilon /n, \varepsilon \in [0, .1].2
The reason for ruling out exponential (and neglecting logarithmic) rates is that the known Universe is
too small to accommodate exponents. Its radius is about 46.5 giga-light-years \sim 2204 Plank units. A system
of \gg R1.5 atoms packed in R Plank Units radius collapses rapidly, be it Universe-sized or a neutron star. So
4
the number of atoms is < 2306 \ll 44 \ll 5!!.
1 \mathrm{T}\mathrm{h}\mathrm{i}\mathrm{s} \mathrm{i}\mathrm{s} \mathrm{a} \mathrm{c}\mathrm{u}\mathrm{s}\mathrm{t}\mathrm{o}\mathrm{m}\mathrm{a}\mathrm{r}\mathrm{y} \mathrm{b}\mathrm{u}\mathrm{t} \mathrm{s}\mathrm{o}\mathrm{m}\mathrm{e}\mathrm{w}\mathrm{h}\mathrm{a}\mathrm{t} \mathrm{m}\mathrm{i}\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{a}\mathrm{d}\mathrm{i}\mathrm{n}\mathrm{g} \mathrm{n}\mathrm{o}\mathrm{t}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}. \mathrm{T}\mathrm{h}\mathrm{e} \mathrm{c}\mathrm{l}\mathrm{e}\mathrm{a}\mathrm{r} \mathrm{n}\mathrm{o}\mathrm{t}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s} \mathrm{w}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{d} \mathrm{b} \mathrm{e} \mathrm{l}\mathrm{i}\mathrm{k}\mathrm{e} f (n) \in O(g(n))
\int n+.5
\mathrm{r}\mathrm{o}\mathrm{u}\mathrm{g}\mathrm{h}\mathrm{e}\mathrm{r} \mathrm{e}\mathrm{s}\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{t}\mathrm{e} \mathrm{f}\mathrm{o}\mathrm{l}\mathrm{l}\mathrm{o}\mathrm{w}\mathrm{s} \mathrm{b}\mathrm{y} \mathrm{c}\mathrm{o}\mathrm{m}\mathrm{p}\mathrm{u}\mathrm{t}\mathrm{i}\mathrm{n}\mathrm{g} \mathrm{l}\mathrm{n} n! = t \mathrm{l}\mathrm{n}(t/e)| n+.5
2 \mathrm{A}
\sum n
t=1.5 + O(1) \mathrm{u}\mathrm{s}\mathrm{i}\mathrm{n}\mathrm{g} \mathrm{t}\mathrm{h}\mathrm{a}\mathrm{t} | i=2 g(i) - 1.5 g(t) \bfd t|
\prime \prime \prime
\mathrm{i}\mathrm{s} \mathrm{b} \mathrm{o}\mathrm{u}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{d} \mathrm{b}\mathrm{y} \mathrm{t}\mathrm{h}\mathrm{e} \mathrm{t}\mathrm{o}\mathrm{t}\mathrm{a}\mathrm{l} \mathrm{v}\mathrm{a}\mathrm{r}\mathrm{i}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n} v \mathrm{o}\mathrm{f} g /8. \mathrm{S}\mathrm{o} \mathrm{f}\mathrm{o}\mathrm{r} \mathrm{m}\mathrm{o}\mathrm{n}\mathrm{o}\mathrm{t}\mathrm{o}\mathrm{n}\mathrm{e} g (t) = \mathrm{l}\mathrm{n} (t) = 1/t \mathrm{t}\mathrm{h}\mathrm{e} O(1) \mathrm{i}\mathrm{s} < v < 1/12.
3
An example: the Game of Life (GL). GL is a plane grid of cells, each holds a 1-bit state (dead/alive)
and pointers to the 8 adjacent cells. A cell remains dead or alive if the number i of its live neighbors is 2.
It becomes (or stays) alive if i=3. In all other cases it dies (of overcrowding or loneliness) or stays dead.
A simulation of a machine M1 by M2 is a correspondence between memory configurations of M1 and
M2 which is preserved during the computation (may be with some time dilation). Such constructions show
that the computation of M1 on any input x can be performed by M2 as well. GL can simulate any CA (see
a sketch of an ingenious proof in the last section of [Berlekamp, Conway, Guy 82]) in this formal sense:
We fix space and time periods a, b. Cells (i, j) of GL are mapped to cell (\lfloor i/a\rfloor , \lfloor j /a\rfloor ) of CA M (com-
pressing a \times a blocks). We represent cell states of M by states of a \times a blocks of GL. This correspondence
is preserved after any number t steps of M and bt steps of GL regardless of the starting configuration.
Exercise. Design a machine of each model (TM, CA, KM, PPM) which determines if an input string x
has a form ww, w \in \{ a, b\} \ast . Analyze time (depth) and space. KM/PPM takes input x in the form of colors
of edges in a chain of nodes, with root linked to both ends. The PPM nodes also have pointers to the root.
Below are hints for TM,SPM,CA. The space is O(\| x\| ) in all three cases.
Turing and Pointer Machines. TM first finds the middle of ww by capitalizing the letters at both ends
one by one. Then it compares letter by letter the two halves, lowering their case. The complexity is: T (x) =
O(\| x\| 2 ). SPM acts similarly, except that the root keeps and updates the pointers to the borders between
the upper and lower case substrings. This allows constant time access to these borders. So, T (x)=O(\| x\| ).
Cellular Automata. The computation starts with the leftmost cell sending right two signals. Reaching
the end the first signal turns back. The second signal propagates three times slower, so they meet in the
middle of ww and disappear. While alive, the second signal copies the input field i of each cell into a special
field c. The c symbols will try to move right whenever the next cell's c field is blank. So the chain of these
symbols alternating with blanks will start moving right from the middle of ww. Upon reaching the end they
will push the blanks out and pack themselves back into a copy of the left half of ww shifted right. When a c
symbol does not have a blank at the right to move to, it compares itself with the i field of the same cell. If
they differ, a signal is generated which halts all activity and rejects x. If all comparisons are successful, the
last c generates the accepting signal. The depth is: T (x) = O(\| x\| ).
5
1.3 Simulation
We have considered several models of computation. We will see now how the simplest of them -- Turing
Machine -- can simulate all others: these powerful machines can compute no more functions than TM.
Church-Turing Thesis is a generalization of this conclusion: TMs can compute every function computable
in any thinkable physical model of computation. This is not a math theorem because the notion of model
is not formally specified. But the long history of studying ways to design real and ideal computing devices
makes it very convincing. Moreover, this Thesis has a stronger Polynomial Time version which bounds
the volume of computation required by that TM simulation by a polynomial of the volume used by the other
models. Both forms of the Thesis play a significant role in foundations of Computer Science.
PKM Simulation of PPM. For convenience, we assume all PPM nodes have pointers to root. PPM
configuration is represented in PKM with extra colors l, r, u used in a u-colored binary tree added to each
node X so that all (unlimited in number) PPM pointers to X are reconnected to its leaves, and inverses,
colored l, r, added to all pointers. The number of pointers increases at most 4 times. To simulate PPM, X
gets a binary name formed by the l, r colors on its path through the root tree, and broadcasts it down
its own tree. For pulling stage X extends its tree to double depth and merges (with combined colors) its
own pointers to nodes with identical names. Then X re-colors its pointers as PPM program requires and
rebalances its tree. This simulation of a PPM step takes polylogarithmic parallel time.
TM Simulation of PKM. We assume the PKM keeps a constant degree and a roughly balanced root
tree (to yield short node names as described above). TM tape reflects its configuration as the list of all
pointers sorted by source name, then by color. The TM's transition table reflects the PKM program. To
simulate PKM's pulling stage TM creates a copy of each pointer and sorts copies by their sinks. Now each
pointer, located at source, has its copy near its sink. So both components of 2-pointer paths are nearby: the
special double-colored pointers can be created and moved to their sources by resorting on the source names.
The re-coloring stage is straightforward, as all relevant pointers having the same source are located together.
Once the root has no active pointers, the Turing machine stops and its tape represents the PKM output. If a
PPM computes a function f (x) in t(x) steps, using s(x) nodes, the simulating TM uses space S = O(s log s),
(O(log s) bits for each of O(s) pointers) and time T = O(S 2 )t, as TM sorting takes quadratic time.
Squaring matters ! TM cannot outperform Bubble Sort. Is its quadratic overhead a big deal? In a short time all
6.25
silicon gates on your PC run, say, X=1023 \sim 22 clock cycles combined. Silicon parameters double almost annually.
Decades may bring micron-thin things that can sail sunlight in space in clouds of great computing and physical (light
beam) power. Centuries may turn them into a Dyson Sphere enveloping the solar system. Still, the power of such
7.25
an ultimate computer is limited by the number of photons the Sun emits per second: Y \sim 22 =X 2 . Giga-years may
8.25
turn much of the known universe into a computer, but its might is still limited by its total entropy 22 =Y 2 .
Faster PPM Simulations. Parallel Bubble-Sort on CA or Merge-Sort on sequential FCM take nearly
linear time. Parallel FCM can do much better [Ofman 65]. It represents and updates pointer graphs as the
above TM. All steps are straightforward to do locally in parallel polylog time except sorting of pointers. We
need to create a fixed connection sorting network. Sophisticated networks sort arbitrary arrays of n integers
in O(log n) parallel steps. We need only a simpler polylog method. Merge-Sort splits an array of two or more
entries in two halves and sorts each recursively. Batcher-Merge combines two sorted lists in O(log n) steps.
Batcher Merge. A bitonic cycle is the combination of two sorted arrays (one may be shorter),
connected by max-to-max and min-to-min entries. Entries in a contiguous half (high-half) of the cycle
are \geq than all entries in the other (low) half. Each half (with its ends connected) forms a bitonic cycle itself.
Shuffle Exchange graph links nodes in a 2k -nodes array to their flips and shits. The flip flips the
highest bit of a node's address; the shift cycle-shifts that bit to the end, or switches 0k address with 1k .
We merge-sort two sorted arrays given as a bitonic cycle on such a graph as follows.
Comparing each entry with its flip (half-a-cycle away), and switching if wrongly ordered, fits the high and
low halves into respectively the first and last halves of the array. (This rotates the array, and then its left
and right halves.) We do so for each half recursively (decrementing k via graph's shift edges).
6
Goedel's Theorem
There is no complete function among the total computable ones, as this class is closed under negation.
So the universal in R function u (and u2 = (u mod 2)) has no total computable extensions.
Formal proof systems are computable functions A(P ) which check if P is an acceptable proof and output
the proven statement. \vdash s means s = A(P ) for some P . A is rich iff it allows computable translations
sx of statements ``u2 (x) = 0"", provable whenever true, and refutable (\vdash \neg sx ), whenever u2 (x) = 1. A is
consistent iff at most one of any such pair sx , \neg sx is provable, and complete iff at least one of them
always (even when u(x) diverges) is. Rich consistent and complete formal systems cannot exist, since they
would provide an obvious total extension uA of u2 (by exhaustive search for P to prove or refute sx ). This
is the famous Goedel's Theorem -- one of the shocking surprises of the 20th century science. (Here A is any
extension of the formal Peano Arithmetic; we skip the details of its formalization and proof of richness.)3
Computable Functions. Another byproduct is that the Halting (of u(x)) Problem would yield a total
extension of u and, thus, is not computable. This is the source of many other uncomputability results.
Another source is an elegant Fixed Point Theorem by S. Kleene: any total computable transformation
A of programs (prefixes) maps some program into an equivalent one. Indeed, the complete/universal u(ps)
intersects computable u(A(p)s). This implies, e.g., Rice theorem: the only computable invariant (i.e. the
same on programs computing the same functions) property of programs is constant (exercise).
Computable (partial and total) functions are also called recursive (due to an alternative definition).
Their ranges (and, equivalently, domains) are called (computably) enumerable or c.e. sets. An c.e. set with
an c.e. complement is called computable (as is its yes/no characteristic function) or decidable. A function
is computable iff its graph is c.e. An c.e. graph of a total function is computable. Each infinite c.e. set is the
range of an injective total computable function (``enumerating"" it, hence the name c.e.).
We can reduce membership problem of a set A to the one of a set B by finding a computable function f
s.t. x \in A \Leftarrow \Rightarrow f (x) \in B. Then A is called m- (or many-to-1-) reducible to B. A more complex Turing
reduction is given by an algorithm which, starting from input x, interacts with B by generating strings s and
receiving answers to s \in ?B questions. Eventually it stops and tells if x \in A. c.e. sets (like Halting Problem)
to which all c.e. sets can be m-reduced are called c.e.-complete. One can show a set c.e.-complete (and,
thus, undecidable) by reducing the Halting Problem to it. So Ju.Matijasevich proved c.e.-completeness of
Diophantine Equations Problem: given a multivariate polynomial of degree 4 with integer coefficients, find
if it has integer roots. The above (and related) concepts and facts are broadly used in Theory of Algorithms
and should be learned from any standard text, e.g., [Rogers 67].
3 \mathrm{A} \mathrm{c}\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{e}\mathrm{r} \mathrm{l}\mathrm{o} \mathrm{o}\mathrm{k} \mathrm{a}\mathrm{t} \mathrm{t}\mathrm{h}\mathrm{i}\mathrm{s} \mathrm{p}\mathrm{r}\mathrm{o} \mathrm{o}\mathrm{f} \mathrm{r}\mathrm{e}\mathrm{v}\mathrm{e}\mathrm{a}\mathrm{l}\mathrm{s} \mathrm{a}\mathrm{n}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r} \mathrm{f}\mathrm{a}\mathrm{m}\mathrm{o}\mathrm{u}\mathrm{s} \mathrm{G}\mathrm{o} \mathrm{e}\mathrm{d}\mathrm{e}\mathrm{l} \mathrm{t}\mathrm{h}\mathrm{e}\mathrm{o}\mathrm{r}\mathrm{e}\mathrm{m}: \mathrm{C}\mathrm{o}\mathrm{n}\mathrm{s}\mathrm{i}\mathrm{s}\mathrm{t}\mathrm{e}\mathrm{n}\mathrm{c}\mathrm{y} C \mathrm{o}\mathrm{f} A (\mathrm{e}\mathrm{x}\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{s}\mathrm{s}\mathrm{i}\mathrm{b}\mathrm{l}\mathrm{e} \mathrm{i}\mathrm{n} A \mathrm{a}\mathrm{s} \mathrm{d}\mathrm{i}\mathrm{v}\mathrm{e}\mathrm{r}\mathrm{g}\mathrm{e}\mathrm{n}\mathrm{c}\mathrm{e} \mathrm{o}\mathrm{f}
\mathrm{t}\mathrm{h}\mathrm{e} \mathrm{s}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{c}\mathrm{h} \mathrm{f}\mathrm{o}\mathrm{r} \mathrm{c}\mathrm{o}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{d}\mathrm{i}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s}) \mathrm{i}\mathrm{s} \mathrm{i}\mathrm{t}\mathrm{s}\mathrm{e}\mathrm{l}\mathrm{f} \mathrm{a}\mathrm{n} \mathrm{e}\mathrm{x}\mathrm{a}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e} \mathrm{o}\mathrm{f} \mathrm{u}\mathrm{n}\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{v}\mathrm{a}\mathrm{b}\mathrm{l}\mathrm{e} \neg sx . \mathrm{I}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{e}\mathrm{d}, u2 \mathrm{i}\mathrm{n}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{s}\mathrm{e}\mathrm{c}\mathrm{t}\mathrm{s} 1 - uA \mathrm{f}\mathrm{o}\mathrm{r} \mathrm{s}\mathrm{o}\mathrm{m}\mathrm{e} \mathrm{p}\mathrm{r}\mathrm{e}fi\mathrm{x} a. C \mathrm{i}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{i}\mathrm{e}\mathrm{s}
\mathrm{t}\mathrm{h}\mathrm{a}\mathrm{t} uA \mathrm{e}\mathrm{x}\mathrm{t}\mathrm{e}\mathrm{n}\mathrm{d}\mathrm{s} u2 \mathrm{a}\mathrm{n}\mathrm{d}, \mathrm{t}\mathrm{h}\mathrm{u}\mathrm{s}, u2 (a), uA (a) \mathrm{b} \mathrm{o}\mathrm{t}\mathrm{h} \mathrm{d}\mathrm{i}\mathrm{v}\mathrm{e}\mathrm{r}\mathrm{g}\mathrm{e}. \mathrm{S}\mathrm{o}, C\Rightarrow \neg sa . \mathrm{T}\mathrm{h}\mathrm{i}\mathrm{s} \mathrm{p}\mathrm{r}\mathrm{o}\mathrm{o}\mathrm{f} \mathrm{c}\mathrm{a}\mathrm{n} \mathrm{b} \mathrm{e} \mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\mathrm{a}\mathrm{l}\mathrm{i}\mathrm{z}\mathrm{e}\mathrm{d} \mathrm{i}\mathrm{n} \mathrm{P}\mathrm{e}\mathrm{a}\mathrm{n}\mathrm{o} \mathrm{A}\mathrm{r}\mathrm{i}\mathrm{t}\mathrm{h}\mathrm{m}\mathrm{e}\mathrm{t}\mathrm{i}\mathrm{c}, \mathrm{t}\mathrm{h}\mathrm{u}\mathrm{s}
\vdash C \Rightarrow \vdash \neg sa . \mathrm{B}\mathrm{u}\mathrm{t} \vdash \neg sa \mathrm{i}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{i}\mathrm{e}\mathrm{s} uA (a) \mathrm{c}\mathrm{o}\mathrm{n}\mathrm{v}\mathrm{e}\mathrm{r}\mathrm{g}\mathrm{e}\mathrm{s}, \mathrm{s}\mathrm{o} \vdash C \mathrm{c}\mathrm{o}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{d}\mathrm{i}\mathrm{c}\mathrm{t}\mathrm{s} C: \mathrm{C}\mathrm{o}\mathrm{n}\mathrm{s}\mathrm{i}\mathrm{s}\mathrm{t}\mathrm{e}\mathrm{n}\mathrm{c}\mathrm{y} \mathrm{o}\mathrm{f} A \mathrm{i}\mathrm{s} \mathrm{p}\mathrm{r}\mathrm{o}\mathrm{v}\mathrm{a}\mathrm{b}\mathrm{l}\mathrm{e} \mathrm{i}\mathrm{n} A \mathrm{i}\mathrm{f} \mathrm{a}\mathrm{n}\mathrm{d} \mathrm{o}\mathrm{n}\mathrm{l}\mathrm{y} \mathrm{i}\mathrm{f} \mathrm{f}\mathrm{a}\mathrm{l}\mathrm{s}\mathrm{e} !
8
Definition: A function f (x) is constructible if it can be computed in volume V (x) = O(f (x)).
Here are two examples: 2\| x\| is constructible, as V (x) = O(\| x\| log \| x\| ) \ll 2\| x\| .
Yet, 2\| x\| + h(x), where h(x) is 0 or 1, depending on whether U (x) halts within 3\| x\| steps, is not.
Compression Theorem [Rabin 59]. For any constructible function f , there exists a function Pf such
that for all functions t, the following two statements are equivalent:
1. There exists an algorithm A such that A(x) computes Pf (x) in volume t(x) for all inputs x.
2. t is constructible and f (x) = O(t(x)).
Proof. Let t-bounded Kolmogorov Complexity Kt (i| x) of i given x be the length of the shortest pro-
gram p for the Universal Multi-Head Turing Machine transforming x into i with < t volume of computation.
Let Pf (x) be the smallest i, with 2Kt (i| x) > log(f (x)| t) for all t. Pf is computed in volume f by generating
all i of low complexity, sorting them and taking the first missing. It satisfies the Theorem, since computing
i=Pf (x) faster would violate the complexity bound defining it. (Some extra efforts can make P Boolean.)
Speed-up Theorem [Blum 67]. There exists a total computable predicate P such that for any algorithm
computing P (x) in volume t(x), there exists another algorithm doing it in volume O(log t(x)).
Though stated here for exponential speed-up, this theorem remains true with log replaced by any computable
unbounded monotone function. In other words, there is no even nearly optimal algorithm to compute P .
The general case. So, the complexity of some predicates P cannot be characterized by a single con-
structible function f , as in Compression Theorem. However, the Compression Theorem remains true (with
harder proof) if the requirement that f is constructible is dropped (replaced with being computable).4
In this form it is general enough so that every computable predicate (or function) P satisfies the statement
of the theorem with an appropriate computable function f . There is no contradiction with Blum's Speed-up,
since the complexity f (not constructible itself) cannot be reached. See a review in [Seiferas, Meyer 95].
4 \mathrm{T}\mathrm{h}\mathrm{e} \mathrm{p}\mathrm{r}\mathrm{o} \mathrm{o}\mathrm{f} \mathrm{s}\mathrm{t}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{s} \mathrm{i}\mathrm{f} \mathrm{c}\mathrm{o}\mathrm{n}\mathrm{s}\mathrm{t}\mathrm{r}\mathrm{u}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{b}\mathrm{i}\mathrm{l}\mathrm{i}\mathrm{t}\mathrm{y} \mathrm{o}\mathrm{f} f \mathrm{i}\mathrm{s} \mathrm{w}\mathrm{e}\mathrm{a}\mathrm{k}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{d} \mathrm{t}\mathrm{o} \mathrm{b} \mathrm{e}\mathrm{i}\mathrm{n}\mathrm{g} semi-constructible, \mathrm{i}.\mathrm{e}. \mathrm{o}\mathrm{n}\mathrm{e} \mathrm{w}\mathrm{i}\mathrm{t}\mathrm{h} \mathrm{a}\mathrm{n} \mathrm{a}\mathrm{l}\mathrm{g}\mathrm{o}\mathrm{r}\mathrm{i}\mathrm{t}\mathrm{h}\mathrm{m} A(n, x) \mathrm{r}\mathrm{u}\mathrm{n}\mathrm{n}\mathrm{i}\mathrm{n}\mathrm{g}
\mathrm{i}\mathrm{n} \mathrm{v}\mathrm{o}\mathrm{l}\mathrm{u}\mathrm{m}\mathrm{e} O(n) \mathrm{a}\mathrm{n}\mathrm{d} \mathrm{s}\mathrm{u}\mathrm{c}\mathrm{h} \mathrm{t}\mathrm{h}\mathrm{a}\mathrm{t} A(n, x)=f (x) \mathrm{i}\mathrm{f} n>f (x). \mathrm{T}\mathrm{h}\mathrm{e} \mathrm{s}\mathrm{e}\mathrm{t}\mathrm{s} \mathrm{o}\mathrm{f} \mathrm{p}\mathrm{r}\mathrm{o}\mathrm{g}\mathrm{r}\mathrm{a}\mathrm{m}\mathrm{s} t \mathrm{w}\mathrm{h}\mathrm{o}\mathrm{s}\mathrm{e} \mathrm{v}\mathrm{o}\mathrm{l}\mathrm{u}\mathrm{m}\mathrm{e}\mathrm{s} (\mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e} fi\mathrm{n}\mathrm{i}\mathrm{t}\mathrm{e}) \mathrm{s}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{s}\mathrm{f}\mathrm{y} \mathrm{e}\mathrm{i}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}
(1) \mathrm{o}\mathrm{r} (2) \mathrm{o}\mathrm{f} \mathrm{t}\mathrm{h}\mathrm{e} \mathrm{T}\mathrm{h}\mathrm{e}\mathrm{o}\mathrm{r}\mathrm{e}\mathrm{m} (\mathrm{f}\mathrm{o}\mathrm{r} \mathrm{c}\mathrm{o}\mathrm{m}\mathrm{p}\mathrm{u}\mathrm{t}\mathrm{a}\mathrm{b}\mathrm{l}\mathrm{e} P, f ) \mathrm{a}\mathrm{r}\mathrm{e} \mathrm{i}\mathrm{n} \Sigma 02 (\mathrm{i}.\mathrm{e}. \mathrm{d}\mathrm{e}fi\mathrm{n}\mathrm{e}\mathrm{d} \mathrm{w}\mathrm{i}\mathrm{t}\mathrm{h} 2 \mathrm{q}\mathrm{u}\mathrm{a}\mathrm{n}\mathrm{t}\mathrm{i}fi\mathrm{e}\mathrm{r}\mathrm{s}). \mathrm{B}\mathrm{o}\mathrm{t}\mathrm{h} \mathrm{g}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e} \mathrm{m}\mathrm{o}\mathrm{n}\mathrm{o}\mathrm{t}\mathrm{o}\mathrm{n}\mathrm{e} \mathrm{c}\mathrm{l}\mathrm{a}\mathrm{s}\mathrm{s}\mathrm{e}\mathrm{s} \mathrm{o}\mathrm{f}
\mathrm{c}\mathrm{o}\mathrm{n}\mathrm{s}\mathrm{t}\mathrm{r}\mathrm{u}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{b}\mathrm{l}\mathrm{e} \mathrm{f}\mathrm{u}\mathrm{n}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s} \mathrm{c}\mathrm{l}\mathrm{o}\mathrm{s}\mathrm{e}\mathrm{d} \mathrm{u}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{r} \mathrm{m}\mathrm{i}\mathrm{n}(t1 , t2 )/2. \mathrm{T}\mathrm{h}\mathrm{e}\mathrm{n} \mathrm{a}\mathrm{n}\mathrm{y} \mathrm{s}\mathrm{u}\mathrm{c}\mathrm{h} \mathrm{c}\mathrm{l}\mathrm{a}\mathrm{s}\mathrm{s} \mathrm{i}\mathrm{s} \mathrm{s}\mathrm{h}\mathrm{o}\mathrm{w}\mathrm{n} \mathrm{t}\mathrm{o} \mathrm{b}\mathrm{e} \mathrm{t}\mathrm{h}\mathrm{e} \Omega (f ) \mathrm{f}\mathrm{o}\mathrm{r} \mathrm{s}\mathrm{o}\mathrm{m}\mathrm{e} \bfs \bfe \bfm \bfi -\bfc \bfo \bfn \bfs \bft \bfr \bfu \bfc \bft \bfi \bfb \bfl \bfe f .
9
Theorem. Each position of any full information game has a winning strategy for one side.
(This theorem [Neumann, Morgenstern 44] fails for games with partial information: either player
may lose if his strategy is known to the adversary. E.g.: 1. Blackjack (21); 2. Each player picks a bit;
their equality determines the winner.) The game can be solved by playing all strategies against each other.
n n n+1
There are 2n positions of length n, (2n )2 = 2n\times 2 strategies and 2n\times 2 pairs of them. For a 5-bit game
that is 2320 . The proof of this Theorem gives a much faster (but still exponential time!) strategy.
Proof. Make the graph of all \leq \| x\| -bit positions and moves; Set V = 0; reset V = v on T .
Repeat until idle: If V (x) = 0, set V (x) = a(x) supm \{ a(x)V (r(x, m))\} .
The procedure stops with empty V - 1 (0) since | r(x, m)| < | x| in our games keep decreasing.
Games may be categorized by the difficulty to compute r. We will consider only r computable in
linear space O(\| x\| ). Then, the 22\| x\| possible moves can be computed in exponential time, say 23\| x\| .
The algorithm tries each move in each step. Thus, its total running time is 23\| x\| +1 : extremely slow
(2313 for a 13-byte game) but still much faster than the previous (double exponential) algorithm.
Exercise: the Match Game. Consider 3 boxes with 3 matches each: ! ! ! !!! !!! .
The players alternate turns taking any positive number of matches from a single box. One cannot leave the
table empty. Use the above algorithm to evaluate all positions and list the evaluations after each its cycle.
Exercise: Modify the chess game by giving one side the right to make (if it chooses to) an extra
move out of turn during the first 10 moves. Prove that this side have a non-loosing strategy.
5 \mathrm{O}\mathrm{u}\mathrm{r} \mathrm{e}\mathrm{x}\mathrm{a}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e}\mathrm{s} \mathrm{w}\mathrm{i}\mathrm{l}\mathrm{l} \mathrm{a}\mathrm{s}\mathrm{s}\mathrm{u}\mathrm{r}\mathrm{e} ``< | x| "" \mathrm{b}\mathrm{y} \mathrm{i}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{i}\mathrm{c}\mathrm{i}\mathrm{t}\mathrm{l}\mathrm{y} \mathrm{p}\mathrm{r}\mathrm{e}\mathrm{p} \mathrm{e}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{n}\mathrm{g} \mathrm{n}\mathrm{o}\mathrm{n}-\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{a}\mathrm{l} \mathrm{c}\mathrm{o}\mathrm{n}fi\mathrm{g}\mathrm{u}\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s} \mathrm{w}\mathrm{i}\mathrm{t}\mathrm{h} \mathrm{a} \mathrm{c}\mathrm{o}\mathrm{u}\mathrm{n}\mathrm{t}\mathrm{e}\mathrm{r} \mathrm{o}\mathrm{f} \mathrm{r}\mathrm{e}\mathrm{m}\mathrm{a}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{n}\mathrm{g} \mathrm{s}\mathrm{t}\mathrm{e}\mathrm{p}\mathrm{s}.
10
Strategy: If u(x) does indeed halt within 2\| x\| steps, then the initial configuration is true to the compu-
tation of u(x). Then L has an obvious (though hard to compute) winning strategy: just tell truly (and thus
always consistently) what actually happens in the computation. S will lose when t=1 and cannot decrease
any more. If the initial configuration is a lie, S can force L to lie all the way down to t = 1. How?
If the upper box A of a legal configuration is false then the lower boxes Bs cannot all be true, since the
rules of u determine A uniquely from them. If S correctly points the false Bs and brings it to the top on his
move, then L is forced to keep on lying. At time t=1 the lie is exposed: the configuration doesn't match the
actual input string x, i.e. is illegal.
Solving this game amounts to deciding correctness of the initial configuration, i.e. u(x) halting in 2\| x\|
steps: impossible in time o(2\| x\| ). This Halting Game is artificial, still has a BHP flavor, though it does not
refer to exponents. We now reduce it to a nicer game (Linear Chess) to prove it exponentially hard, too.
11
Space-Time Trade-off. Deterministic linear space computations are games where any position has at
most one (and easily computable) move. We know no general superlinear lower bound or subexponential
upper bound for time required to determine their outcome. This is a big open problem.
Recall that on a parallel machine: time is the number of steps until - space
the last processor halts; space is the amount of memory used; vol-
ume is the combined number of steps of all processors. ``Small"" will small time, large space
refer to values bounded by a polynomial of the input length; ``large""
to exponential. Let us call computations narrow if either time or large
space are polynomial, and compact if both (and, thus, volume too) time,
narrow computations
are. An open question: Do all exponential volume algorithms (e.g., small
? space
one solving Linear Chess) allow an equivalent narrow computation? time
Alternatively, can every narrow computation be converted into a compact one? This is
equivalent to the existence of a P-time algorithm for solving any fast game, i.e. a game with a P-time
transition rule and a move counter limiting the number of moves to a polynomial. The sec. 3.1 algorithm
can be implemented in parallel P-time for such games. Converse also holds, similarly to the Halting Game.
[Stockmeyer, Meyer 73] solve compact games in P-space: With M \subset \{ 0, 1\} run depth-first search on the
tree of all games -- sequences of moves. On exiting each node it is marked as the active player's win if
some move leads to a child so marked; else as his opponent's. Children's marks are then erased. Conversely,
compact games can simulate any P-space algorithms. Player A declares the result of the space-k, time-2k
computation. If he lies, player B asks him to declare the memory state in the middle of that time interval,
and so by a k-step binary search catches A's lie on a mismatch of states at two adjacent times. This has
some flavor of trade-offs such as saving time at the expense of space in dynamic programming.
Thus, fast games (i.e. compact alternating computations) correspond to narrow deterministic computa-
tions; general games (i.e. narrow alternating computations) correspond to large deterministic ones.
12
Part II
Mysteries
We now enter Terra Incognita by extending deterministic computations with tools like random choices, non-
deterministic guesses, etc., the power of which is completely unknown. Yet many fascinating discoveries were
made there in which we will proceed to take a glimpse.
1. Linear Programming: Given integer n \times m matrix A and vector b, find a rational vector x with Ax < b.
Note, if n and entries in A have \leq k-bits and x exists then an O(nk)-bit x exists, too.
Solution: The Dantzig's Simplex algorithm finds x quickly for many A.
Some A, however, take exponential time. After long frustrating efforts, a worst case
P-time Ellipsoid Algorithm was finally found in [Yudin and A.S. Nemirovsky 76].
2. Primality test: Determine whether a given integer p has a factor?
Solution: A bad (exponential time) way is to try all 2\| p\| possible integer factors of p.
More sophisticated algorithms, however, run fast (see section 5.1).
3. Graph Isomorphism Problem: Are two given graphs G1 , G2 , isomorphic?
I.e., can the vertices of G1 be re-numbered so that it becomes equal G2 ?
Solution: Checking all n! enumerations of vertices is impractical
(for n = 100, this exceeds the number of atoms in the known Universe).
[Luks 80] found an O(nd ) steps algorithm where d is the degree. This is a P-time for d = O(1).
4. Independent Edges (Matching):
Find a given number of independent (i.e., not sharing nodes) edges in a given graph.
Solution: Max flow algorithm solves a bipartite graph case.
The general case is solved with a more sophisticated algorithm by J. Edmonds.
Many other problems have been battled for decades or centuries and no P-time solution has been found.
Even modifications of the previous four examples have no known answers:
Padding Argument. First, we need to reduce it to some ``standard"" NP problem. An obvious candidate
is the problem ``Is there w : U (v, w) ?"", where U is the universal Turing Machine, simulating F (x, w) for
v = px. But U does not run in P-time, so we must restrict U to u which stops within some P-time limit.
How to make this fixed degree limit sufficient to simulate any polynomial (even of higher degree) time? Let
the TM u(v, w) for v=00 . . . 01px simulate \| v\| 2 steps of U (px, w)=F (x, w). If the padding of 0's in v is
sufficiently long, u will have enough time to simulate F , even though u runs in quadratic time, while F 's
time limit may be, say, cube (of a shorter ``un-padded"" string). So the NP problem F (x, w) is reduced to
u(v, w) by mapping instances x into f (x) = 0 . . . 01px = v, with \| v\| determined by the time limit for F .
Notice that program p for F is fixed. So, if some NP problem cannot be solved in P-time then neither can
be the problem \exists ?w : u(v, w). Equivalently, if the latter IS solvable in P-time then so is any search problem.
We do not know which of these alternatives is true. It remains to reduce the search problem u to Tiling.
Proof. As the input v and the guessed solution w are the same in both the right and the wrong tables,
the first 2 rows agree. The actual computation starts on the third row. Obviously, in the first mismatching
row a transition of some cell from the previous row is wrong. This is visible from the state in both rows of
this cell and the cell it points to, resulting in an impossible combination of four cells sharing a corner.
For a given x, the existence of w satisfying F (x, w) is equivalent to the existence of
a table with the prescribed first row, no halting state, and permissible patterns of each
four adjacent squares (cells). Converting the table into the Tiling Problem: u v
The cells in the table are separated by ``---"" ; the tiles by ``...""; Cut each cell into 4 parts
by a vertical and a horizontal lines through its center and copy cell's content in each v x
part. Combine into a tile each four parts sharing a corner of 4 cells. If these cells are
permissible in the table, then so is the respective tile.
So, any P-time algorithm extending a given first row to the whole table of matching tiles from a given
set could be used to solve any NP problem by converting it to Tiling as shown.
Exercise: Find a polynomial time algorithm for n \times log n Tiling Problem.
15
5 Probability in Computing
5.1 A Monte-Carlo Primality Tester
The factoring problem seems very hard. But to test if a number has factors turns out to be much easier than
to find them. It also helps if we supply the computer with a coin-flipping device. See: [Rabin 80, Miller 76,
Solovay, Strassen 77]. We now consider a Monte Carlo algorithm, i.e. one that with high probability rejects
any composite number, but never a prime.
Residue Arithmetic. p| x means p divides x. x \equiv y (mod p) means p| (x - y). y = (x mod p) denotes the
residue of x when divided by p, i.e. x \equiv y \in [0, p - 1]. Residues can be added, multiplied and subtracted with
the result put back in the range [0, p - 1] via shifting by an appropriate multiple of p. E.g., - x means p - x
for residues mod p. We use \pm x to mean either x or - x.
The Euclidean Algorithm finds gcd(x, y) -- the greatest (and divisible by any other) common divisor of
x and y: gcd(x, 0) = x; gcd(x, y) = gcd(y, (x mod y)), for y > 0. By induction, g= gcd(x, y)=A \ast x - B \ast y,
where integers A=(g/x mod y) and B=(g/y mod x) are produced as a byproduct of Euclid's Algorithm. This
allows division ( mod p) by any r coprime with p, (i.e. gcd(r, p)=1), and operations +, - , \ast , / obey all usual
arithmetical laws. We will need to compute (xq mod p) in polynomial time. We cannot do q>2\| q\| multipli-
i
cations. Instead we compute all numbers xi = (x2i - 1 mod p) = (x2 mod p), i < \| q\| . Then we represent q in
binary, i.e. as a sum of powers of 2 and multiply mod p the needed xi 's.
(p - 1)
Fermat Test. The Little Fermat Theorem for every prime p and x \in [1, \prod p - 1] says: x \equiv 1 (mod p).
Indeed, the sequence (xi mod p) is a permutation of 1, . . . , p - 1. So, 1\equiv ( i<p (xi))/(p - 1)! \equiv xp - 1 (modp).
This test rejects typical composite p. Other composite p can be actually factored by the following tests.
Square Root Test. For each y and prime p, x2 \equiv y (mod p) has at most one pair of solutions \pm x.
2
Proof. Let x, x\prime be two solutions: y \equiv x2 \equiv x\prime 2 (mod p). Then x2 - x\prime = (x - x\prime )(x+x\prime ) \equiv 0 (mod p).
So, p divides (x - x\prime )(x+x\prime ) and, if prime, must divide either (x - x\prime ) or (x+x\prime ).
(Thus either (x \equiv x\prime ) or (x \equiv - x\prime ).) Otherwise p is composite, and gcd(p, x+x\prime ) actually gives its factor.
Miller-Rabin Test T (x, p) completes the Fermat Test: it factors a composite p, given d that
kills Z\ast p (i.e. xd \equiv gcd(x, p)d (mod p) for all x) and a random choice of x. For prime p, d = p - 1.
i
Let d=2k q, with odd q. T sets x0 = (xq mod p), xi = (x2i - 1 mod p) = (x2 q mod p), i \leq k. If xk \not =1 then
gcd(x, p)\not =1 factors p (if d killed x, else Fermat test rejects p=d+1). If x0 =1, or one of xi is - 1, T gives up
for this x. Otherwise xi \in \{ \pm 1\}
/ for some i<k, while (x2i mod p)=xi+1 =1, and the Square Root Test factors p.
First, for each odd composite p, we show that T succeeds with some x, coprime with p. If p = aj , j>1,
then x=(1+p/a) works for Fermat Test: (1+p/a)p - 1 =1+(p/a)(p - 1)+(p/a)2 (p - 1)(p - 2)/2+ . . . \equiv 1 - p/a\not \equiv 1
(mod p), since p| (p/a)2 . Otherwise p=ab, gcd(a, b)=1<a<b. Take the greatest i such that xi \not \equiv 1 for some x
coprime with p. It exists: ( - 1)q \equiv - 1 for odd q. So, (xi )2 \equiv 1 \not \equiv xi (mod p). (Or i=k, so Fermat test works.)
Then x\prime =1+b(1/b mod a)(x - 1) \equiv 1\equiv x\prime i (mod b), while x\prime i \equiv xi (mod a). So, either xi or x\prime i is \not \equiv \pm 1 (mod p).
Now, T (y, p) succeeds with most yi , as it does with xi (or x\prime i ): the function y \mapsto \rightarrow xy is 1-1 and T cannot
fail with both y and xy. This test can be repeated for many randomly chosen y. Each time T fails, we are
twice more sure that p is prime. The probability of 300 failures on a composite p is < 2 - 300 , its inverse
exceeds the number of atoms in the known Universe.
16
Random Inputs to Deterministic Algorithms are analyzed similarly to algorithms that flip coins them-
selves and the two should not be confused. Consider an example: Someone is interested in knowing whether
or not certain graphs contain Hamiltonian Cycles. He offers graphs and pays $100 if we show either that the
graph has or that it has not Hamiltonian Cycles. Hamiltonian Cycle problem is NP-Complete, so it should be
very hard for some, but not necessarily for most graphs. In fact, if our patron chooses the graphs uniformly,
a fast algorithm can earn us the $100 most of the time! Let all graphs have n nodes and, say, d < ln n/2
mean degree and be equally likely. Then we can use the following (deterministic) algorithm:
Output ``No Hamiltonian Cycles"" and collect the $100, if the graph has an isolated node. Otherwise, pass on
that graph and the money. Now, how often do we\surd get our $100. The probability that a given node A of the
graph is isolated is (1 - 1/n)dn > (1\surd - O(1/n))/ n.\surd Thus, the probability that none of n nodes is isolated
(and we lose our $100) is O((1 - 1/ n)n ) = O(1)/e n and vanishes fast. Similar calculations can be made
whenever r = lim(d/ ln n) < 1. If r > 1, other fast algorithms can actually find a Hamiltonian Cycle.
See: [Johnson 84, Karp 76, Gurevich 85]. See also [Levin Venkatesan 18] for a proof that another graph
problem is NP-complete even on average. How do this HC algorithm and the above primality test differ?
• The primality algorithm works for all instances. It tosses the coin itself and can repeat it for a more
reliable answer. The HC algorithms only work for most instances (with isolated nodes or generic HC).
• In the HC algorithms, we must trust the customer to follow the presumed random procedure.
If he cheats and produces rare graphs often, the analysis breaks down.
Symmetry Breaking. Randomness comes into Computer Science in many other ways besides those we
considered. Here is a simple example: avoiding conflicts for shared resources.
Dining Philosophers. They sit at a circular table. Between each pair is either a knife or a fork,
alternating. The problem is, neighboring diners must share the utensils, cannot eat at the same time. How
can the philosophers complete the dinner given that all of them must act in the same way without any
central organizer? Trying to grab the knives and forks at once may turn them into fighting philosophers.
Instead they could each flip a coin, and sit still if it comes up heads, otherwise try to grab the utensils.
If two diners try to grab the same utensil, neither succeeds. If they repeat this procedure enough times,
most likely each philosopher will eventually get both a knife and a fork without interference.
We have no time to actually analyze this and many other scenaria, where randomness is crucial.
Instead we will take a look into the concept of Randomness itself.
17
This reduction of Section 3 games yields a hierarchy of Arthur-Merlin games powers, i.e. the type of
computations that have reductions to Vc (x) of such games and back. The one-player games with randomized
transition rule r running in space linear in the size of initial configuration are equivalent to exponential
time deterministic computations. If instead the running time T of r combined for all steps is limited by a
polynomial, then the games are equivalent to polynomial space deterministic computations.
An interesting twist comes in one move games with polylog T , too tiny to examine the initial configuration
x and the Merlin's move m. But not only this obstacle is removed but the equivalence to NP is achieved with
a little care. Namely, x is set in an error-correcting code, and r is given O(log \| x\| ) coin-flips and random
access to the digits of x, m. Then the membership proof m is reliably verified by the randomized r.
See [Holographic proof] for details and references.
18
6 Randomness
6.1 Randomness and Complexity
Intuitively, a random sequence is one that has the same properties as a sequence of coin flips. But this
definition leaves the question, what are these properties? Kolmogorov resolved these problems with a new
definition of random sequences: those with no description noticeably shorter than their full length. See survey
and history in [Kolmogorov, V.A.Uspenskii 87, Li, Vitanyi 19].
Kolmogorov Complexity KA (x| y) of the string x given y is the length of the shortest program p which
lets algorithm A transform y into x: min\{ (\| p\| ) : A(p, y) = x\} . There exists a Universal Algorithm U such
that, KU (x) \leq KA (x) + O(1), for every algorithm A. This constant O(1) is bounded by the length of the
program U needs to simulate A. We abbreviate KU (x| y) as K(x| y), or K(x) for empty y.
An example: For A : x \mapsto \rightarrow x, KA (x) = \| x\| , so K(x) < KA (x) + O(1) < \| x\| + O(1).
Can we compute K(x) by trying all programs p, \| p\| <\| x\| +O(1) to find the shortest one generating x?
This does not work because some programs diverge, and the halting problem is unsolvable.
In fact, no algorithm can compute K or even any its lower bounds except O(1).
Consider the Berry Paradox expressed in the phrase: ``The smallest integer which cannot
be uniquely and clearly defined by an English phrase of less than two hundred characters.""
There are < 128200 English phrases of < 200 characters. So there must be integers not expressible
by such phrases and the smallest one among them. But isn't it described by the above phrase?
A similar argument proves that K is not computable. Suppose an algorithm L(x) \not = O(1) computes a lower
bound for K(x). We can use it to compute f (n) that finds x with n < L(x) \leq K(x), but K(x) < Kf (x)+O(1)
and Kf (f (n)) \leq \| n\| , so n < K(f (n)) < \| n\| + O(1) = log O(n) \ll n: a contradiction.
So, K and its non-constant lower bounds are not computable.
An important application of Kolmogorov Complexity measures the Mutual Information:
I(x : y) = K(x) + K(y) - K(x, y). It has many uses which we cannot consider here.
Deficiency of Randomness
Some upper bounds of K(x) are close in some important cases. One such case is of x generated at random.
Define its rarity for uniform on \{ 0, 1\} n distribution as d(x) = n - K(x| n) \geq - O(1).
What is the probability of d(x) > i, for uniformly random n-bit x ? There are 2n strings x of length n.
If d(x) > i, then K(x| n) < n - i. There are < 2n - i programs of such length, generating < 2n - i strings.
So, the probability of such strings is < 2n - i /2n = 2 - i (regardless of n)! Even for n = 1, 000, 000,
the probability of d(x) > 300 is absolutely negligible (provided x was indeed generated by fair coin flips).
Small rarity implies all other enumerable properties of random strings. Indeed, let such property ``x\not \in P ""
have a negligible probability and Sn be the number of n-bit strings violating P , so sn = log(Sn ) is small.
To generate x, we need only the algorithm enumerating Sn and the sn -bit position of x in that enumeration.
Then the rarity d(x) > n - (sn +O(1)) is large. Each x violating P will thus also violate the ``small rarity""
requirement. In particular, the small rarity implies unpredictability of bits of random strings: A short al-
gorithm with high prediction rate would assure large d(x). However, the randomness can only be refuted,
cannot be confirmed: we saw, K and its lower bounds are not computable.
Rectification of Distributions. We rarely have a source of randomness with precisely known distribution.
But there are very efficient ways to convert ``roughly"" random sources into perfect ones. Assume, we have
such a sequence with weird unknown distribution. We only know that its long enough (m bits) segments have
min-entropy > k + i, i.e. probability < 1/2k+i , given all previous bits. (Without such m we would not know
a segment needed to extract even one not fully predictable bit.) No relation is required between n, m, i, k,
but useful are small m, i, k and huge n = o(2k /i). We can fold X into an n \times m matrix. We also need a small
m \times i matrix Z, independent of X and really uniformly random (or random Toeplitz,\sqrt{} i.e. with restriction
Za+1,b+1 = Za,b ). Then the n \times i product XZ has uniform with accuracy O( ni/2k ) distribution. This
follows from [Goldreich, Levin 89], which uses earlier ideas of U. and V. Vazirani.
19
6.2 Pseudo-randomness
The above definition of randomness is very robust, if not practical. True random generators are rarely used in
computing. The problem is not that making a true random generator is impossible: we just saw efficient ways
to perfect the distributions of biased random sources. The reason lies in many extra benefits provided by
pseudorandom generators. E.g., when experimenting with, debugging, or using a program one often needs to
repeat the exact same sequence. With a truly random generator, one actually has to record all its outcomes:
long and costly. The alternative is to generate pseudo-random strings from a short seed. Such methods
were justified in [Blum, Micali 84, Yao 82]:
First, take any one-way permutation Fn (x) (see sec. 6.3) with a hard-core bit (see below) Bp (x) which
is easy to compute from x, p, but infeasible to guess from p, n, Fn (x) with any noticeable correlation.
Then take a random seed of three k-bit parts x0 , p, n and Repeat: (Si \leftarrow Bp (xi ); xi+1 \leftarrow Fn (xi ); i\leftarrow i+1).
We will see how distinguishing outputs S of this generator from strings of coin flips would imply the
ability to invert F . This is infeasible if F is one-way. But if P=NP (a famous open problem), no one-way F ,
and no pseudorandom generators could exist.
By Kolmogorov's standards, pseudo-random strings are not random: let G be the generator; s be the
seed, G(s) = S, and \| S\| \gg k = \| s\| . Then K(S) \leq O(1) + k \ll \| S\| , thus violating Kolmogorov's definition.
We can distinguish between truly random and pseudo-random strings by simply trying all short seeds.
However this takes time exponential in the seed length. Realistically, pseudo-random strings will be as good
as a truly random ones if they can't be distinguished in feasible time. Such generators we call perfect.
Theorem: [Yao 82] Let G(s) = S \in \{ 0, 1\} n run in time tG . Let a probabilistic algorithm A in expected
(over internal coin flips) time tA accept G(s) and truly random strings with different by d probabilities.
Then, for random i, one can use A to guess Si from Si+1 , Si+2 , . . . in time tA + tG with correlation d/O(n).
Proof. Let ri be the probability that A accepts S = G(s) modified by replacing its first i digits
with truly random bits. Then r0 is the probability of accepting G(s) and must differ by d from
the probability rn of accepting random string. Then ri - 1 - ri = d/n, for randomly chosen i.
Let R0 and R1 be the probabilities of accepting r0x and r1x for x = Si+1 , Si+2 , . . ., and random (i - 1)-bit r.
Then (R1 +R0 )/2 averages to ri , while RSi = R0 +(R1 - R0 )Si averages to ri - 1 and
(R1 - R0 )(Si - 1/2) to ri - 1 - ri = d/n. So, R1 - R0 has the stated correlation with Si .
If the above generator was not perfect, one could guess Si from the sequence Si+1 , Si+2 , . . .
with a polynomial (in 1/\| s\| ) correlation. But, Si+1 , Si+2 . . . can be produced from p, n, xi+1 .
So, one could guess Bp (xi ) from p, n, F (xi ) with correlation d/n, which cannot be done for hard-core B.
Hard Core. The key to constructing a pseudorandom generator is finding a hard core for a one-way F .
The following B is hard-core for any one-way F , e.g., for Rabin's OWF in sec. 6.3.
[Knuth 97] has more details\sum and references.
Let Bp (x) = (x \cdot p) = ( i xi pi mod 2). [Goldreich, Levin 89] converts any method g of guessing Bp (x)
from p, n, F (x) with correlation \varepsilon into an algorithm of finding x, i.e. inverting F (slower \varepsilon 2 times than g).
Proof. (Simplified with some ideas of Charles Rackoff.) Take k = \| x\| = \| y\| , j = log(2k/\varepsilon 2 ), vi = 0i 10k - i .
Let Bp (x) = p) and b(x, p) = ( - 1)Bp (x) . Assume, for y = Fn (x), g(y, p, w) \in \{ \pm 1\} guesses Bp (x) with
\sum (x \cdot - \| p\|
correlation p 2 b(x, p)gp > \varepsilon , where gp abbreviates g(y, p, w), since w, y are fixed throughout the proof.
(x\cdot p)
( - 1) gp averaged over >2k/\varepsilon 2 random pairwise independent p deviates from its mean (over all p) by
<\varepsilon (and so is >0) with probability > 1 - 1/2k. The same for ( - 1)(x\cdot [p+vi ]) gp+vi = ( - 1)(x\cdot p) gp+vi ( - 1)xi .
Take a random k \times j binary matrix P . The vectors P r, r\in \{ 0, 1\} j \setminus \{ 0j \} are pairwise independent. So, for
a fraction \geq 1 - 1/2k of P , sign( r ( - 1)xP r gP r+vi ) = ( - 1)xi . We could thus find xi for all i with probability
\sum
> 1/2 if we knew z = xP . But z is short: we can try all its 2j possible values and check y = Fn (x) for each !
So the inverter, \sum for a random P and all i, r, computes Gi (r) = gP r+vi . It uses Fast Fourier on Gi to
compute hi (z) = r b(z, r)Gi (r). The sign of hi (z) is the i-th bit for the z-th member of output list.
20
6.3 Cryptography
Rabin's One-way Function. Pick random prime numbers p, q, \| p\| = \| q\| with two last bits =1, i.e. with
odd (p - 1)(q - 1)/4. Then n = pq is called a Blum number. Its length should make factoring infeasible.
Let Qn = (Zn\ast )2 be the set of squares, i.e. quadratic residues (all residues are assumed (mod n)).
Lemma. Let n = pq be a Blum number, F : x \mapsto \rightarrow x2 \in Qn . Then (1) F is a permutation on Qn
and (2) The ability to invert F on random x is equivalent to that of factoring n.
Proof. (1) t=(p - 1)(q - 1)/4 is odd, so u=(t+1)/2 is an integer. Let x=F (z). Both p - 1 and q - 1 divide 2t.
So, by Fermat's little theorem, both p, q (and, thus n) divide xt - 1 \equiv z 2t - 1. Then F (x)u \equiv x2u = xxt \equiv x.
(2) The above y u inverts F . Conversely, let F (A(y)) = y for a fraction \varepsilon of y \in Qn .
Each y \in Qn has x, x\prime \not = \pm x with F (x)=F (x\prime )=y, both with equal chance to be chosen at random.
If F (x) generates y while A(y) = x\prime the Square Root Test (5.1) has both x, x\prime for factoring n.
Such one-way permutations, called ``trap-door"", have many applications; we look at cryptography below.
Picking random primes is easy: they have density 1/O(\| p\| ). Indeed, one can see that 2n
\bigl( \bigr)
n is divisible by
2 i
\bigl( 2n\bigr)
every prime p\in [n, 2n] but by no prime p\in [ 3 n, n] or prime power p >2n. So, (log n )/ log n = 2n/ log n - O(1)
is an upper bound on the number of primes in [n, 2n] and a lower bound on that in [1, 2n] (and in [3n, 6n]
as a simple calculation shows). And fast VLSI exist to multiply long numbers and check primality.
Public Key Encryption. A perfect way to encrypt a message m is to add it mod 2 bit by bit to a random
string S of the same length k. The resulting encryption m \oplus S has the same uniform probability distribution,
no matter what m is. So it is useless for the adversary who wants to learn something about m, without
knowing S. A disadvantage is that the communicating parties must share a secret S as large as all messages
to be exchanged, combined. Public Key Cryptosystems use two keys. One key is needed to encrypt the
messages and may be completely disclosed to the public. The decryption key must still be kept secret, but
need not be sent to the encrypting party. The same keys may be used repeatedly for many messages.
Such cryptosystem can be obtained [Blum, Goldwasser 82] by replacing the above random S by pseudo-
random Si = (si \cdot x); si+1 = (s2i mod n). Here a Blum number n = pq is chosen by the Decryptor and is
\| n\|
public, but p, q are kept secret. The Encryptor chooses x \in Z2 , s0 \in Zn at random and sends x, sk , m\oplus S.
Assuming factoring is intractable for the adversary, S should be indistinguishable from random strings (even
with known x, sk ). Then this scheme is as secure as if S were random. The Decryptor knows p, q and can
compute u, t (see above) and v = (uk - 1 mod t). So, he can find s1 = (svk mod n), and then S and m.
Another use of the intractability of factoring is digital signatures [Rivest, Shamir, Adleman 78, Rabin 79].
Strings x can be released as authorizations of y = (x2 mod n). Verifying x, is easy but the ability of forging
it for generic y is equivalent to that of factoring n.
Go On!
You noticed that most of our burning questions are still open. Take them on!
Start with reading recent results (FOCS/STOC is a good source). See where you can improve them.
Start writing, first notes just for your friends, then the real papers. Here is a little writing advice:
A well written paper has clear components: skeleton, muscles, etc.
The skeleton is an acyclic digraph of basic definitions and statements, with cross-references.
The meat consists of proofs (muscles) each separately verifiable by competent graduate students having to
read no other parts but statements and definitions cited. Intuitive comments, examples and other comfort
items are fat and skin: a lack or excess will not make the paper pretty. Proper scholarly references constitute
clothing, no paper should ever appear in public without! Trains of thought which led to the discovery are
blood and guts: keep them hidden. Metaphors for other vital parts, like open problems, I skip out of modesty.
Writing Contributions. Section 1 was originally prepared by Elena Temin, Yong Gao and Imre Kifor (BU),
others by Berkeley students: 2.3 by Mark Sullivan, 3.1 by Eric Herrmann and Elena Eliashberg, 3.2 by Wayne Fenton
and Peter Van Roy, 3.3 by Carl Ludewig, Sean Flynn, and Francois Dumas, 4.1 by Jeff Makaiwi, Brian Jones and
Carl Ludewig, 4.2 by David Leech and Peter Van Roy, 4.3 by Johnny and Siu-Ling Chan, 5.2 by Deborah Kordon,
6.1 by Carl Ludewig, 6.2 by Sean Flynn, Francois Dumas, Eric Herrmann, 6.3 by Brian Jones.
21
References
[Levin 91] L.Levin. Fundamentals of Computing: a Cheat-List. SIGACT News; Education Forum.
Special 100-th issue, 27(3):89-110, 1996. Errata: ibid. 28(2):80. Earlier version: ibid. 22(1), 1991.
[Kleinberg, Tardos 06] Jon Kleinberg, Eva Tardos. Algorithm design. 2006. Pearson Education.
[Knuth 97] Donald E. Knuth. The Art of Computer Programming. Vol. 1-3. Addison-Wesley, 3d ed., 1997.
New to 3d ed. Sec.3.5.F of v.2 is also on pp. 10, 29-36 of
https://www-cs-faculty.stanford.edu/\~knuth/err2-2e.ps.gz
[Feller 68] William Feller. An Introduction to Probability Theory and Its Applications. Wiley \& Sons, 1968.
[Lang 93] S.Lang. Algebra. 3rd ed. 1993, Addison-Wesley.
[Rogers 67] H. Rogers, Jr. Theory of Recursive Functions and Effective Computability. McGraw-Hill, 1967.
[] References for section 1:
[Barzdin', Kalnin's 74] Ja.M. Barzdin', Ja.Ja. Kalnin's. A Universal Automaton with Variable Structure.
Automatic Control and Computing Sciences. 8(2):6-12, 1974.
[Berlekamp, Conway, Guy 82] E.R.Berlekamp, J.H.Conway, R.K.Guy. Winning Ways. Sec.25. 1982.
[Kolmogorov, Uspenskii 58] A.N. Kolmogorov, V.A. Uspenskii. On the Definition of an Algorithm.
Uspekhi Mat. Nauk 13:3-28, 1958; AMS Transl. 2nd ser. 29:217-245, 1963.
[Schoenhage 80] A. Schoenhage. Storage Modification Machines. SIAM J. on Computing 9(3):490-508, 1980.
[Ofman 65] Yu. Ofman. A Universal Automaton. Trans. of the Moscow Math. Soc., pp.200-215, 1965.
[] Section 2:
[Blum 67] M. Blum. A machine-independent theory of the complexity of recursive functions. JACM 14, 1967.
[Davis 65] M. Davis, ed. The Undecidable. Hewlett, N.Y. Raven Press, 1965.
(The reprints of the original papers of K.Goedel, A.Turing, A.Church, E.Post and others).
[Ikeno 58] Shinichi Ikeno. A 6-symbol 10-state Universal Turing Machine.
Proceedings, Institute of Electrical Communications, Tokyo, 1958.
[Seiferas, Meyer 95] Joel I. Seiferas, Albert R. Meyer. Characterization of Realizable Space Complexities.
Annals of Pure and Applied Logic 73:171-190, 1995.
[Rabin 59] M.O. Rabin. Speed of computation of functions and classification of recursive sets. Third Con-
vention of Sci.Soc. Israel, 1959, 1-2. Abst.: Bull. of the Research Council of Israel, 8F:69-70, 1959.
[Tseitin 56] G.S. Tseitin. Talk: seminar on math. logic, Moscow university, 11/14, 11/21, 1956.
Also pp. 44-45 in: S.A. Yanovskaya, Math. Logic and Foundations of Math.,
Math. in the USSR for 40 Years, 1:13-120, 1959, Moscow, Fizmatgiz, (in Russian).
[] Section 3:
[Neumann, Morgenstern 44] J. v.Neumann, O. Morgenstern. Theory of Games and Economic Behavior.
Princeton Univ. Press, 1944.
[Stockmeyer, Meyer 73] L.Stockmeyer, A.Meyer. Word problems requiring exponential time. STOC-1973
[Chandra, Kozen, Stockmeyer 81] Ashok K. Chandra, Dexter C. Kozen, Larry J. Stockmeyer. Alternation.
J. ACM, 28(1):114-133, 1981.
[Robson 83, 84] J.M. Robson. N by N checkers is EXPTIME-complete. SIAM J. Comput 13(2), 1984.
Also: The complexity of Go. Proc. 1983 IFIP World Computer Congress, p. 413-417.
[Fraenkel, Lichtenstein 81] A.S. Fraenkel, D. Lichtenstein. Computing a perfect strategy for n \times n chess
requires time exponential in n. J. Combin. Theory (Ser. A) 31:199-214. ICALP-1981.
22
[] Section 4:
[Savitch 70] W.J. Savitch. Relationships between nondeterministic and deterministic tape complexities.
J. Comput. Syst. Sci. 4:177-190, 1970.
[Yudin and A.S. Nemirovsky 76] D.B. Yudin and A.S. Nemirovsky. Informational Complexity and Effective
Methods for Solving Convex Extremum Problems. Economica i Mat. Metody 12(2):128-142;
transl. MatEcon 13:3-25, 1976.
[Luks 80] E.M. Luks: Isomorphism of Graphs of Bounded Valence Can Be Tested in Polynomial Time.
FOCS-1980.
[Garey, Johnson 79] M.R.Garey, D.S.Johnson. Computers and Intractability. W.H.Freeman \& Co. 1979.
[Trakhtenbrot 84] B.A.Trakhtenbrot. A survey of Russian approaches to Perebor (brute-force search)
algorithms. Annals of the History of Computing, 6(4):384-400, 1984.
[] Section 5:
[Rabin 80] M.O.Rabin. Probabilistic Algorithms for Testing Primality. J. Number Theory, 12: 128-138, 1980.
[Miller 76] G.L.Miller. Riemann's Hypothesis and tests for Primality. J. Comp. Sys. Sci. 13(3):300-317, 1976.
[Solovay, Strassen 77] R. Solovay, V. Strassen. A fast Monte-Carlo test for primality. SIComp 6:84-85, 1977.
[Karp 86] R. Karp. Combinatorics, Complexity and Randomness. (Turing Award Lecture)
Communication of the ACM, 29(2):98-109, 1986.
[Johnson 84] David S. Johnson. The NP-Completeness Column. J. of Algorithms 5:284-299, 1984.
[Karp 76] R. Karp. The probabilistic analysis of some combinatorial search algorithms.
Algorithms and Complexity. (J.F.Traub, ed.) pp. 1-19. Academic Press, NY 1976.
[Gurevich 85] Y. Gurevich, Average Case Complexity. Internat. Symp. on Information Theory, IEEE, 1985.
[Levin Venkatesan 18] Leonid A Levin, Ramarathnam Venkatesan. An average case NP-complete graph col-
oring problem. Combinatorics, Probability, and Computing, 27(5), 2018. https://arxiv.org/abs/cs/0112001
[Shamir 90] A. Shamir. IP = PSPACE. JACM 39/4:869-877, 1992.
[Fortnow, Lund 93] Lance Fortnow, Carsten Lund. Interactive proof systems and alternating time---space
complexity. Theor.Comp.Sci. 113(1):55-73, 1993. https://doi.org/10.1016/0304-3975(93)90210-K
[Holographic proof] Holographic proof. The Encyclopedia of Mathematics, Supplement II, Hazewinkel, M.
(Ed.), Kluwer, 2000. https://encyclopediaofmath.org/wiki/Holographic\.proof
[] Section 6:
[Kolmogorov, V.A.Uspenskii 87] A.N.Kolmogorov, V.A.Uspenskii. Algorithms and Randomness. Theoria
Veroyatnostey i ee Primeneniya = Theory of Probability and its Applications, 3(32):389-412, 1987.
[Li, Vitanyi 19] M. Li, P.M.B. Vitanyi. Introduction to Kolmogorov Complexity and its Applications.
Springer Verlag, New York, 2019.
[Blum, Micali 84] M.Blum, S.Micali. How to generate Cryptographically Strong Sequences. SICOMP,
13, 1984.
[Yao 82] A. C. Yao. Theory and Applications of Trapdoor Functions. FOCS-1982.
[Goldreich, Levin 89] O.Goldreich, L.Levin. A Hard-Core Predicate for all One-Way Functions. STOC-1989.
[Rivest, Shamir, Adleman 78] R.Rivest, A.Shamir, L.Adleman. A Method for Obtaining Digital Signature
and Public-Key Cryptosystems. Comm. ACM, 21:120-126, 1978.
[Blum, Goldwasser 82] M. Blum, S. Goldwasser.
An Efficient Probabilistic Encryption Scheme Hiding All Partial Information. Crypto-1982.
[Rabin 79] M. Rabin. Digitalized Signatures as Intractable as Factorization. MIT/LCS/TR-212, 1979.