0% found this document useful (0 votes)

47 views7 pages

1 Di Erence Between Grad and Undergrad Algorithms: Lecture 1: Course Intro and Hashing

This document discusses hashing algorithms. It introduces the basics of hashing including hash tables and dealing with collisions. It then discusses assumptions made about inputs and outputs. Next, it defines what is meant by random hash functions and different levels of independence. It concludes by defining 2-universal hash families and providing an example construction.

Uploaded by

ylw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views7 pages

1 Di Erence Between Grad and Undergrad Algorithms: Lecture 1: Course Intro and Hashing

Uploaded by

ylw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

princeton univ.

F’13 cos 521: Advanced Algorithm Design

Lecture 1: Course Intro and Hashing
Lecturer: Sanjeev Arora Scribe:Sanjeev

Algorithms are integral to computer science and every computer scientist (even as an
undergrad) has designed several algorithms. So has many a physicist, electrical engineer,
mathematician etc. This course is meant to be your one-stop shop that teaches you how
to design a variety of algorithms. The operative word is “variety. ”In other words you will
avoid the blinders that one often sees in domain experts. A bayesian needs priors on the
data before he can design algorithms; an optimization expert wishes to cast all problems
as mathematican optimization; a systems designer has never seen any problem that cannot
be solved by hashing. (OK, mostly kidding but the joke does reflect truth to some degree.)
These and more domain-specific ideas make an appearance in our course, but we will learn
to not be wedded to any single approach.
The primary skill you will learn in this course is how to analyse algorithms: prove their
correctness and their running time and any other relevant properties. Learning to analyse a
variety of algorithms (designed by others) will let you design better algorithms later in life.
I will try to fill the course with beautiful algorithms. Be prepared for frequent rose-smelling
stops, in other words.

1 Di↵erence between grad and undergrad algorithms

Undergrad algorithms is largely about algorithms discovered before 1990; grad algorithms
is a lot about algorithms discovered since 1990. OK, I picked 1990 as an arbitrary cuto↵.
Maybe it is 1985, or 1995. What happened in 1990 that caused this change, you may
ask? Nothing. It was no single event but just a gradual shift in the emphasis and goals of
computer science as it became a more mature field.
In the first few decades of computer science, algorithms research was driven by the goal of
understanding how to design basic components of a computer: operating systems, compilers,
networks, etc. Other motivations to study algorithms came out of discrete mathematics,
operations research, graph theory. Thus in undergraduate algorithms you would study
data structures, graph traversal, string matching, parsing, network flows, etc. Starting
around 1990 theoretical computer science broadened its horizons and started looking at new
problems: algorithms for bioinformatics, algorithms and mechanism design for e-commerce,
algorithms to understand big data or big networks. This changed algorithms research and
the change is ongoing. One big change is that it is often unclear what the algorithmic problem
even is. Identifying it is part of the challenge. Thus good modeling is important. This in
turn is shaped by understanding what is possible (given our understanding of computational
complexity) and what is reasonable given the limitations of the type of inputs we are given.

1
2

Some examples of this change:

The changing graph. In undergrad algorithms the graph is given and arbitrary (worst-
case). In grad algorithms we are willing to look at where the graph came from (social
network, computer vision etc.) since those properties may be germane to designing a good
algorithm. (This is not a radical idea of course but we will see that formulating good graph
models is not easy. This is why you see a lot of heuristic work in practice, without any
mathematical proofs of correctness.)

Changing data structures: In undergrad algorithms the data structures were simple
and often designed to hold data generated by other algorithms. A stack allows you to hold
vertices during depth-first search traversal of a graph, or instances of a recursive call to a
procedure. A heap is useful for sorting and searching.
But in the newer applications, data often comes from sources we don’t control. Thus it
may be noisy, or inexact, or both. It may be high dimensional. Thus something like heaps
will not work, and we need more advanced data structures.
We will encounter the “curse of dimensionality”which constrains algorithm design for
high-dimensional data.

Changing notion of input/output: Algorithms in your undergrad course have a simple

input/output model. But increasingly we see a more nuanced interpretation of what the
input is: datastreams (useful in analytics involving routers and webservers), online (sequence
of requests), social network graphs, etc. And there is a corresponding subtlety in settling
on what an appropriate output is, since we have to balance output quality with algorithmic
efficiency. In fact, design of a suitable algorithm often goes hand in hand with understanding
what kind of output is reasonable to hope for.

Type of analysis: In undergrad algorithms the algorithms were often exact and work on
all (i.e., worst-case) inputs. In grad algorithms we are willing to relax these requirements.
Advanced Algorithm Design: Hashing
Lectured by Prof. Moses Charikar
Transcribed by Linpeng Tang∗
Feb 2nd, 2013

1 Preliminaries
In hashing, we want to store a subset S of a large universe U (U can be very
large, say |U | = 232 is the set of all 32 bit integers). And |S| = m is a relatively
small subset. For each x ∈ U , we want to support 3 operations:
• insert(x). Insert x into S.
• delete(x). Delete x from S.
• query(x). Check whether x ∈ S.

U
h
n elements

Figure 1: Hash table. x is placed in T [h(x)].

A hash table can support all these 3 operations. We design a hash function
h : U −→ {0, 1, . . . , n − 1} (1.1)
such that x ∈ U is placed in T [h(x)], where T is a table of size n.
Since |U | $ n, multiple elements can be mapped into the same location in
T , and we deal with these collisions by constructing a linked list at each location
in the table.
One natural question to ask is: how long is the linked list at each location?
We make two kinds of assumptions:
∗ [email protected]

1
1. Assume the input is the random.

2. Assume the input is arbitrary, but the hash function is random.

Assumption 1 may not be valid for many applications, since the input might
be correlated.
For Assumption 2, we construct a set of hash functions H, and for each
input, we choose a random function h ∈ H and hope that on average we will
achieve good performance.

2 Hash Functions
Say we have a family of hash functions H, and for each h ∈ H, h : U −→ [n]1 ,
what do mean by saying these functions are random?
For any x1 , x2 , . . . , xm ∈ S (xi $= xj when i $= j), and any a1 , a2 , . . . , am ∈
[n], ideally a random H should satisfy:
1
• Prh∈H [h(x1 ) = a1 ] = n.
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ] = n2 . Pairwise independence.
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ∧ · · · ∧ h(xk ) = ak ] = nk
. k-wise indepen-
dence.
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ∧ · · · ∧ h(xm ) = am ] = n1m . Full indepen-
dence (note that |U | = m). In this case we have nm possible h (we store
h(x) for each x ∈ U ), so we need m log n bits to represent the each hash
function. Since m is usually very large, this is not practical.

For any x, let Lx be the length of the linked list containing x, then Lx is just
the number of elements with the same hash value as x. Let random variable
!
1 if h(y) = h(x),
Iy = (2.1)
0 otherwise.
"
So Lx = 1 + y"=x Iy , and
# m−1
E[Lx ] = 1 + E[Iy ] = 1 + (2.2)
n
y"=x

Note that we don’t need full independence to prove this property, and pairwise
independence would actually suffice.
1 We use [n] to denote the set {0, 1, . . . , n − 1}

2
3 2-Universal Hash Families
Definition 3.1 (Cater Wegman). Family H of hash functions is 2-universal if
for any x != y ∈ U ,
1
Prh∈H [h(x) = h(y)] ≤ (3.1)
n
Note that this property is even weaker than 2 independence.
We can design 2-universal hash families in the following way. Choose a prime
p ∈ {|U |, . . . , 2|U |}, and let

fa,b (x) = ax + b mod p (a, b ∈ [p], a != 0) (3.2)

And let
ha,b (x) = fa,b (x) mod n (3.3)

Lemma 3.2. For any x1 != x2 and s != t, the following system

ax1 + b = s mod p (3.4)

ax2 + b = t mod p (3.5)

has exactly one solution.

Since [p] constitutes a finite field, we have that a = (x1 − x2 )−1 (s − t) and
b = s − ax1 . Since we have p(p − 1) different hash functions in H in this case,
1
Prh∈H [h(x1 ) = s ∧ h(x2 ) = t] = (3.6)
p(p − 1)

Claim 3.3. H = {ha,b : a, b ∈ [p] ∧ a != 0} is 2-universal.

Proof. For any x1 != x2 ,

Pr[ha,b (x1 ) = ha,b (x2 )] (3.7)

!
= δ(s=t mod n) Pr[fa,b (x1 ) = s ∧ fa,b (x2 ) = t] (3.8)
s,t∈[p],s#=t
1 !
= δ(s=t mod n) (3.9)
p(p − 1)
s,t∈[p],s#=t

1 p(p − 1)
≤ (3.10)
p(p − 1) n
1
= (3.11)
n
where δ is the Dirac delta function. Equation (3.10) follows because for each
s ∈ [p], we have at most (p − 1)/n different t such that s != t and s = t
mod n.

3
Can we design a collision free hash table then? Say we have m elements,
and the hash table is of size n. Since for any x1 != x2 , Prh [h(x1 ) = h(x2 )] ≤ n1 ,
the expected number of total collisions is just
! ! " #
m 1
E[ h(x1 ) = h(x2 )] = E[h(x1 ) = h(x2 )] ≤ (3.12)
2 n
x1 !=x2 x1 !=x2

Let’s pick m ≥ n2 , then

1
E[number of collisions] ≤ (3.13)
2
and so
1
Prh∈H [∃ a collision] ≤ (3.14)
2
So if the size the hash table is large enough m ≥ n2 , we can easily find a
collision free hash functions. But in reality, such a large table is often unrealistic.
We may use a two-layer hash table to avoid this problem.

0
1

si elements
i
s2i locations

n−1

Figure 2: Two layer hash tables.

Specifically, let si denote the number of collisions at location i. If we can

construct a second layer table of size s2i , we can easily find a collision-free hash
table to store
$m−1all the si elements. Thus the total size of the second-layer hash
tables is i=0 s2i .
$m−1
Note that i=0 si (si − 1) is just the number of collisions calculated in
Equation (3.12), so
! ! ! m(m − 1)
E[ s2i ] = E[ si (si − 1)] + E[ si ] = + m ≤ 2m (3.15)
i i i
n

4 Load Balance
In load balance problem, we can imagine that we are trying to put balls into
bins. If we have n balls and n bins, and we randomly put the balls into bins,

4
then for a give i,
! "
n 1 1
Pr[bini gets more than k elements] ≤ · k ≤ (4.1)
k n k!

By Stirling’s formula,
√ k
k! ∼ 2nk( )k (4.2)
e
If we choose k = O( logloglogn n ), we can let 1
k! ≤ 1
n2 . Then

1 1
Pr[∃ a bin ≥ k balls] ≤ n · 2
= (4.3)
n n
12
So with probability larger than 1 − n ,

log n
max load ≤ O( ) (4.4)
log log n
Note that if we look at 2 random bins when a new ball comes in and put
the ball in the bin with fewer balls, we can achieve maximal load at the scale of
O(log log n), which is a huge improvement.

2 this 1
can be easily improve to 1 − nc
for any constant c

AQA GCSE CompSci Textbook 8525
100% (1)
AQA GCSE CompSci Textbook 8525
267 pages
Step by Step Guide To Critiquing Research Part 2
No ratings yet
Step by Step Guide To Critiquing Research Part 2
7 pages
You-Are-The-Placebo-Making-Your-Mind-Matter-Pdf 22
No ratings yet
You-Are-The-Placebo-Making-Your-Mind-Matter-Pdf 22
1 page
A Graduate Course in Algorithm Design and Analysis
No ratings yet
A Graduate Course in Algorithm Design and Analysis
155 pages
Intro To Algorithms
No ratings yet
Intro To Algorithms
46 pages
Daa Course File Final 2012
No ratings yet
Daa Course File Final 2012
35 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
103 pages
Design and Analysis of Algorithms Cho
No ratings yet
Design and Analysis of Algorithms Cho
12 pages
Introduction To Algorithms and Problem Solving
No ratings yet
Introduction To Algorithms and Problem Solving
20 pages
Algorithms and Complexity 2
No ratings yet
Algorithms and Complexity 2
143 pages
Advance Analysis of Algorithms
No ratings yet
Advance Analysis of Algorithms
26 pages
Class 1 - Introduction To Algorithm
No ratings yet
Class 1 - Introduction To Algorithm
14 pages
01 Lecture One
No ratings yet
01 Lecture One
27 pages
2 Algorithm
No ratings yet
2 Algorithm
5 pages
MA214 Lecture Slides 0
No ratings yet
MA214 Lecture Slides 0
30 pages
Introduction To Algorithm Analysis: Ms. Andleeb Yousaf Khan FALL 2019
No ratings yet
Introduction To Algorithm Analysis: Ms. Andleeb Yousaf Khan FALL 2019
29 pages
Lecture1 IO BLG336E 2022
No ratings yet
Lecture1 IO BLG336E 2022
87 pages
No 5
No ratings yet
No 5
6 pages
An Open Guide To Data Structures and Algorithms 1698234730
No ratings yet
An Open Guide To Data Structures and Algorithms 1698234730
350 pages
00intro 2x2 PDF
No ratings yet
00intro 2x2 PDF
7 pages
Wa0003.
No ratings yet
Wa0003.
309 pages
1 Introduction
No ratings yet
1 Introduction
32 pages
Algorithm and Design
No ratings yet
Algorithm and Design
6 pages
Daa Courseoutline
No ratings yet
Daa Courseoutline
5 pages
SYBSC Computer Science SEM 4 Fundamental of Algorithms
No ratings yet
SYBSC Computer Science SEM 4 Fundamental of Algorithms
235 pages
ADA m1
No ratings yet
ADA m1
129 pages
Syllabus: Course Title Faculty/Institute
No ratings yet
Syllabus: Course Title Faculty/Institute
2 pages
Materials
No ratings yet
Materials
74 pages
Lecture 1 DAA
No ratings yet
Lecture 1 DAA
52 pages
Algo Outline
No ratings yet
Algo Outline
2 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
CS F232 - Fdsa
No ratings yet
CS F232 - Fdsa
4 pages
ADA LAB Manual CE
No ratings yet
ADA LAB Manual CE
41 pages
Cds AIDS DAA
No ratings yet
Cds AIDS DAA
7 pages
Design & Analysis of Algorithms - Topic 1 - Introduction To Course
No ratings yet
Design & Analysis of Algorithms - Topic 1 - Introduction To Course
29 pages
Unit 1 (Daa)
No ratings yet
Unit 1 (Daa)
58 pages
Design & Analysis of Algorithm
No ratings yet
Design & Analysis of Algorithm
35 pages
Algorithms - Rodney R Howell
No ratings yet
Algorithms - Rodney R Howell
611 pages
Handout Preparation Model
No ratings yet
Handout Preparation Model
10 pages
Lecture 1
No ratings yet
Lecture 1
18 pages
Lec 1-ds Algo
No ratings yet
Lec 1-ds Algo
38 pages
Levi Mungai Kariuki - Sma 461 Term Paper
No ratings yet
Levi Mungai Kariuki - Sma 461 Term Paper
5 pages
Combined Handouts CS702 (WORD Format)
No ratings yet
Combined Handouts CS702 (WORD Format)
382 pages
Lecture - 01 - Introduction To Algorithms
No ratings yet
Lecture - 01 - Introduction To Algorithms
29 pages
DAA Lesson Plan 19 20
No ratings yet
DAA Lesson Plan 19 20
9 pages
AYCHEW
No ratings yet
AYCHEW
6 pages
Final Lesson Plan DAA
No ratings yet
Final Lesson Plan DAA
13 pages
CS302 Design and Analysis of Algorithm: Week-1a
No ratings yet
CS302 Design and Analysis of Algorithm: Week-1a
39 pages
Lec01-Introduction and Overview
No ratings yet
Lec01-Introduction and Overview
45 pages
Disign and Analysis of Algorith - Overview
No ratings yet
Disign and Analysis of Algorith - Overview
23 pages
AAA - Lec 1
No ratings yet
AAA - Lec 1
16 pages
Analysis of Algorithms
No ratings yet
Analysis of Algorithms
5 pages
Advanced Analysis of Algorithms: Lecture # 1
No ratings yet
Advanced Analysis of Algorithms: Lecture # 1
19 pages
Week1 Programming Challenges 3
No ratings yet
Week1 Programming Challenges 3
9 pages
DSAD Regular HO
No ratings yet
DSAD Regular HO
6 pages
Daa Syllabus
No ratings yet
Daa Syllabus
3 pages
Alogrithms and Data Structure Syllabus
No ratings yet
Alogrithms and Data Structure Syllabus
7 pages
Bits F232
No ratings yet
Bits F232
3 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Sociology - Mid Term 1 24-25
No ratings yet
Sociology - Mid Term 1 24-25
5 pages
Meaning of Self
No ratings yet
Meaning of Self
3 pages
Autoethnography
No ratings yet
Autoethnography
23 pages
Technological Influence On Society.
No ratings yet
Technological Influence On Society.
13 pages
Chapter 4 The Business Research Process
No ratings yet
Chapter 4 The Business Research Process
49 pages
COURSE SYLLABUS in Literary Criticism
100% (1)
COURSE SYLLABUS in Literary Criticism
5 pages
University School of Business Bachelor of Business Administration Human Resource Management BAT 153
No ratings yet
University School of Business Bachelor of Business Administration Human Resource Management BAT 153
13 pages
Elev8 BalvinderPowar Jan23 Session 2
No ratings yet
Elev8 BalvinderPowar Jan23 Session 2
44 pages
HF ch1
No ratings yet
HF ch1
30 pages
In-Depth Analysis of Workplace Accidents in Food Processing Company in The Philippines
No ratings yet
In-Depth Analysis of Workplace Accidents in Food Processing Company in The Philippines
9 pages
(PDF) (David Nunan) Research Methods in Language Learning Yeferson Galeano - Academia - Edu
No ratings yet
(PDF) (David Nunan) Research Methods in Language Learning Yeferson Galeano - Academia - Edu
49 pages
Chapter 1 - Research
No ratings yet
Chapter 1 - Research
8 pages
GE5 Syllabus
No ratings yet
GE5 Syllabus
6 pages
6380 18924 1 PB
No ratings yet
6380 18924 1 PB
21 pages
Tutorial 2 Decision Makers
No ratings yet
Tutorial 2 Decision Makers
7 pages
CHAPTER 3-Seizing Business Opportunities
No ratings yet
CHAPTER 3-Seizing Business Opportunities
23 pages
Summary Chapter 9
No ratings yet
Summary Chapter 9
3 pages
SEminar 1
No ratings yet
SEminar 1
2 pages
Mid-Term Test
No ratings yet
Mid-Term Test
3 pages
Harmonizing SHS Subject Offerings
No ratings yet
Harmonizing SHS Subject Offerings
23 pages
Practical Research 2: Quarter 1: Module 1-4
No ratings yet
Practical Research 2: Quarter 1: Module 1-4
31 pages
Batonlapoc, Botolan, Zambales: Polytechnic College of Botolan
No ratings yet
Batonlapoc, Botolan, Zambales: Polytechnic College of Botolan
3 pages
Jurnal Perilaku Organisasi
No ratings yet
Jurnal Perilaku Organisasi
16 pages
Meet 5.1 Qualitative Data Analysis
No ratings yet
Meet 5.1 Qualitative Data Analysis
21 pages
AI-AG-Day-1-21st Feb
No ratings yet
AI-AG-Day-1-21st Feb
21 pages
Attachment in The Student-Teacher Relationship As
No ratings yet
Attachment in The Student-Teacher Relationship As
23 pages
Edci 212 QUESTIONS
No ratings yet
Edci 212 QUESTIONS
6 pages
Nursing Care Study
No ratings yet
Nursing Care Study
20 pages