17-Matrix Sketching
17-Matrix Sketching
material useful for giving your own lectures. Feel free to use these slides verbatim, or to
modify them to fit your own needs. If you make use of a significant portion of these slides
in your own lecture, please include this message, or a link to our web site: http://www.mmds.org
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2
¡ Think of data as A ∈ ℝ𝒏×𝒅 containing n row
vectors in Rd , and typically 𝑛 ≫ 𝑑
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 4
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 6
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 7
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 8
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 9
¡ We saw that SVD computes the best rank-k
approximation to A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 10
¡ SVD computes the best rank-k approximation
to A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 11
¡ SVD computes the best rank-k approximation
to A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 12
¡ Can we compute rank-k approximation in
streaming setting?
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 13
¡ Every element of the stream is a row vector of
fixed d-dimension.
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 15
¡ Streaming data such as any time series data:
§ ecommerce purchases
§ Traffic sensors
§ Activity logs
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 17
¡ B is a sketch of a streaming matrix A iff
§ B is of a fixed small size
that fts in memory
§ At any point in stream,
B approximates A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 18
¡ Almost any matrix sketching methods in
streaming setting falls into one of these
categories:
1. Row sampling based
2. Random projection based and Hashing
3. Iterative sketching
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 22
¡ An Intuitive way to define “importance” of an item:
§ the weight associated to the item, e.g.
§ file records à weights as size of the file,
§ IP addresses à weights as number of times the IP address
makes a request
¡ why it is necessary to sample important items?
§ Consider a set of weighted items S = {(a1, w1),(a2, w2), ··· ,(an, wn)}
that we want to summarize with a small & representative sample.
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 23
¡ This is achievable with a sample set of size one!
§ we sample any item (aj , wj) with an arbitrary fixed
probability p,
§ and rescale it to have weight Ws/p.
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 24
¡ Row sampling based on L2 norm:
§ Sample with probability
§ Rescale rows of B by 1/ 𝑙 𝑝(
§ We can show that 𝐸[ 𝐵 𝐹 ]= 𝐴 𝐹
§ And it is proved that if we sample
rows, then:
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 25
¡ Row sampling based on L2 norm:
§ CUR method: samples rows/columns with
probability = squared norm of rows/columns
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 26
¡ Row sampling based on L2 norm:
§ CUR method: samples rows/columns with
probability = squared norm of rows/columns
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 27
¡ Row sampling based on L2 norm:
§ CUR method: samples rows/columns with
probability = squared norm of rows/columns
𝒌 𝒍𝒐𝒈 𝒌
§ Error guarantee: If we sample 𝒄 = 𝑶
𝜺𝟐
𝒌 𝒍𝒐𝒈 𝒌
columns and r= 𝑶 rows, then
𝜺𝟐
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 29
¡ Key idea: if points in a vector space are
projected onto a randomly selected subspace
of suitably high dimension, then the distances
between points are approximately preserved
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33
¡ A simpler construction for S ∈ ℝ"×$ is:
§ to have entries as independent random variables
with the standard normal distribution
*
S= [matrix with entries draw from N(0,1)]
+
*
x
+
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 34
¡ Another construction for S ∈ ℝ"×$ is:
*
S= [entries as independent +/-1 random var]
+
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 36
¡ Depending on JLT construction, we achieve
different error bounds:
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 37
¡ Computationally efficient
¡ Sufficiently accurate in practice
¡ A great pre-processing step in applications
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 38
¡ Use matrix S that contains one ±1 per column
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 42
¡ State of the art method in this group is called
“Frequent Directions”
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 43
¡ Suppose there is a stream of items, and we
want to find frequency f(i) of each item
universal
set
stream
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 44
¡ If I keep d counters, I can count frequency of
every item...
§ But it’s not good enough (IP addresses, queries,...)
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 45
¡ Let’s keep 𝑙 counters where 𝑙 ≪ 𝑑
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 46
¡ If a new item arrives in the stream that is
already in the counters, we add 1 to its count
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 47
¡ If the new item is not in the counters and we
have space, we create a counter for it and set
it to 1
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 48
¡ But what if we don’t have space for it?
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 49
¡ Let 𝛿 be the median counter at time t
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 50
¡ Decrease all counters by 𝛿 (or set it to zero if
less than 𝛿)
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 51
¡ Now we have space for new item, so we
continue...
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 52
¡ At any time in the stream, the approximated
counts for items are what we have kept so far
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 53
¡ This method undercounts
0 ≤ 𝑓′(𝑖) ≤ 𝑓(𝑖)
¡ We decrease each counter by at most 𝛿,
𝑓 % 𝑖 ≥ 𝑓 𝑖 − C 𝛿,
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 54
¡ Misra-Gries produces a non-zero
approximated frequency 𝒇′(𝒊) for all items
that their true frequency 𝒇 𝒊 is higher than
𝟐𝒏/𝒍 ,
¡ 𝒇 𝒊 − 𝟐𝒏/𝒍 ≤ 𝒇′(𝒊)
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 56
𝑆5 ← [ 𝑆#$ −𝑆%/$
$
, 𝑆$$ −𝑆%/$
$
, … 0, … , 0]
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 57
ai
𝑆5 ← [ 𝑆#$ −𝑆%/$
$ , 𝑆$$ −𝑆%/$
$ … 0, … , 0]
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 58
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 59
𝑆5 ← [ 𝑆#$ −𝑆%/$
$ , 𝑆$$ −𝑆%/$
$ … 0, … , 0]
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 60
𝑆5 ← [ 𝑆#$ −𝑆%/$
$
, 𝑆$$ −𝑆%/$
$
… 0, … , 0]
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 61
¡ Similar to the frequent items case, this
method has the following error guarantee:
2 (
𝐴𝑇𝐴 − 𝐵𝑇𝐵 ≤ 𝐴 /
𝑙
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 62
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 63
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 64
¡ Matrix Sketching in Streams:
§ Row sampling methods
§ CUR
§ L2 norm based sampling
§ Random projection methods
§ Johnson Lindenstrauss Transform (JLT)
§ Different ways to construct a JLT matrix
§ Iterative sketching methods
§ Misra-Gries algorithm for frequent items
§ Frequent Directions method (state of the art)
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 65