0% found this document useful (0 votes)

61 views65 pages

17-Matrix Sketching

The document provides guidance for other teachers on using the lecture slides. It states that others are welcome to use the slides verbatim or modify them as needed for their own lectures. It requests that if a significant portion of the slides are used, that this message or a link to the website be included.

Uploaded by

Lê Võ Quyết Thắng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views65 pages

17-Matrix Sketching

Uploaded by

Lê Võ Quyết Thắng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Note to other teachers and users of these slides: We would be delighted if you found our

material useful for giving your own lectures. Feel free to use these slides verbatim, or to
modify them to fit your own needs. If you make use of a significant portion of these slides
in your own lecture, please include this message, or a link to our web site: http://www.mmds.org

CS246: Mining Massive Datasets

Jure Leskovec, Stanford University
Mina Ghashami, Amazon
http://cs246.stanford.edu
¡ In many applications, we can represent data as a
matrix: e.g. text analysis, recommendation

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2
¡ Think of data as A ∈ ℝ𝒏×𝒅 containing n row
vectors in Rd , and typically 𝑛 ≫ 𝑑

¡ Some examples of typical web-scale data:

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 4
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A

¡ B is much smaller than A that it fits in memory

¡ Rank(B) << rank(A)
§ If A is a document-term matrix with 10 billion
𝟏𝟎𝟏𝟎×𝟏𝟎𝟔
documents and 1 million words A ∈ ℝ then B
would probably be B ∈ ℝ𝟏𝟎𝟎𝟎×𝟏𝟎𝟔
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 5
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 6
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A

¡ Error difference between A and B is small:

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 7
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A

¡ Error difference between A and B is small:

§ The covariance error 𝐴𝑇𝐴 − 𝐵𝑇𝐵 2 is small

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 8
¡ Rank-k approximation to A computes a smaller
matrix B of rank k such that B approximates A

¡ Error difference between A and B is small:

§ The covariance error 𝐴𝑇𝐴 − 𝐵𝑇𝐵 2 is small
§ The projection error 𝐴 − Π𝐵𝐴 2, 𝐹 is small
§ Π𝐵𝐴 := projecting rows of A onto the subspace of B
§ If B = USVT then, the subspace of B is VVT
§ Therefore Π𝐵𝐴 = AVVT

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 9
¡ We saw that SVD computes the best rank-k
approximation to A

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 10
¡ SVD computes the best rank-k approximation
to A

We compare error of other algorithms to ||A-Ak|| as it is the smallest error

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 11
¡ SVD computes the best rank-k approximation
to A

¡ SVD requires O(nd2) time and O(nd) space

¡ Not applicable in streaming, or distributed
settings
¡ Not efficient for sparse matrices

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 12
¡ Can we compute rank-k approximation in
streaming setting?

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 13
¡ Every element of the stream is a row vector of
fixed d-dimension.

§ We’d like to process A

in one pass and using
a small amount of
memory (sublinear in n)

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 15
¡ Streaming data such as any time series data:
§ ecommerce purchases
§ Traffic sensors
§ Activity logs

¡ We can not store the entire data

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 16
¡ A large set of data analysis tasks rely on
obtaining a low rank approximation:
§ Dimension reduction
§ Anomaly detection
§ Data denoising
§ Clustering
§ Recommendation systems

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 17
¡ B is a sketch of a streaming matrix A iff
§ B is of a fixed small size
that fts in memory
§ At any point in stream,
B approximates A

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 18
¡ Almost any matrix sketching methods in
streaming setting falls into one of these
categories:
1. Row sampling based
2. Random projection based and Hashing
3. Iterative sketching

¡They compute a significantly smaller sketch

matrix B such that 𝐴 ≈ 𝐵 𝑜𝑟 𝐴! 𝐴 ≈ 𝐵! 𝐵
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 19
¡ They select a subset of “important” rows of
the original matrix A
§ Sampling is done w.r.t a well-defined probability
distribution
§ Often sampling is done with replacement

¡ And show that sampled matrix B is a good

approximation to original one

¡ Methods differ in how they define notion of

“importance”
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 21
They construct sketch B by:
¡ assign a probability pi to each row ai
¡ sample 𝑙 rows from A to construct B
¡ rescale B appropriately to make it unbiased

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 22
¡ An Intuitive way to define “importance” of an item:
§ the weight associated to the item, e.g.
§ file records à weights as size of the file,
§ IP addresses à weights as number of times the IP address
makes a request
¡ why it is necessary to sample important items?
§ Consider a set of weighted items S = {(a1, w1),(a2, w2), ··· ,(an, wn)}
that we want to summarize with a small & representative sample.

§ We define a representative sample as the one estimates total

weight of S (i.e. 𝑊𝑠 = & 𝑤𝑖 ) in expectation.
!"#

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 23
¡ This is achievable with a sample set of size one!
§ we sample any item (aj , wj) with an arbitrary fixed
probability p,
§ and rescale it to have weight Ws/p.

¡ This sample set has total weight Ws in expectation

§ but has a large variance too
§ To lower down the variance, it is necessary to allow heavy
items (i.e. important items) to get sampled with higher
probability

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 24
¡ Row sampling based on L2 norm:
§ Sample with probability
§ Rescale rows of B by 1/ 𝑙 𝑝(
§ We can show that 𝐸[ 𝐵 𝐹 ]= 𝐴 𝐹
§ And it is proved that if we sample
rows, then:

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 25
¡ Row sampling based on L2 norm:
§ CUR method: samples rows/columns with
probability = squared norm of rows/columns

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 26
¡ Row sampling based on L2 norm:
§ CUR method: samples rows/columns with
probability = squared norm of rows/columns

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 27
¡ Row sampling based on L2 norm:
§ CUR method: samples rows/columns with
probability = squared norm of rows/columns

𝒌 𝒍𝒐𝒈 𝒌
§ Error guarantee: If we sample 𝒄 = 𝑶
𝜺𝟐
𝒌 𝒍𝒐𝒈 𝒌
columns and r= 𝑶 rows, then
𝜺𝟐

With probability >= 98%

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 28
+ Easy interpretation of basis
• Since the basis vectors are actual rows/columns

+ Suitable for Sparse data

• Since the basis vectors are actual rows/columns

- Duplicate columns and rows

• Columns of large norms will be sampled multiple
times

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 29
¡ Key idea: if points in a vector space are
projected onto a randomly selected subspace
of suitably high dimension, then the distances
between points are approximately preserved

¡ Johnson-Lindenstrauss Transform (JLT): d

datapoints in any dimension (ℝ. for 𝑛 ≫ 𝑑)can
get embedded into roughly log d dimensional
space, such that their pair-wise distances are
preserved to some extent
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 31
We define JLT more precisely:
¡ A random matrix S ∈ ℝ"×$ has JLT property if
for all vectors 𝑣, 𝑣′ ∈ ℝ$ ,
𝑺𝒗 − 𝑺𝒗% ) 𝟐 = (𝟏 ± 𝝐) 𝒗 − 𝒗% ) 𝟐
with probability at least 1 − 𝛿

¡ There are many ways to construct a matrix S

that preserve pair-wise distances.
§ All such matrices are called to have the Johnson-
Lindenstrauss Transform (JLT) property
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 32
One simple construction of S:

¡ Pick matrix S ∈ ℝ"×$ as an orthogonal

projection on a random r-dimensional
subspace of ℝ$ with 𝑟 = 𝑂(𝜖 '( log 𝑑)
§ Rows of S are orthogonal vectors

¡ Then for any matrix 𝐴 ∈ ℝ$×) , SA preserves

pair-wise distances between d datapoints in A

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33
¡ A simpler construction for S ∈ ℝ"×$ is:
§ to have entries as independent random variables
with the standard normal distribution
*
S= [matrix with entries draw from N(0,1)]
+

*
x
+

Entries drawn from

distribution N(0,1)

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 34
¡ Another construction for S ∈ ℝ"×$ is:

*
S= [entries as independent +/-1 random var]
+

This is computationally simpler to construct

Entries are +/-1 random

variables
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 35
¡ They use a JLT matrix S ∈ ℝ"×$
¡ Construct the sketch as B = SA ∈ ℝ"×)
§ this projects datapoints from a high-dim space
ℝ. onto a lower-dim subspace ℝ/
¡ They show

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 36
¡ Depending on JLT construction, we achieve
different error bounds:

§ If S ∈ ℝ0×. has has iid zero-mean ±1 entries and

1
𝑟 = 𝑂( + 𝑘 log 𝑘) and, then
2

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 37
¡ Computationally efficient
¡ Sufficiently accurate in practice
¡ A great pre-processing step in applications

¡ Data-oblivious as their computation involves

only a random matrix S
§ Compare to row sampling methods that need to
access data to form a sketch

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 38
¡ Use matrix S that contains one ±1 per column

Only one non-zero

entry in each
column of S.
The rest of entries
are zero

¡ To build S, use two hash functions:

§ h: [n] à [r] , and g:[n] à {-1, +1}
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 39
¡ Very efficient for sparse matrices A
§ can be applied in O(nnz(A)) operations
§ nnz(A) = number of non-zeros of A
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 40
¡ They work over a stream 𝐴 =< 𝑎3 , 𝑎4 , … , 𝑎. >
¡ each ai is read once, get processed quickly
and not read again
¡ with only a small
amount of memory
available

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 42
¡ State of the art method in this group is called
“Frequent Directions”

¡ It is based on Misra-Gries algorithm for

finding frequent items in a data stream

¡ We first see how Misra-Gries algorihtm for

finding frequent items work
§ Then we extend it to matrices

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 43
¡ Suppose there is a stream of items, and we
want to find frequency f(i) of each item

universal
set

stream

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 44
¡ If I keep d counters, I can count frequency of
every item...
§ But it’s not good enough (IP addresses, queries,...)

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 45
¡ Let’s keep 𝑙 counters where 𝑙 ≪ 𝑑

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 46
¡ If a new item arrives in the stream that is
already in the counters, we add 1 to its count

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 47
¡ If the new item is not in the counters and we
have space, we create a counter for it and set
it to 1

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 48
¡ But what if we don’t have space for it?

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 49
¡ Let 𝛿 be the median counter at time t

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 50
¡ Decrease all counters by 𝛿 (or set it to zero if
less than 𝛿)

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 51
¡ Now we have space for new item, so we
continue...

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 52
¡ At any time in the stream, the approximated
counts for items are what we have kept so far

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 53
¡ This method undercounts
0 ≤ 𝑓′(𝑖) ≤ 𝑓(𝑖)
¡ We decrease each counter by at most 𝛿,

𝑓 % 𝑖 ≥ 𝑓 𝑖 − C 𝛿,

¡ At any point that we have seen n elements in

stream: -
∑𝛿 ≤ 𝑛 ,
(

¡ The error guarantee: 𝟎 ≤ 𝒇 𝒊 − 𝒇′(𝒊) ≤ 𝟐𝒏/𝒍

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 54
¡ Misra-Gries produces a non-zero
approximated frequency 𝒇′(𝒊) for all items
that their true frequency 𝒇 𝒊 is higher than
𝟐𝒏/𝒍 ,
¡ 𝒇 𝒊 − 𝟐𝒏/𝒍 ≤ 𝒇′(𝒊)

¡ To find items that appear more than 20% of

the time i.e. 𝒇 𝒊 > 𝒏/𝟓, take 𝒍 = 𝟏𝟎
counters and run Misra-Gries algo.
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 55
¡ Let’s extend it to vectors and matrices

¡ Stream items are row vectors in d dimension

¡ At any time n in the stream, they form a tall

matrix A ∈ ℝ.×)

¡ The goal is to find the most frequent

directions of A

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 56
𝑆5 ← [ 𝑆#$ −𝑆%/$
$
, 𝑆$$ −𝑆%/$
$
, … 0, … , 0]

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 57
ai

𝑆5 ← [ 𝑆#$ −𝑆%/$
$ , 𝑆$$ −𝑆%/$
$ … 0, … , 0]

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 58
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 59
𝑆5 ← [ 𝑆#$ −𝑆%/$
$ , 𝑆$$ −𝑆%/$
$ … 0, … , 0]

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 60
𝑆5 ← [ 𝑆#$ −𝑆%/$
$
, 𝑆$$ −𝑆%/$
$
… 0, … , 0]

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 61
¡ Similar to the frequent items case, this
method has the following error guarantee:

2 (
𝐴𝑇𝐴 − 𝐵𝑇𝐵 ≤ 𝐴 /
𝑙

¡ More accurate error bounds:

𝐴 − 𝜋𝐵 (𝐴) 2𝐹 ≤ (1 + 𝜀) 𝐴 − 𝐴𝑘 2
𝐹

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 62
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 63
3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 64
¡ Matrix Sketching in Streams:
§ Row sampling methods
§ CUR
§ L2 norm based sampling
§ Random projection methods
§ Johnson Lindenstrauss Transform (JLT)
§ Different ways to construct a JLT matrix
§ Iterative sketching methods
§ Misra-Gries algorithm for frequent items
§ Frequent Directions method (state of the art)

3/1/22 Jure Leskovec & Mina Ghashami, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 65

Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Hazmat
100% (1)
Hazmat
102 pages
A Roadmap To Accounting For IncomeTaxes - November 2020
100% (1)
A Roadmap To Accounting For IncomeTaxes - November 2020
670 pages
SQAP For Starter or Control Panel
No ratings yet
SQAP For Starter or Control Panel
29 pages
Nocom vs. Camerino
0% (1)
Nocom vs. Camerino
7 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
Handbook of Research On Machine Learning Applications and Trends
100% (1)
Handbook of Research On Machine Learning Applications and Trends
34 pages
Computer Science Practical For KV
No ratings yet
Computer Science Practical For KV
13 pages
Data Mining: Dimensionality Reduction Pca - SVD
No ratings yet
Data Mining: Dimensionality Reduction Pca - SVD
33 pages
Chapter - 2 Data Mining
No ratings yet
Chapter - 2 Data Mining
21 pages
Made in China Farming Tigers To Extinction
No ratings yet
Made in China Farming Tigers To Extinction
25 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
RMT ML Book-1
No ratings yet
RMT ML Book-1
446 pages
Theory of Locality Sensitive Hashing - CS246 Stanford (Slides)
No ratings yet
Theory of Locality Sensitive Hashing - CS246 Stanford (Slides)
52 pages
Clarion IDE Users Guide
No ratings yet
Clarion IDE Users Guide
302 pages
AEDT Icepak Intro 2019R1 L3 Flow and Thermal Boundary Conditions
No ratings yet
AEDT Icepak Intro 2019R1 L3 Flow and Thermal Boundary Conditions
20 pages
"The Electoral Reforms Law of 1987" Sec. 27. Election Offenses. - in Addition To The Prohibited Acts and Election Offenses Enumerated in
100% (1)
"The Electoral Reforms Law of 1987" Sec. 27. Election Offenses. - in Addition To The Prohibited Acts and Election Offenses Enumerated in
24 pages
Data Mining and Analysis: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Analysis: Fundamental Concepts and Algorithms
9 pages
Spectral Algorithms
No ratings yet
Spectral Algorithms
110 pages
Unit 3
No ratings yet
Unit 3
100 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Big Data - Lecture 06 - SVD
No ratings yet
Big Data - Lecture 06 - SVD
56 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
56 pages
MIT18 409S15 Bookex
No ratings yet
MIT18 409S15 Bookex
123 pages
Mathematics of Signals, Networks, and Learning
No ratings yet
Mathematics of Signals, Networks, and Learning
68 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
42 pages
06-Dim Red
No ratings yet
06-Dim Red
61 pages
Unit 4
No ratings yet
Unit 4
60 pages
Randomnized Linear Algebra
No ratings yet
Randomnized Linear Algebra
54 pages
16 Streams
No ratings yet
16 Streams
61 pages
18-Sub-Modular Functions
No ratings yet
18-Sub-Modular Functions
51 pages
08 Recsys2
No ratings yet
08 Recsys2
60 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
07 Recsys1
No ratings yet
07 Recsys1
48 pages
Lect 07 Distance Based Algorithms
No ratings yet
Lect 07 Distance Based Algorithms
34 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
64 pages
Fortec PT Brochure July 2020 Web
No ratings yet
Fortec PT Brochure July 2020 Web
30 pages
Module7 PCA Clustering November 9-13-2023
No ratings yet
Module7 PCA Clustering November 9-13-2023
41 pages
Assessment 613 Full Resubmission PDF
No ratings yet
Assessment 613 Full Resubmission PDF
32 pages
Anirban CMI StatFin 2019 II
No ratings yet
Anirban CMI StatFin 2019 II
92 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Wavelet Decomposition of Data Streams: by Dragana Veljkovic
No ratings yet
Wavelet Decomposition of Data Streams: by Dragana Veljkovic
34 pages
Least Squares Support Vector Machines: Johan Suykens
No ratings yet
Least Squares Support Vector Machines: Johan Suykens
84 pages
Artificial Intelligence and Machine Learning: T.A. Silvia Bucci
No ratings yet
Artificial Intelligence and Machine Learning: T.A. Silvia Bucci
78 pages
U5 - SVD - 5th Sem - DS
No ratings yet
U5 - SVD - 5th Sem - DS
17 pages
ML Lectures - 20 22
No ratings yet
ML Lectures - 20 22
14 pages
TC-1 Final Answer Key
No ratings yet
TC-1 Final Answer Key
14 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
MLQB2
No ratings yet
MLQB2
11 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
451866136ba Ii Year
No ratings yet
451866136ba Ii Year
16 pages
CPC Modes of Servive Esummon
No ratings yet
CPC Modes of Servive Esummon
12 pages
SVM Class
No ratings yet
SVM Class
33 pages
Site Adaptation and Solar Radiation Forecasting1
No ratings yet
Site Adaptation and Solar Radiation Forecasting1
14 pages
19 Submodular
No ratings yet
19 Submodular
47 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
2019 Shark Fishing and Finning Regulations
No ratings yet
2019 Shark Fishing and Finning Regulations
9 pages
16 Streams
No ratings yet
16 Streams
5 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
No ratings yet
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
10 pages
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
No ratings yet
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
41 pages
3 - Low Rank Apprx For SVD
No ratings yet
3 - Low Rank Apprx For SVD
4 pages
14-Learning Emb
No ratings yet
14-Learning Emb
8 pages
Experiment 8 Fuentes Mark
No ratings yet
Experiment 8 Fuentes Mark
29 pages
EE Lab 10
No ratings yet
EE Lab 10
7 pages
Group Life Assurance in Myanmar
No ratings yet
Group Life Assurance in Myanmar
2 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
49 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
1MWh ESS and 303KW PV System For Biova V2
No ratings yet
1MWh ESS and 303KW PV System For Biova V2
1 page
EECS 275 Matrix Computation: Ming-Hsuan Yang
No ratings yet
EECS 275 Matrix Computation: Ming-Hsuan Yang
21 pages
Certificate: Lokmanya Tilak College of Engineering
No ratings yet
Certificate: Lokmanya Tilak College of Engineering
5 pages
Job Analysis The Process and Its Uses
No ratings yet
Job Analysis The Process and Its Uses
13 pages
IEC-IM03 Series: Key Features
No ratings yet
IEC-IM03 Series: Key Features
1 page
ER Diagram
No ratings yet
ER Diagram
2 pages
SVM - Hype or Hallelujah
No ratings yet
SVM - Hype or Hallelujah
13 pages
Compact 16 Port Master / Room Controller With Poe: Features
No ratings yet
Compact 16 Port Master / Room Controller With Poe: Features
2 pages
To College DESIGN OF TAPER SLOT ARRAY FOR ULTRA WIDE Review 1.1
No ratings yet
To College DESIGN OF TAPER SLOT ARRAY FOR ULTRA WIDE Review 1.1
24 pages
Synchro PRO 2018 - Technical Overview
No ratings yet
Synchro PRO 2018 - Technical Overview
11 pages
80-90 DT - Fiat Tractor (01/84 - 12/92)
No ratings yet
80-90 DT - Fiat Tractor (01/84 - 12/92)
2 pages
20.-Mclaughlin V CA
No ratings yet
20.-Mclaughlin V CA
2 pages
Elka 43 Instructions
No ratings yet
Elka 43 Instructions
5 pages
Search Theory And U-Boats In The Bay Of Biscay
From Everand
Search Theory And U-Boats In The Bay Of Biscay
Captain R. Gregory Carl
No ratings yet
Markov Chains: From Theory to Implementation and Experimentation
From Everand
Markov Chains: From Theory to Implementation and Experimentation
Paul A. Gagniuc
No ratings yet
General Stochastic Processes in the Theory of Queues
From Everand
General Stochastic Processes in the Theory of Queues
Vaclav E. Benes
No ratings yet
Computer Methods in Power Systems Analysis with MATLAB
From Everand
Computer Methods in Power Systems Analysis with MATLAB
Sekhar Chandra P.
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

17-Matrix Sketching

Uploaded by

17-Matrix Sketching

Uploaded by

Note to other teachers and users of these slides: We would be delighted if you found our

CS246: Mining Massive Datasets

¡ Some examples of typical web-scale data:

¡ B is much smaller than A that it fits in memory

¡ Error difference between A and B is small:

¡ Error difference between A and B is small:

¡ Error difference between A and B is small:

We compare error of other algorithms to ||A-Ak|| as it is the smallest error

¡ SVD requires O(nd2) time and O(nd) space

§ We’d like to process A

¡ We can not store the entire data

¡They compute a significantly smaller sketch

¡ And show that sampled matrix B is a good

¡ Methods differ in how they define notion of

§ We define a representative sample as the one estimates total

¡ This sample set has total weight Ws in expectation

With probability >= 98%

+ Suitable for Sparse data

- Duplicate columns and rows

¡ Johnson-Lindenstrauss Transform (JLT): d

¡ There are many ways to construct a matrix S

¡ Pick matrix S ∈ ℝ"×$ as an orthogonal

¡ Then for any matrix 𝐴 ∈ ℝ$×) , SA preserves

Entries drawn from

This is computationally simpler to construct

Entries are +/-1 random

§ If S ∈ ℝ0×. has has iid zero-mean ±1 entries and

¡ Data-oblivious as their computation involves

Only one non-zero

¡ To build S, use two hash functions:

¡ It is based on Misra-Gries algorithm for

¡ We first see how Misra-Gries algorihtm for

¡ At any point that we have seen n elements in

¡ The error guarantee: 𝟎 ≤ 𝒇 𝒊 − 𝒇′(𝒊) ≤ 𝟐𝒏/𝒍

¡ To find items that appear more than 20% of

¡ Stream items are row vectors in d dimension

¡ At any time n in the stream, they form a tall

¡ The goal is to find the most frequent

¡ More accurate error bounds:

You might also like