Introduction PDF
Introduction PDF
material useful for giving your own lectures. Feel free to use these slides verbatim, or to
modify them to fit your own needs. If you make use of a significant portion of these slides
in your own lecture, please include this message, or a link to our web site: http://www.mmds.org
¡ Predictive methods
§ Use some variables to predict unknown
or future values of other variables
§ Example: Recommender systems
Locality Filtering
PageRank, Recommen
sensitive data SVM
SimRank der systems
hashing streams
Dimensional Duplicate
Spam Queries on Perceptron,
ity document
reduction Detection streams kNN detection
C0 C1 D0 C1 C2 C5 C0 C5
C5 C2 C5 C3 D0 D1 … D0 C2
Chunk server 1 Chunk server 2 Chunk server 3 Chunk server N
Group by key:
Collect all pairs with
same key
(Hash merge, Shuffle,
Sort, Partition)
Reduce:
Collect all values
belonging to the
key and output
Input Output
Mappers Reducers
data
The crew of the space
reads
shuttle Endeavor recently
(The, 1) (crew, 1)
read the
returned to Earth as (crew, 1) (crew, 1)
ambassadors, harbingers of (crew, 2)
a new era of space (of, 1) (space, 1)
sequential
exploration. Scientists at
(space, 1)
(the, 1) (the, 1)
NASA are saying that the (the, 3)
Sequentially
recent assembly of the (space, 1) (the, 1)
Dextre bot is the first step in (shuttle, 1)
a long-term space-based (shuttle, 1) (the, 1)
man/mache partnership.
(recently, 1)
(Endeavor, 1) (shuttle, 1)
'"The work we're doing now …
Only
-- the robotics we're doing - (recently, 1) (recently, 1)
- is what we're going to
need …………………….. …. …
Big document (key, value) (key, value) (key, value)
1/7/20 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 42
map(key, value):
# key: document name; value: text of the document
for each word w in value:
emit(w, 1)
reduce(key, values):
# key: a word; value: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)
F:
Stage 1 groupBy
C: D: E:
join = RDD
¡ Other examples:
§ Link analysis and graph processing
§ Machine Learning algorithms
A B B C A C
a1 b1
⋈
b2 c1 a3 c1
a2
a3
b1
b2
b2 c2 = a3 c2
b3 c3 a4 c3
a4 b3
S
R