0% found this document useful (0 votes)

5 views

Lecture 9 - MapReduce

The document is a lecture outline on MapReduce and Apache Hadoop, detailing the programming model, its implementation, and the motivations behind its development. It explains the MapReduce framework, including its core functions (Map and Reduce), job submission process, and cluster architecture. Additionally, it covers advanced aspects such as fault tolerance and handling stragglers in a distributed computing environment.

Uploaded by

Ahmed Ibrahim Ghnnam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Lecture 9 - MapReduce

Uploaded by

Ahmed Ibrahim Ghnnam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

NIE‐PDB: Advanced Database Systems

http://www.ksi.mff.cuni.cz/~svoboda/courses/241‐NIE‐PDB/

Lecture 6

MapReduce, Apache Hadoop

Martin Svoboda
[email protected]

29. 10. 2024

Charles University, Faculty of Mathematics and Physics

Czech Technical University in Prague, Faculty of Information Technology
Lecture Outline
MapReduce
• Programming model and implementation
• Motivation, principles, details, …
Apache Hadoop
• HDFS – Hadoop Distributed File System
• MapReduce

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 2
Programming Models
What is a programming model?
• Abstraction of an underlying computer system
Describes a logical view of the provided functionality
Offers a public interface, resources or other constructs
Allows for the expression of algorithms and data structures
Conceals physical reality of the internal implementation
Allows us to work at a (much) higher level of abstraction
• The point is
how the intended user thinks in order to solve their tasks
and not necessarily how the system actually works

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 3
Programming Models
Examples
• Traditional von Neumann model
Architecture of a physical computer with several components
such as a central processing unit (CPU), arithmetic‐logic unit
(ALU), processor registers, program counter, memory unit, etc.
Execution of a stream of instructions
• Java Virtual Machine (JVM)
• …
Do not confuse programming models with
• Programming paradigms (procedural, functional, logic, modular,
object‐oriented, recursive, generic, data‐driven, parallel, …)
• Programming languages (Java, C++, …)

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 4
Parallel Programming Models
Process interaction
Mechanisms of mutual communication of parallel processes
• Shared memory – shared global address space, asynchronous read
and write access, synchronization primitives
• Message passing
• Implicit interaction
Problem decomposition
Ways of problem decomposition into tasks executed in parallel
• Task parallelism – different tasks over the same data
• Data parallelism – the same task over different data
• Implicit parallelism

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 5
MapReduce
MapReduce Framework
What is MapReduce?
• Programming model + implementation
• Developed by Google in 2008

Google:
A simple and powerful interface that enables automatic par‐
allelization and distribution of large‐scale computations,
combined with an implementation of this interface that
achieves high performance on large clusters of commodity
PCs.

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 7
History and Motivation
Google PageRank problem (2003)
• How to rank tens of billions of web pages by their importance
… efficiently in a reasonable amount of time
… when data is scattered across thousands of computers
… data files can be enormous (terabytes or more)
… data files are updated only occasionally (just appended)
… sending the data between compute nodes is expensive
… hardware failures are rule rather than exception
• Centralized index structure was no longer sufficient
• Solution
Google File System – a distributed file system
MapReduce – a programming model

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 8
MapReduce Framework
MapReduce programming model
• Cluster of commodity personal computers (nodes)
Each running a host operating system, mutually interconnected
within a network, communication based on IP addresses, …
• Data is distributed among the nodes
• Tasks executed in parallel across the nodes
Classification
• Process interaction: message passing
• Problem decomposition: data parallelism

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 9
Basic Idea
Divide‐and‐conquer paradigm
• Breaks down a given problem into simpler sub‐problems
• Solutions of the sub‐problems are then combined together
Two core functions
• Map function
Generates a set of intermediate key‐value pairs
• Reduce function
Reduces values associated with a given intermediate key
And that’s all!

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 10
Basic Idea
And that’s really all!
It means...
• We only need to implement Map and Reduce functions
• Everything else such as
input data distribution,
scheduling of execution tasks,
monitoring of computation progress,
inter‐machine communication,
handling of machine failures,
…
is managed automatically by the framework! this because
MapReduce are Programming Model.

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 11
Model Description
Map function
• Input: input key‐value pair = input record
• Output: list of intermediate key‐value pairs
Usually from a different domain
Keys do not have to be unique
Duplicate pairs are permitted
• (key, value) → list of (key, value)
Reduce function
• Input: intermediate key + list of (all) values for this key
• Output: possibly smaller list of values for this key
Usually from the same domain
• (key, list of values) → (key, list of values)

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024
Example: Word Frequency
/**
* Map function
* @param key Document identifier
* @param value Document contents
*/
map(String key, String value) {
foreach word w in value: emit(w, 1);
}

/**
* Reduce function
* @param key Particular word
* @param values List of count values generated for this word
*/
reduce(String key, Iterator values) {
int result = 0;
foreach v in values: result += v;
emit(key, result);
}

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 13
Logical Phases

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 14
Logical Phases
Mapping phase
• Map function is executed for each input record
• Intermediate key‐value pairs are emitted
Shuffling phase
• Intermediate key‐value pairs are grouped and sorted
according to the keys
Reducing phase
• Reduce function is executed for each intermediate key
• Output key‐value pairs are generated

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 15
Cluster Architecture
Master‐slave architecture
• Two types of nodes, each with two basic roles
• Master
Manages the execution of MapReduce jobs
– Schedules individual Map / Reduce tasks to idle workers
– …
Maintains metadata about input / output files
– These are stored in the underlying distributed file system
• Slaves (workers)
Physically store the actual data contents of files
– Files are divided into smaller parts called splits
– Each split is stored by one / or even more particular workers
Accept and execute assigned Map / Reduce tasks

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 16
Cluster Architecture

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 17
MapReduce Job Submission

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 18
MapReduce Job Submission
Submission of MapReduce jobs
• Jobs can only be submitted to the master node
• Client provides the following:
Implementation of (not only) Map and Reduce functions
Description of input file (or even files)
Description of output directory
Localization of input files
• Master determines locations of all involved splits
I.e. workers containing these splits are resolved

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 19
Input Splits Localization

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 20
Input Splits Localization

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 21
Map Task Assignment

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 22
Map Task Execution
Map Task = processing of 1 split by 1 worker
• Assigned by the master to an idle worker that is (preferably)
already containing (physically storing) a given split
Individual steps…
• Input reader is used to parse contents of the split
I.e. input records are generated from input reader.
• Map function is applied on each input record
Intermediate key‐value pairs are emitted
• These pairs are stored locally and organized into regions
Either in the system memory,
or flushed to a local hard drive when necessary
Partition function is used to determine the intended region
– Intermediate keys (not values) are used for this purpose
– E.g. hash of the key modulo the overall number of reducers
NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 23
Input Parsing
Parsing phase
• Each split is parsed so that input records are retrieved
(i.e. input key‐value pairs are obtained)

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 24
Map Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 25
Map Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 26
Map Task Confirmation

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 27
Reduce Task Assignment

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 28
Reduce Task Execution
Reduce Task = reduction of selected key‐value pairs by 1 worker
• Goal: processing of all emitted intermediate key‐value pairs
belonging to a particular region
Individual steps…
• Intermediate key‐value pairs are first acquired
All relevant mapping workers are addressed
Data of corresponding regions are transfered (remote read)
• Once downloaded, they are locally merged
I.e. sorted and grouped based on keys
• Reduce function is applied on each intermediate key
• Output key‐value pairs are emitted and stored (output writer)
Note that each worker produces its own separate output file

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 29
Region Data Retrieval

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 30
Region Data Retrieval

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 31
Reduce Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 32
Reduce Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 33
Reduce Task Confirmation

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 34
MapReduce Job Termination

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 35
Combine Function
Optional Combine function
• Objective
Decrease the amount of intermediate data
i.e. decrease the amount of data that is needed to be
transferred from Mappers to Reducers
• Analogous purpose and implementation to Reduce function
• Executed locally by Mappers
• However, only applicable when the reduction is…
Commutative
Associative
Idempotent: f(f(x)) = f(x)

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 36
Improved Map Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 37
Improved Reduce Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 38
Improved Reduce Phase

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 39
Functions Overview
Input reader
• Parses a given input split and prepares input records
Map function
Partition function
• Determines a particular Reducer for a given intermediate key
Compare function
• Mutually compares two intermediate keys
Combine function
Reduce function
Output writer
• Writes the output of a given Reducer

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 40
Advanced Aspects
Counters
• Allow to track the progress of a MapReduce job in real time
Predefined counters
– E.g. numbers of launched / finished Map / Reduce tasks,
parsed input key‐value pairs, …
Custom counters (user‐defined)
– Can be associated with any action that a Map or Reduce
function does

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 41
Advanced Aspects
Fault tolerance
• When a large number of nodes process a large number of data
⇒ fault tolerance is necessary
Worker failure
• Master periodically pings every worker; if no response is received in
a certain amount of time, master marks the worker as failed
• All its tasks are reset back to their initial idle state and become
eligible for rescheduling on other workers
Master failure
• Strategy A – periodic checkpoints are created; if master fails,
a new copy can then be started
• Strategy B – master failure is considered to be highly unlikely;
users simply resubmit unsuccessful jobs
NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 42
Advanced Aspects
Stragglers
• Straggler = node that takes unusually long time to complete
a task it was assigned
• Solution
When a MapReduce job is close to completion, the master
schedules backup executions of the remaining in‐progress tasks
A given task is considered to be completed whenever either
the primary or the backup execution completes

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 43
Additional Examples
URL access frequency
• Input: HTTP server access logs
• Map: parses a log, emits (accessed URL, 1) pairs
• Reduce: computes and emits the sum of the associated values
• Output: overall number of accesses to a given URL
Inverted index
• Input: text documents containing words
• Map: parses a document, emits (word, document ID) pairs
• Reduce: emits all the associated document IDs sorted
• Output: list of documents containing a given word

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 44
Additional Examples
Distributed sort
• Input: records to be sorted according to a specific criterion
• Map: extracts the sorting key, emits (key, record) pairs
• Reduce: emits the associated records unchanged
Reverse web‐link graph
• Input: web pages with <a href="…">…</a> tags
• Map: emits (target URL, current document URL) pairs
• Reduce: emits the associated source URLs unchanged
• Output: list of URLs of web pages targeting a given one

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 45
Additional Examples
Reverse web‐link graph
/**
* Map function
* @param key Source web page URL
* @param value HTML contents of this web page
*/
map(String key, String value) {
foreach <a> tag t in value: emit(t.href, key);
}

/**
* Reduce function
* @param key URL of a particular web page
* @param values List of URLs of web pages targeting this one
*/
reduce(String key, Iterator values) {
emit(key, values);
}

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 46
Use Cases: General Patterns
Counting, summing, aggregation
• When the overall number of occurrences of certain items or a
different aggregate function should be calculated
Collating, grouping
• When all items belonging to a certain group should be found,
collected together or processed in another way
Filtering, querying, parsing, validation
• When all items satisfying a certain condition should be found,
transformed or processed in another way
Sorting
• When items should be processed in a particular order with respect
to a certain ordering criterion

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 47
Use Cases: Real‐World Problems
Just a few real‐world examples…
• Risk modeling, customer churn
• Recommendation engine, customer preferences
• Advertisement targeting, trade surveillance
• Fraudulent activity threats, security breaches detection
• Hardware or sensor network failure prediction
• Search quality analysis
• …

Source: http://www.cloudera.com/

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 48
Lecture Conclusion
MapReduce criticism
• MapReduce is a step backwards
Does not use database schema
Does not use index structures
Does not support advanced query languages
Does not support transactions, integrity constraints, views, …
Does not support data mining, business intelligence, …
• MapReduce is not novel
Ideas more than 20 years old and overcome
Message Passing Interface (MPI), Reduce‐Scatter
The end of MapReduce?

NIE‐PDB: Advanced Database Systems | Lecture 6: MapReduce, Apache Hadoop | 29. 10. 2024 68

Amazon DEA-C01 AWS Certified Data Engineer - Associate Dumps
No ratings yet
Amazon DEA-C01 AWS Certified Data Engineer - Associate Dumps
20 pages
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
PPT -CC-UNIT-1
No ratings yet
PPT -CC-UNIT-1
151 pages
Case Study-Machine Learning at American Express
No ratings yet
Case Study-Machine Learning at American Express
8 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
2 pages
BDAunit-III
No ratings yet
BDAunit-III
4 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
BDA - II Sem - II Mid
100% (1)
BDA - II Sem - II Mid
4 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
07-BigData-DataAnalysis
No ratings yet
07-BigData-DataAnalysis
66 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Lesson 2 A Review of Hadoop
No ratings yet
Lesson 2 A Review of Hadoop
6 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
No ratings yet
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
14 pages
DM - Topic Five
No ratings yet
DM - Topic Five
30 pages
BDA Assignment QP-3 IT a With Key Solutions
No ratings yet
BDA Assignment QP-3 IT a With Key Solutions
5 pages
Unit 5
No ratings yet
Unit 5
7 pages
Unit V FRAMEWORKS AND VISUALIZATION
No ratings yet
Unit V FRAMEWORKS AND VISUALIZATION
71 pages
Training For Bigdata and Hadoop: #I Background and Introduction
No ratings yet
Training For Bigdata and Hadoop: #I Background and Introduction
9 pages
Big Data Analytics AAM Unit 5 (1)
No ratings yet
Big Data Analytics AAM Unit 5 (1)
28 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Advance Database Chap One
No ratings yet
Advance Database Chap One
53 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Lecture-3-MR-model-and-systems
No ratings yet
Lecture-3-MR-model-and-systems
67 pages
Unit 3 & 4 big data
No ratings yet
Unit 3 & 4 big data
18 pages
Big Data
No ratings yet
Big Data
29 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
No ratings yet
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
7 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
61 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Lab Manual BDA
No ratings yet
Lab Manual BDA
36 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Unit_5[1]
No ratings yet
Unit_5[1]
21 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
1 MapReduce introduction with example
No ratings yet
1 MapReduce introduction with example
52 pages
Bigdata PPT Slides (E)
No ratings yet
Bigdata PPT Slides (E)
10 pages
IDS Unit3
No ratings yet
IDS Unit3
16 pages
Hadoop
No ratings yet
Hadoop
13 pages
Report Title: Wasit University
No ratings yet
Report Title: Wasit University
8 pages
Big Data
No ratings yet
Big Data
67 pages
Unit 5
No ratings yet
Unit 5
35 pages
Big Data and Analytics and MapReduce 29052023 054155pm
No ratings yet
Big Data and Analytics and MapReduce 29052023 054155pm
35 pages
NAAC Accredited ''A" D. E. Society's: "Rainfall Analysis in India "
No ratings yet
NAAC Accredited ''A" D. E. Society's: "Rainfall Analysis in India "
21 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Big Data
No ratings yet
Big Data
25 pages
SDCBDASPARKWEEK1-1
No ratings yet
SDCBDASPARKWEEK1-1
9 pages
NAAC Accredited ''A" D. E. Society's: "Rainfall in India Analysis"
No ratings yet
NAAC Accredited ''A" D. E. Society's: "Rainfall in India Analysis"
21 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
No ratings yet
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
44 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
BDA notes
No ratings yet
BDA notes
39 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
DA Unit 5
No ratings yet
DA Unit 5
191 pages
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
22413_SEN_QB5
No ratings yet
22413_SEN_QB5
18 pages
Lec4 designPattern
No ratings yet
Lec4 designPattern
48 pages
assignment_1
No ratings yet
assignment_1
12 pages
Lec5 flask
No ratings yet
Lec5 flask
5 pages
Answer Midterm 2024 - 11 - 19
No ratings yet
Answer Midterm 2024 - 11 - 19
4 pages
MNU CAI ICI334 Lec4&5
No ratings yet
MNU CAI ICI334 Lec4&5
33 pages
sodapdf-converted
No ratings yet
sodapdf-converted
4 pages
BDA-Lec3
No ratings yet
BDA-Lec3
48 pages
BDA-Lec1
No ratings yet
BDA-Lec1
25 pages
BDA-Lec10
No ratings yet
BDA-Lec10
33 pages
AI lecture 9
No ratings yet
AI lecture 9
39 pages
BDA-Lec4
No ratings yet
BDA-Lec4
40 pages
Section 5
No ratings yet
Section 5
7 pages
Lec. 3
No ratings yet
Lec. 3
18 pages
Chapter 8 Concurrency-P1
No ratings yet
Chapter 8 Concurrency-P1
30 pages
Lecture 7 - Wide Column Stores - Part 1
No ratings yet
Lecture 7 - Wide Column Stores - Part 1
30 pages
Lecture-02,03
No ratings yet
Lecture-02,03
54 pages
Avinash - Data Engineer (AutoRecovered)
No ratings yet
Avinash - Data Engineer (AutoRecovered)
10 pages
Unit 1 - BD - Introduction To Big Data
No ratings yet
Unit 1 - BD - Introduction To Big Data
83 pages
Vjkarthigaa
No ratings yet
Vjkarthigaa
5 pages
Data Tiering in BW4HANA and SAP BW On HANA Roadmap Update
100% (1)
Data Tiering in BW4HANA and SAP BW On HANA Roadmap Update
14 pages
6 Sears
No ratings yet
6 Sears
25 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Prácticas Bigdata: 1. Lanzar Un Proceso Mapreduce Contra El Cluster
No ratings yet
Prácticas Bigdata: 1. Lanzar Un Proceso Mapreduce Contra El Cluster
3 pages
Akash Box Akash Notes3
No ratings yet
Akash Box Akash Notes3
55 pages
Word Count
No ratings yet
Word Count
10 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
25 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
19 Series Curriculum - Viii Sem-2022-2023
No ratings yet
19 Series Curriculum - Viii Sem-2022-2023
21 pages
Immediate download Big Data Anil Maheshwari ebooks 2024
100% (3)
Immediate download Big Data Anil Maheshwari ebooks 2024
66 pages
My Siwes Report
No ratings yet
My Siwes Report
34 pages
Data Scince
No ratings yet
Data Scince
8 pages
Isilon OneFS Version 7.0 Administration Guide
No ratings yet
Isilon OneFS Version 7.0 Administration Guide
320 pages
Stitha Routrey
No ratings yet
Stitha Routrey
4 pages
Deepak Garg
No ratings yet
Deepak Garg
3 pages
English - Communication Skills Lab - I: Prakasam Engineering College
No ratings yet
English - Communication Skills Lab - I: Prakasam Engineering College
44 pages
Cloud Computing and Big Data: Technologies, Applications and Security
No ratings yet
Cloud Computing and Big Data: Technologies, Applications and Security
406 pages
IT IT02 IOE Qps
No ratings yet
IT IT02 IOE Qps
17 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Cassandra Hadoop integration
No ratings yet
Cassandra Hadoop integration
2 pages
Scaler Plus Brochure
No ratings yet
Scaler Plus Brochure
16 pages
Hbase
No ratings yet
Hbase
23 pages
Ethical Safe Lawful A Toolkit For Artificial Intelligence Projects
No ratings yet
Ethical Safe Lawful A Toolkit For Artificial Intelligence Projects
58 pages

Lecture 9 - MapReduce

Uploaded by

Lecture 9 - MapReduce

Uploaded by

NIE‐PDB: Advanced Database Systems

MapReduce, Apache Hadoop

29. 10. 2024

Charles University, Faculty of Mathematics and Physics

You might also like