0% found this document useful (0 votes)
33 views15 pages

Presentation: Hadoop Technology

The document provides an overview of Hadoop technology including its architecture, components, and MapReduce framework. It describes how Hadoop uses a distributed file system to store large datasets across clusters and nodes. It also explains the typical workflow of how data is loaded and stored in HDFS, including how the NameNode manages block placement. Finally, it gives a high-level description of the MapReduce programming model and provides an example of how it can be used to count word frequencies.

Uploaded by

Rahul Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views15 pages

Presentation: Hadoop Technology

The document provides an overview of Hadoop technology including its architecture, components, and MapReduce framework. It describes how Hadoop uses a distributed file system to store large datasets across clusters and nodes. It also explains the typical workflow of how data is loaded and stored in HDFS, including how the NameNode manages block placement. Finally, it gives a high-level description of the MapReduce programming model and provides an example of how it can be used to count word frequencies.

Uploaded by

Rahul Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Presentation

on

HADOOP TECHNOLOGY
Submitted by

Rahul Singh
Roll NO 1503314918,MCA
Under the Guidance of
Rama Chaudhary,
(Assistant Professor)
Raj Kumar Goel Institute of Technology
Contents

Hadoop Introduction
Famous Hadoop Users
Hadoop Architecture
Hadoop Cluster
Hadoop Cluster Components
HDFS Architecture
MapReduce Overview
MapReduce Word Count
HADHOOP

Distributed file system

Traditional hierarchical file organization

Single namespace for the entire cluster

Write-once-read-many access model

Aware of the network topology


Famous Hadoop users
Hadoop Architecture
Hadoop cluster
A Small Hadoop Cluster Include a single master &
multiple worker nodes
Master node:
Data Node Slave node:
Job Tracker Data Node
Task Tracker Task Tracke
Name Node
Hadoop cluster components:
HDFS Architecture
Hadoop- Typical Workflow in HDFS

How Sample.txt gets loaded into


the Hadoop Cluster?

Client machine does this step and


loads the Sample.txt into cluster. It
breaks the sample.
txt into smaller chunks which are
known as "Blocks" in Hadoop
context. Client put these blocks on
different machines (data nodes)
throughout the cluster.
Next, how does the Client knows
that to which data nodes load
the blocks?
Now NameNode comes into
picture. The NameNode used its
Rack Awareness intelligence to
decide on which DataNode to
provide. For each of the data block
(in this case Block-A, Block-B and
Block-C), Client contacts
NameNode and in response
NameNode sends an ordered list of
3 DataNodes.

For example in response to Block-


A request, Node Name may send
DataNode-2, DataNode-3 and
DataNode-4.
Who does the block replication?
MapReduce Overview

A method for distributing computation across multiple nodes


Each node processes the data that is stored at that node
The Mapper

Reads data as key/value pairs


Outputs zero or more key/value
pairs
The Reducer
Called once for each unique key
Gets a list of all values associated with a key
as input
The reducer outputs zero or more final
key/value pairs
MapReduce: Word Count

You might also like