0% found this document useful (0 votes)

103 views27 pages

The Solution For Big Data Hadoop

Hadoop is an open-source software framework for storing and processing big data in a distributed computing environment. It allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop consists of Hadoop Common, HDFS for distributed storage, and MapReduce for distributed processing of large data sets across clusters of nodes. HDFS stores data across clusters of machines as blocks and provides high throughput access to application data. MapReduce allows distributed processing of large data sets across clusters of nodes and helps to parallelize the job workload. Hadoop is widely adopted by companies like Amazon, Facebook, Adobe, and eBay for its ability to efficiently store and process massive amounts of data in a distributed manner.

Uploaded by

Amritranjan Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views27 pages

The Solution For Big Data Hadoop

Uploaded by

Amritranjan Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

You are on page 1/ 27

The solution for Big data

HADOOP

J. Sai Krishna
and G. Sravya
Lahari
2nd B.Tech (CSE)
K.O.R.M College of Engineering
Kadapa

Contents
1. Data trends in storing data.
2. Bigdata problems in IT industry
3. Introduction to HADOOP
4. HDFS (Hadoop Distributed File System)

MapReduce
6. Prominent users of Hadoop.
7. Conclusion
5.

Data trends in storing data

What is data--- Any real world symbol

(character,
numeric, special character)
or a of group of
them is
said to be
data it may be of the visual or audio or
scriptural ,etc

Big data
What is big dataIn IT, it is a collection

of data sets so large and complex data that

it becomes difficult to process using onhand database management tools or
traditional data processing applications.
As of 2012, limits on the size of data sets
that are feasible to process in reasonable
time were on the order of Exabyte of data.

BIGDATA and problems with

it.
Daily about 0.5 Petabytes of updates are being made

into FACEBOOK including 40 millions photos.

Daily, YOUTUBE is loaded with videos that can be
watched for one year continuously
Limitations are encountered due to large data sets in
many areas, including meteorology, genomics,
complex physics simulations, and biological and
environmental research.
Also affect Internet search, finance and business
informatics.
The challenges include in capture, retrieval, storage,
search, sharing, analysis, and visualization.

THEN WHAT COULD BE THE

SOLUTION FOR BIGDATA

HADOOP

What is Hadoop?
It is a opensource software written in java
Hadoop software library is a framework that

allows for the distributed processing of

large data sets across clusters of
computers using simple programming
models.
It is designed to scale up from single
servers to thousands of machines, each
offering local computation and storage.

The project includes these

modules:
Hadoop Common
Hadoop Distributed File System
(HDFS)
Hadoop MapReduce

1.Hadoop Commons
It provides access to the filesystems

supported by Hadoop.
The Hadoop Common package contains the
necessary JAR files and scripts needed to
start Hadoop.
The package also provides source code,
documentation, and a contribution section
which includes projects from the Hadoop
Community (Avro, Cassandra, Chukwa,
Hbase, Hive, Mahout, Pig, ZooKeeper)

2. Hadoop Distributed File

System (HDFS):
Hadoop uses HDFS, a distributed file

system based on GFS (Google File System),

as its shared filesystem.
HDFS architecture divides files into large
chunks (~64MB) distributed across data
servers (this is configurable).
It has a namenode and datanodes

What does a HDFS contain

HDFS consists of a global namenodes or

namespaces and they are federated.

The datanodes are used as common
storage for blocks by all the Namenodes.
Each datanode registers with all the
Namenodes in the cluster.
Datanodes send periodic heartbeats and

block reports and handles commands

from the Namenodes

Structure of Hadoop system:

MASTER NODE

Master node
Keeps

track of namespace and metadata about items

Keeps track of MapReduce jobs in the system

Hadoop currently configured with centurion064 as

the master node
Hadoop is locally installed in each system.
Installed location is in /localtmp/hadoop/hadoop0.15.3

SLAVE NODES

Slave nodes
Manage

blocks of data sent from master node

In common, these are the chunkservers

Currently centurion060, centurion064 are the two

slave nodes being used.
Slave nodes store their data in
/localtmp/hadoop/hadoop-dfs (this is automatically
created by the DFS)
Once you use the DFS, relative paths are from
/usr/{your usr id}

Advantages and Limitations of

HDFS
Reduce traffic on job

scheduling.
File access can be
achieved through
the native Java or
language of the
users' choice (C++,
Java, Python, PHP,
Ruby, Erlang, Perl,
Haskell, C#, Cocoa,
Smalltalk, and
OCaml),

It cannot be

directly mounted
by an existing
operating system.
It should be
provided with UNIX
or LUNIX system.

3.Hadoop MAPREDUCE
SYSTEM

MAP AND REDUCE METHODS USAGE

Map function

Reduce function

Run this program as a

MapReduce job

WORD COUNT OVER A GIVEN

SET OF STRINGS
We love India

We play
tennis

We
1
love
1
India
We
1
Play
Map
1
tennis

Love
India

1
1

We
2
tennis 1
play
1
Reduce

MAPREDUCE IN WITH NO REDUCE TASKS

MAPREDUCE WITH TWO REDUCE

TASKS - AUTOMATIC PARALLEL
EXECUTION IN MAPREDUCE

MapReduce - lifecycle

Input
Splits

Map
function
Map phase

Reduce
function
Reduce phase

Shuffle and sort in MapReduce

with multiple reduce tasks

Prominent users of HADOOP

Amazon 100 nodes
Facebook two clusters of 8000 and 3000

nodes
Adobe 80 node system
EBay 532 node cluster
yahoo cluster of about 4500 nodes
IIIT Hyderabad 30 node cluster

Achievements
March 2011 - Apache Hadoop takes top

prize at Media Guardian Innovation Award

July 2012 - Hadoop Wins Terabyte Sort
Benchmark

Conclusion:
It reduce traffic on capture, storage, search,
sharing, analysis, and visualization.
A huge amount of data could be stored and large
computations could be done in a single
compound with full safety and security at cheap
cost.
BIGDATA and BIGDATA-SOLUTIONS is one of the
burning issues in the present IT industry so, work
on those will surely make you more useful to that.

Thank
you
Any queries

The Ultimate Cisco Jabber Specialist 2 Lab Guide - Part1
No ratings yet
The Ultimate Cisco Jabber Specialist 2 Lab Guide - Part1
166 pages
SAP BI Analysis Authorization (Customer Exit Variables)
No ratings yet
SAP BI Analysis Authorization (Customer Exit Variables)
11 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Module 2.1
No ratings yet
Module 2.1
21 pages
Hadoop Presentation
No ratings yet
Hadoop Presentation
19 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Unit IV Hadoop
No ratings yet
Unit IV Hadoop
90 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Big Data
No ratings yet
Big Data
67 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
Big Data Introduction PDF
No ratings yet
Big Data Introduction PDF
180 pages
Hadoop by Dr. Kamal Gulati
No ratings yet
Hadoop by Dr. Kamal Gulati
33 pages
Unit 2,3
No ratings yet
Unit 2,3
24 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Unit 2
No ratings yet
Unit 2
9 pages
HADOOP
No ratings yet
HADOOP
55 pages
Bigdata Module2 7th-Sem 18cs72
No ratings yet
Bigdata Module2 7th-Sem 18cs72
64 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
Unit 3 Hadoop
No ratings yet
Unit 3 Hadoop
50 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Module 2
No ratings yet
Module 2
34 pages
RTK Notes m1
No ratings yet
RTK Notes m1
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Hadoop BigData Testing Overview
No ratings yet
Hadoop BigData Testing Overview
37 pages
Big Data Aktu Unit 2
No ratings yet
Big Data Aktu Unit 2
127 pages
BDA_Module2
No ratings yet
BDA_Module2
83 pages
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
No ratings yet
WWW Doubtly in Big Data Analytics Semester 7 Mu Ai Ds Viva Qna
7 pages
BIGDATA
No ratings yet
BIGDATA
180 pages
Elementary Concepts of Big Data and Hadoop
No ratings yet
Elementary Concepts of Big Data and Hadoop
4 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Testing Big Data: Camelia Rad
No ratings yet
Testing Big Data: Camelia Rad
31 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
20 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
Efficient Ways To Improve The Performance of HDFS For Small Files
No ratings yet
Efficient Ways To Improve The Performance of HDFS For Small Files
5 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
4 IJITEEfgdfgfdgfdg
No ratings yet
4 IJITEEfgdfgfdgfdg
6 pages
Unit 2
No ratings yet
Unit 2
22 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
Big Data
No ratings yet
Big Data
43 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
58 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
Hadoop - The Final Product
100% (2)
Hadoop - The Final Product
42 pages
Module - 2
No ratings yet
Module - 2
84 pages
Unit 3 Introduction To Hadoop Syllabus
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
Unit I
No ratings yet
Unit I
38 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Winproladder Manual (Unlockplc - Com)
No ratings yet
Winproladder Manual (Unlockplc - Com)
195 pages
DRM
No ratings yet
DRM
9 pages
94 - Casper T Wi Fi Network Interception PDF
No ratings yet
94 - Casper T Wi Fi Network Interception PDF
2 pages
Changelog
No ratings yet
Changelog
2 pages
Vivax 2104 STR g5653 - PDF
No ratings yet
Vivax 2104 STR g5653 - PDF
31 pages
LCD TV - Training - Manual - ML012
No ratings yet
LCD TV - Training - Manual - ML012
51 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
10 pages
AirVault FAQ
100% (1)
AirVault FAQ
12 pages
On Windows 8
No ratings yet
On Windows 8
36 pages
01 Logisim
No ratings yet
01 Logisim
17 pages
Bye-Bye Bluetooth - Hello WiFi Audio
No ratings yet
Bye-Bye Bluetooth - Hello WiFi Audio
3 pages
Input Devices
No ratings yet
Input Devices
34 pages
Hi-Speed USB-USB Network Cable Quick Network Setup Guide: Congratulations!
No ratings yet
Hi-Speed USB-USB Network Cable Quick Network Setup Guide: Congratulations!
7 pages
X2 Interface - LTE
100% (1)
X2 Interface - LTE
41 pages
Sean Coleman Resume
No ratings yet
Sean Coleman Resume
2 pages
Extracting Tabular Data From Pdfs With Camelot and Excalibur
No ratings yet
Extracting Tabular Data From Pdfs With Camelot and Excalibur
13 pages
E Contec PDF
100% (1)
E Contec PDF
6 pages
Inter-Integrated Circuit (I2C) : Karthik Hemmanur ECE 480-Design Team 3 Fall 2009
100% (1)
Inter-Integrated Circuit (I2C) : Karthik Hemmanur ECE 480-Design Team 3 Fall 2009
8 pages
ModScan - Defcon 2008
No ratings yet
ModScan - Defcon 2008
31 pages
Supporting Cisco Data Center Networking Devices (010-151) : Exam Description
No ratings yet
Supporting Cisco Data Center Networking Devices (010-151) : Exam Description
2 pages
Connection
No ratings yet
Connection
54 pages
Vsat Implementation
100% (1)
Vsat Implementation
29 pages
RTC Interfacing Using I2C #18f4550
No ratings yet
RTC Interfacing Using I2C #18f4550
6 pages
Prof DR MD Abdul Mottalib: Chapter 1-Introduction
No ratings yet
Prof DR MD Abdul Mottalib: Chapter 1-Introduction
35 pages
AI ZTE Roadmap
No ratings yet
AI ZTE Roadmap
26 pages
Hazelcast
No ratings yet
Hazelcast
8 pages
Information Theory & Coding - (PE-EC603D) - 6752 - I040
No ratings yet
Information Theory & Coding - (PE-EC603D) - 6752 - I040
2 pages
Tricks of The Trade - How Malware Authors Cover Their Tracks
No ratings yet
Tricks of The Trade - How Malware Authors Cover Their Tracks
10 pages

The Solution For Big Data Hadoop

Uploaded by

The Solution For Big Data Hadoop

Uploaded by

The solution for Big data

Data trends in storing data

of data sets so large and complex data that

BIGDATA and problems with

into FACEBOOK including 40 millions photos.

THEN WHAT COULD BE THE

allows for the distributed processing of

The project includes these

2. Hadoop Distributed File

system based on GFS (Google File System),

What does a HDFS contain

namespaces and they are federated.

block reports and handles commands

Structure of Hadoop system:

track of namespace and metadata about items

Hadoop currently configured with centurion064 as

blocks of data sent from master node

Currently centurion060, centurion064 are the two

Advantages and Limitations of

MAP AND REDUCE METHODS USAGE

Run this program as a

WORD COUNT OVER A GIVEN

MAPREDUCE IN WITH NO REDUCE TASKS

MAPREDUCE WITH TWO REDUCE

Shuffle and sort in MapReduce

Prominent users of HADOOP

prize at Media Guardian Innovation Award

You might also like