0% found this document useful (0 votes)
6 views6 pages

3.4 Introduction To HADOOP System

The document discusses the challenges of big data, focusing on volume, variety, and velocity, and presents Hadoop as a solution for managing these issues. Hadoop offers advantages such as low cost, scalability, and inherent data protection through features like Replication Factor and MapReduce programming. It emphasizes Hadoop's capabilities for massive data storage and faster data processing across multiple nodes.

Uploaded by

seceh93562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

3.4 Introduction To HADOOP System

The document discusses the challenges of big data, focusing on volume, variety, and velocity, and presents Hadoop as a solution for managing these issues. Hadoop offers advantages such as low cost, scalability, and inherent data protection through features like Replication Factor and MapReduce programming. It emphasizes Hadoop's capabilities for massive data storage and faster data processing across multiple nodes.

Uploaded by

seceh93562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Big Data Challenges

•Volume, Variety and Velocity


• “How to store terabytes of
mounting data?”
• VOLUME
• “How to handle structured,
semi-structured and
unstructured data?”
• VARIETY
• “How to manage the data that
is getting generated at very fast
speed?”
• VELOCITY
Why Hadoop?
• Key consideration
• Hadoop can handle
ü Massive amount of data
ü Different kinds of data
ü In fast manner

Low cost – open source


• Advantages
Computing power – many nodes can be used for computation

Scalability – simple add nodes in system

Storage Flexibility – can store unstructured data easily

Inherent data protection – protects against hardware failures


Distributed Computing Challenges

• Problems and Solutions


• Storage of huge amount of data
ü More systems , more failures
ü How to retrieve the data stored on the failed node?
ü Hadoop solves this by Replication Factor (RF)
ü Number of data copies of a given data item / data block stored across the network

ü Processing the huge amount of data


ü Data is spread across systems, how to process it in quick manner
ü Challenge is to integrate data from different machines before processing
ü Hadoop solves this by MapReduce Programming
ü Programming model to process huge amount of data at same time in quick manner
What is Hadoop?

• Key Aspects
• Two Tasks
ü Massive Data Storage
q Huge of amount of data across several nodes
q Uses low cost commodity storage

ü Faster Data Processing


q Has everything needed for data processing
application development
q Computation done parallel on several nodes at
same time
Hadoop Ecosystem
Hadoop Ecosystem
Hadoop High Level Architecture

You might also like