The document discusses the challenges of big data, focusing on volume, variety, and velocity, and presents Hadoop as a solution for managing these issues. Hadoop offers advantages such as low cost, scalability, and inherent data protection through features like Replication Factor and MapReduce programming. It emphasizes Hadoop's capabilities for massive data storage and faster data processing across multiple nodes.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
6 views6 pages
3.4 Introduction To HADOOP System
The document discusses the challenges of big data, focusing on volume, variety, and velocity, and presents Hadoop as a solution for managing these issues. Hadoop offers advantages such as low cost, scalability, and inherent data protection through features like Replication Factor and MapReduce programming. It emphasizes Hadoop's capabilities for massive data storage and faster data processing across multiple nodes.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6
Big Data Challenges
•Volume, Variety and Velocity
• “How to store terabytes of mounting data?” • VOLUME • “How to handle structured, semi-structured and unstructured data?” • VARIETY • “How to manage the data that is getting generated at very fast speed?” • VELOCITY Why Hadoop? • Key consideration • Hadoop can handle ü Massive amount of data ü Different kinds of data ü In fast manner
Low cost – open source
• Advantages Computing power – many nodes can be used for computation
Scalability – simple add nodes in system
Storage Flexibility – can store unstructured data easily
Inherent data protection – protects against hardware failures
Distributed Computing Challenges
• Problems and Solutions
• Storage of huge amount of data ü More systems , more failures ü How to retrieve the data stored on the failed node? ü Hadoop solves this by Replication Factor (RF) ü Number of data copies of a given data item / data block stored across the network
ü Processing the huge amount of data
ü Data is spread across systems, how to process it in quick manner ü Challenge is to integrate data from different machines before processing ü Hadoop solves this by MapReduce Programming ü Programming model to process huge amount of data at same time in quick manner What is Hadoop?
• Key Aspects • Two Tasks ü Massive Data Storage q Huge of amount of data across several nodes q Uses low cost commodity storage
ü Faster Data Processing
q Has everything needed for data processing application development q Computation done parallel on several nodes at same time Hadoop Ecosystem Hadoop Ecosystem Hadoop High Level Architecture