0% found this document useful (0 votes)
7 views14 pages

Big Data Technologies

Uploaded by

nour.salameh03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Big Data Technologies

Uploaded by

nour.salameh03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Big Data Technologies

MapReduce
•Category: Data Analytics
•Description: A programming
framework for processing and
generating large datasets. It splits
tasks into mapping (processing) and
reducing (aggregating) phases.
•Main Function: Parallel processing
and analysis of large datasets.
Hadoop
• Category: Data Storage & Data Management
• Description: An open-source framework that
uses HDFS (Hadoop Distributed File System) for
storing and managing big data.
• Main Function: Distributed storage and
management of large-scale data.
subprojects for Hadoop

Data Ingestion: Flume, Sqoop, Avro

Data Storage: HBase, Avro

Data Management: Oozie, Ambari, Hcatalog

Data Analytics: Hive, Pig, Mahou


1-Flume

Category: Data Ingestion Description: A distributed service Main Function: Real-time ingestion
for collecting and transferring log of large volumes of data.
data to Hadoop.
2-Sqoop

Category: Data Ingestion Description: A tool for transferring Main Function: Importing/exporting
data between Hadoop and relational data between Hadoop and RDBMS.
databases.
3-Avro

Category: Data Storage & Data Description: A framework for data Main Function: Efficient serialization
Ingestion serialization and exchange. and exchange of data across
systems.
4-HBase

Category: Data Storage Description: A NoSQL database built Main Function: Storage and
on top of Hadoop. retrieval of semi-structured and
unstructured data.
5-Oozie

Category: Data Management Description: A workflow scheduler Main Function: Scheduling and
for managing Hadoop tasks. coordination of Hadoop processing
jobs.
6-Ambari

Category: Data Management Description: A management Main Function :Simplifies the


platform for provisioning, management of Hadoop
monitoring, and managing clusters.
Hadoop clusters.
7-Hcatalog

Category: Data Management Description: A metadata Main Function: Organizing and


management tool for Hadoop. managing data tables and metadata.
8- Hive

Category: Data Analytics Description: A data warehouse tool Main Function: Enables users to
providing SQL-like querying on data perform queries and analytics on
stored in Hadoop. Hadoop data using SQL.
9- Pig

Category: Data Analytics Description: A platform with its own Main Function: Flexible data
scripting language (Pig Latin) for analysis and processing
analyzing large data sets.
10-
Mahout

Category: Data Analytics Description: A library for scalable Main Function: Importing/exporting
machine learning built on Hadoop. data between Hadoop and RDBMS.

You might also like