Big Data Technologies
Big Data Technologies
MapReduce
•Category: Data Analytics
•Description: A programming
framework for processing and
generating large datasets. It splits
tasks into mapping (processing) and
reducing (aggregating) phases.
•Main Function: Parallel processing
and analysis of large datasets.
Hadoop
• Category: Data Storage & Data Management
• Description: An open-source framework that
uses HDFS (Hadoop Distributed File System) for
storing and managing big data.
• Main Function: Distributed storage and
management of large-scale data.
subprojects for Hadoop
Category: Data Ingestion Description: A distributed service Main Function: Real-time ingestion
for collecting and transferring log of large volumes of data.
data to Hadoop.
2-Sqoop
Category: Data Ingestion Description: A tool for transferring Main Function: Importing/exporting
data between Hadoop and relational data between Hadoop and RDBMS.
databases.
3-Avro
Category: Data Storage & Data Description: A framework for data Main Function: Efficient serialization
Ingestion serialization and exchange. and exchange of data across
systems.
4-HBase
Category: Data Storage Description: A NoSQL database built Main Function: Storage and
on top of Hadoop. retrieval of semi-structured and
unstructured data.
5-Oozie
Category: Data Management Description: A workflow scheduler Main Function: Scheduling and
for managing Hadoop tasks. coordination of Hadoop processing
jobs.
6-Ambari
Category: Data Analytics Description: A data warehouse tool Main Function: Enables users to
providing SQL-like querying on data perform queries and analytics on
stored in Hadoop. Hadoop data using SQL.
9- Pig
Category: Data Analytics Description: A platform with its own Main Function: Flexible data
scripting language (Pig Latin) for analysis and processing
analyzing large data sets.
10-
Mahout
Category: Data Analytics Description: A library for scalable Main Function: Importing/exporting
machine learning built on Hadoop. data between Hadoop and RDBMS.