0% found this document useful (0 votes)
5 views3 pages

Hadoop

Hadoop utilizes the MapReduce model for processing large-scale data through a mapping step that generates key-value pairs and a reducing step that aggregates these pairs for final output. The evolution of Big Data includes milestones such as data warehousing, the introduction of Hadoop, NoSQL databases, cloud computing, machine learning, data streaming, and edge computing, which have transformed data storage and analysis. As technology advances, Big Data is expected to significantly impact various industries.

Uploaded by

seceh93562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Hadoop

Hadoop utilizes the MapReduce model for processing large-scale data through a mapping step that generates key-value pairs and a reducing step that aggregates these pairs for final output. The evolution of Big Data includes milestones such as data warehousing, the introduction of Hadoop, NoSQL databases, cloud computing, machine learning, data streaming, and edge computing, which have transformed data storage and analysis. As technology advances, Big Data is expected to significantly impact various industries.

Uploaded by

seceh93562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Hadoop

Hadoop primarily uses the MapReduce computational method for processing and
managing large-scale data. MapReduce is a programming model that enables the
processing of large datasets across a distributed cluster of computers.
Key Components of MapReduce:
Map Function: Processes input data and produces intermediate key-value pairs.
This is the "mapping" step where data is distributed and processed in parallel.
Reduce Function: Takes the intermediate key-value pairs produced by the map
function, aggregates them, and produces the final output. This is the "reducing"
step, which combines the results of the map step.
How It Works:
Mapping: The input dataset is divided into smaller, manageable chunks, which are
then processed independently by the map tasks in parallel across the distributed
environment.
Shuffling and Sorting: The output from the map tasks is shuffled and sorted based
on the keys to ensure that all values associated with the same key are grouped
together.
Reducing: The reduce tasks take the grouped data and perform the desired
computation, such as aggregation, filtering, or summarization, to produce the final
result.
Hadoop's MapReduce framework is highly scalable and designed to handle very
large datasets, making it ideal for big data applications.

Evolution of Big Data and its Impact on Database Management Systems


Evolution of Big Data:
If we see the last few decades, we can analyze that Big Data technology has gained
so much growth. There are a lot of milestones in the evolution of Big Data which
are described below:
Data Warehousing:
In the 1990s, data warehousing emerged as a solution to store and analyze large
volumes of structured data.
Hadoop:
Hadoop was introduced in 2006 by Doug Cutting and Mike Cafarella. Distributed
storage medium and large data processing are provided by Hadoop, and it is an
open-source framework.
NoSQL Databases:
In 2009, NoSQL databases were introduced, which provide a flexible way to store
and retrieve unstructured data.
Cloud Computing:
Cloud Computing technology helps companies to store their important data in data
centers that are remote, and it saves their infrastructure cost and maintenance costs.
Machine Learning:
Machine Learning algorithms are those algorithms that work on large data, and
analysis is done on a huge amount of data to get meaningful insights from it. This
has led to the development of artificial intelligence (AI) applications.
Data Streaming:
Data Streaming technology has emerged as a solution to process large volumes of
data in real time.
Edge Computing:
Edge Computing is a kind of distributed computing paradigm that allows data
processing to be done at the edge or the corner of the network, closer to the source
of the data.
Overall, big data technology has come a long way since the early days of data
warehousing. The introduction of Hadoop, NoSQL databases, cloud computing,
machine learning, data streaming, and edge computing has revolutionized how we
store, process, and analyze large volumes of data. As technology evolves, we can
expect Big Data to play a very important role in various industries.

You might also like