0% found this document useful (0 votes)
27 views2 pages

Module - 3 - Session - 1 The History of Hadoop

Hadoop, created by Doug Cutting, originated from Apache Nutch and was developed to address scalability issues for web search. It became an independent project in 2006 and gained significant traction, with major companies like Yahoo!, Facebook, and the New York Times adopting it. Today, Hadoop serves as a general-purpose platform for big data storage and analysis, supported by various enterprise vendors.

Uploaded by

s903019.1265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views2 pages

Module - 3 - Session - 1 The History of Hadoop

Hadoop, created by Doug Cutting, originated from Apache Nutch and was developed to address scalability issues for web search. It became an independent project in 2006 and gained significant traction, with major companies like Yahoo!, Facebook, and the New York Times adopting it. Today, Hadoop serves as a general-purpose platform for big data storage and analysis, supported by various enterprise vendors.

Uploaded by

s903019.1265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CS6CRT19 Big Data Analytics​ ​ ​ ​ ​ ​ ​ Module 3

The History of Hadoop


​ Hadoop was created by Doug Cutting, the creator of Apache Lucene, the
widely used text search library. Hadoop has its origins in Apache Nutch, an open
source web search engine, itself a part of the Lucene project.
​ The name Hadoop is a made-up name. The name Hadoop is the name given
by Doug Cutting’s kid to a stuffed yellow elephant. Projects in the Hadoop
ecosystem also tend to have names that are unrelated to their function, often with
an elephant or other animal theme.
​ Apache Nutch was started in 2002, and a working crawler and search system
quickly emerged. However, the creators realized that their architecture wouldn’t
scale to the billions of pages on the web. They found the solution for this, from a
paper published in 2003, which describes the architecture of Google’s distributed
file system (GFS). In 2004, Nutch’s developers set about writing an open source
implementation, the Nutch Distributed File System (NDFS).
​ 1n 2004, Google published the paper that introduced MapReduce to the
world. Early in 2005, the Nutch developers had a working MapReduce
implementation in Nutch, and by the middle of that year all the major Nutch
algorithms had been ported to run using MapReduce and NDFS.
​ NDFS and the MapReduce implementation in Nutch were applicable beyond
the domain of search, and in February 2006, they moved out of Nutch to form an
independent subproject of Lucene called Hadoop. At around the same time, Doug
Cutting joined Yahoo!, which provided a dedicated team and the resources to turn
Hadoop into a system that runs at web scale. This was demonstrated in February
2008, when Yahoo! announced that its production search index was being
generated by a 10,000 core Hadoop cluster.
In January 2008, Hadoop was made its own top-level project at Apache,
confirming its success and its diverse active community. By this time, Hadoop was

Swamy Saswathikananda College, Poothotta​ ​ ​ ​ ​ ​ ​ 1


CS6CRT19 Big Data Analytics​ ​ ​ ​ ​ ​ ​ Module 3

being used by many other companies besides Yahoo!, such as Facebook, and the
New York Times.
In April 2008, Hadoop broke a world record to become the fastest system
to sort an entire terabyte of data. Running on a 910-node cluster, Hadoop sorted
1 terabyte in 209 seconds (just under 3.5 minutes), beating the previous year’s
winner of 297 seconds. In November of the same year, Google reported that its
MapReduce implementation sorted 1 terabyte in 68 seconds. Then, in April 2009,
it was announced that a team at Yahoo! Had used Hadoop to sort 1 terabyte in 62
seconds.
Today, Hadoop is widely used in mainstream enterprises. Hadoop’s role as a
general purpose storage and analysis platform for big data has been recognized by
the industry, and this fact is reflected in the number of products that use or
incorporate Hadoop in some way. Commercial Hadoop support is available from
large, established enterprise vendors, including EMC, IBM, Microsoft, and
Oracle, as well as from specialist Hadoop companies such as Cloudera,
Hortonworks, etc.

Swamy Saswathikananda College, Poothotta​ ​ ​ ​ ​ ​ ​ 2

You might also like