Module - 3 - Session - 1 The History of Hadoop

Hadoop, created by Doug Cutting, originated from Apache Nutch and was developed to address scalability issues for web search. It became an independent project in 2006 and gained significant traction, with major companies like Yahoo!, Facebook, and the New York Times adopting it. Today, Hadoop serves as a general-purpose platform for big data storage and analysis, supported by various enterprise vendors.

Uploaded by

s903019.1265

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views2 pages

Module - 3 - Session - 1 The History of Hadoop

Uploaded by

s903019.1265

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CS6CRT19 Big Data Analytics Module 3

The History of Hadoop

Hadoop was created by Doug Cutting, the creator of Apache Lucene, the
widely used text search library. Hadoop has its origins in Apache Nutch, an open
source web search engine, itself a part of the Lucene project.
The name Hadoop is a made-up name. The name Hadoop is the name given
by Doug Cutting’s kid to a stuffed yellow elephant. Projects in the Hadoop
ecosystem also tend to have names that are unrelated to their function, often with
an elephant or other animal theme.
Apache Nutch was started in 2002, and a working crawler and search system
quickly emerged. However, the creators realized that their architecture wouldn’t
scale to the billions of pages on the web. They found the solution for this, from a
paper published in 2003, which describes the architecture of Google’s distributed
file system (GFS). In 2004, Nutch’s developers set about writing an open source
implementation, the Nutch Distributed File System (NDFS).
1n 2004, Google published the paper that introduced MapReduce to the
world. Early in 2005, the Nutch developers had a working MapReduce
implementation in Nutch, and by the middle of that year all the major Nutch
algorithms had been ported to run using MapReduce and NDFS.
NDFS and the MapReduce implementation in Nutch were applicable beyond
the domain of search, and in February 2006, they moved out of Nutch to form an
independent subproject of Lucene called Hadoop. At around the same time, Doug
Cutting joined Yahoo!, which provided a dedicated team and the resources to turn
Hadoop into a system that runs at web scale. This was demonstrated in February
2008, when Yahoo! announced that its production search index was being
generated by a 10,000 core Hadoop cluster.
In January 2008, Hadoop was made its own top-level project at Apache,
confirming its success and its diverse active community. By this time, Hadoop was

Swamy Saswathikananda College, Poothotta 1

CS6CRT19 Big Data Analytics Module 3

being used by many other companies besides Yahoo!, such as Facebook, and the
New York Times.
In April 2008, Hadoop broke a world record to become the fastest system
to sort an entire terabyte of data. Running on a 910-node cluster, Hadoop sorted
1 terabyte in 209 seconds (just under 3.5 minutes), beating the previous year’s
winner of 297 seconds. In November of the same year, Google reported that its
MapReduce implementation sorted 1 terabyte in 68 seconds. Then, in April 2009,
it was announced that a team at Yahoo! Had used Hadoop to sort 1 terabyte in 62
seconds.
Today, Hadoop is widely used in mainstream enterprises. Hadoop’s role as a
general purpose storage and analysis platform for big data has been recognized by
the industry, and this fact is reflected in the number of products that use or
incorporate Hadoop in some way. Commercial Hadoop support is available from
large, established enterprise vendors, including EMC, IBM, Microsoft, and
Oracle, as well as from specialist Hadoop companies such as Cloudera,
Hortonworks, etc.

Swamy Saswathikananda College, Poothotta 2

Cof-C02 6
No ratings yet
Cof-C02 6
38 pages
Practice Question For Information Retrieval Subject
No ratings yet
Practice Question For Information Retrieval Subject
5 pages
SQL Server DBA Checklist-Ok
No ratings yet
SQL Server DBA Checklist-Ok
4 pages
1 Odbc: 1.1 Window Composition
No ratings yet
1 Odbc: 1.1 Window Composition
50 pages
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
No ratings yet
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
14 pages
Bda Aiml Note Unit 2
No ratings yet
Bda Aiml Note Unit 2
13 pages
Apache Hadoop: Big Data (Unit 2)
No ratings yet
Apache Hadoop: Big Data (Unit 2)
40 pages
Unit 2
No ratings yet
Unit 2
10 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
30 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
Unit 2
No ratings yet
Unit 2
28 pages
Unit 2
No ratings yet
Unit 2
30 pages
Unit II BDA
No ratings yet
Unit II BDA
32 pages
IBM Hadoop
No ratings yet
IBM Hadoop
11 pages
Big Data Analytics Unit-II New 2025
No ratings yet
Big Data Analytics Unit-II New 2025
62 pages
Unit Iii
No ratings yet
Unit Iii
43 pages
A Brief History of Hadoop
No ratings yet
A Brief History of Hadoop
1 page
00 HadoopWelcome Transcript
No ratings yet
00 HadoopWelcome Transcript
4 pages
History of Hadoop Apache Hadoop - The Hadoop Distributed File System
No ratings yet
History of Hadoop Apache Hadoop - The Hadoop Distributed File System
8 pages
Big Data Notes - 2 Unit
No ratings yet
Big Data Notes - 2 Unit
20 pages
Ad205 Unit1
No ratings yet
Ad205 Unit1
63 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Cloud Computing
No ratings yet
Cloud Computing
21 pages
Chicago Crime (2013) Analysis Using Pig and Visualization Using R
No ratings yet
Chicago Crime (2013) Analysis Using Pig and Visualization Using R
61 pages
CC 2
No ratings yet
CC 2
25 pages
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
No ratings yet
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
6 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
61 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Unit 3 Introduction To Hadoop Syllabus
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
58 pages
Week+3+ (8W) + +Exploring+Hadoop+Ecosystem+ (W6)
No ratings yet
Week+3+ (8W) + +Exploring+Hadoop+Ecosystem+ (W6)
72 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
UNIT-4-Hadoop Ecosystem-Part 1
No ratings yet
UNIT-4-Hadoop Ecosystem-Part 1
22 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
Unit 2
No ratings yet
Unit 2
21 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Module 2
No ratings yet
Module 2
34 pages
Big Data 3rd Module
No ratings yet
Big Data 3rd Module
22 pages
Unit III
No ratings yet
Unit III
32 pages
01 BDA Hadoop Overview
No ratings yet
01 BDA Hadoop Overview
19 pages
Week 3 (8W) - Exploring Hadoop Ecosystem (W6) - Revised
No ratings yet
Week 3 (8W) - Exploring Hadoop Ecosystem (W6) - Revised
66 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Unit 2
No ratings yet
Unit 2
73 pages
Ha Do Op Cluster
No ratings yet
Ha Do Op Cluster
12 pages
BDA Module2
No ratings yet
BDA Module2
43 pages
Big Data NoSLQ Kopyası
No ratings yet
Big Data NoSLQ Kopyası
51 pages
Hadoop Notes 2
No ratings yet
Hadoop Notes 2
5 pages
Hadoop Architecture and Its Functionality
No ratings yet
Hadoop Architecture and Its Functionality
7 pages
Lect 2 Big Data Lesson01
No ratings yet
Lect 2 Big Data Lesson01
26 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Hadoop
No ratings yet
Hadoop
562 pages
What Is Bigdata
No ratings yet
What Is Bigdata
5 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages
Student name:TARUN KUMAR Roll No:1314310112
No ratings yet
Student name:TARUN KUMAR Roll No:1314310112
22 pages
HADOOP
No ratings yet
HADOOP
34 pages
Hadoop Overview
No ratings yet
Hadoop Overview
26 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Implementing Cloud Storage with OpenStack Swift
From Everand
Implementing Cloud Storage with OpenStack Swift
Amar Kapadia
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Capstone Project
No ratings yet
Capstone Project
20 pages
Log
No ratings yet
Log
12 pages
DBMS Lec No 5
No ratings yet
DBMS Lec No 5
27 pages
Tableau Q&a Guide
No ratings yet
Tableau Q&a Guide
16 pages
ADSO
No ratings yet
ADSO
8 pages
KashishRana Resume-1
No ratings yet
KashishRana Resume-1
1 page
$MTTR Advisor
No ratings yet
$MTTR Advisor
5 pages
Kuis Alibaba Cloud MaxCompute-SQL Development
100% (1)
Kuis Alibaba Cloud MaxCompute-SQL Development
5 pages
ITC423 Assignment 01
No ratings yet
ITC423 Assignment 01
8 pages
Article-31 - Integration of Essbase Cube With OBIEE 11g
No ratings yet
Article-31 - Integration of Essbase Cube With OBIEE 11g
12 pages
Chapter 3 Olap and Oltp
No ratings yet
Chapter 3 Olap and Oltp
29 pages
Dbms Assignment 1
No ratings yet
Dbms Assignment 1
12 pages
Reformatted Questions With Answers
No ratings yet
Reformatted Questions With Answers
51 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
12 pages
Perl Dbi Dpriest
100% (1)
Perl Dbi Dpriest
45 pages
Evaluación Final de Archivos
No ratings yet
Evaluación Final de Archivos
7 pages
Unit 5
No ratings yet
Unit 5
36 pages
Practical No 24 Gad 22034 1
No ratings yet
Practical No 24 Gad 22034 1
3 pages
New Features Guide: Informatica (Version 9.1.0)
No ratings yet
New Features Guide: Informatica (Version 9.1.0)
18 pages
BCA V Semester
No ratings yet
BCA V Semester
2 pages
Unit 1
No ratings yet
Unit 1
47 pages
Bachelor of Business Administration (COMP. APPLI.) (2019 Pattern)
No ratings yet
Bachelor of Business Administration (COMP. APPLI.) (2019 Pattern)
99 pages
Module 2 - The Excel Data Model
No ratings yet
Module 2 - The Excel Data Model
17 pages
Chapter 6 Management Information System
No ratings yet
Chapter 6 Management Information System
6 pages
Data Engineering SQL Top 100 Questions With Answers
No ratings yet
Data Engineering SQL Top 100 Questions With Answers
297 pages
HRIS Chap 2
67% (3)
HRIS Chap 2
33 pages

Module - 3 - Session - 1 The History of Hadoop

Uploaded by

Module - 3 - Session - 1 The History of Hadoop

Uploaded by

CS6CRT19 Big Data Analytics​ ​ ​ ​ ​ ​ ​ Module 3

The History of Hadoop

Swamy Saswathikananda College, Poothotta​ ​ ​ ​ ​ ​ ​ 1

Swamy Saswathikananda College, Poothotta​ ​ ​ ​ ​ ​ ​ 2

You might also like

CS6CRT19 Big Data Analytics Module 3

Swamy Saswathikananda College, Poothotta 1

Swamy Saswathikananda College, Poothotta 2