0% found this document useful (0 votes)

28 views16 pages

CASE STUDY On Application of Hadoop

The document provides details about a case study on the application of Hadoop. It discusses the history of Hadoop, how it was started by Doug Cutting and Mike Cafarella in 2002 based on the Google File System paper. It then summarizes the key components and modules of Hadoop including HDFS, YARN, and MapReduce. The document outlines how Hadoop works with its distributed architecture and the advantages it provides like being fast, scalable, and cost effective.

Uploaded by

haqueashraful713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views16 pages

CASE STUDY On Application of Hadoop

Uploaded by

haqueashraful713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

CASE STUDY ON APPLICATON OF HADOOP

TEAM NAME: - “UDAAN”

Nirbhay Pandey (B.B.A)
Alok Kumar Chaturvedi (B.C.A)
Ashraful Haque (B.B.A)
Pintu kumar (B.B.A)
Saurav raj (B.B.A)

Quantum University,
Mandawara (22 km millstone)
Roorkee Dehradun highway.

History of Hadoop
The Hadoop was started by Doug Cutting and Mike Cafarella in 2002. Its
origin was the Google File System paper, published by Google.

Let's focus on the history of Hadoop in the following steps: -

o In 2002, Doug Cutting and Mike Cafarella started to work
on a project, Apache Nutch. It is an open source web
crawler software project.
o While working on Apache Nutch, they were dealing with
big data. To store that data they have to spend a lot of
costs which becomes the consequence of that project. This
problem becomes one of the important reason for the
emergence of Hadoop.
o In 2003, Google introduced a file system known as GFS
(Google file system). It is a proprietary distributed file
system developed to provide efficient access to data.
o In 2004, Google released a white paper on Map Reduce.
This technique simplifies the data processing on large
clusters.
o In 2005, Doug Cutting and Mike Cafarella introduced a
new file system known as NDFS (Nutch Distributed File
System). This file system also includes Map reduce.
o In 2006, Doug Cutting quit Google and joined Yahoo. On
the basis of the Nutch project, Dough Cutting introduces a
new project Hadoop with a file system known as HDFS
(Hadoop Distributed File System). Hadoop first version
0.1.0 released in this year.
o Doug Cutting gave named his project Hadoop after his
son's toy elephant.
o In 2007, Yahoo runs two clusters of 1000 machines.
o In 2008, Hadoop became the fastest system to sort 1
terabyte of data on a 900 node cluster within 209 seconds.
o In 2013, Hadoop 2.2 was released.
o In 2017, Hadoop 3.0 was released.

INTRODUCTION OF HADOOP: -

Hadoop is an open-source distributed processing framework that

manages data processing and storage for big data applications in scalable
clusters of computer servers. It's at the center of an ecosystem of big
data technologies that are primarily used to support advanced analytics
initiatives, including predictive analytics, data mining and machine
learning. Hadoop systems can handle various forms of structured and
unstructured data, giving users more flexibility for collecting, processing,
analysing and managing data than relational databases and data
warehouses provide.
Hadoop's ability to process and store different types of data makes it a
particularly good fit for big data environments. They typically involve not
only large amounts of data, but also a mix of structured transaction
data and semistructured and unstructured information, such as internet
clickstream records, web server and mobile application logs, social media
posts, customer emails and sensor data from the internet of things (IoT).

Formally known as Apache Hadoop, the technology is developed as part of

an open source project within the Apache software foundation. Multiple
vendors offer commercial Hadoop distributions, although the number of
Hadoop vendors has declined because of an overcrowded market and then
competitive pressures driven by the increased deployment of big data
systems in the cloud. The shift to the cloud also enables users to store data
in lower-cost cloud object storage services instead of Hadoop's namesake
file system; as a result, Hadoop's role is being reduced in some big data
architectures.

Hadoop and big data:-

Hadoop runs on commodity servers and can scale up to support
thousands of hardware nodes. The Hadoop Distributed File System
(HDFS) is designed to provide rapid data access across the nodes in a
cluster, plus fault-tolerant capabilities so applications can continue to
run if individual nodes fail. Those features helped Hadoop become a
foundational data management platform for big data analytics uses
after it emerged in the mid-2000s.

Because Hadoop can process and store such a wide assortment of data, it
enables organizations to set up data lakes as expansive reservoirs for
incoming streams of information. In a Hadoop data lake, raw data is often
stored as is so data scientists and other analysts can access the full data
sets, if need be; the data is then filtered and prepared by analytics or IT
teams, as needed, to support different applications.

Data lakes generally serve different purposes than traditional data

warehouses that hold cleansed sets of transaction data. But, in some cases,
companies view their Hadoop data lakes as modern-day data warehouses.
Either way, the growing role of big data analytics in business decision-
making has made effective data governance and data security processes a
priority in data lake deployments and Hadoop systems in general.

Components of Hadoop and how it works:-

The core components in the first iteration of Hadoop were MapReduce,
HDFS and Hadoop Common, a set of shared utilities and libraries. As its
name indicates, MapReduce uses map and reduce functions to split
processing jobs into multiple tasks that run at the cluster nodes where data is
stored and then to combine what the tasks produce into a coherent set of
results. MapReduce initially functioned as both Hadoop's processing engine
and cluster resource manager, which tied HDFS directly to it and limited
users to running MapReduce batch applications.

That changed in Hadoop 2.0, which became generally available in

October 2013 when version 2.2.0 was released. It introduced Apache
Hadoop YARN, a new cluster resource management and job scheduling
technology that took over those functions from MapReduce. YARN -- short
for Yet Another Resource Negotiator, but typically referred to by the
acronym alone -- ended the strict reliance on MapReduce and opened up
Hadoop to other processing engines and various applications besides batch
jobs. For example, Hadoop can now run applications on the Apache
Spark, Apache Flink, Apache Kafka and Apache Storm engines.

In Hadoop clusters, YARN sits between HDFS and the processing

engines deployed by users. The resource manager uses a combination of
containers, application coordinators and node-level monitoring agents to
dynamically allocate cluster resources to applications and oversee the
execution of processing jobs in a decentralized process. YARN supports
multiple job scheduling approaches, including a first-in-first-out queue and
several methods that schedule jobs based on assigned cluster resources.

The Hadoop 2.0 series of releases also added high availability and
federation features for HDFS, support for running Hadoop clusters on
Microsoft Windows servers and other capabilities designed to expand the
distributed processing framework's versatility for big data management and
analytics.

Hadoop 3.0.0 was the next major version of Hadoop. Released by

Apache in December 2017, it added a YARN Federation feature designed to
enable YARN to support tens of thousands of nodes or more in a single
cluster, up from a previous 10,000-node limit. The new version also
included support for GPUs and erasure coding, an alternative to data
replication that requires significantly less storage space.

Subsequent 3.1.x and 3.2.x updates enabled Hadoop users to run

YARN containers inside Docker ones and introduced a YARN service
framework that functions as a container orchestration platform. Two new
Hadoop components were also added with those releases: a machine
learning engine called Hadoop Submarine and the Hadoop Ozone object
store, which is built on the Hadoop Distributed Data Store block storage
layer and designed for use in on-premises systems.

Modules of Hadoop
1.HDFS: Hadoop Distributed File System. Google published its paper
GFS and on the basis of that HDFS was developed. It states that the
files will be broken into blocks and stored in nodes over the
distributed architecture.
2.Yarn: Yet another Resource Negotiator is used for job scheduling
and manage the cluster.
3.Map Reduce: This is a framework which helps Java programs to do the
parallel computation on data using key value pair. The Map task takes
input data and converts it into a data set which can be computed in Key
value pair. The output of Map task is consumed by reduce task and then
the out of reducer gives the desired result.
Hadoop Common: These Java libraries are used to start
Hadoop and are used by other Hadoop modules.
Hadoop Architecture:-

The Hadoop architecture is a package of the file system,

MapReduce engine and the HDFS (Hadoop Distributed File
System). The MapReduce engine can be MapReduce/MR1 or
YARN/MR2.
A Hadoop cluster consists of a single master and multiple slave
nodes. The master node includes Job Tracker, Task Tracker,
NameNode, and DataNode whereas the slave node includes
DataNode and TaskTracker.

Advantages of Hadoop
o Fast: In HDFS the data distributed over the cluster and are mapped
which helps in faster retrieval. Even the tools to process the data are
often on the same servers, thus reducing the processing time. It is able to
process terabytes of data in minutes and Peta bytes in hours.
o Scalable: Hadoop cluster can be extended by just adding nodes in
the cluster.
o Cost Effective: Hadoop is open source and uses commodity hardware
to store data so it really cost effective as compared to traditional
relational database management system.
o Resilient to failure: HDFS has the property with which it can replicate
data over the network, so if one node is down or some other network
failure happens, then Hadoop takes the other copy of data and use it.
Normally, data are replicated thrice but the replication factor is
configurable.

Features Of Hadoop
 It is best-suited for Big Data analysis

Typically, Big Data has an unstructured and distributed nature.

This is what makes Hadoop clusters best suited for Big Data
analysis. Hadoop functions on the ‘data locality’ concept,
which means that instead of the actual data, the processing
logic flows to the computing nodes, thereby consuming less
network bandwidth. This increases the efficiency of Hadoop
applications.
 It is scalable

The best thing about Hadoop clusters is that you can scale them
to any extent by adding additional cluster nodes to the network
without incorporating any modifications to application logic.
So, as the Big Data volume, variety, and velocity increase, you
can also scale the Hadoop cluster to accommodate the growing
data needs.
 It is fault-tolerant

In the Hadoop ecosystem, there’s a provision to replicate the

input data to other cluster nodes as well. Thus, if ever a cluster
node fails, data processing will not come to a standstill as
another cluster node can replace the failed node and continue
the process.
Hadoop Applications in the real-world
1. Security and Law Enforcement
Yes, Hadoop is now used as an active tool in Law enforcement.
Thanks to its speedy and reliable Big Data analysis, Hadoop is
helping Law enforcement agencies (like the police department)
to become more proactive, efficient, and accountable. For
instance, the national security agency of the USA uses Hadoop
to prevent terrorist attacks. Since Hadoop can help detect
security breaches and suspicious activities in real-time, it has
become an effective tool to predict criminal activity and catch
criminals.
Explore our Popular Software Engineering Courses
2. Enhance customer satisfaction and monitor online
reputation
Businesses are now using Hadoop to analyze sales data and compare
it against many other factors to determine when and at which time a
specific product sells best. By continually monitoring sales data,
business owners can find out why certain products sell better on
particular days or hours or season. In the same way, Hadoop can
also mine social media and online conversations to see what your
customers (both existing and potential) are saying about you on
online platforms. It monitors the sentiments behind the comments
and feedback of the customers. This insight helps marketers and
business owners to analyze customer pain points and what they
expect from the brand. All of this vital information can be used by
businesses and companies to enhance the quality of their products,
boost customer satisfaction quotient, and improve their online
reputation.
3. Monitor patient vitals
Many hospitals have started leveraging Hadoop to make their staff
more productive in their work process. Healthcare systems and
machines generate large volumes of unstructured data. Conventional
data processing systems cannot process and analyze such large
quantities of raw data. However, Hadoop can. An excellent case in
point is when the Children’s Healthcare of Atlanta fitted a sensor
beside the bed of its ICU units to continually track the vital of child
patients such as blood pressure, heartbeat, and respiratory rate. The
primary aim was to store and analyze these critical signs and be
alerted if ever there occurred any change in the patterns. This
allowed the healthcare provider to promptly send a team of doctors
and medical assistants to check on patients in need. This was made
possible using the core components of the Hadoop ecosystem
components – Hive, Flume, Impala, Spark, and Sqoop.
4. Healthcare Intelligence
Healthcare insurance companies usually combine all the associated
costs (including the risks involved) and equally divide it by the total
number of members in a particular group. Naturally, the outcomes
are always dynamic since they keep changing. This is where
Hadoop’s scalable and inexpensive feature can be highly useful.
Hadoop can efficiently accommodate dynamic data and scale
according to the ever-changing needs. By using Hadoop-based
healthcare intelligence apps, both healthcare providers and
healthcare insurance companies can devise smart business solutions
at an affordable cost.
Let’s assume that a healthcare insurance company wishes to find the
age in a region where people below a certain age limit aren’t prone
to a specific disease. This is to be done to help the company to
calculate the approximate cost of the insurance policy. However, to
gather the age data of the people in the region, the company will
have to invest a large sum of money in processing and analyzing
vast volumes of datasets to extract relevant information regarding
the disease in question, its symptoms, its target victims, and so on.
This is where Hadoop components like Pig, Hive, and MapReduce
can come in handy – these can process large datasets at relatively
low costs.
In-Demand Software Development Skills
Essentially, Hadoop’s primary function is to store, process, and
analyze massive volumes of data, including clickstream data.
Hadoop can successfully capture the following:
 Where did a visitor originate from before reaching a

particular website?
 What search term did the visitor use that lead to the
website?
 Which webpage did the visitor open first?

 What are the other webpages that interested the visitor?

 How much time did the visitor spend on each page?

 What product/service did the visitor decide to buy?

By helping you find the answers to all such questions, Hadoop

offers an analysis of the user engagement and website
performance. Thus, by leveraging Hadoop, companies of all
shapes and sizes can conduct clickstream analysis to optimize
the user-path and predict what product/service the customer is
likely to buy next, and where to allocate their web resources.
6. Track geolocation data
Smartphones have become a crucial part of our lives now. With
the number of smartphone users around the world increasing as
we speak, these tiny devices are the heartbeat of the digital
world. So, why not capitalize on this opportunity and use
smartphones to your advantage? Businesses can use Hadoop to
track the geolocation data on smartphones and tablets to track
customers’ movements, behavior patterns, purchases, and
predict their next move. Not just that, Hadoop clusters can also
streamline massive amounts of geolocation data and help
organizations to identify the challenges in their business and
operation processes.
7. Track sensor data
Today, electronic gadgets and machines are using sensors to
enhance the user experience and more importantly, to harvest
customer data. The growing trend toward incorporating sensors
has become more pronounced following the increasing
adoption of IoT devices. In fact, sensor data is among the
fastest-growing data types now. Devices and machines are
infused with advanced sensors that can monitor and track a
host of features like temperature, speed, pressure, proximity,
location, image, price, motion, and much more. Since sensor
data tends to become overwhelming with time, Hadoop is the
best and most effective solution to track, store, and analyze
sensor data. By tracking and monitoring sensor data,
companies can obtain operational insights into their business
and improve their processes accordingly.
8. Strengthen security and compliance
Hadoop can efficiently analyze server-log data and respond to a
security breach in real-time. Server-logs are nothing but
computer-generated logs that capture network data operations,
particularly the security and regulatory compliance data.
Server-log provides companies and organizations important
insights pertaining to network usage, security threats and
compliance. Hadoop is the perfect fit for staging and analyzing
this data. It is an excellent tool to extract errors or detect the
occurrence of any suspicious event in a system (example, login
failures). By loading the server logs into Hadoop, network
admins can identify the cause of the security breach and fix the
issue promptly.
Although these are only a handful of Hadoop applications in
the real-world scenario, many more are yet to come. As the Big
Data use cases expand and Hadoop technology matures, we
will see more of such pioneering applications of Hadoop.
Features Of Hadoop
 It is best-suited for Big Data analysis

Typically, Big Data has an unstructured and distributed nature.

In the Hadoop ecosystem, there’s a provision to replicate the

input data to other cluster nodes as well. Thus, if ever a cluster
node fails, data processing will not come to a standstill as
another cluster node can replace the failed node and continue
the process.
Hadoop Applications in the real-world:-
1. Security and Law Enforcement
Yes, Hadoop is now used as an active tool in Law enforcement.
Thanks to its speedy and reliable Big Data analysis, Hadoop is
helping Law enforcement agencies (like the police department)
to become more proactive, efficient, and accountable. For
instance, the national security agency of the USA uses Hadoop
to prevent terrorist attacks. Since Hadoop can help detect
security breaches and suspicious activities in real-time, it has
become an effective tool to predict criminal activity and catch
criminals.
Explore our Popular Software Engineering Courses
2. Enhance customer satisfaction and monitor online
reputation
Businesses are now using Hadoop to analyze sales data and
compare it against many other factors to determine when and at
which time a specific product sells best. By continually
monitoring sales data, business owners can find out why
certain products sell better on particular days or hours or
season. In the same way, Hadoop can also mine social media
and online conversations to see what your customers (both
existing and potential) are saying about you on online
platforms. It monitors the sentiments behind the comments and
feedback of the customers. This insight helps marketers and
business owners to analyze customer pain points and what they
expect from the brand. All of this vital information can be used
by businesses and companies to enhance the quality of their
products, boost customer satisfaction quotient, and improve
their online reputation.
3. Monitor patient vitals
Many hospitals have started leveraging Hadoop to make their
staff more productive in their work process. Healthcare systems
and machines generate large volumes of unstructured data.
Conventional data processing systems cannot process and
analyze such large quantities of raw data. However, Hadoop
can. An excellent case in point is when the Children’s
Healthcare of Atlanta fitted a sensor beside the bed of its ICU
units to continually track the vital of child patients such as
blood pressure, heartbeat, and respiratory rate. The primary aim
was to store and analyze these critical signs and be alerted if
ever there occurred any change in the patterns. This allowed
the healthcare provider to promptly send a team of doctors and
medical assistants to check on patients in need. This was made
possible using the core components of the Hadoop ecosystem
components – Hive, Flume, Impala, Spark, and Sqoop.
4. Healthcare Intelligence
Healthcare insurance companies usually combine all the associated
costs (including the risks involved) and equally divide it by the total
number of members in a particular group. Naturally, the outcomes
are always dynamic since they keep changing. This is where
Hadoop’s scalable and inexpensive feature can be highly useful.
Hadoop can efficiently accommodate dynamic data and scale
according to the ever-changing needs. By using Hadoop-based
healthcare intelligence apps, both healthcare providers and
healthcare insurance companies can devise smart business solutions
at an affordable cost.
Let’s assume that a healthcare insurance company wishes to
find the age in a region where people below a certain age limit
aren’t prone to a specific disease. This is to be done to help the
company to calculate the approximate cost of the insurance
policy. However, to gather the age data of the people in the
region, the company will have to invest a large sum of money
in processing and analyzing vast volumes of datasets to extract
relevant information regarding the disease in question, its
symptoms, its target victims, and so on. This is where Hadoop
components like Pig, Hive, and MapReduce can come in handy
– these can process large datasets at relatively low costs.
In-Demand Software Development Skills
5. Track clickstream data
Essentially, Hadoop’s primary function is to store, process, and
analyze massive volumes of data, including clickstream data.
Hadoop can successfully capture the following:
 Where did a visitor originate from before reaching a

particular website?
 What search term did the visitor use that lead to the

website?
 Which webpage did the visitor open first?

 What are the other webpages that interested the visitor?

 How much time did the visitor spend on each page?

 What product/service did the visitor decide to buy?

By helping you find the answers to all such questions, Hadoop

Mayuri Mehta (Editor), Kalpdrum Passi (Editor), Indranath Chatterjee (Editor), Rajan Patel (Editor) - Knowledge Modelling and Big Data Analytics in Healthcare - Advances and Applications-CRC Press
No ratings yet
Mayuri Mehta (Editor), Kalpdrum Passi (Editor), Indranath Chatterjee (Editor), Rajan Patel (Editor) - Knowledge Modelling and Big Data Analytics in Healthcare - Advances and Applications-CRC Press
363 pages
Big Data in Ehealthcare - Challenges and Perspectives
No ratings yet
Big Data in Ehealthcare - Challenges and Perspectives
256 pages
(Ebook) Cloud Computing Basics by T.B. Rehman & PHD
No ratings yet
(Ebook) Cloud Computing Basics by T.B. Rehman & PHD
86 pages
Week+3+ (8W) + +Exploring+Hadoop+Ecosystem+ (W6)
No ratings yet
Week+3+ (8W) + +Exploring+Hadoop+Ecosystem+ (W6)
72 pages
Handbook of Systems Engineering and Risk Management in Control Systems, Communication, Space Technology, Missile, Security and Defense Operations 1st Edition Anna M. Doro-On - The ebook is available for instant download, no waiting required
No ratings yet
Handbook of Systems Engineering and Risk Management in Control Systems, Communication, Space Technology, Missile, Security and Defense Operations 1st Edition Anna M. Doro-On - The ebook is available for instant download, no waiting required
72 pages
Ad205 Unit1
No ratings yet
Ad205 Unit1
63 pages
大数据系统和分析技术综述程学旗
No ratings yet
大数据系统和分析技术综述程学旗
20 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
Unit 2
No ratings yet
Unit 2
73 pages
2-Introduction To Hadoop Eco System
No ratings yet
2-Introduction To Hadoop Eco System
35 pages
Chap3 OverviewOfBigDataEcosystem
No ratings yet
Chap3 OverviewOfBigDataEcosystem
91 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Unit 2,3
No ratings yet
Unit 2,3
24 pages
UNIT-4-Hadoop Ecosystem-Part 1
No ratings yet
UNIT-4-Hadoop Ecosystem-Part 1
22 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Sujatha Hadoop Admin
No ratings yet
Sujatha Hadoop Admin
5 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
DATA228 Lecture Notes Week 3
No ratings yet
DATA228 Lecture Notes Week 3
21 pages
Module 1 - Introduction To Big Data
100% (1)
Module 1 - Introduction To Big Data
40 pages
Hadoop - Presentation 101
No ratings yet
Hadoop - Presentation 101
10 pages
Lovely Professional University (Lpu) : Mittal School of Business (Msob)
No ratings yet
Lovely Professional University (Lpu) : Mittal School of Business (Msob)
10 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
IBM Hadoop
No ratings yet
IBM Hadoop
11 pages
Open Source Software Options For Government: Version 2.0, April 2012
No ratings yet
Open Source Software Options For Government: Version 2.0, April 2012
55 pages
Unit 2
No ratings yet
Unit 2
28 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Data Science and Big Data UNIT 3
No ratings yet
Data Science and Big Data UNIT 3
11 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
History of Hadoop Apache Hadoop - The Hadoop Distributed File System
No ratings yet
History of Hadoop Apache Hadoop - The Hadoop Distributed File System
8 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Data W - Bigdata8
No ratings yet
Data W - Bigdata8
105 pages
Bda Unit2
No ratings yet
Bda Unit2
24 pages
CC 2
No ratings yet
CC 2
25 pages
Unit 2
No ratings yet
Unit 2
30 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
1 - Big Data and Hadoop Framework
No ratings yet
1 - Big Data and Hadoop Framework
40 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
GitHub - Prince6635 - Movie-Ratings-By-Mapreduce-And-Hadoop - Big Data (Movie Ratings) Based On Hadoop and MapReduce
No ratings yet
GitHub - Prince6635 - Movie-Ratings-By-Mapreduce-And-Hadoop - Big Data (Movie Ratings) Based On Hadoop and MapReduce
11 pages
Unit 2
No ratings yet
Unit 2
9 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
AWS Service Used in ETL Testing
No ratings yet
AWS Service Used in ETL Testing
1 page
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
COS20028 Assignment1 2023
No ratings yet
COS20028 Assignment1 2023
3 pages
Abdul Kareem Syed
No ratings yet
Abdul Kareem Syed
5 pages
Hadoop
No ratings yet
Hadoop
7 pages
Hadoop 2.8.0 Installation On Window 10: Prepare
No ratings yet
Hadoop 2.8.0 Installation On Window 10: Prepare
9 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Architecting A Platform For Big Data Analytics
No ratings yet
Architecting A Platform For Big Data Analytics
23 pages
Real Time Applications Using MapReduce
No ratings yet
Real Time Applications Using MapReduce
12 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
17 pages
Vjkarthigaa
No ratings yet
Vjkarthigaa
5 pages
Guna Sekhar Yalamuri
No ratings yet
Guna Sekhar Yalamuri
5 pages
Cloud Computing IIT Kanpur PDF
No ratings yet
Cloud Computing IIT Kanpur PDF
123 pages
10 1109ICoAC44903 2018 8939061
No ratings yet
10 1109ICoAC44903 2018 8939061
9 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
HADOOP
No ratings yet
HADOOP
10 pages
DSCI 5350 - Lecture 2 PDF
No ratings yet
DSCI 5350 - Lecture 2 PDF
54 pages
Certified Big Data and Apache Hadoop Developer VS-1221
No ratings yet
Certified Big Data and Apache Hadoop Developer VS-1221
9 pages
Hadoop Administration
No ratings yet
Hadoop Administration
97 pages
Big Data and Hadoop - Suzanne
No ratings yet
Big Data and Hadoop - Suzanne
5 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Pig Vs Hive VS Native Map Reduc E: Pangool
No ratings yet
Pig Vs Hive VS Native Map Reduc E: Pangool
6 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
The Cloud
No ratings yet
The Cloud
23 pages
Hadoop Big Data 1
No ratings yet
Hadoop Big Data 1
19 pages
Sample Resume
No ratings yet
Sample Resume
2 pages
00 HadoopWelcome Transcript
No ratings yet
00 HadoopWelcome Transcript
4 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
No ratings yet
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
14 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Hadoop & HDFS Final
No ratings yet
Hadoop & HDFS Final
31 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
No ratings yet
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
2 pages
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
No ratings yet
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
5 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
No ratings yet
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
6 pages
Hadoop Job Runner UI Tool
No ratings yet
Hadoop Job Runner UI Tool
10 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
CV Aizaz PDF
No ratings yet
CV Aizaz PDF
1 page
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

CASE STUDY On Application of Hadoop

Uploaded by

CASE STUDY On Application of Hadoop

Uploaded by

CASE STUDY ON APPLICATON OF HADOOP

TEAM NAME: - “UDAAN”

Let's focus on the history of Hadoop in the following steps: -

Hadoop is an open-source distributed processing framework that

Formally known as Apache Hadoop, the technology is developed as part of

Hadoop and big data:-

Data lakes generally serve different purposes than traditional data

Components of Hadoop and how it works:-

That changed in Hadoop 2.0, which became generally available in

In Hadoop clusters, YARN sits between HDFS and the processing

Hadoop 3.0.0 was the next major version of Hadoop. Released by

Subsequent 3.1.x and 3.2.x updates enabled Hadoop users to run

The Hadoop architecture is a package of the file system,

Typically, Big Data has an unstructured and distributed nature.

In the Hadoop ecosystem, there’s a provision to replicate the

 What are the other webpages that interested the visitor?

 How much time did the visitor spend on each page?

 What product/service did the visitor decide to buy?

By helping you find the answers to all such questions, Hadoop

Typically, Big Data has an unstructured and distributed nature.

In the Hadoop ecosystem, there’s a provision to replicate the

 What are the other webpages that interested the visitor?

 How much time did the visitor spend on each page?

 What product/service did the visitor decide to buy?

By helping you find the answers to all such questions, Hadoop

You might also like