0% found this document useful (0 votes)

24 views

Simplified Data Processing For Large Cluster A Map

Uploaded by

Athulya Kuriakose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Simplified Data Processing For Large Cluster A Map

Uploaded by

Athulya Kuriakose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/367908911

Simpliﬁed Data Processing for Large Cluster: A MapReduce and Hadoop Based
Study

Article in Advances in Applied Sciences · July 2021

DOI: 10.11648/j.aas.20210603.11

CITATIONS READS

5 160

2 authors:

Abdiaziz Omar Hassan Abdulkadir Abdulahi Hasan

Zhejiang University Anhui University of Science and Technology
5 PUBLICATIONS 13 CITATIONS 4 PUBLICATIONS 17 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Abdiaziz Omar Hassan on 29 March 2023.

The user has requested enhancement of the downloaded file.

Advances in Applied Sciences
2021; 6(3): 43-48
http://www.sciencepublishinggroup.com/j/aas
doi: 10.11648/j.aas.20210603.11
ISSN: 2575-2065 (Print); ISSN: 2575-1514 (Online)

Simplified Data Processing for Large Cluster:

A MapReduce and Hadoop Based Study
Abdiaziz Omar Hassan*, Abdulkadir Abdulahi Hasan
College of Mathematics and Big Data, Anhui University of Science and Technology, Huainan, China

Email address:
*
Corresponding author

To cite this article:

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study.
Advances in Applied Sciences. Vol. 6, No. 3, 2021, pp. 43-48. doi: 10.11648/j.aas.20210603.11

Received: May 29, 2021; Accepted: June 21, 2021; Published: July 9, 2021

Abstract: With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data.
Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing
channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this
study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming
model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the
computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce
implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling
computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing
terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This
way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up
the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the
handling of big data and respond very fast.

Keywords: Google MapReduce Processes, Hadoop, Parallel Data Processing, HDFS, Cloud Computing,
Large Cluster Data Processing

development of data mining methods and further their

1. Introduction application to make them workable. Various hurdles in the
With the introduction and advancement of technology and wake of processing are faced by large-scale internet
computerized innovation, the growth of data is unimaginable companies including Google, Yahoo, Facebook, LinkedIn, as
and unreachable. Data scientists and handlers are getting well as, other bigger internet-solution providing companies
overwhelmed and frustrated with such a large and ever- that require processing a huge chunk of data not only in
increasing amount of data with its processing requirements minimum timeframe but also keeping the cost-effective
ever-increasing and demanding more every time. With so solution in an application.
large an ever-increasing data, there comes to some problems Google had developed MapReduce and the Google File
as well concerning its handling, processing, and System, which is embracing to studied and investigated in
management. These problems are faced by various fields in this research study. Google has also built a database
making use of this large scale, drawing meanings out of it, as management system (DBMS) known as Big Table. This
well as, using it for decision making. system can search millions of pages and return the results in
Data mining, data classification, handling, and processing milliseconds by employing some algorithms that work
are some of those technologies that can amend and draw new through the MapReduce system and Google File System [1].
ways out of these large data sets. For many years in the past, In the recent past, MapReduce has made its place as an
this data mining technique with its pre-requisites is studied in algorithm to handle computing paradigm and analysis of a
all applicable scenarios; resulting it to be the phase of large amount of data [2]. MapReduce has got fame while it
44 Abdiaziz Omar Hassan and Abdulkadir Abdulahi Hasan: Simplified Data Processing for Large
Cluster: A MapReduce and Hadoop Based Study

was made part of the Google database management system will be similar to the following code:
and Google file system. MapReduce could be employed for map (String key, String value):// key: document name//
measurability and is purely a fault-tolerant data processing value: document contents for each wordw in value:
tool that can handle and process huge data along with lower- EmitIntermediate(w, "1");
bound computing nodes [3]. reduce (String key, Iterator values):// key: a word// values:
Discussing how MapReduce works, a distributed file a list of counts int result = 0; for each v in values: result +=
system (DFS) first categorizes data in multiple categories, ParseInt (v); Emit(As String(result));
and then data is presented as a pair containing key and Herewith this example, the program in detail counts the
values. The MapReduce framework performs its applications occurrences of each word within input files specified on the
and function on a single machine where the data may be command line.
preprocessed before map functions or post-process the output #include "mapreduce/mapreduce.h" // User’s map function
of MapReduce function performed [4]. As Hadoop is applied, class Word Counter: public Mapper {public: virtual void
which is a famous open-source application of MapReduce to Map (const MapInput& input) {const string& text =
handle large datasets. It employs an already provided user- input.value(); const int n = text.size(); for (int i = 0; i < n;) {//
level filesystem to handle storage across the cluster [5]. This Skip past leading whitespace while ((i < n) &&
implication will provide you with a speedy output but less isspace(text[i])) i++; // Find word end int start = I; while ((i <
significant, yet giving you a reasonable speed as well as n) &!isspace(text[i])) i++; if (start < I)
handling a larger dataset that tackles a large number of Emit(text.substr(start,i-start),"1");}}};
computing nodes and minimizes application time by 30% REGISTER_MAPPER(WordCounter); // User’s reduce
comparing with ordinary data mining techniques [6]. function class Adder: public Reducer {virtual void Reduce
(ReduceInput* input) {// Iterate over all entries with the//
1.1. Programming Model and Application of MapReduce same key and add the values int64 value = 0; while (!input-
Function >done()) {value += StringToInt(input->value()); input-
The programming model indicates and includes defined >NextValue(); // Emit sum for input->key () Emit
sets of input pairs of key or value, and outlays output pairs of (IntToString(value));}};}
key and values. This MapReduce function has two outlays: REGISTER_REDUCER(Adder); int main (int argc,
one is Map and the other, Reduce. The map function char** argv) {ParseCommandLineFlags(argc, argv);
considers the input pair and provides key/value pairs. These 1.2. MapReduce Specification
values and intermediate outputs are grouped by the
MapReduce function and then passed further to the Reduce spec;// Store list of input files into "spec" for (int i = 1; i <
function. The Reduce function accepts these intermediate argc; i++) {MapReduceInput* input = spec.add_input();
keys and merges their values to form a smaller set of values. input->set_format("text"); input->set_filepattern(argv[i]);
Let us assume an example of counting the occurrence of each input->set_mapper_class("WordCounter");} // Specify the
word in a large dataset, and appear to use it with the map and output files: // /gfs/test/freq-00000-of-00100 // /gfs/test/freq-
reduce functions. The code to do this counting of occurrence 00001-of-00100 //

Figure 1. Execution plan of the programming model.

Advances in Applied Sciences 2021; 6(3): 43-48 45

1.3. MapReduce Output

2. Related Works
out = spec.output(); out->set_filebase("/gfs/test/freq"); out-
>set_num_tasks(100); out->set_format("text"); out- Seema Maitrey with her fellow researchers has studied big
>set_reducer_class("Adder") // Optional: do partial sums data handling with a new technique under the name of
within map // tasks to save network bandwidth out- "MapReduce: Simplified Data Analysis of Big Data".
>set_combiner_class("Adder"); // Tuning parameters: use at This research study is focused on using the MapReduce
most 2000 // machines and 100 MB of memory per task technique that is based on cloud-based technologies. A
spec.set_machines(2000); spec.set_map_megabytes(100); famous application of cloud technology is Google, which
spec.set_reduce_megabytes(100); // Now run it works aligned with this technology and handles data and
The above code was written in terms of string inputs and processes with care. They also discussed Hadoop that is used
outputs, but the map and reduce functions have also by companies other than Google, including Facebook,
associated types that are defined and assessed through the Yahoo, etc. The analytical processing of data using Hadoop
listing of variables and values. and the application of MapReduce is verified and assessed
with their research-based study [7].
map (k1,v1) → list(k2,v2) reduce (k2,list(v2)) → list(v2) Another researcher Jeffrey Dean with his fellow
researchers has studied the MapReduce framework getting a
The map function will dig out the key from each recording
lot of attention for the application on big data. They
and will forward the key with a matching record pair. On the
other hand, the reduce function will give all pairs unchanged. classified it as a programming model with implementation
with the aim of processing and handling large datasets being
responsive for a wide variety of real-world operations [8].

Figure 2. Anatomy of MapReduce function.

Richard M. Yoo and his fellows have studied Scalable Conference. They highlighted the MapReduce mechanism
MapReduce with a large-scale shared-memory system and being a proprietary system of Google. They also discussed the
talked about dynamic runtimes with simplifying parallel distributed computing being great to extend simplified with
programming, as well as automatically detecting scenarios. implications of Map and Reduce functions, providing the
They discussed how a multi-layered approach works along basics and insights of achieving the desired performance [11].
that work for the optimizations on the algorithm, Jeffrey Dean with his fellows had discussed simplified
implementation, and OS interaction defining and data processing on large clusters with the MapReduce
channelizing significant speedup improvements with 256 framework. They stated this being the subsidiary
threads. They also identified the hurdles or roadblocks which infrastructure of Google’s MapReduce that allocates to a
are involved in limiting the scalability of runtimes on shared- distributed file system and enables the algorithms to locate
memory systems [9]. data and make it available. They termed it easy to use as with
Kyong Lee with his fellows had discussed Google's the opinion of programmers as more than ten thousand
MapReduce technique that works for big data handling and distinct MapReduce programs are on implementation
processing more simply and smoothly together with the internally at Google within the last four-year span [12].
benefit of minimized cost. The main characteristic of this Bayardo Panda and his fellows have discussed massively
MapReduce model was that it able to process large data sets parallel learning with the application of the MapReduce
among others distributed among multiple nodes and multiple framework. They highlighted combining the MapReduce
channels [10]. programming technique with the distributed file system,
B Panda and his fellows had highlighted the MapReduce being a way to achieve distributed computing objectives with
system and its applications with big data at an International data processing over thousands of computing nodes [11].
46 Abdiaziz Omar Hassan and Abdulkadir Abdulahi Hasan: Simplified Data Processing for Large
Cluster: A MapReduce and Hadoop Based Study

Jaliya Ekanayake and her fellows have discussed MapReduce This should be used in two different ways. These are the
for data-intensive scientific analysis. They discussed the outputs advantageous API streamed output and the other
MapReduce technique due to its application to large parallel data involving building of Hadoop apps with C++. Hadoop
analyses. They discussed this with efficient parallel/concurrent Distributed File System is a target file system especially to
algorithms meeting the scalability and performance use with MapReduce programs. This best applicable to the
requirements to handle and process scientific data [13]. small number of very large files. With the use of replication,
Anam Alam and her fellows have discussed the Hadoop data availability could be made possible within Hadoop
Architecture and Its Issues, together with their implication at Distributed File System (HDFS).
an international conference. Hadoop is categorized as a To process all of the files created by the mapping
distributed program or framework used to handle a large mechanism, the Reduce program get access to internode data.
amount of data. Hadoop is usually used for data-intensive When this is executed, map and reduce, both programs will
applications. With its extensive application, every social write it down to the local file system to avoid the burden over
media site has made use of it [14]. the HDFS system. HDFS can support multiple readers and
R. Vijayakumari and her fellows have discussed the one writer (MROW) approach. The indexing mechanism
comparative analysis of Google File System and Hadoop might not apply to HDFS, so, this would just be applied to
Distributed File System. They discussed this distributed read-only applications that only scan and read the contents of
computing, parallel computing, grid computing, and other the file.
parameters including; Design Goals, Processes, Fire
management, Scalability, Protection, Security, cache 3.1.1. Hadoop Architecture
management replication, etc. to compare both these methods Hadoop Distributed File System stores data within its
and their application of the file system [15]. computing nodes, providing customized and high aggregate
bandwidth across the entire cluster. This file system
installation has different nodes plus one single name node,
3. Methodology called the master node and various data nodes, called slave
The methods used may not look familiar to a common nodes. The name node has held responsible for the
audience. The first one is MapReduce which is in fact management of the file system namespace and controls the
oriented to programmers, rather than business users. This has access to files by clients. The data nodes or slave nodes are
gained popularity due to its easy application, efficiency and distributed in a way that one data node is assigned per
ability to control “Big Data” in a timely manner. MapReduce machine in the cluster, managing data while attaching it to
framework with its application and programming model is the machines where they run. The name node has an
discussed above. An example of occurrences is discussed and operation execution scenario within the file system
employed with the MapReduce framework. namespace and assigns those data blocks to data nodes.
Those data nodes are there to handle read and write requests
3.1. Hadoop from clients and performing operations with the instruction
provided [16].
Another process employed and utilized is Hadoop which is
connected with Java implementation and Java application.

Figure 3. Hadoop Architecture.

Hadoop Distributed File system manipulates and handle server for performance keeping and mechanism, load-
data chunks and replicates these data chunks across the balancing and resiliency. The processing application of any
Advances in Applied Sciences 2021; 6(3): 43-48 47

problem execution will specify the number of replicates of machine, as the name node and another, as the job tracker.
the file right when it is created, and this count or record can There could be a secondary name node that might work for
be changed any time after that. The name node has the ability periodic handshaking with a name node for fault tolerance.
to adopt different decisions concerning block replication.
3.1.3. Replication Management
3.1.2. Deploying Hadoop HDFS provides a reliable way to store huge data in a
Hadoop compiles in three different ways, the first one is a distributed environment as data blocks. The blocks are also
standalone mode, which is the default mode of Hadoop, replicated to provide fault tolerance. The default replication
running as a single Java process. The second one is Pseudo- factor is 3 which is again configurable. So, as you can see in
distributed mode, which involves the configuration of the figure below where each block is replicated three times
Hadoop to run on a single machine, whereas, with different and stored on different DataNodes (considering the default
Hadoop processes, run divergent Java processes. The third replication factor):
one is fully disseminating or cluster mode, involving one

Figure 4. Block replication.

This type of Oozie workflow works with both action nodes

3.2. Hadoop Based Oozie Structure and Implementation and control-flow nodes. An action node represents a
Apache Oozie manages all the tasks and makes them workflow task like moving files into HDFS, running a
organized. So, this could be known as a scheduler for MapReduce, or running a shell script of a program written in
Hadoop. This mechanism provides workflow of dependent Java. While a control-flow node controls the execution of the
jobs that later on and helps to develop Directed Acyclic task by allowing different action nodes and controlling
Graphs of workflows that allow jobs or tasks to run in control nodes.
parallel and sequentially in Hadoop.
4. Results and Discussions
Discussing results and discussions, big data and its
requisite technologies can bring about significant changes
and benefits to your business. But with the increased and
widespread use of technologies, it might turn into a difficult
task for your organization to manage, control and tackle a
heterogeneous collection of data and get your desired
outcomes.
To handle the growth of individual companies, certain
aspects should be followed so that timely results could be
attained from Big Data since effective use of Big Data, the
modernization, and effectiveness for entire divisions and
Figure 5. Oozie workflow chart. economies are to be attained. Therefore, you should know
how to ensure the effectiveness of usage, management and
48 Abdiaziz Omar Hassan and Abdulkadir Abdulahi Hasan: Simplified Data Processing for Large
Cluster: A MapReduce and Hadoop Based Study

re-use of data sources, including public data to construct [3] D. D, "MapReduce: A major step backwards," The Database
applications. There is a need to evaluate the best approach to Column, 2011.
use for filtering and analyzing the data. For the optimized [4] Y. Kim and K. Shim, "Parallel Top-K Similarity Join
processing, Hadoop with MapReduce could be employed. As Algorithms Using MapReduce," Arlington, VA, USA, 2012.
we have used in this paper, with the basics of MapReduce
[5] J. Shafer, S. Rixner and A. L. Cox, "The Hadoop distributed
programming and open-source Hadoop framework filesystem: Balancing portability and performance," White
application. The Hadoop framework can speed up the Plains, NY, USA, 2010.
processing of big data and respond very fast. The
extensibility and simplicity of these frameworks will be the [6] S. M. CA Moturi, "Use of MapReduce for Data Mining and
Data Optimization on a Web Portal," International Journal of
critical factors that make it a replenishing tool for big data Computer, vol. 56, no. 7, 2012.
handling, processing, and management.
[7] C. J. Seema Maitreya, "MapReduce: Simplified Data Analysis
of Big Data," Procedia Computer Science, vol. 57, pp. 563-
5. Conclusion 571, 2015.
MapReduce programming model is applied, an associated [8] S. G. Jeffrey Dean, "MapReduce: Simplified Data Processing
implementation introduced by Google. This programming on Large Clusters," USENIX Association OSDI, vol. 4, pp.
model involves the computation of two gatherings; Map and 137-149, 2004.
Reduce. [9] R. M. Yoo, A. Romano and C. Kozyrakis, "Phoenix rebirth:
Hadoop performance is made up of an ecosystem of tools Scalable MapReduce on a large-scale shared-memory
and technologies that will requirement careful analysis and system," Austin, TX, USA, 2009.
expertise to determine the suitable mapping of technologies [10] H. C. Y. D. C. B. M. Kyong-Ha Lee, "Parallel data processing
to enable a smooth migration. with MapReduce: a survey," ACM SIGMOD Record, vol. 40,
Hadoop is a highly scalable platform and is largely no. 4, 2012.
because of its ability that it stores and allocates large data sets
[11] B. P. J. S. H. S. B. R. J. Bayardo, "PLANET: Massively
across lots of servers. The servers used here are quite Parallel Learning of Tree Ensembles with MapReduce,"
inexpensive and can operate in parallel. The processing PVLDB, vol. 2, no. 2, pp. 1426-1437, 2009.
power of the system can be improved with the addition of
more servers. [12] S. G. Jeffrey Dean, "MapReduce: simplified data processing
on large clusters," Communications of the ACM, vol. 51, no. 1,
Hadoop MapReduce programming model offers 2008.
suppleness to process structure or unstructured data by
several business organizations who can use the data and [13] J. Ekanayake, S. Pallickara and G. Fox, "MapReduce for Data
operate on different types of data. Thus, they can achieve a Intensive Scientific Analyses," Indianapolis, IN, USA, 2008.
business value out of those meaningful and beneficial data [14] A. Alam and J. Ahmed, "Hadoop Architecture and Its Issues,"
for the business organizations for analysis. Las Vegas, NV, USA, 2014.
[15] R. K. R. R. Vijayakumari, "Comparative analysis of Google
File System and Hadoop Distributed File System,"
References International Journal of Advanced Trends in Computer
Science and Engineering, vol. 3, no. 1, pp. 553-558, 2014.
[1] G. Z. &. C. B. Jason R Swedlow, "Channeling the data
deluge," Nature methods, vol. 8, p. 463–465, 2011. [16] J. J. B. X. Y. F. Wang, "Hadoop high availability through
metadata replication”, in Proc," The first international
[2] J. Maitrey S, "An Integrated Approach for CURE Clustering workshop on Cloud data management, pp. 37-44, 2009.
using Map-Reduce Techniques," In Proceedings of Elsevier,
vol. 2, 2013. [17] A. D. R.-L. H. D. S. P. Hung-chih Yang, "Map-reduce-merge:
simplified relational data processing on large clusters," 2007.

View publication stats

Chapter 6 Vertical Integration: Strategic Management and Competitive Advantage, 5e (Barney)
No ratings yet
Chapter 6 Vertical Integration: Strategic Management and Competitive Advantage, 5e (Barney)
29 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Game of Thrones - Ep 510 Mothers Mercy-D Benioff and D.B PDF
No ratings yet
Game of Thrones - Ep 510 Mothers Mercy-D Benioff and D.B PDF
49 pages
Ijwsc 030401
No ratings yet
Ijwsc 030401
13 pages
3412ijwsc01 PDF
No ratings yet
3412ijwsc01 PDF
13 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
No ratings yet
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
3 pages
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
No ratings yet
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
9 pages
Challenges For Mapreduce in Big Data: Scholarship@Western
No ratings yet
Challenges For Mapreduce in Big Data: Scholarship@Western
10 pages
347 VLDBJ2013 MapReduceSurvey
No ratings yet
347 VLDBJ2013 MapReduceSurvey
27 pages
Map Reduce
No ratings yet
Map Reduce
27 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
A Dynamic Data Placement Strategy
No ratings yet
A Dynamic Data Placement Strategy
9 pages
Replication-Based Query Management For Resource Allocation Using Hadoop and MapReduce Over Big Data
No ratings yet
Replication-Based Query Management For Resource Allocation Using Hadoop and MapReduce Over Big Data
13 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
Map Reduce Summary
No ratings yet
Map Reduce Summary
4 pages
INSIDE CLOUD - CASE STUDY
No ratings yet
INSIDE CLOUD - CASE STUDY
11 pages
Act4 May2 6E BDA SEC
No ratings yet
Act4 May2 6E BDA SEC
4 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Big Data Analytics Litrature Review
No ratings yet
Big Data Analytics Litrature Review
7 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama
No ratings yet
Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama
4 pages
Design An Efficient Big Data Analytic Architecture For Retrieval of Data Based On Web Server in Cloud Environment
No ratings yet
Design An Efficient Big Data Analytic Architecture For Retrieval of Data Based On Web Server in Cloud Environment
10 pages
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
No ratings yet
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
14 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
Document Clustering With Map Reduce Using Hadoop Framework
No ratings yet
Document Clustering With Map Reduce Using Hadoop Framework
5 pages
A Brief On MapReduce Performance
No ratings yet
A Brief On MapReduce Performance
6 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Hadoop Based Feature Selection and Decision Making Models On Big Data
No ratings yet
Hadoop Based Feature Selection and Decision Making Models On Big Data
6 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Research Assignment
No ratings yet
Research Assignment
7 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
MapReduce Algorithms For Big Data Analysis
No ratings yet
MapReduce Algorithms For Big Data Analysis
2 pages
Scheduling For Hadoop Cluster
No ratings yet
Scheduling For Hadoop Cluster
5 pages
Lab Manual BDA
No ratings yet
Lab Manual BDA
36 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
HADOOP
No ratings yet
HADOOP
55 pages
Data Mining With Bigdata
No ratings yet
Data Mining With Bigdata
30 pages
Big Data From Beginning To Future
No ratings yet
Big Data From Beginning To Future
17 pages
Cloud Security UNIT 5
No ratings yet
Cloud Security UNIT 5
4 pages
Mapreduce article review
No ratings yet
Mapreduce article review
8 pages
Evaluation of Data Processing Using Mapreduce Framework in Cloud and Stand - Alone Computing
No ratings yet
Evaluation of Data Processing Using Mapreduce Framework in Cloud and Stand - Alone Computing
13 pages
Data Science
No ratings yet
Data Science
87 pages
j.ijdsa.20241005.11
No ratings yet
j.ijdsa.20241005.11
14 pages
Experiment No _ 1 Bda
No ratings yet
Experiment No _ 1 Bda
10 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Low-Latency, High-Throughput Access To Static Global Resources Within The Hadoop Framework
No ratings yet
Low-Latency, High-Throughput Access To Static Global Resources Within The Hadoop Framework
15 pages
Big Data Processing Technologies in Distributed in
No ratings yet
Big Data Processing Technologies in Distributed in
6 pages
DM - Topic Five
No ratings yet
DM - Topic Five
30 pages
Guha Roy 2017
No ratings yet
Guha Roy 2017
3 pages
BDA Class3
No ratings yet
BDA Class3
15 pages
MapReduceBusinessDriver_NOSQL case studypdf
No ratings yet
MapReduceBusinessDriver_NOSQL case studypdf
3 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
Article Intéressant
No ratings yet
Article Intéressant
23 pages
0 The BigDataEra
No ratings yet
0 The BigDataEra
36 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
MANZO DENTAL APPOINTMENT SYSTEM IAN FORMAT March 15
No ratings yet
MANZO DENTAL APPOINTMENT SYSTEM IAN FORMAT March 15
9 pages
96 Commom Idioms
No ratings yet
96 Commom Idioms
45 pages
Brochure Bettis G Series Pneumatic Hydraulic Actuators Us en 84548
No ratings yet
Brochure Bettis G Series Pneumatic Hydraulic Actuators Us en 84548
6 pages
00-3795-0006-00 THREAD-O-RING Fitting 2 inch- 11-7-97
No ratings yet
00-3795-0006-00 THREAD-O-RING Fitting 2 inch- 11-7-97
4 pages
T8 - Account Operations
No ratings yet
T8 - Account Operations
25 pages
Army Aviation Digest - Dec 1971
No ratings yet
Army Aviation Digest - Dec 1971
68 pages
18bit0236 Ism Lab Da 6
No ratings yet
18bit0236 Ism Lab Da 6
13 pages
Jon Kolko-Exposing The Magic of Design - A Practitioner's Guide To The Methods and Theory of Synthesis (Oxford Series in Human-Technology Interaction) PDF
100% (2)
Jon Kolko-Exposing The Magic of Design - A Practitioner's Guide To The Methods and Theory of Synthesis (Oxford Series in Human-Technology Interaction) PDF
206 pages
Impact of Leadership Style On Organization Commitment: September 2011
No ratings yet
Impact of Leadership Style On Organization Commitment: September 2011
9 pages
Appointment of Management Representative For IMS
No ratings yet
Appointment of Management Representative For IMS
3 pages
Construction of Multi Storied Housing Complex at Fort Kochi, Kerala Report For Pile Integrity Testing On 06 Nos. R.C. Bored Piles
No ratings yet
Construction of Multi Storied Housing Complex at Fort Kochi, Kerala Report For Pile Integrity Testing On 06 Nos. R.C. Bored Piles
11 pages
USFDA 483s PPT
No ratings yet
USFDA 483s PPT
28 pages
EX2 Latent Heat of Fusion of Ice
No ratings yet
EX2 Latent Heat of Fusion of Ice
2 pages
Grade 7 Reading 22
No ratings yet
Grade 7 Reading 22
11 pages
Disola M 4012 V101
No ratings yet
Disola M 4012 V101
12 pages
Better Management Practices For Feed FISH FEED
No ratings yet
Better Management Practices For Feed FISH FEED
96 pages
HP Compaq NX9040 Schematics
No ratings yet
HP Compaq NX9040 Schematics
38 pages
Assignment at Ericsson Project As A RIGGER Responsibilities
No ratings yet
Assignment at Ericsson Project As A RIGGER Responsibilities
2 pages
RFSKUSUMB12-12-2023
No ratings yet
RFSKUSUMB12-12-2023
92 pages
(en) Programming Microsoft Visual C# 2005 - The Language (MS Press , 2006)
No ratings yet
(en) Programming Microsoft Visual C# 2005 - The Language (MS Press , 2006)
1,264 pages
DCDA
No ratings yet
DCDA
8 pages
Eligibility Criteria CUGUJ PDF
No ratings yet
Eligibility Criteria CUGUJ PDF
8 pages
Empowerment Technologies: Infographics
No ratings yet
Empowerment Technologies: Infographics
18 pages
POS - Motor Insurance - Miscellaneous Carrying Comprehensive
No ratings yet
POS - Motor Insurance - Miscellaneous Carrying Comprehensive
3 pages
PM FM 400
100% (1)
PM FM 400
91 pages
Energy Star Appliance Rebate
No ratings yet
Energy Star Appliance Rebate
2 pages
DDD
No ratings yet
DDD
1 page
Sequence of Service Floor
No ratings yet
Sequence of Service Floor
5 pages

Simplified Data Processing For Large Cluster A Map

Uploaded by

Simplified Data Processing For Large Cluster A Map

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in Advances in Applied Sciences · July 2021

Abdiaziz Omar Hassan Abdulkadir Abdulahi Hasan

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Simplified Data Processing for Large Cluster:

To cite this article:

development of data mining methods and further their

Figure 1. Execution plan of the programming model.

1.3. MapReduce Output

Figure 2. Anatomy of MapReduce function.

Figure 3. Hadoop Architecture.

Figure 4. Block replication.

This type of Oozie workflow works with both action nodes

View publication stats

You might also like