0% found this document useful (0 votes)

17 views5 pages

Apache Spark Based Analysis On Word Count Application in Big Data

This document discusses big data analysis using Apache Spark for a word count application. It introduces big data characteristics and challenges, as well as how big data supports IoT and cloud computing. Specifically, it explains how combining big data, IoT, and cloud provides a new architecture. The document also discusses the 5 V's of big data - volume, variety, velocity, veracity, and value. It analyzes how Spark can be used to process large amounts of data in parallel.

Uploaded by

salah Alswiay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Apache Spark Based Analysis On Word Count Application in Big Data

Uploaded by

salah Alswiay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM)

Apache Spark based analysis on word count

application in Big Data
K. Subha Dr. N. Bharathi
2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM) | 978-1-6654-6643-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICIPTM54933.2022.9753879

Department of Computer Science and Engineering Department of Computer Science and Engineering
SRM Institute of Science and Technology, SRM Institute of Science and Technology,
Vadapalani Campus Vadapalani Campus
Chennai, India Chennai, India
[email protected] [email protected]

Abstract—The rise in the volume of data, as well as the type of

data and the rate at which data is produced, has led to the
development of novel processing methods capable of working
with such massive amounts of data, termed as Big Data. The
most difficult aspects of managing Big Data include its collection
and storage, as well as search, sharing, analysis, and
visualizations. In this paper explain the characteristics,
Figure 1. Types of BD
processing, application of bigdata and its challenges. Also
explained how big data support Internet of Things and Cloud
computing. By combining big data, IOT, Cloud provides the
new architecture.

Keywords— Big Data (BD), Hadoop, Spark, data processing.

I. INTRODUCTION

With the spike in use of internet, social media, mobile phones,

IOT devices the volume of generated data is growing
exponentially. According to Statista, the total quantity of data
Figure 2. 5 V’s of BD
consumed globally is expected to reach 64.2 zettabytes in
2020, 79 zettabytes in 2021, and over 180 zettabytes in
1. Volume: Big data itself contain the meaning of volume.
2025[1]. In fact, data is rising at such a rapid rate that, if
It defines the size of data. Now a days amount of created
current trends continue, the data volume will soon exceed the
data is in Peta byte. From the huge volume of data, we
Yottabyte scale. The handling high volume of data is always
can find value and hidden pattern of the data. In Fig 2
challenging issue of any organization. The standard database
shows the 5 Characteristic of BD.
management system is incapable of storing and analyzing
large amounts of complicated data. Cox and Ellsworth coined
2. Velocity: The rate at which data is generated and
the term "big data." A huge volume and variety of data that
processed to fulfil the needs and problems. The speed of
are increases exponentially with time and this type of data,
generated data is in alarm rate.
which are collected from different origins such as Facebook,
twitter, android mobile phones, various sensors, tests report
3. Variety: Type or format of data which are in structured,
from laboratory, clinical notes from hospital, demographics
semi-structured and unstructured data collected from
data, and a variety of omics data, can be classified into three
different sources (emails, PDFs, photos, videos, audios).
groups based on how they are organise such as structured,
Variety is one of important characteristic of BD.
semi structured, and unstructured are showed in Fig 1.
Structure data maintain proper structure like tables which
4. Value: From the raw data, can yield valuable data.
consist of rows and columns. Semi structured data is partially
organizations are starting to generate amazing value from
organized. It’s a bride between structure and unstructured
their BD.
data. Examples of semi structured data are JSON, CSV, XML,
etc. Another type is Unstructured data where the data is not
5. Veracity: Defines the inconsistency of data. Level of
organized in a predefined schema. Example of unstructured
quality of captured data, which means noise, inconsistent
data image files, audio files, log files, and video files.
and bias that are differ greatly [16]. Accurate analysis is
Characteristic of big data includes volume, velocity, and
depending on the veracity of source data.
variety. These are the most common characteristics that define
The remaining paper is organized as follows. Section 2
the BD, that are commonly called as 3V’s of BD. Later it
describes the big data processing. The details of role of big
extended to 5 V’s. Fig1 describe the 5V’s of BD.1. Volume,
data in IOT, Cloud and its application analyzed in Section 3.
2. Velocity, 3. Variety, 4. Value, 5. Veracity [16].

491
978-1-6654-6643-1/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: ULAKBIM UASL - Atilim Universitesi. Downloaded on November 24,2022 at 15:14:57 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM)

Section 4 present the experimental result. The research popular for its multi query approach to reduce the number of
challenges are explained in Section 5. times that the data is scanned. And the last component is
YARN, which is short for Yet Another Resource
II. BIG DATA PROCESS Negotiator. YARN is a very important component because it
prepares the RAM and CPU for Hadoop to run data which are
BD is a collection of large amounts of data, so the BD tools stored in HDFS.
are used to process the high volume of data with low
computational time. The BD framework like Hadoop and C. Spark
spark is explained in this section. Hadoop support batch
Spark is a BD processing engine, and it is also cluster
processing and spark support both batch and stream
computing engine that is designed for fast execution, in-
processing.
memory computation, and supports different workload. Spark
manages batch processing, interactive querying, streaming,
A. Hadoop
and iterative computations [9]. The Apace Spark system
Hadoop is a big data processing framework that permits users includes the Spark core and have libraries which are Spark's
to store and process data sets which are very big in size MLlib, GraphX, Spark Streaming, and Spark SQL. Machine
(gigabytes to petabytes) by allowing a group of computers or learning is performed by Spark's MLlib, graph analysis is
nodes to solve huge and complex data problems. It is a very handled by GraphX, stream processing is done by Spark
flexible and low-cost solution for storing, Analyzing and Streaming [17], and structured data processing is addressed by
processing ordered, semi- ordered and unordered data. It’s a Spark SQL. The performance of the spark can be increased by
high scalable because of using commodity hardware and fault recourse management [18]
tolerance through the replication factor. Hadoop replication
factor is 3. The replication factor is changeable. It can be III. APPLICATION OF BIG DATA IN IOT AND CLOUD
change to 2 or can be increased more than 3. Hadoop provides
The IOT sensor devices are continuously generating data
advanced analytics for stored data. Generally, there are four
based on the application, so the big data place the major role
types in data analytics which are descriptive analytics,
to collect, process and analyze the large volume of data which
diagnostic analytics predictive analytics, and prescriptive
are collected from IOT devices. The cloud is the internet of
analytics. Hadoop also support data mining, machine learning
service. It is used for storage, platform for service which can
(ML).The initial step in the Hadoop data processing is to
used for organization or individual person and provide
divide a vast volume of data into smaller tasks. The small jobs
software service. IOT and cloud together with big data place
are completed in parallel using the MapReduce algorithm and
the important role in many applications. In table 1explained
then distributed across a Hadoop cluster. MapReduce is
the recent Application of BD.
responsible for parallel processing to reduce processing time
of the large volume of data.[11]
The author described a new data storage framework that
maintains high consistency. BD is a vast volume of
B. Components of Hadoop
information with a complicated structure. Traditional database
The term Hadoop is often used to refer to both the core management solutions are incapable of handling such a large
components of Hadoop as well as the ecosystem of related amount of data. Rapidly growing no of BD application
projects. The core components of Hadoop are Hadoop requires efficient data base architecture to store and manage
Distributed File System (HDFS), MapReduce, YARN, and critical data and support scalability and availability for data
Hadoop Common. Hadoop Common, which is an essential analytics. The parallel and distributed data processing needs
part of the Apache Hadoop Framework that refers to the to support Consistency, Availability, and partition tolerance
collection of common utilities and libraries that provide for play the important role for data processing. Like ACID
other Hadoop modules ecosystem [13]. A Large amount of Property in relational database system, CAP theorem explains
data are stored in HDFS. It handles large data sets running on the consistency, Availability, and partition tolerance in BD
Commodity Hardware (CH). A CH is low- management System. To provide the business needs, many
specifications industry-grade hardware and scales a single NoSQL systems runs to achieve the high consistency by the
Hadoop cluster to hundreds and even thousands and BD CAP theorem. The proposed system provides strong
support horizontal scaling. The next component is consistency through Scalable Distributed Two Layer Data
MapReduce, it has two-part mapper and reducer which is a Store (SD2DS) [1]. The work is divided into two parts, first it
processing unit of Hadoop and a core component to the analyses all consistent issue and in the second part, design the
Hadoop framework. MapReduce processes the data by scheduling algorithm for supporting strong consistency. In
splitting huge volume of data into smaller units and processes this paper, high consistency is demonstrated for all basic
them simultaneously. MapReduce was the only way to access operations.
the data stored in the HDFS. Other components of the systems Diabetes is the widespread disease, so the authors detect the
like Hive and Pig. Hive provided SQL-like query and occurrence of diabetes datasets to predict the optimal results
provided users with strong statistical functions. Pig was for the diabetes patients. Hadoop MapReduce use data mining

492

algorithm attention network, Decision Tree (DT), outlier base for the analytics. Finally, it reaches the data visualization. The
multiclass classification and association rule techniques were visualization of energy conception level is showed in device
used to get solution to the patient. MapReduce is the are used to manage the device for low energy bill by the house
processing engine for BD. From the MapReduce the BD set is owner.
divided into manageable data set with different attributes and
hence passed to Decision Tree, Apriori algorithm and outlier Increasing high amount of data, storing BD in single data
based multiclass classification. Use of these algorithm the centre is no longer feasible. Storing and accessing data is the
diabetes patients are classified, and they determined the challenging issue in BD. The authors store and analyze BD in
insulin level [2]. multiple data centre which are in different geographical area.
In Hadoop and Spark framework are designed to process data
Smart meters are used to accurately read energy consumption locally with same data centre. they must replicate all data to a
level from the smart building and smart industry and play the single data centre before performing any operation in a locally
important part in growing energy management system. distributed computation. Because of bandwidth constraints,
Velocity is one of the characteristics of bigdata. The sensor in communication costs, data privacy, and security, copying all
Smart meters generate the data in alarmed rate, that’s define data from several data centers to one data centre might be a
the velocity. The authors designed the system for BD that read bottleneck. The implemented Random Sample Partition (RSP)
one million smart meters data in near real time. The processed method is to divide the large data into packets of sample data
system consists of four modules that are BD storage block, racks and distribute these data blocks across different data
data modelling, BD storage, big data processing and querying centers without or with duplication. The important data are
block and data visualization. The voluminous data is captured replicated to increase the availability [4].
and modelled by optimized query design then stored in BD
storage HDFS which is consist of master and slave node. In this paper, the authors implement an efficient movie
Master nodes contain metadata, and the slave nodes are actual recommendation system in the environment of BD.
workers. Data is fetched from storage area and applied Auto Recommendation systems are the systems designed to
Regressive Integrated Moving Average (ARIMA) [3] model recommend things to the user based on their interest.

TABLE I. RECENT APPLICATION IN BIG DATA

S. No. Authors Findings Application Area Year Algorithm / Method
1 Krechowicz, Adam Deniziak, Provide strong consistency by using Data Storage 2021 SD2DS
Stanisław Lukawski, Grzegorz [1] novel data storage.
2 Jayasri N.P., R. Aruna [2] Evaluate diabetes patients to discover Healthcare 2021 DT, AA & outlier based
optimal solution. multiclass classification
3 Gupta, Ragini. etl [3] Explained efficient and accurate Energy Management 2020 ARIMA
representation of energy consumption
level which are collected from smart
meters.
4 T. Z. Emara and J. Z. Huang [4] Analyzing the big data in distributed Data Storage 2020 RSP
data centre.
5 Shen, J., Zhou, T., & Chen, L[5] Users’ historical activity data and Recommendation System 2020 Collaboration filtering
interests are used to recommend
appropriate products to the customers.
6 Gokhan, Silahtaroglu and Nevin Described a machine learning pre- Healthcare 2019 PNN
Yılmazturk [6] diagnosis model for emergency
Departments.
7 Mahboob Alam & Mohd Amjad [7] Forecast future weather condition on Science and technology 2019 Parallel and distributed
available BD. analytics
8 Shihao Zhou, Zhilei Qiao, Qianzhou BD Text Analytics to measure Business 2018 SVD
Du. Et. [8] customer Liveliness from online
reviews.

Organizations like Netflix, Amazon and so on use proposal hospital or clinic which is less crowded for the patient to be
framework to assist their clients with distinguishing the right admitted immediately. In emergency Departments The
item. Recommendation system is to deal with large volume of authors explained a machine learning pre-diagnosis system,
data. In this proposed work implements the collaborative which uses Random Forest Decision Tree Probabilistic Neural
filtering algorithm to recommend appropriate products to the Network (PNN) based on the DDA (Dynamic Decay
customers [5]. Adjustment), models are separately trained on the dataset.
This model understands the emergency level of the case and
Ambulance services are the first service provided by find the most suitable and available health care [6].
emergency department in hospitals. One of the most important
aspects of an ambulance service is determining the severity of To forecast future weather conditions in cloud computing, the
the situation, another important analysis is to find the closest authors designed an architecture for parallel and distributed

493

big data analysis. Based on available data, weather forecasting

is used to predict the atmosphere for a certain geographical
area and period. The suggested system uses the Hadoop
system in conjunction with the MapReduce engine to process
large amounts of data. The final forecast contains maximum
and lowest temperatures as well as rainfall for any future date
[7].
Through big data analytics technology, huge data created by
internet users can be used to develop new products with
significant strategic value. The suggested model uses a Figure 5. Spark Processing Time with Single Core
semantic keyword similarity method based on Singular Value
Decomposition (SVM) to examine the connection between V. RESEARCH CHALLENGES
survey volume, client deftness, and item execution, which is
achieved in two stages. The first part examines how the Big data research provides significant benefits to organization,
volume of online reviews promotes consumer agility while the industries, and individuals to make them better decision. Even
subsequent stage explores the connection between client though there are numerous issues to be solved. To address
readiness and item execution [8]. such issues, some BD research problems require assistance
from BD research groups, governments, and organizations.
IV. EXPERIMENTAL SETUP & RESULT For most researchers, the characteristics of big data provide a
Installed spark 3.1.2 with 2 core 8 GB ram standalone node. major challenge.
In this study, we used varied data size to perform our
execution using a different number of cores in the with 1. Volume
different data size. Run the word count program to analyse the
speedup and processing time. In Fig 3,4,5 Show less handling Volume is one of the challenging issues in BD. Increased
time such as processing time when we are expanding the internet usage and IOT devices the volume of generated data
quantity of cores as for the document size. is growing up day by day. The volume of generated data is
increased but the percentage of data used for analysis is less.
As data flows in the company increase, the percentage of data
that can be processed decreases, the challenge becomes
evident. According to Statista, the total quantity of data
consumed globally is expected to reach 64.2 zettabytes in
2020, 79 zettabytes in 2021, and over 180 zettabytes in 2025.

Figure 3. Big Data Processing Time

Figure 6. Data growth Rate

Figure 4. Spark Processing time with two cores

2. Variety

Handling a different kind of data, including structured, semi

structured, and unstructured data, is another issue in BD. Data
locality is also a challenge in big data.

494

3. Velocity [12] Verma, Ankush, Ashik Hussain Mansuri, and Neelesh Jain. "Big data
management processing with Hadoop MapReduce and spark
technology: A comparison." 2016 Symposium on Colossal Data
The problem of data streaming falls under the velocity Analysis and Networking (CDAN). IEEE, 2016.
characteristic of BD. Data availability and concept drift are [13] Lee, Jinbae, Bobae Kim, and Jong-Moon Chung. "Time estimation and
two BD challenges that must be addressed while data is in resource minimization scheme for apache spark and hadoop big data
systems with failures." Ieee Access 7 (2019): 9658-9666.
motion.
[14] Nasser, T., and R. S. Tariq. "Big data challenges." J Comput Eng Inf
Technol 4: 3. doi: http://dx. doi. org/10.4172/2324 9307.2 (2015).
4. Veracity [15] Salloum, S., Dautov, R., Chen, X. et al. Big data analytics on Apache
Spark. Int J Data Sci Anal 1, 145–164 (2016).
https://doi.org/10.1007/s41060-016-0027-9
Uncertainties, falsehood, and missing values are all examples
[16] Hajjaji, Yosra, et al. "Big data and IoT-based applications in smart
of veracity. The quality of the data influences the quality of environments: A systematic review." Computer Science Review 39
the research findings. Many experts believe that the largest (2021): 100318.
difficulty in BD is truthfulness. [17] Oussous, Ahmed, et al. "Big Data technologies: A survey." Journal of
King Saud University-Computer and Information Sciences 30.4
(2018): 431-448
VI. CONCLUSION [18] Aziz, Khadija, Dounia Zaidouni, and Mostafa Bellafkih. "Leveraging
resource management for efficient performance of Apache
The foundational concept of BD is presented in this work. Spark." Journal of Big Data 6.1 (2019): 1-23.
These topics include BD characteristics, types of BD, BD
processing, big data applications, and big data difficulties.
Data analysis using big data processing engines spark shows
that spark is better execution engine with high scalability and
can get low processing time with increasing number of cores
in apace spark.

REFERENCES

[1] Krechowicz, Adam, Stanisław Deniziak, and Grzegorz Łukawski.

"Highly Scalable Distributed Architecture for NoSQL Datastore
Supporting Strong Consistency." IEEE Access 9 (2021): 69027-69043.
[2] Jayasri N.P., R. Aruna, " Big data analytics in health care by data
mining and classification techniques,” ICT Express, 2021
[3] T. Z. Emara and J. Z. Huang, "Distributed Data Strategies to Support
Large-Scale Data Analysis Across Geo-Distributed Data Centers,"
in IEEE Access, vol. 8, pp. 178526-178538, 2020, doi:
10.1109/ACCESS.2020.3027675.
[4] R. Gupta, A. R. Al-Ali, I. A. Zualkernan and S. K. Das, "Big Data
Energy Management, Analytics and Visualization for Residential
Areas," in IEEE Access, vol. 8, pp. 156153-156164, 2020, doi:
10.1109/ACCESS.2020.3019331
[5] Shen, J., Zhou, T., & Chen, L. Collaborative filtering-based
recommendation system for big data. International Journal of
Computational Science and Engineering, Vol. 21, PP. 219,
2020, doi:10.1504/ijcse.2020.105727
[6] Gokhan, Silahtaroglu and Nevin Yılmazturk, “Data analysis in health
and big data: A machine learning medical diagnosis model based on
patients’complaints,” Communications in Statistics-Theory and
Methods, 2019, doi: 10.1080/03610926.2019.1622728
[7] Mahboob Alam & Mohd Amjad ,“Weather forecasting using parallel
and distributed analytics approaches on big data clouds,” Journal of
Statistics and Management Systems, Vol. 22:4, PP.791-799, 2019,doi:
10.1080/09720510.2019.1609559.
[8] Shihao Zhou, Zhilei Qiao, Qianzhou Du, G. Alan Wang, “ Weiguo Fan
& Xiangbin Yan,” Measuring Customer Agility from Online
ReviewsUsing Big Data Text Analytics,” Journal of Management
Information System, Vol: 35:2,PP 510-539, 2018, doi:
10.1080/07421222.2018.1451956
[9] Stoica, Ion. "Trends and challenges in big data
processing." Proceedings of the VLDB Endowment 9.13 (2016): 1619-
1619.
[10] https://www.ibm.com/cloud/blog/hadoop-vs-spark
[11] Thakur, Bhupender Singh, and Kishori Lal Bansal. "Performance
Evaluation Of Apache Hadoop, Apache Spark, And Apache
Flink." Advances In Management, Social Sciences and Technology:
93.

495

Authorized licensed use limited to: ULAKBIM UASL - Atilim Universitesi. Downloaded on November 24,2022 at 15:14:57 UTC from IEEE Xplore. Restrictions apply.

BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
22 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Big Data(1) [Autosaved]
No ratings yet
Big Data(1) [Autosaved]
13 pages
A Survey - Data Security and Privacy Big Data
No ratings yet
A Survey - Data Security and Privacy Big Data
6 pages
Evolution of Big Data and Tools For Big Data
No ratings yet
Evolution of Big Data and Tools For Big Data
9 pages
Informatics Engineering, An International Journal (IEIJ)
No ratings yet
Informatics Engineering, An International Journal (IEIJ)
20 pages
BCA Concrete Usage Index
100% (2)
BCA Concrete Usage Index
91 pages
Big Data in Pharmaceutical Industry
No ratings yet
Big Data in Pharmaceutical Industry
10 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
BDS Module-1
No ratings yet
BDS Module-1
59 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
4-Big Data Management
No ratings yet
4-Big Data Management
40 pages
Big Data and its characteristics
No ratings yet
Big Data and its characteristics
21 pages
BDAV Question Bank Solution
No ratings yet
BDAV Question Bank Solution
63 pages
Module 6_Big Data and NOSQL
No ratings yet
Module 6_Big Data and NOSQL
63 pages
BD-Unit-1
No ratings yet
BD-Unit-1
63 pages
Big-Data-1
No ratings yet
Big-Data-1
22 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
UNIT-1 BDA
No ratings yet
UNIT-1 BDA
20 pages
Research IN BIG Data - AN: Dr. S.Vijayarani and Ms. S.Sharmila
No ratings yet
Research IN BIG Data - AN: Dr. S.Vijayarani and Ms. S.Sharmila
20 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
UNIT I
No ratings yet
UNIT I
25 pages
Lec 1 - Introduction to Big Data
No ratings yet
Lec 1 - Introduction to Big Data
37 pages
Bda M1
No ratings yet
Bda M1
111 pages
Midterm Report 1
No ratings yet
Midterm Report 1
21 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Chemistry Paper 1 - 2025 Kala Pre Mock Examination-4475
No ratings yet
Chemistry Paper 1 - 2025 Kala Pre Mock Examination-4475
11 pages
Current Big Data Issues and Their Solutions Via Deep Learning: An Overview
No ratings yet
Current Big Data Issues and Their Solutions Via Deep Learning: An Overview
12 pages
Arora 2016
No ratings yet
Arora 2016
6 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Big Data With Cloud Computing Discussions and Challenges
No ratings yet
Big Data With Cloud Computing Discussions and Challenges
9 pages
Unit 1
No ratings yet
Unit 1
18 pages
Unit 1
No ratings yet
Unit 1
56 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
Big Data
No ratings yet
Big Data
2 pages
Yesterday, Today and Tommorrow of Big Data
No ratings yet
Yesterday, Today and Tommorrow of Big Data
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
Sns College of Engineering: Big Data Analytics
No ratings yet
Sns College of Engineering: Big Data Analytics
17 pages
Big Data Concept Handling and Challenges An Overvi
No ratings yet
Big Data Concept Handling and Challenges An Overvi
5 pages
BigData
No ratings yet
BigData
2 pages
UNIT I BDA
No ratings yet
UNIT I BDA
18 pages
A_Review_of_Machine_Learning_Techniques
No ratings yet
A_Review_of_Machine_Learning_Techniques
6 pages
Unit 5 - Principles of Big Data 2
No ratings yet
Unit 5 - Principles of Big Data 2
14 pages
Unit 1
No ratings yet
Unit 1
26 pages
Sem Csen1301
No ratings yet
Sem Csen1301
12 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Big Data Basics Unit 1
No ratings yet
Big Data Basics Unit 1
12 pages
Big Data
No ratings yet
Big Data
7 pages
What Is Data
No ratings yet
What Is Data
20 pages
Bigdata
No ratings yet
Bigdata
12 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
BCC (IEEE Format) Big Data
No ratings yet
BCC (IEEE Format) Big Data
2 pages
Big Type Data
No ratings yet
Big Type Data
4 pages
henna hair dye formulation
No ratings yet
henna hair dye formulation
19 pages
Civil Engineer Sahab (Free Questions E-Book) - 2142
No ratings yet
Civil Engineer Sahab (Free Questions E-Book) - 2142
249 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Esslce 2014 With Soln Tell
No ratings yet
Esslce 2014 With Soln Tell
14 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Strategy Templates - StrategyQuant
No ratings yet
Strategy Templates - StrategyQuant
9 pages
Li-Ion Battery Thermal Management
No ratings yet
Li-Ion Battery Thermal Management
45 pages
73-2022 Using Energy Efficient Security Technology To Protect The Migration of Live Virtual Machines in The Cloud Computing Infrastructure
No ratings yet
73-2022 Using Energy Efficient Security Technology To Protect The Migration of Live Virtual Machines in The Cloud Computing Infrastructure
23 pages
68-2021 Research On Security Algorithm of Virtual
No ratings yet
68-2021 Research On Security Algorithm of Virtual
17 pages
Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML
No ratings yet
Test Point Insertion For Test Coverage Improvement in DFT - Design For Testability (DFT) .HTML
75 pages
A01 Simulation
No ratings yet
A01 Simulation
59 pages
72-2015 Towards - Security-Aware - Virtual - Server - Migration - Optimization - To - The - Cloud
No ratings yet
72-2015 Towards - Security-Aware - Virtual - Server - Migration - Optimization - To - The - Cloud
10 pages
Gomez Et Al (2017) - EFQM Excellence Model and TQM An Empirical Comparison
No ratings yet
Gomez Et Al (2017) - EFQM Excellence Model and TQM An Empirical Comparison
17 pages
CH-4
No ratings yet
CH-4
19 pages
Solar Energy: Mahfoud Abderrezek, Mohamed Fathi
No ratings yet
Solar Energy: Mahfoud Abderrezek, Mohamed Fathi
13 pages
IT Network System Administrator Competition
No ratings yet
IT Network System Administrator Competition
10 pages
74-2023 OSVR An Efficient Support Vector Regression Model Based Host
No ratings yet
74-2023 OSVR An Efficient Support Vector Regression Model Based Host
9 pages
Aubf Prelim 1
No ratings yet
Aubf Prelim 1
59 pages
Create Login Application in Excel Macro Using Visual Basic: Karthikeyan K Article
No ratings yet
Create Login Application in Excel Macro Using Visual Basic: Karthikeyan K Article
16 pages
Stomach: Arterial and Venous Blood Supply
No ratings yet
Stomach: Arterial and Venous Blood Supply
10 pages
ASME BPVC VIII Div1_ Spot radiography
No ratings yet
ASME BPVC VIII Div1_ Spot radiography
8 pages
2014 IMAS - Round 2 - JU - Pro
No ratings yet
2014 IMAS - Round 2 - JU - Pro
8 pages
A Framework For Monitoring and Security Authentication in Cloud Based On Eucalyptus
No ratings yet
A Framework For Monitoring and Security Authentication in Cloud Based On Eucalyptus
5 pages
Performance Comparison of Apache Hadoop and Apache Spark
No ratings yet
Performance Comparison of Apache Hadoop and Apache Spark
5 pages
Lid Driven Flow
No ratings yet
Lid Driven Flow
8 pages
Poutrelles Alvéolaires À Ouvertures Hexagonales: IPE IPE
No ratings yet
Poutrelles Alvéolaires À Ouvertures Hexagonales: IPE IPE
3 pages
09_ Java FutureTask example _ 800+ Big Data & Java Interview FAQs
No ratings yet
09_ Java FutureTask example _ 800+ Big Data & Java Interview FAQs
6 pages
Cross Border Valuation
No ratings yet
Cross Border Valuation
18 pages
195600-NEMAcourse-analog_mag-manual
No ratings yet
195600-NEMAcourse-analog_mag-manual
5 pages
Volvo Penta Inboard Diesel: Technical Data
No ratings yet
Volvo Penta Inboard Diesel: Technical Data
2 pages
SONEX Ceiling and Wall Tiles
No ratings yet
SONEX Ceiling and Wall Tiles
2 pages
Formation of The Universe and Solar System
No ratings yet
Formation of The Universe and Solar System
5 pages
Attention: 6R140 Installation Guide
No ratings yet
Attention: 6R140 Installation Guide
4 pages
Assessment Framework For Sea 2019-2023 (16!11!2017)
No ratings yet
Assessment Framework For Sea 2019-2023 (16!11!2017)
30 pages
Ring Frame - Package Building
No ratings yet
Ring Frame - Package Building
13 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Mastering Vector Databases: The Future of Data Retrieval and AI
From Everand
Mastering Vector Databases: The Future of Data Retrieval and AI
Robert Johnson
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)

Apache Spark Based Analysis On Word Count Application in Big Data

Uploaded by

Apache Spark Based Analysis On Word Count Application in Big Data

Uploaded by

2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM)

Apache Spark based analysis on word count

Abstract—The rise in the volume of data, as well as the type of

Keywords— Big Data (BD), Hadoop, Spark, data processing.

With the spike in use of internet, social media, mobile phones,

TABLE I. RECENT APPLICATION IN BIG DATA

big data analysis. Based on available data, weather forecasting

Figure 3. Big Data Processing Time

Figure 6. Data growth Rate

Figure 4. Spark Processing time with two cores

Handling a different kind of data, including structured, semi

[1] Krechowicz, Adam, Stanisław Deniziak, and Grzegorz Łukawski.

You might also like