0% found this document useful (0 votes)

40 views9 pages

Big Data With Cloud Computing Discussions and Challenges

Uploaded by

Die Perper

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views9 pages

Big Data With Cloud Computing Discussions and Challenges

Uploaded by

Die Perper

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

BIG DATA MINING AND ANALYTICS

ISSN 2096-0654 03/06 pp 32 – 40

Vo l u m e 5 , N u m b e r 1, M a r c h 2 0 2 2
DOI: 10.26599/BDMA.2021.9020016

Big Data with Cloud Computing: Discussions and Challenges

Amanpreet Kaur Sandhu

Abstract: With the recent advancements in computer technologies, the amount of data available is increasing
day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing
services provide a powerful environment to store large volumes of data. They eliminate various requirements,
such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a
time-consuming task that requires large computational clusters to ensure successful data storage and processing.
In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud
services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud,
Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed.
Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and
data visualization.

Key words: big data; data analysis; cloud computing; Hadoop

1 Introduction to store reliable and accurate results for big data[3] . In

addition, big data require state-of-the-art technology to
With recent technological advancements, the amount of
efficiently store and process large amounts of data within
data available is increasing day by day. For example,
a limited run time.
sensor networks and social networking sites generate
Three different types of big data platforms are
overwhelming flows of data. In other words, big data
interactive analysis tools, stream processing tools, and
are produced from multiple sources in different formats
batch processing tools[4] . Interactive analysis tools are
at very high speeds[1] . At present, big data represent an
used to process data in interactive environments and
important research area. Big data are rapidly produced
interact with real-time data. Apache Drill and Google’s
and are thus difficult to store, process, or manage using
Dremel are the frameworks for storing real-time data.
traditional software. Big data technologies are tools
Stream processing tools are used to store information
that are capable of storing meaningful information in
in continuous flow[5] . The main platforms for storing
different types of formats. For the purpose of meeting
streaming information are S4 and Strom. Hadoop
users’ requirements and analyzing and storing complex
infrastructure is utilized to store information in batches.
data, a number of analytical frameworks have been
Big data techniques are involved in various disciplines,
made available to aid users in analyzing complex
such as signal processing, statistics, visualization,
structured and unstructured data[2] . Several programs,
social network analysis, neural networks, and data
models, technologies, hardware, and software have been
mining[6] . Mohajer et al.[7] designed an interactive
proposed and designed to access the information from
gradient algorithm that receives controlled messages
big data. The main objective of these technologies is
from neighboring nodes. The proposed method uses
Amanpreet Kaur Sandhu is with University Institute of a self-optimization framework for big data.
Computing, Chandigarh University, Mohali 140413, India.
E-mail: [email protected]. 2 Definitions of Big Data
* To whom correspondence should be addressed.
Manuscript received: 2021-06-11; revised: 2021-09-12;
Big data are huge in size and are difficult to manage
accepted: 2021-09-13 and analyze relative to traditional data. Storing big data
C The author(s) 2022. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 33

requires scalable architecture and efficient storage and Table 2 Units of data.
manipulation. Table 1 presents the existing definitions Name of unit Equals Size in bytes
of big data. Bit 1 or 0 1/8
Nibble 4 bits 1/2
2.1 Characteristics of big data Byte 8 bits 1
Big data are characterized by three Vs: volume, velocity, Kilobyte (KB) 1024 bytes 210
and variety. These characteristics were introduced by Megabyte (MB) 1024 KB 220
Gigabyte (GB) 1024 MB 230
Gartner to define the various challenges in big data[12] .
Terabyte (TB) 1024 GB 240
With new-generation architecture, data are now stored
Petabyte (PB) 1024 TB 250
in different types of formats; hence, the three Vs may be Exabyte (EB) 1024 PB 260
extended to five Vs, namely, volume, velocity, variety, Zettabyte (ZB) 1024 EB 270
value, and veracity[13] . Yottabyte (YB) 1024 ZB 280
(1) Volume: Data are generated by multiple sources
(sensors, social networks, smartphones, etc.) and Table 3 Users in India as of February 2021.
are continuously expanding. The Internet produces Application name Count Application name Count
global data in large increments. In 2012, approximately WhatsApp 53 Crore Instagram 21 Crore
YouTube 44.8 Crore Twitter 1.75 Crore
2.5 exabytes (EB) of data were produced every
Facebook 41 Crore
day. According to the report of International Data
Cooperation, the volume of data in 2013 doubled,
reaching 4.4 zettabytes (ZB). In 2020, the volume of
data reached 40 ZB. Table 2 shows the names of the
units of data that can be measured in bytes[14] .
(2) Velocity: Data are exponentially growing at high
speeds. Millions of connected devices are added on
a daily basis, thereby leading to increases in not only
volume but also velocity[15, 16] . One relevant example is
YouTube, which generates big data at high speeds[17, 18] .
Table 3 presents the number of users in India who had
used social media networks by February 2021. Figure 1

Table 1 Definitions of big data.

Reference Author’s name Definition
Big data are massive in size and
cannot fit into Excel spreadsheets Fig. 1 Five Vs of big data.
[8] Batty
comprising approximately 16 000
shows the five Vs of big data.
columns and 1 million rows.
Big data cannot be loaded into
(3) Variety: Data are generated in multiple formats
[9] Havens et al. local storage devices (computer via social networks, smartphones, or sensors. These tools
memory). produce data in the form of data logs, images, videos,
Big data cannot be easily audio, documents, and text. Data may also be structured,
[10] Fisher et al. processed and managed in a semistructured, and unstructured[19] .
straightforward manner. (4) Value: Value is an important characteristic of
Big data have several
big data. It relates to how data can be dealt with and
The State Council characteristics, such as high
[11] of People’s application value, fast access converted into meaningful information[20] .
Republic of China speed, large volume, and multiple (5) Veracity: Veracity refers to the quality,
types. correctness, and trustworthiness of data. Therefore,
Big data have large volume, maintaining veracity in data is mandatory[21, 22] . For
variety, and velocity that demand example, data in huge amounts create confusion,
[12] Bayer and Laney
cost effectiveness and are helpful
whereas small amounts of data can convey incomplete
in decision making.
or half information.
34 Big Data Mining and Analytics, March 2022, 5(1): 32–40

2.2 Types of big data and fetching values from hidden streams and datasets.
Data are produced at unprecedented rates from various Traditional data mining techniques are related to
sources, such as financial, government, health, and clustering, association rule mining, accuracy, scalability,
social networks. Such rapid growth of data can be and classification, whereas big data are related to
attributed to smart devices, the Internet of Things, dynamic environments[36, 37] .
etc. In the last decades, companies have failed to 3.2 Deep learning
store data efficiently and for long periods[23, 24] . This
Deep learning is another important aspect in the
drawback relates to traditional technologies that lack
field of machine learning and pattern recognition. It
adequate storage capacity and are costly. Meanwhile, big
allows predictive analysis and involves natural language
data require new storage methods backed by powerful
processing, speech recognition, and computer vision.
technologies[25, 26] . Big data can be classified into several
The application of deep learning is to resolve the issues
categories. Figure 2 depicts the classification of big data.
in data analysis and help extract complex datasets
Table 4 summarizes the definitions of various types of
from huge amounts of data. Deep learning is called
big data.
hierarchical learning because it extracts data from
3 Big Data with Machine Learning complex datasets at different levels. It is very helpful
in the analysis of large volumes of data, information
The main function of machine learning techniques retrieval, data tagging, and discrimination tasks (e.g.,
is to discover knowledge and make intelligent prediction and classification)[38] .
decisions. Machine learning is used in various real-
world applications, such as data mining, recognition 4 Cloud Computing
systems, recommendation engines, and autonomous
control systems. The machine learning domain can be Cloud computing offers a cost-efficient and scalable
divided into three areas, namely, supervised learning, solution to store big data. According to the National
unsupervised learning, and reinforcement learning[35] . Institute for Standards and Technology, “Cloud
Computing is based on pay-per-use services for enabling
3.1 Data streaming learning convenient, on-demand network access to a shared pool
Various real-time world technologies, such as of configurable computing resources such as servers,
stock management, network traffic, and credit card networks, and services that can be rapidly provisioned
transactions, generate huge datasets. Data mining and released with minimal management effort or service
plays an important role in finding interesting patterns provider interaction”. Cloud computing services can be

Fig. 2 Types of big data.

Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 35

Table 4 Types of big data.

Type Category Explanation
Social media represents an important aspect of big data. Facebook, Twitter, emails, and microblogs are
Social media
social media sources that generate massive amounts of data daily[27] .
Machine- Software and hardware, such as medical devices, computers, and other types of machines that generate
Data generated data data without human interferences.
source Sensing Various types of sensing devices that generate data and convert them into signals[28] .
Transaction Financial, business, and work data generate time-based dimensions that define data.
Tablets, smartphones, and digital camera devices are connected over the Internet and thus generate huge
IoT
amounts of data and information.
Structured-data are in a consistent order with a well-defined format. The advantage of structured-data is
Structured-data that they are easy to maintain, access, and store on computers. Structured-data are stored in the form of
rows and columns; an example is a DataBase Management System (DBMS)[29] .
Content
Semi-structured data can be considered as another form of structured-data. It inherits a few properties
format Semi-structured
of structured-data that do not represent the data in database models. An example is Common Separated
source data
Value (CSV) files[30] .
Unstructured Unstructured data do not follow the formal structure rules of data models. Images, videos, text messages,
data and social media posts are examples of unstructured data.
Key value stores are used to store and access data in key/value pairs. They are basically designed to
Key value
store massive data and manage heavy loads. Apache HBase, Apache Cassandra, Redis, and Riak are
stores
examples of key value store databases[31] .
Graph stores are used to analyze data on the basis of the relationships between nodes, edges, and
Data Graph stores
properties. Neo4j is an example of a graph store.
store
Column family stores keep data and information within a column of a table at the same location on a
sources Column family
disk in the same way a row store keeps row data together. Google Bigtable is an example of column
stores
family stores.
Document- Document-oriented stores offer complex data forms in multiple formats, such as XML, JSON, text,
oriented store string, array, or binary forms. CouchDB and MongoDB are examples of document-oriented stores[32] .
Cleaning Cleaning is a process in which noisy data, outliers, and missing values are removed.
Data
Transformation In data transformation, data are transformed in an appropriate format for analysis.
staging
Normalization Normalization is a process used to reduce redundancies from data[33] .
Batch data MapReduce-based systems are used to process data in the form of batches. Apache Hadoop, Apache
Data processing Mahout, Skytree Server, and Dryad are examples of batch processing.
processing Real-time data Streaming systems, such as S4, are based on distributed frameworks that allow users to design
processing applications for processing continuous unbounded streams of data[34] .

classified into the following three categories[39] : the Internet, all applications are run on remote cloud
(1) Infrastructure as a Service (IaaS): These infrastructure in SaaS. To access SaaS services, users
services are basically based on the principle of “pay need an Internet connection and a web browser, such as
for what you need”. It provides high-performance Google Chrome or Internet Explorer[40] . Users connect
computing to customers. Amazon Web Services (AWS), to a desktop environment via a virtual machine, in which
Elastic Compute Cloud, and Simple Storage Services all software programs are installed. SaaS provides more
(S3) are examples of IaaS. AWS and S3 provide online facilities to users than IaaS.
storage services. At nominal charges, customers can (3) Platform as a Service (PaaS): It provides
easily access the world’s largest data centers. At present, a runtime environment to users. It allows users to
three companies provide IaaS landscape services: create, test, and run web applications. Users can
Google, Microsoft, and HP. Google provides Google easily access PaaS on the basis of the pay-per-use
Compute Engine to access IaaS services. Microsoft also mode using an Internet connection. PaaS provides the
provides a cloud platform through its Window Azure infrastructure (networking, storage, and services) and
Platform. HP offers HP Cloud, which is designed by platform (DBMS, business intelligence, middleware) for
NASA and Rack Space. running a web application life cycle. Examples of PaaS
(2) Software as a Service (SaaS): With the help of include Microsoft Azure and Google Cloud[41] .
36 Big Data Mining and Analytics, March 2022, 5(1): 32–40

The cloud computing environment has two important manner. Hadoop plays an important role in the storage
aspects: the frontend and the backend. From the frontend of distributed datasets on the cloud[44] . The Hadoop
side, users access cloud services through an Internet Distributed File System (HDFS) stores large amounts
connection; at the backend, all cloud services are run. of data in distributed form. The HDFS is a data storage
Figure 3 shows the various types of cloud computing management system in Hadoop. The advantage of the
services[42] . HDFS is that it is cost effective and capable of managing
Big data and cloud computing are closely associated. thousands of nodes in a cluster and massive amounts of
With technological changes, big data models provide unstructured data. It works as a batch processing system
distributed processing, parallel technologies, large with latency operations.
storage capacity, and real-time analysis of heterogeneous Moreover, Hadoop increases system performance and
databases. Data security and privacy are also considered avoids network congestion. The HDFS is based on
in big data models. Big data require large amounts master-slave architecture. The HFDS comprises various
of storage space and thus entail the use of cloud types of daemons, such as DataNode, NameNode,
computing. Cloud computing offers scalability and cost Secondary NameNode, Resource Manager, and Node
savings[43] . Moreover, it provides massive amounts of Manager. NameNode and Resource Manager work as
storage capacity and processing power. master nodes, while DataNode and Node Manager work
Cloud computing works on different types of as slave nodes. In addition, it avoids fault tolerance
technologies, such as distributed storage and with the help of data replication on various servers.
virtualization, and processes data for different The primary NameNode is used to solve computation
types of tasks. It accesses distributed queries over problems and establishes coordination with DataNode.
multiple datasets and gives responses in a timely Secondary NameNode manages the availability and
replication of data. The relationship between big data
and cloud computing is shown in Fig. 4.
Cloud computing provides various types of facilities,
such as processing, computation, and storage of big data.
Moreover, the cloud computing infrastructure offers an
efficient and effective platform to determine the storage
requirements of big data analysis. It is also correlated
with new patterns for the analysis of various types of
resources that are available in the cloud. Several cloud-
based technologies have been developed to deal with big
Fig. 3 Cloud computing services. data for parallel processing. MapReduce (MapR) is an

Fig. 4 Big data and cloud computing.

Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 37

example of big data processing in a cloud environment issues and threats, such as the availability of data,
that allows the storage of massive amounts of data in a confidentiality, real-time monitoring, identity and access
cluster[45] . authorization control, integrity, and privacy, exist in
In other words, MapR is an efficient and cost effective big data when used with cloud computing frameworks.
model for processing big data. The MapR framework Therefore, data security must be measured once data are
comprises the map and reduce functions for handling big outsourced to cloud service providers[48] .
data. (3) Heterogeneity: Big data are heterogeneous in
Cloud computing also plays an important role nature because data are gathered from multiple devices
in distributed system environments by facilitating in different formats, such as images, videos, audio, and
storage, boosting computing power, and aiding network text. Before loading data into a warehouse, they need to
communication. Big data technologies store data in be transformed and cleaned, and the processes present
cloud clusters rather than in local storage file systems.
challenges in big data[49] . Combining all unstructured
Several companies provide big data cloud platforms.
data and reconciling them for use in report creation are
Moreover, various cloud computing platforms are
incredibly difficult to achieve in real-time.
available to store big data. Table 5 shows a comparative
(4) Data processing and cleaning: Data storage and
analysis of big data cloud frameworks for storing
acquisition require preprocessing and cleaning, which
massive amounts of data[46] . Cloud services such
involves data merging, data filtering, data consistency,
as Microsoft Azure, Google Cloud, AWS, IBM,
and data optimization. Thus, processing and cleaning
Hortonworks, and MapR are compared on the basis of
data are difficult because of the wide variety of data
various parameters.
sources[50] . Moreover, data sources may contain noise
5 Research Issues in Big Data and errors, or they may be incomplete. The challenge is
how to clean large amounts of data and how to determine
As data are growing at exponential rates, a number
whether such data are reliable.
of issues and problems emerge during the processing
(5) Data visualization: Data visualization is a
and storage of big data. Few tools are available to
technique to represent complex data in a graphical form
resolve these issues and problems in a cloud environment.
for clear understanding. If the data are structured, then
Technologies, such as PigLatin, Dryad, MongoDB,
they can be easily represented in the traditional graphical
Cassandra, and MapR, are not able to resolve these issues
way. If the data are unstructured or semistructured, then
in big data processing. Even with the help of Hadoop and
they are difficult to visualize with high diversity in real-
MapR, users cannot execute queries on databases, and
time.
they have low-level infrastructures for data processing
and management. Some issues and problems in big data 6 Conclusion
are summarized as follows[47] :
(1) Distributed database storage system: In the last decades, the size of data has grown, and it
Numerous technologies are used to store and retrieve continues to increase day by day. Data are generated
huge amounts of data. Cloud computing is an important in different formats (variety) by multiple sources.
aspect of big data. Big data are generated by multiple Therefore, the variety of data is also expanding. Mobile
devices on a daily basis. At present, the main issue devices and sensor networks that are connected generate
in distributed frameworks is the storage of data in data at very high speeds (velocity). Cloud computing
a straightforward manner and the processing and services are used to process, analyze, and store data
migration of data between distributed servers. without the need for a dedicated space and maintenance
(2) Data security: Security threats are an important of expensive computer hardware and software. This
issue in a cloud computing environment. Cloud study reviews the relationship between big data and
computing has been transformed with modern cloud computing. Furthermore, a comparative analysis
information and communication technologies, and of big data and cloud services is performed. Big
several types of unresolved security threats exist in big data involve various issues and problems, such as
data. Data security threats are magnified by the variety, distributed database storage, data privacy/security, and
velocity, and volume of big data. Meanwhile, various heterogeneity/data formats.
38
Table 5 Big data cloud frameworks.
Microsoft Azure Google Cloud AWS IBM Hortonworks MapR
Founding date Oct. 2008 Oct. 2015 2006 2011 June 2011 2009
Provides Azure
Big data analytics Provides Google Cloud Provides Amazon Elastic search Provides various IBM Provides Hortonworks Provides MapR data
HDInsight services
Dataproc services analysis engines Data Platform (HDP) analysis platform
Types of software Open-source Open-source Open-source framework Open-source Open-source Licensed
framework framework framework framework
Structured, Structured, Structured,
Content format Only unstructured data semistructured, Structured, semistructured, and Only unstructured data semistructured, and semistructured, and
format and unstructured unstructured format unstructured unstructured
Windows Server, CentOS, RedHat,
Types of OS supported Debian 8 Linux, Ubuntu, CentOS Only CentOS7 Linux, Ubuntu
Ubuntu14 Ubuntu
Access machine
Real-time, logs Real-time, logs
Access batch and learning, streaming, Real-time and stream
Various applications analytics, and Data analytics analytics, and
stream processes and batch process analytics
stream analytics stream analytics
application
HDFS, YARN,
Framework execution Hadoop Big Query Elastic MapReduce Elastic MapReduce MapR
MapReduce2
Big data storage Hortonworks Data
Microsoft Azure Google Cloud Services S3 IBM Cloud Object MapR Data Analysis
framework Platform
Storage Platform
Types of storage Distributed Distributed Distributed Distributed Centralized Distributed
Content format XML JSON, CSV Any Any Any Any
Storage size limit Limited Limited Unlimited Unlimited Limited Unlimited
Metadata Yes Yes Yes Yes No Yes
Relational database PostgreSQL, Oracle,
SQL Azure Cloud SQL Oracle or MySQL SQL SQL
management system MySQL
SQL Azure Data Data Warehouse
Big data warehouse Big Query Amazon Red Shift Db2 warehouse HIVE warehouse
Warehouse Optimization (DWO)
Content format ORC, RC, Parquet JSON, CSV ORC, CSV, TSV CSV ORC JSON
NoSQL (Not only SQL) Stored in table format AppEngine data store DynamoDB Apache Accumulo MongoDB MongoDB
database system
Search application Storm, Spark
Streaming processing SteamInsight Nothing prepackaged Apache Spark Apache Spark
programming interface Streaming, and Flink
Prediction application Hortonworks Data HPE Intelligent Data
Machine learning Mahout Mahout Hadoop
programming interface Platform Platform
Big Data Mining and Analytics, March 2022, 5(1): 32–40
Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 39

References [18] R. Nachiappan, B. Javadi, R. N. Calheiros, and K. M.

Matawie, Cloud storage reliability for Big Data applications:
[1] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A.
A state of the art survey, J. Netw. Comput. Appl., vol. 97,
Gani, and S. U. Khan, The rise of ‘big data’ on cloud
pp. 35–47, 2017.
computing: Review and open research issues, Inform. Syst., [19] A. O’Driscoll, J. Daugelaite, and R. D. Sleator, ‘Big data’,
vol. 47, pp. 98–115, 2015. Hadoop and cloud computing in genomics, J. Biomed.
[2] J. H. Yu and Z. M. Zhou, Components and development in
Inform., vol. 46, no. 5, pp. 774–781, 2013.
big data system: A survey, J. Electr. Sci. Technol., vol. 17, [20] S. Karimian-Aliabadi, D. Ardagna, R. Entezari-Maleki,
no. 1, pp. 51–72, 2019.
E. Gianniti, and A. Movaghar, Analytical composite
[3] S. Kumar and K. K. Mohbey, A review on big data
performance models for Big Data applications, J. Netw.
based parallel and distributed approaches of pattern
Comput. Appl., vol. 142, pp. 63–75, 2019.
mining, J. King Saud Univ. – Comput. Inform. Sci., doi:
[21] H. F. Yu, A priori algorithm optimization based on Spark
10.1016/j.jksuci.2019.09.006.
platform under big data, Microprocess. Microsyst., vol. 80,
[4] Y. N. Liu, N. Li, X. Zhu, and Y. Qi, How wide is the
p. 103528, 2021.
application of genetic big data in biomedicine, Biomed.
[22] M. Muniswamaiah, T. Agerwala, and C. Tappert, Big data
Pharmacother., vol. 133, p. 111074, 2021.
in cloud computing review and opportunities, Int. J. Comput.
[5] V. Subramaniyaswamy, V. Vijayakumar, R. Logesh, and V.
Sci. Inform. Technol., vol. 11, no. 4, pp. 43–57, 2019.
Indragandhi, Unstructured data analysis on big data using
[23] T. Cherian and H. Bhadkamkar, A study and survey of big
map reduce, Procedia Comput. Sci., vol. 50, pp. 456–465,
data using data mining techniques, Int. J. Eng. Sci. Res.
2015.
Technol., vol. 6, no. 10, pp. 169–174, 2017.
[6] S. Maitrey and C. K. Jha, MapReduce: Simplified data
[24] A. Mehmood, I. Natgunanathan, Y. Xiang, G. Hua, and S.
analysis of big data, Procedia Comput. Sci., vol. 57, pp.
Guo, Protection of big data privacy, IEEE Access, vol. 4,
563–571, 2015.
[7] A. Mohajer, M. Barari, and H. Zarrabi, Big data pp. 1821–1834, 2016.
[25] S. Kumar and M. Singh, Big data analytics for healthcare
based self-optimization networking: A novel approach
industry: Impact, applications, and tools, Big Data Mining
beyond cognition, Intell. Automat. Soft Comput., doi:
Analytics, vol. 2, no. 1, pp. 48–57, 2019.
10.1080/10798587.2017.1312893.
[26] S. Kumar and M. Singh, A novel clustering technique for
[8] M. Batty, Big data, smart cities and city planning, Dialogues
efficient clustering of big data in Hadoop Ecosystem, Big
in Human Geography, vol. 3, no. 3, pp. 274–279, 2013.
[9] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and Data Mining Analytics, vol. 2, no. 4, pp. 240–247, 2019.
[27] C. K. Leung, Y. B. Chen, S. Y. Shang, and D. Y. Deng, Big
M. Palaniswami, Fuzzy c-means algorithms for very large,
data science on COVID-19 data, in Proc. of 2020 IEEE 14th
IEEE Trans. Fuzzy Syst., vol. 20, no. 6, pp. 1130–1146,
2012. Int. Conf. Big Data Science and Engineering, Guangzhou,
[10] D. Fisher, R. Deline, M. Czerwinski, and S. Drucker, China, 2020, pp. 14–21.
Interactions with big data analytics, Interactions, vol. 19, [28] M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and
no. 3, pp. 50–59, 2012. K. Sadatdiynov, A survey of data partitioning and sampling
[11] The State Council of the People’s Republic of China, methods to support big data analysis, Big Data Mining
Action plan for promoting big data development, Analytics, vol. 3, no. 2, pp. 85–101, 2020.
(in Chinese), http://www.gov.cn/zhengce/content/2015- [29] S. Aslam and M. A. Shah, Load balancing algorithms in
09/05/content 10137.htm, 2015. cloud computing: A survey of modern techniques, in Proc.
[12] M. A. Beyer and D. Laney, The importance of ‘big data’: A of 2020 IEEE National Software Engineering Conference,
definition, Stamford, CT, USA: Gartner, G00235055, 2012. doi: 10.1109/NSEC.2015.7396341.
[13] L. Rabhi, N. Falih, A. Afraites, and B. Bouikhalene, Big [30] D. A. Shafiq, N. Z. Jhanjhi, and A. Abdullah, vLoad
data approach and its applications in various fields: Review, balancing techniques in cloud computing environment: A
Procedia Comput. Sci., vol. 155, pp. 599–605, 2019. review, Journal of King Saud University-Computer and
[14] F. Ridzuan and W. M. N. Wan Zainon, A review on data Information Sciences, doi: 10.1016/j.jksuci.2021.02.007.
cleansing methods for big data, Procedia Comput. Sci., vol. [31] A. Oussous, F. Z. Benjelloun, A. Ait Lahcen, and S. Belfkih,
161, pp. 731–738, 2019. Big data technologies: A survey, J. King Saud Univ. –
[15] D. A. Shafiq, N. Z. Jhanjhi, and A. Abdullah, Load Comput. Inform. Sci., vol. 30, no. 4, pp. 431–448, 2018.
balancing techniques in cloud computing environment: A [32] R. Misra, B. Panda, and M. Tiwary, Big data
review, J. King Saud Univ. – Comput. Inform. Sci., doi: and ICT applications: A study, in Proc. 2nd
10.1016/j.jksuci.2021.02.007. International Conference on Information and
[16] S. Amamou, Z. Trifa, and M. Khmakhem, Data protection Communication Technology for Competitive Strategies,
in cloud computing: A survey of the state-of-art, Procedia https://doi.org/10.1145/2905055.2905099, 2016.
Comput. Sci., vol. 159, pp. 155–161, 2019. [33] B. Saraladevi, N. Pazhaniraja, P. V. Paul, M. S. S. Basha,
[17] P. J. Sun, Security and privacy protection in cloud and P. Dhavachelvan, Big data and Hadoop–A study in
computing: Discussions and challenges, J. Netw. Comput. security perspective, Procedia Computer Science, vol. 50,
Appl., vol. 160, p. 102642, 2020. pp. 596–601, 2015.
40 Big Data Mining and Analytics, March 2022, 5(1): 32–40

[34] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. [43] C. L. Philip Chen and C. Y. Zhang, Data-intensive
Zomaya, S. Foufou, and A. Bouras, A survey of clustering applications, challenges, techniques and technologies: A
algorithms for Big Data: Taxonomy and empirical analysis, survey on Big Data, Inform. Sci., vol. 275, pp. 314–347,
IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 267– 2014.
279, 2014. [44] S. Salloum, J. Z. Huang, and Y. He, Random sample
[35] A. Katal, M. Wazid, and R. H. Goudar, Big data: Issues, partition: A distributed data model for big data analysis,
challenges, tools and good practices, in Proc. 6th Int. Conf. IEEE Trans. Ind. Inform., vol. 15, no. 11, pp. 5846–5854,
Contemporary Computing, Noida, India, 2013, pp. 404– 2019.
409. [45] L. Q. Kong, Z. F. Liu, and J. G. Wu, A systematic review of
[36] C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, Big big data-based urban sustainability research: state-of-the-
data analytics: A survey, J. Big Data, vol. 2, no. 1, pp. 1–32, science and future directions, J. Clean. Prod., vol. 273, p.
2015. 123142, 2020.
[37] Z. Lv, H. Song, P. Basanta-Val, A. Steed, and M. Jo, Next- [46] P. Pääkkönen and D. Pakkala, Reference architecture and
generation big data analytics: State of the art, challenges, classification of technologies, products and services for big
and future research topics, IEEE Trans. Ind. Informatics, data systems, Big Data Res., vol. 2, no. 4, pp. 166–186,
vol. 13, no. 4, pp. 1891–1899, 2017. 2015.
[38] K. S. Jadon, R. S. Bhadoria, and G. S. Tomar, A review [47] M. Wook, N. A. Hasbullah, N. M. Zainudin, Z. Z. A. Jabar,
on costing issues in big data analytics, in Proc. 2015 S. Ramli, N. A. M. Razali, and N. M. M. Yusop, Exploring
International Conference on Computational Intelligence big data traits and data quality dimensions for big data
and Communication Networks (CICN), Jabalpur, India, analytics application using partial least squares structural
2016, pp. 727–730. equation modelling, J. Big Data, vol. 8, no. 1, pp. 1–15,
[39] O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat, G. K. 2021.
Karagiannidis, and K. Taha, Efficient machine learning [48] S. Saif and S. Wazir, Performance analysis of big data and
for big data: A review, Big Data Res., vol. 2, no. 3, pp. cloud computing techniques: A survey, Procedia Comput.
87–93, 2015. Sci., vol. 132, pp. 118–127, 2018.
[40] G. S. Bhathal and A. Singh, Big Data: Hadoop framework [49] S. M. Shamsuddin and S. Hasan, Data science vs. big data
vulnerabilities, security issues and attacks, Array, vols. 1&2, @ UTM big data centre, in Proc. of 2015 IEEE Int. Conf.
p. 100002, 2019. Science in Information Technology, Yogyakarta, Indonesia,
[41] J. Hurwitz, A. Nugent, F. Halper, and M. Kaufman, Big 2015, pp. 1–4.
Data for Dummies. Hoboken, NJ, USA: John Wiley & Sons, [50] T. Y. Yang and Y. Zhao, Application of cloud computing
Inc., 2013. in biomedicine big data analysis cloud computing in
[42] V. P. Lalitha, M. Y. Sagar, S. Sharanappa, S. Hanji, and big data, in Proc. of the 2017 Int. Conf. Algorithms,
R. Swarup, Data security in cloud, in Proc. of 2017 Int. Methodology, Models and Applications in Emerging
Conf. Energy, Communication, Data Analytics and Soft Technologies (ICAMMAET), Chennai, India, 2017, pp. 1–3.
Computing, Chennai, India, pp. 3604–3608, 2017.

Amanpreet Kaur Sandhu received the

PhD degree from IKG Punjab Technical
University, India in 2018. She is currently
an assistant professor at University Institute
of Computing, Chandigarh University,
Mohali, India. Her research interests
include image and video processing and Big
Data analytics.

Big Data
No ratings yet
Big Data
190 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
S4HANA MigrationCockpit
100% (3)
S4HANA MigrationCockpit
36 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
RSA Archer Threat Management 4 Overview Guide
No ratings yet
RSA Archer Threat Management 4 Overview Guide
33 pages
Requirements For Business Analytics Seilevel
100% (2)
Requirements For Business Analytics Seilevel
6 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Siebel Trouble Shooting Guide
No ratings yet
Siebel Trouble Shooting Guide
60 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Unlocking The Power of Big Data Analytics With Hadoop and NoSQL Databases For Beginners
No ratings yet
Unlocking The Power of Big Data Analytics With Hadoop and NoSQL Databases For Beginners
47 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
Issues of Privacy and Databases
No ratings yet
Issues of Privacy and Databases
15 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
Big Data
No ratings yet
Big Data
76 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
42 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
Bda 1
No ratings yet
Bda 1
26 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Huawei
No ratings yet
Huawei
29 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
Unit I
No ratings yet
Unit I
25 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
BigData UNIT-1
No ratings yet
BigData UNIT-1
19 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
17 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
BDA Notes Part 1
No ratings yet
BDA Notes Part 1
11 pages
What Is Data
No ratings yet
What Is Data
20 pages
BD 1
No ratings yet
BD 1
15 pages
Bda Unit I LM
No ratings yet
Bda Unit I LM
14 pages
Big Data
No ratings yet
Big Data
10 pages
Bigdata
No ratings yet
Bigdata
12 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Anoverviewon Big Dataand Hadoop
No ratings yet
Anoverviewon Big Dataand Hadoop
8 pages
Evolution of Big Data and Tools For Big Data
No ratings yet
Evolution of Big Data and Tools For Big Data
9 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Thesis Computer Science Topics
100% (2)
Thesis Computer Science Topics
4 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
33 pages
OCI Foundations Practice Exam (1Z0-1085) Flashcards - Quizlet
No ratings yet
OCI Foundations Practice Exam (1Z0-1085) Flashcards - Quizlet
42 pages
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
No ratings yet
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
13 pages
BD Unit 1
No ratings yet
BD Unit 1
5 pages
Big Data Analytics
No ratings yet
Big Data Analytics
7 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
Module 1-NCS
No ratings yet
Module 1-NCS
18 pages
Research Report Sample Format Edited
No ratings yet
Research Report Sample Format Edited
13 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
A Survey On Big Data Applications and Challenges
No ratings yet
A Survey On Big Data Applications and Challenges
4 pages
US Senate: A "Kill Chain" Analysis of The 2013 Target Data Breach
No ratings yet
US Senate: A "Kill Chain" Analysis of The 2013 Target Data Breach
18 pages
Big Data Security Issues
No ratings yet
Big Data Security Issues
7 pages
What Is Virtualization
No ratings yet
What Is Virtualization
7 pages
MNFST
No ratings yet
MNFST
8 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Hany Shaalan Resume
No ratings yet
Hany Shaalan Resume
6 pages
BCC (IEEE Format) Big Data
No ratings yet
BCC (IEEE Format) Big Data
2 pages
Quiz Software Management System
No ratings yet
Quiz Software Management System
24 pages
Big Data: Concepts, Techniques, Storage and Challenges
No ratings yet
Big Data: Concepts, Techniques, Storage and Challenges
9 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Arora 2016
No ratings yet
Arora 2016
6 pages
#1) - Introduction To Computer Fundamentals - Questions
No ratings yet
#1) - Introduction To Computer Fundamentals - Questions
2 pages
Unit 1 - Introduction To System Administration
No ratings yet
Unit 1 - Introduction To System Administration
30 pages
NFA's Process of Identifying Its Technology Strength, Weakness, and Needs
No ratings yet
NFA's Process of Identifying Its Technology Strength, Weakness, and Needs
7 pages
Big Data Analytics: Recent Achievements and New Challenges
No ratings yet
Big Data Analytics: Recent Achievements and New Challenges
5 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Cyberark Privilege Cloud Security Overview
No ratings yet
Cyberark Privilege Cloud Security Overview
4 pages
Paras UiPath
No ratings yet
Paras UiPath
6 pages
Smart Campus Parking Parking Made Easy
No ratings yet
Smart Campus Parking Parking Made Easy
14 pages
CAMPUS VIRTUAL Help
No ratings yet
CAMPUS VIRTUAL Help
14 pages
Sidhant Mehta Se Final 2.0
No ratings yet
Sidhant Mehta Se Final 2.0
42 pages
Chapter 3 Service Oriented Architectures
No ratings yet
Chapter 3 Service Oriented Architectures
26 pages
Cloud Computing Unit 5
No ratings yet
Cloud Computing Unit 5
45 pages
Capability and Strength of Computer
No ratings yet
Capability and Strength of Computer
6 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Aditya 24
No ratings yet
Aditya 24
1 page
Hands-On Practice - Create A Connected System
No ratings yet
Hands-On Practice - Create A Connected System
5 pages
Equnix Business Solutions - Google Search
No ratings yet
Equnix Business Solutions - Google Search
1 page
Gpon Olt Main
No ratings yet
Gpon Olt Main
48 pages

Big Data With Cloud Computing Discussions and Challenges

Uploaded by

Big Data With Cloud Computing Discussions and Challenges

Uploaded by

BIG DATA MINING AND ANALYTICS

ISSN 2096-0654 03/06 pp 32 – 40

Big Data with Cloud Computing: Discussions and Challenges

Key words: big data; data analysis; cloud computing; Hadoop

1 Introduction to store reliable and accurate results for big data[3] . In

Table 1 Definitions of big data.

Fig. 2 Types of big data.

Table 4 Types of big data.

Fig. 4 Big data and cloud computing.

References [18] R. Nachiappan, B. Javadi, R. N. Calheiros, and K. M.

Amanpreet Kaur Sandhu received the

You might also like