0% found this document useful (0 votes)
31 views

Big Data With Cloud Computing Discussions and Challenges

Uploaded by

Die Perper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Big Data With Cloud Computing Discussions and Challenges

Uploaded by

Die Perper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

BIG DATA MINING AND ANALYTICS

ISSN 2096-0654 03/06 pp 32 – 40


Vo l u m e 5 , N u m b e r 1, M a r c h 2 0 2 2
DOI: 10.26599/BDMA.2021.9020016

Big Data with Cloud Computing: Discussions and Challenges


Amanpreet Kaur Sandhu

Abstract: With the recent advancements in computer technologies, the amount of data available is increasing
day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing
services provide a powerful environment to store large volumes of data. They eliminate various requirements,
such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a
time-consuming task that requires large computational clusters to ensure successful data storage and processing.
In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud
services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud,
Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed.
Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and
data visualization.

Key words: big data; data analysis; cloud computing; Hadoop

1 Introduction to store reliable and accurate results for big data[3] . In


addition, big data require state-of-the-art technology to
With recent technological advancements, the amount of
efficiently store and process large amounts of data within
data available is increasing day by day. For example,
a limited run time.
sensor networks and social networking sites generate
Three different types of big data platforms are
overwhelming flows of data. In other words, big data
interactive analysis tools, stream processing tools, and
are produced from multiple sources in different formats
batch processing tools[4] . Interactive analysis tools are
at very high speeds[1] . At present, big data represent an
used to process data in interactive environments and
important research area. Big data are rapidly produced
interact with real-time data. Apache Drill and Google’s
and are thus difficult to store, process, or manage using
Dremel are the frameworks for storing real-time data.
traditional software. Big data technologies are tools
Stream processing tools are used to store information
that are capable of storing meaningful information in
in continuous flow[5] . The main platforms for storing
different types of formats. For the purpose of meeting
streaming information are S4 and Strom. Hadoop
users’ requirements and analyzing and storing complex
infrastructure is utilized to store information in batches.
data, a number of analytical frameworks have been
Big data techniques are involved in various disciplines,
made available to aid users in analyzing complex
such as signal processing, statistics, visualization,
structured and unstructured data[2] . Several programs,
social network analysis, neural networks, and data
models, technologies, hardware, and software have been
mining[6] . Mohajer et al.[7] designed an interactive
proposed and designed to access the information from
gradient algorithm that receives controlled messages
big data. The main objective of these technologies is
from neighboring nodes. The proposed method uses
 Amanpreet Kaur Sandhu is with University Institute of a self-optimization framework for big data.
Computing, Chandigarh University, Mohali 140413, India.
E-mail: [email protected]. 2 Definitions of Big Data
* To whom correspondence should be addressed.
Manuscript received: 2021-06-11; revised: 2021-09-12;
Big data are huge in size and are difficult to manage
accepted: 2021-09-13 and analyze relative to traditional data. Storing big data
C The author(s) 2022. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 33

requires scalable architecture and efficient storage and Table 2 Units of data.
manipulation. Table 1 presents the existing definitions Name of unit Equals Size in bytes
of big data. Bit 1 or 0 1/8
Nibble 4 bits 1/2
2.1 Characteristics of big data Byte 8 bits 1
Big data are characterized by three Vs: volume, velocity, Kilobyte (KB) 1024 bytes 210
and variety. These characteristics were introduced by Megabyte (MB) 1024 KB 220
Gigabyte (GB) 1024 MB 230
Gartner to define the various challenges in big data[12] .
Terabyte (TB) 1024 GB 240
With new-generation architecture, data are now stored
Petabyte (PB) 1024 TB 250
in different types of formats; hence, the three Vs may be Exabyte (EB) 1024 PB 260
extended to five Vs, namely, volume, velocity, variety, Zettabyte (ZB) 1024 EB 270
value, and veracity[13] . Yottabyte (YB) 1024 ZB 280
(1) Volume: Data are generated by multiple sources
(sensors, social networks, smartphones, etc.) and Table 3 Users in India as of February 2021.
are continuously expanding. The Internet produces Application name Count Application name Count
global data in large increments. In 2012, approximately WhatsApp 53 Crore Instagram 21 Crore
YouTube 44.8 Crore Twitter 1.75 Crore
2.5 exabytes (EB) of data were produced every
Facebook 41 Crore
day. According to the report of International Data
Cooperation, the volume of data in 2013 doubled,
reaching 4.4 zettabytes (ZB). In 2020, the volume of
data reached 40 ZB. Table 2 shows the names of the
units of data that can be measured in bytes[14] .
(2) Velocity: Data are exponentially growing at high
speeds. Millions of connected devices are added on
a daily basis, thereby leading to increases in not only
volume but also velocity[15, 16] . One relevant example is
YouTube, which generates big data at high speeds[17, 18] .
Table 3 presents the number of users in India who had
used social media networks by February 2021. Figure 1

Table 1 Definitions of big data.


Reference Author’s name Definition
Big data are massive in size and
cannot fit into Excel spreadsheets Fig. 1 Five Vs of big data.
[8] Batty
comprising approximately 16 000
shows the five Vs of big data.
columns and 1 million rows.
Big data cannot be loaded into
(3) Variety: Data are generated in multiple formats
[9] Havens et al. local storage devices (computer via social networks, smartphones, or sensors. These tools
memory). produce data in the form of data logs, images, videos,
Big data cannot be easily audio, documents, and text. Data may also be structured,
[10] Fisher et al. processed and managed in a semistructured, and unstructured[19] .
straightforward manner. (4) Value: Value is an important characteristic of
Big data have several
big data. It relates to how data can be dealt with and
The State Council characteristics, such as high
[11] of People’s application value, fast access converted into meaningful information[20] .
Republic of China speed, large volume, and multiple (5) Veracity: Veracity refers to the quality,
types. correctness, and trustworthiness of data. Therefore,
Big data have large volume, maintaining veracity in data is mandatory[21, 22] . For
variety, and velocity that demand example, data in huge amounts create confusion,
[12] Bayer and Laney
cost effectiveness and are helpful
whereas small amounts of data can convey incomplete
in decision making.
or half information.
34 Big Data Mining and Analytics, March 2022, 5(1): 32–40

2.2 Types of big data and fetching values from hidden streams and datasets.
Data are produced at unprecedented rates from various Traditional data mining techniques are related to
sources, such as financial, government, health, and clustering, association rule mining, accuracy, scalability,
social networks. Such rapid growth of data can be and classification, whereas big data are related to
attributed to smart devices, the Internet of Things, dynamic environments[36, 37] .
etc. In the last decades, companies have failed to 3.2 Deep learning
store data efficiently and for long periods[23, 24] . This
Deep learning is another important aspect in the
drawback relates to traditional technologies that lack
field of machine learning and pattern recognition. It
adequate storage capacity and are costly. Meanwhile, big
allows predictive analysis and involves natural language
data require new storage methods backed by powerful
processing, speech recognition, and computer vision.
technologies[25, 26] . Big data can be classified into several
The application of deep learning is to resolve the issues
categories. Figure 2 depicts the classification of big data.
in data analysis and help extract complex datasets
Table 4 summarizes the definitions of various types of
from huge amounts of data. Deep learning is called
big data.
hierarchical learning because it extracts data from
3 Big Data with Machine Learning complex datasets at different levels. It is very helpful
in the analysis of large volumes of data, information
The main function of machine learning techniques retrieval, data tagging, and discrimination tasks (e.g.,
is to discover knowledge and make intelligent prediction and classification)[38] .
decisions. Machine learning is used in various real-
world applications, such as data mining, recognition 4 Cloud Computing
systems, recommendation engines, and autonomous
control systems. The machine learning domain can be Cloud computing offers a cost-efficient and scalable
divided into three areas, namely, supervised learning, solution to store big data. According to the National
unsupervised learning, and reinforcement learning[35] . Institute for Standards and Technology, “Cloud
Computing is based on pay-per-use services for enabling
3.1 Data streaming learning convenient, on-demand network access to a shared pool
Various real-time world technologies, such as of configurable computing resources such as servers,
stock management, network traffic, and credit card networks, and services that can be rapidly provisioned
transactions, generate huge datasets. Data mining and released with minimal management effort or service
plays an important role in finding interesting patterns provider interaction”. Cloud computing services can be

Fig. 2 Types of big data.


Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 35

Table 4 Types of big data.


Type Category Explanation
Social media represents an important aspect of big data. Facebook, Twitter, emails, and microblogs are
Social media
social media sources that generate massive amounts of data daily[27] .
Machine- Software and hardware, such as medical devices, computers, and other types of machines that generate
Data generated data data without human interferences.
source Sensing Various types of sensing devices that generate data and convert them into signals[28] .
Transaction Financial, business, and work data generate time-based dimensions that define data.
Tablets, smartphones, and digital camera devices are connected over the Internet and thus generate huge
IoT
amounts of data and information.
Structured-data are in a consistent order with a well-defined format. The advantage of structured-data is
Structured-data that they are easy to maintain, access, and store on computers. Structured-data are stored in the form of
rows and columns; an example is a DataBase Management System (DBMS)[29] .
Content
Semi-structured data can be considered as another form of structured-data. It inherits a few properties
format Semi-structured
of structured-data that do not represent the data in database models. An example is Common Separated
source data
Value (CSV) files[30] .
Unstructured Unstructured data do not follow the formal structure rules of data models. Images, videos, text messages,
data and social media posts are examples of unstructured data.
Key value stores are used to store and access data in key/value pairs. They are basically designed to
Key value
store massive data and manage heavy loads. Apache HBase, Apache Cassandra, Redis, and Riak are
stores
examples of key value store databases[31] .
Graph stores are used to analyze data on the basis of the relationships between nodes, edges, and
Data Graph stores
properties. Neo4j is an example of a graph store.
store
Column family stores keep data and information within a column of a table at the same location on a
sources Column family
disk in the same way a row store keeps row data together. Google Bigtable is an example of column
stores
family stores.
Document- Document-oriented stores offer complex data forms in multiple formats, such as XML, JSON, text,
oriented store string, array, or binary forms. CouchDB and MongoDB are examples of document-oriented stores[32] .
Cleaning Cleaning is a process in which noisy data, outliers, and missing values are removed.
Data
Transformation In data transformation, data are transformed in an appropriate format for analysis.
staging
Normalization Normalization is a process used to reduce redundancies from data[33] .
Batch data MapReduce-based systems are used to process data in the form of batches. Apache Hadoop, Apache
Data processing Mahout, Skytree Server, and Dryad are examples of batch processing.
processing Real-time data Streaming systems, such as S4, are based on distributed frameworks that allow users to design
processing applications for processing continuous unbounded streams of data[34] .

classified into the following three categories[39] : the Internet, all applications are run on remote cloud
(1) Infrastructure as a Service (IaaS): These infrastructure in SaaS. To access SaaS services, users
services are basically based on the principle of “pay need an Internet connection and a web browser, such as
for what you need”. It provides high-performance Google Chrome or Internet Explorer[40] . Users connect
computing to customers. Amazon Web Services (AWS), to a desktop environment via a virtual machine, in which
Elastic Compute Cloud, and Simple Storage Services all software programs are installed. SaaS provides more
(S3) are examples of IaaS. AWS and S3 provide online facilities to users than IaaS.
storage services. At nominal charges, customers can (3) Platform as a Service (PaaS): It provides
easily access the world’s largest data centers. At present, a runtime environment to users. It allows users to
three companies provide IaaS landscape services: create, test, and run web applications. Users can
Google, Microsoft, and HP. Google provides Google easily access PaaS on the basis of the pay-per-use
Compute Engine to access IaaS services. Microsoft also mode using an Internet connection. PaaS provides the
provides a cloud platform through its Window Azure infrastructure (networking, storage, and services) and
Platform. HP offers HP Cloud, which is designed by platform (DBMS, business intelligence, middleware) for
NASA and Rack Space. running a web application life cycle. Examples of PaaS
(2) Software as a Service (SaaS): With the help of include Microsoft Azure and Google Cloud[41] .
36 Big Data Mining and Analytics, March 2022, 5(1): 32–40

The cloud computing environment has two important manner. Hadoop plays an important role in the storage
aspects: the frontend and the backend. From the frontend of distributed datasets on the cloud[44] . The Hadoop
side, users access cloud services through an Internet Distributed File System (HDFS) stores large amounts
connection; at the backend, all cloud services are run. of data in distributed form. The HDFS is a data storage
Figure 3 shows the various types of cloud computing management system in Hadoop. The advantage of the
services[42] . HDFS is that it is cost effective and capable of managing
Big data and cloud computing are closely associated. thousands of nodes in a cluster and massive amounts of
With technological changes, big data models provide unstructured data. It works as a batch processing system
distributed processing, parallel technologies, large with latency operations.
storage capacity, and real-time analysis of heterogeneous Moreover, Hadoop increases system performance and
databases. Data security and privacy are also considered avoids network congestion. The HDFS is based on
in big data models. Big data require large amounts master-slave architecture. The HFDS comprises various
of storage space and thus entail the use of cloud types of daemons, such as DataNode, NameNode,
computing. Cloud computing offers scalability and cost Secondary NameNode, Resource Manager, and Node
savings[43] . Moreover, it provides massive amounts of Manager. NameNode and Resource Manager work as
storage capacity and processing power. master nodes, while DataNode and Node Manager work
Cloud computing works on different types of as slave nodes. In addition, it avoids fault tolerance
technologies, such as distributed storage and with the help of data replication on various servers.
virtualization, and processes data for different The primary NameNode is used to solve computation
types of tasks. It accesses distributed queries over problems and establishes coordination with DataNode.
multiple datasets and gives responses in a timely Secondary NameNode manages the availability and
replication of data. The relationship between big data
and cloud computing is shown in Fig. 4.
Cloud computing provides various types of facilities,
such as processing, computation, and storage of big data.
Moreover, the cloud computing infrastructure offers an
efficient and effective platform to determine the storage
requirements of big data analysis. It is also correlated
with new patterns for the analysis of various types of
resources that are available in the cloud. Several cloud-
based technologies have been developed to deal with big
Fig. 3 Cloud computing services. data for parallel processing. MapReduce (MapR) is an

Fig. 4 Big data and cloud computing.


Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 37

example of big data processing in a cloud environment issues and threats, such as the availability of data,
that allows the storage of massive amounts of data in a confidentiality, real-time monitoring, identity and access
cluster[45] . authorization control, integrity, and privacy, exist in
In other words, MapR is an efficient and cost effective big data when used with cloud computing frameworks.
model for processing big data. The MapR framework Therefore, data security must be measured once data are
comprises the map and reduce functions for handling big outsourced to cloud service providers[48] .
data. (3) Heterogeneity: Big data are heterogeneous in
Cloud computing also plays an important role nature because data are gathered from multiple devices
in distributed system environments by facilitating in different formats, such as images, videos, audio, and
storage, boosting computing power, and aiding network text. Before loading data into a warehouse, they need to
communication. Big data technologies store data in be transformed and cleaned, and the processes present
cloud clusters rather than in local storage file systems.
challenges in big data[49] . Combining all unstructured
Several companies provide big data cloud platforms.
data and reconciling them for use in report creation are
Moreover, various cloud computing platforms are
incredibly difficult to achieve in real-time.
available to store big data. Table 5 shows a comparative
(4) Data processing and cleaning: Data storage and
analysis of big data cloud frameworks for storing
acquisition require preprocessing and cleaning, which
massive amounts of data[46] . Cloud services such
involves data merging, data filtering, data consistency,
as Microsoft Azure, Google Cloud, AWS, IBM,
and data optimization. Thus, processing and cleaning
Hortonworks, and MapR are compared on the basis of
data are difficult because of the wide variety of data
various parameters.
sources[50] . Moreover, data sources may contain noise
5 Research Issues in Big Data and errors, or they may be incomplete. The challenge is
how to clean large amounts of data and how to determine
As data are growing at exponential rates, a number
whether such data are reliable.
of issues and problems emerge during the processing
(5) Data visualization: Data visualization is a
and storage of big data. Few tools are available to
technique to represent complex data in a graphical form
resolve these issues and problems in a cloud environment.
for clear understanding. If the data are structured, then
Technologies, such as PigLatin, Dryad, MongoDB,
they can be easily represented in the traditional graphical
Cassandra, and MapR, are not able to resolve these issues
way. If the data are unstructured or semistructured, then
in big data processing. Even with the help of Hadoop and
they are difficult to visualize with high diversity in real-
MapR, users cannot execute queries on databases, and
time.
they have low-level infrastructures for data processing
and management. Some issues and problems in big data 6 Conclusion
are summarized as follows[47] :
(1) Distributed database storage system: In the last decades, the size of data has grown, and it
Numerous technologies are used to store and retrieve continues to increase day by day. Data are generated
huge amounts of data. Cloud computing is an important in different formats (variety) by multiple sources.
aspect of big data. Big data are generated by multiple Therefore, the variety of data is also expanding. Mobile
devices on a daily basis. At present, the main issue devices and sensor networks that are connected generate
in distributed frameworks is the storage of data in data at very high speeds (velocity). Cloud computing
a straightforward manner and the processing and services are used to process, analyze, and store data
migration of data between distributed servers. without the need for a dedicated space and maintenance
(2) Data security: Security threats are an important of expensive computer hardware and software. This
issue in a cloud computing environment. Cloud study reviews the relationship between big data and
computing has been transformed with modern cloud computing. Furthermore, a comparative analysis
information and communication technologies, and of big data and cloud services is performed. Big
several types of unresolved security threats exist in big data involve various issues and problems, such as
data. Data security threats are magnified by the variety, distributed database storage, data privacy/security, and
velocity, and volume of big data. Meanwhile, various heterogeneity/data formats.
38
Table 5 Big data cloud frameworks.
Microsoft Azure Google Cloud AWS IBM Hortonworks MapR
Founding date Oct. 2008 Oct. 2015 2006 2011 June 2011 2009
Provides Azure
Big data analytics Provides Google Cloud Provides Amazon Elastic search Provides various IBM Provides Hortonworks Provides MapR data
HDInsight services
Dataproc services analysis engines Data Platform (HDP) analysis platform
Types of software Open-source Open-source Open-source framework Open-source Open-source Licensed
framework framework framework framework
Structured, Structured, Structured,
Content format Only unstructured data semistructured, Structured, semistructured, and Only unstructured data semistructured, and semistructured, and
format and unstructured unstructured format unstructured unstructured
Windows Server, CentOS, RedHat,
Types of OS supported Debian 8 Linux, Ubuntu, CentOS Only CentOS7 Linux, Ubuntu
Ubuntu14 Ubuntu
Access machine
Real-time, logs Real-time, logs
Access batch and learning, streaming, Real-time and stream
Various applications analytics, and Data analytics analytics, and
stream processes and batch process analytics
stream analytics stream analytics
application
HDFS, YARN,
Framework execution Hadoop Big Query Elastic MapReduce Elastic MapReduce MapR
MapReduce2
Big data storage Hortonworks Data
Microsoft Azure Google Cloud Services S3 IBM Cloud Object MapR Data Analysis
framework Platform
Storage Platform
Types of storage Distributed Distributed Distributed Distributed Centralized Distributed
Content format XML JSON, CSV Any Any Any Any
Storage size limit Limited Limited Unlimited Unlimited Limited Unlimited
Metadata Yes Yes Yes Yes No Yes
Relational database PostgreSQL, Oracle,
SQL Azure Cloud SQL Oracle or MySQL SQL SQL
management system MySQL
SQL Azure Data Data Warehouse
Big data warehouse Big Query Amazon Red Shift Db2 warehouse HIVE warehouse
Warehouse Optimization (DWO)
Content format ORC, RC, Parquet JSON, CSV ORC, CSV, TSV CSV ORC JSON
NoSQL (Not only SQL) Stored in table format AppEngine data store DynamoDB Apache Accumulo MongoDB MongoDB
database system
Search application Storm, Spark
Streaming processing SteamInsight Nothing prepackaged Apache Spark Apache Spark
programming interface Streaming, and Flink
Prediction application Hortonworks Data HPE Intelligent Data
Machine learning Mahout Mahout Hadoop
programming interface Platform Platform
Big Data Mining and Analytics, March 2022, 5(1): 32–40
Amanpreet Kaur Sandhu: Big Data with Cloud Computing: Discussions and Challenges 39

References [18] R. Nachiappan, B. Javadi, R. N. Calheiros, and K. M.


Matawie, Cloud storage reliability for Big Data applications:
[1] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A.
A state of the art survey, J. Netw. Comput. Appl., vol. 97,
Gani, and S. U. Khan, The rise of ‘big data’ on cloud
pp. 35–47, 2017.
computing: Review and open research issues, Inform. Syst., [19] A. O’Driscoll, J. Daugelaite, and R. D. Sleator, ‘Big data’,
vol. 47, pp. 98–115, 2015. Hadoop and cloud computing in genomics, J. Biomed.
[2] J. H. Yu and Z. M. Zhou, Components and development in
Inform., vol. 46, no. 5, pp. 774–781, 2013.
big data system: A survey, J. Electr. Sci. Technol., vol. 17, [20] S. Karimian-Aliabadi, D. Ardagna, R. Entezari-Maleki,
no. 1, pp. 51–72, 2019.
E. Gianniti, and A. Movaghar, Analytical composite
[3] S. Kumar and K. K. Mohbey, A review on big data
performance models for Big Data applications, J. Netw.
based parallel and distributed approaches of pattern
Comput. Appl., vol. 142, pp. 63–75, 2019.
mining, J. King Saud Univ. – Comput. Inform. Sci., doi:
[21] H. F. Yu, A priori algorithm optimization based on Spark
10.1016/j.jksuci.2019.09.006.
platform under big data, Microprocess. Microsyst., vol. 80,
[4] Y. N. Liu, N. Li, X. Zhu, and Y. Qi, How wide is the
p. 103528, 2021.
application of genetic big data in biomedicine, Biomed.
[22] M. Muniswamaiah, T. Agerwala, and C. Tappert, Big data
Pharmacother., vol. 133, p. 111074, 2021.
in cloud computing review and opportunities, Int. J. Comput.
[5] V. Subramaniyaswamy, V. Vijayakumar, R. Logesh, and V.
Sci. Inform. Technol., vol. 11, no. 4, pp. 43–57, 2019.
Indragandhi, Unstructured data analysis on big data using
[23] T. Cherian and H. Bhadkamkar, A study and survey of big
map reduce, Procedia Comput. Sci., vol. 50, pp. 456–465,
data using data mining techniques, Int. J. Eng. Sci. Res.
2015.
Technol., vol. 6, no. 10, pp. 169–174, 2017.
[6] S. Maitrey and C. K. Jha, MapReduce: Simplified data
[24] A. Mehmood, I. Natgunanathan, Y. Xiang, G. Hua, and S.
analysis of big data, Procedia Comput. Sci., vol. 57, pp.
Guo, Protection of big data privacy, IEEE Access, vol. 4,
563–571, 2015.
[7] A. Mohajer, M. Barari, and H. Zarrabi, Big data pp. 1821–1834, 2016.
[25] S. Kumar and M. Singh, Big data analytics for healthcare
based self-optimization networking: A novel approach
industry: Impact, applications, and tools, Big Data Mining
beyond cognition, Intell. Automat. Soft Comput., doi:
Analytics, vol. 2, no. 1, pp. 48–57, 2019.
10.1080/10798587.2017.1312893.
[26] S. Kumar and M. Singh, A novel clustering technique for
[8] M. Batty, Big data, smart cities and city planning, Dialogues
efficient clustering of big data in Hadoop Ecosystem, Big
in Human Geography, vol. 3, no. 3, pp. 274–279, 2013.
[9] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and Data Mining Analytics, vol. 2, no. 4, pp. 240–247, 2019.
[27] C. K. Leung, Y. B. Chen, S. Y. Shang, and D. Y. Deng, Big
M. Palaniswami, Fuzzy c-means algorithms for very large,
data science on COVID-19 data, in Proc. of 2020 IEEE 14th
IEEE Trans. Fuzzy Syst., vol. 20, no. 6, pp. 1130–1146,
2012. Int. Conf. Big Data Science and Engineering, Guangzhou,
[10] D. Fisher, R. Deline, M. Czerwinski, and S. Drucker, China, 2020, pp. 14–21.
Interactions with big data analytics, Interactions, vol. 19, [28] M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and
no. 3, pp. 50–59, 2012. K. Sadatdiynov, A survey of data partitioning and sampling
[11] The State Council of the People’s Republic of China, methods to support big data analysis, Big Data Mining
Action plan for promoting big data development, Analytics, vol. 3, no. 2, pp. 85–101, 2020.
(in Chinese), http://www.gov.cn/zhengce/content/2015- [29] S. Aslam and M. A. Shah, Load balancing algorithms in
09/05/content 10137.htm, 2015. cloud computing: A survey of modern techniques, in Proc.
[12] M. A. Beyer and D. Laney, The importance of ‘big data’: A of 2020 IEEE National Software Engineering Conference,
definition, Stamford, CT, USA: Gartner, G00235055, 2012. doi: 10.1109/NSEC.2015.7396341.
[13] L. Rabhi, N. Falih, A. Afraites, and B. Bouikhalene, Big [30] D. A. Shafiq, N. Z. Jhanjhi, and A. Abdullah, vLoad
data approach and its applications in various fields: Review, balancing techniques in cloud computing environment: A
Procedia Comput. Sci., vol. 155, pp. 599–605, 2019. review, Journal of King Saud University-Computer and
[14] F. Ridzuan and W. M. N. Wan Zainon, A review on data Information Sciences, doi: 10.1016/j.jksuci.2021.02.007.
cleansing methods for big data, Procedia Comput. Sci., vol. [31] A. Oussous, F. Z. Benjelloun, A. Ait Lahcen, and S. Belfkih,
161, pp. 731–738, 2019. Big data technologies: A survey, J. King Saud Univ. –
[15] D. A. Shafiq, N. Z. Jhanjhi, and A. Abdullah, Load Comput. Inform. Sci., vol. 30, no. 4, pp. 431–448, 2018.
balancing techniques in cloud computing environment: A [32] R. Misra, B. Panda, and M. Tiwary, Big data
review, J. King Saud Univ. – Comput. Inform. Sci., doi: and ICT applications: A study, in Proc. 2nd
10.1016/j.jksuci.2021.02.007. International Conference on Information and
[16] S. Amamou, Z. Trifa, and M. Khmakhem, Data protection Communication Technology for Competitive Strategies,
in cloud computing: A survey of the state-of-art, Procedia https://doi.org/10.1145/2905055.2905099, 2016.
Comput. Sci., vol. 159, pp. 155–161, 2019. [33] B. Saraladevi, N. Pazhaniraja, P. V. Paul, M. S. S. Basha,
[17] P. J. Sun, Security and privacy protection in cloud and P. Dhavachelvan, Big data and Hadoop–A study in
computing: Discussions and challenges, J. Netw. Comput. security perspective, Procedia Computer Science, vol. 50,
Appl., vol. 160, p. 102642, 2020. pp. 596–601, 2015.
40 Big Data Mining and Analytics, March 2022, 5(1): 32–40

[34] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. [43] C. L. Philip Chen and C. Y. Zhang, Data-intensive
Zomaya, S. Foufou, and A. Bouras, A survey of clustering applications, challenges, techniques and technologies: A
algorithms for Big Data: Taxonomy and empirical analysis, survey on Big Data, Inform. Sci., vol. 275, pp. 314–347,
IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 267– 2014.
279, 2014. [44] S. Salloum, J. Z. Huang, and Y. He, Random sample
[35] A. Katal, M. Wazid, and R. H. Goudar, Big data: Issues, partition: A distributed data model for big data analysis,
challenges, tools and good practices, in Proc. 6th Int. Conf. IEEE Trans. Ind. Inform., vol. 15, no. 11, pp. 5846–5854,
Contemporary Computing, Noida, India, 2013, pp. 404– 2019.
409. [45] L. Q. Kong, Z. F. Liu, and J. G. Wu, A systematic review of
[36] C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, Big big data-based urban sustainability research: state-of-the-
data analytics: A survey, J. Big Data, vol. 2, no. 1, pp. 1–32, science and future directions, J. Clean. Prod., vol. 273, p.
2015. 123142, 2020.
[37] Z. Lv, H. Song, P. Basanta-Val, A. Steed, and M. Jo, Next- [46] P. Pääkkönen and D. Pakkala, Reference architecture and
generation big data analytics: State of the art, challenges, classification of technologies, products and services for big
and future research topics, IEEE Trans. Ind. Informatics, data systems, Big Data Res., vol. 2, no. 4, pp. 166–186,
vol. 13, no. 4, pp. 1891–1899, 2017. 2015.
[38] K. S. Jadon, R. S. Bhadoria, and G. S. Tomar, A review [47] M. Wook, N. A. Hasbullah, N. M. Zainudin, Z. Z. A. Jabar,
on costing issues in big data analytics, in Proc. 2015 S. Ramli, N. A. M. Razali, and N. M. M. Yusop, Exploring
International Conference on Computational Intelligence big data traits and data quality dimensions for big data
and Communication Networks (CICN), Jabalpur, India, analytics application using partial least squares structural
2016, pp. 727–730. equation modelling, J. Big Data, vol. 8, no. 1, pp. 1–15,
[39] O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat, G. K. 2021.
Karagiannidis, and K. Taha, Efficient machine learning [48] S. Saif and S. Wazir, Performance analysis of big data and
for big data: A review, Big Data Res., vol. 2, no. 3, pp. cloud computing techniques: A survey, Procedia Comput.
87–93, 2015. Sci., vol. 132, pp. 118–127, 2018.
[40] G. S. Bhathal and A. Singh, Big Data: Hadoop framework [49] S. M. Shamsuddin and S. Hasan, Data science vs. big data
vulnerabilities, security issues and attacks, Array, vols. 1&2, @ UTM big data centre, in Proc. of 2015 IEEE Int. Conf.
p. 100002, 2019. Science in Information Technology, Yogyakarta, Indonesia,
[41] J. Hurwitz, A. Nugent, F. Halper, and M. Kaufman, Big 2015, pp. 1–4.
Data for Dummies. Hoboken, NJ, USA: John Wiley & Sons, [50] T. Y. Yang and Y. Zhao, Application of cloud computing
Inc., 2013. in biomedicine big data analysis cloud computing in
[42] V. P. Lalitha, M. Y. Sagar, S. Sharanappa, S. Hanji, and big data, in Proc. of the 2017 Int. Conf. Algorithms,
R. Swarup, Data security in cloud, in Proc. of 2017 Int. Methodology, Models and Applications in Emerging
Conf. Energy, Communication, Data Analytics and Soft Technologies (ICAMMAET), Chennai, India, 2017, pp. 1–3.
Computing, Chennai, India, pp. 3604–3608, 2017.

Amanpreet Kaur Sandhu received the


PhD degree from IKG Punjab Technical
University, India in 2018. She is currently
an assistant professor at University Institute
of Computing, Chandigarh University,
Mohali, India. Her research interests
include image and video processing and Big
Data analytics.

You might also like