Mobile Information Systems - 2021 - Chang - Cloud Computing Storage Backup and Recovery Strategy Based On Secure IoT and
Mobile Information Systems - 2021 - Chang - Cloud Computing Storage Backup and Recovery Strategy Based On Secure IoT and
Research Article
Cloud Computing Storage Backup and Recovery Strategy Based on
Secure IoT and Spark
Received 21 July 2021; Revised 30 September 2021; Accepted 29 October 2021; Published 23 November 2021
Copyright © 2021 Dajun Chang et al. +is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Spatial data occupies a large proportion of the large amount of data that is constantly emerging, but a large amount of spatial data
cannot be directly understood by people. Even a highly configured stand-alone computing device can hardly meet the needs of
visualization processing. In order to protect the security of data and facilitate for users the search for data and recover by mistake,
this paper conducts a research on cloud computing storage backup and recovery strategies based on the secure Internet of +ings
and Spark platform. In the method part, this article introduces the security Internet of +ings, Spark, and cloud computing backup
and recovery related content and proposes cluster analysis and Ullman two algorithms. In the experimental part, this article
explains the experimental environment and experimental objects and designs an experiment for data recovery. In the analysis part,
this article analyzes the challenge-response-verification framework, the number of data packets, the cost of calculation and
communication, the choice of Spark method, the throughput of different platforms, and the iteration and cache analysis. +e
experimental results show that the loss rate of database 1 in the fourth node is 0.4%, 2.4%, 1.6%, and 3.2% and the loss rate of each
node is less than 5%, indicating that the system can respond to applications.
Based on the secure IoT and Spark cloud computing built into the Spark platform and provides data source
storage backup and recovery strategies, many scholars at support at an interactive speed of several orders of mag-
home and abroad have conducted related research. Kumar nitude faster than alternative solutions, while having min-
believes that the Internet of +ings is an emerging tech- imal impact on Spark job performance; the observed
nology that can connect everyday objects to the Internet. +e overhead for capturing data lineage rarely exceeds that of the
Internet of +ings technology does provide an interface for baseline job 30% of execution time. +e author conducted
different technologies. New applications can be realized with program debugging for Spark but did not compare these
the help of embedded physical devices with intelligent platforms with other platforms [3].
thinking capabilities, playing an important role in con- +is paper proposes cloud computing storage backup and
necting to the Internet. +e IoT gateway must be smart recovery strategies based on the secure Internet of +ings and
enough to perform the collected operations based on their Spark and conducts related research. In the method part, this
respective applications. +e author proposed the gateway Pi, article introduces the security Internet of +ings, Spark, and
an IoTsmart security gateway framework integrated with the cloud computing backup and recovery related content and
Raspberry Pi board. +is proposal does make the IoT proposes cluster analysis and Ullman two algorithms. In the
gateway a smart thing, and it runs like a normal PC. In experimental part, this article explains the experimental en-
addition to native gateway functions, this article also em- vironment and experimental objects and designs an experi-
phasizes the security of IoT gateways. +e author proposed ment for data recovery. In the analysis part, this article analyzes
three measures to provide security for the IoT gateway, the challenge-response-verification framework, the number of
making the gateway a firewall and using gateway Pi to data packets, the cost of calculation and communication, the
implement a cost-effective and reliable IoT architecture for choice of Spark method, the throughput of different platforms,
smart irrigation. In this study, the author conducted re- and the iteration and cache analysis. +e innovation of this
search on IoT gateways, aiming at improving the security of article is to combine the secure Internet of +ings with Spark
the Internet of +ings, but the author did not draw the and use these two technologies to study storage backup and
relevant framework diagram [1]. Mary A A believes that, in recovery strategies based on cloud computing, so as to max-
cloud computing, from the perspective of reliable storage of imize the value of data.
sensitive data and storage service quality, the storage of
massive amounts of information is a very challenging task.
Among different cloud security issues, data disaster toler- 2. Cloud Computing Storage Backup and
ance is the most critical issue. +e motivation of recovery Recovery Strategy Method Based on Secure
technology is to help users collect data from any backup IoT and Spark
server when the server loses data and cannot provide data to
the user. To achieve this goal, many types of research have 2.1. Secure Internet of %ings. As a value-added application
developed different technologies. +erefore, the author in the information network, the Internet of +ings is also an
proposed a data disaster tolerance process using the op- extension of the special application of the communication
position group search optimizer (OGSO) algorithm, mainly network. +e development of the Internet of +ings in-
to avoid disasters in the cloud. +e proposed data recovery dustry involves three main elements. +e first is recogni-
process consists of four modules: (1) file upload module, (2) tion, which is a basic premise, the second is
copy generation module, (3) data backup module, and (4) communication, which is a very important support and
disaster recovery module. First, the author split the data into platform, and the third is application, which is the main
multiple files and uploaded the files to the corresponding goal and the ultimate goal, which fully demonstrates the
virtual machine using the OGSO algorithm. +en, a copy is Internet of +ings itself. In the development and appli-
generated based on the bandwidth of each file. +e copy is cation of the Internet of +ings, the requirements for
mainly used for data backup strategy. Finally, files based on technology are very high, and solutions are also an im-
user queries are backed up and retrieved based on copies. portant driving force for strengthening the Internet of
Experimental results show that the proposed OGSO-based +ings to achieve leapfrog development [4, 5].
data disaster recovery process is better than other methods. Figure 1 shows the basic framework of the Internet of
+e author conducted research on storage issues in cloud +ings, which includes a comprehensive application layer, a
computing but did not discuss security issues [2]. Interlandi network construction layer, a management service layer, and a
M debugging data processing logic in a data-intensive perception recognition layer. Perception recognition layer: the
scalable computing (DISC) system is a difficult and time- core technology of the Internet of +ings is perception tech-
consuming task. Today’s DISC system provides very few nology, which is the key center of communication with the
tools for debugging programs, so programmers spend physical world and the information world. +e detection layer
countless hours collecting evidence (e.g., from log files) and mainly includes automatic data acquisition equipment such as
performing trial and error debugging. To help with this radio frequency (RFID) and wireless sensors, as well as various
work, the author built Titian, a library that enables data intelligent electronic products dedicated to manual informa-
source tracking data through transformations in Apache tion [6]. Network construction layer: the key task of this layer is
Spark. Data scientists who use the Titian Spark extension will to connect the analysis and identification equipment of the
be able to quickly identify the underlying cause of potential lower layer to the Internet so that the upper layer can access the
errors or abnormal results in the input data. Titian is directly application. +e foundation of the Internet of +ings is the
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 3
Application
Service
Layer
Environmental Smart
Smart power Smart home
monitoring transportation
Distributed
data
processing
Network
transport
layer Data mining Cloud computing Mass storage
QR code
RFID
Sensor
Internet and the next-generation Internet. Various wireless 2.2. Spark. Spark is a fast and comprehensive large pro-
networks have been providing Internet services, relying on cessing machine. In the case of sufficient memory, Spark
powerful computers and mass storage technology to collect runs 100 times faster than Hadoop and MapReduce. Even if
various amounts of data [7]. Comprehensive application layer: the memory is insufficient, the flow to disk is 10 times faster.
with the continuous advancement of computers, online ap- +is is because Spark supports complex DAG drivers for
plications are also undergoing earthquake-like changes. File circuit data flow and memory. Spark is implemented in
transfer and e-mail are the keys to early data services. Since Scala. It combines the language features of object-oriented
then, this kind of data service has become more widely used in and functional programming. It can operate distributed data
user-centric network applications such as video images, online sets as easily as local collection objects. It has the charac-
games, and social network [8]. teristics of fast running speed, simple operation, strong
At the same time, due to the large number of terminal versatility, and good compatibility [10, 11].
nodes of the Internet of +ings, the Internet of +ings itself Spark can realize the comprehensive and unified
has some shortcomings, such as the interconnection and management of data sets such as text or graphs with
intercommunication of the Internet of +ings systems and different attributes and provides a computing architecture
communication through the network. Despite any security that can process real-time data streams like ordinary data.
measures, the system hardly provides any control and can With the help of the Spark computing framework, the
trigger various network attacks. In addition, the sensing computing speed of cluster applications has been sig-
nodes in the Internet of +ings also have low mobility [9]. nificantly improved. Before this, the computing speed of
+erefore, for these safety hazards, a secure Internet of computer programs on the Hadoop platform in memory
+ings is proposed. Figure 2 shows the safety supervision was only one percent of Spark, while the computing speed
service system. of programs on HDFS is only one-tenth [12, 13].
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Mobile Information Systems
Security Monitoring
Service
Data
collection
Data
analysis
Analysis
Unified
management
Figure 2: Security monitoring service (pictures from Baidu picture).
+e specific characteristics of Spark are as follows. (1) It key/value pairs generated by the Map function are stored in a
is fast. Sparks and Hadoop applications can run on Hadoop. buffer in memory. +e data storage location on the local disk
Sparks can run up to 100 times faster than memory. Even on is returned to the master node, which is responsible for
the disc the running speed has been increased by 10 times. transmitting the location information to the job node that
Intermediate data can be stored in the memory instead of the runs the mitigation job. +en, it reads the saved data from
disk, which reduces the time of rereading from the disk and the node that will execute the map task and then sorts the
improves the computational efficiency [14]. (2) Running data. When the master node determines that the Map and
Spark supports multiple languages to build applications, Reduce tasks of all nodes are completed, the master node
mainly Java, Scala, and Python, including Spark written in starts the user program and calls MapReduce in the user
Scala language, with more than 80 built-in high-level op- code to return to the editing process.
erators [15]. (3) Detailed analysis: it not only provides
computer functions similar to Map and Reduce, but also
provides functions such as drawing [16]. 2.3. Cloud Computing Storage Backup and Recovery. In cloud
In addition to Spark, there are also two architectures, computing, data storage operations are provided in the form
Hadoop and MapReduce. As a distributed system archi- of services, which makes the data security of cloud com-
tecture, Hadoop can be used to store and process massive puting have unique characteristics: (1) User data is stored in
amounts of data. It enables large databases to be processed the cloud server, and both upload and download need to go
through computer clusters using a simple programming through the network, which increases the transmission
model. Designed to grow from a single server to thousands process, the risk of data leakage in the medium. (2) +e data
of servers, each server provides a local computer and storage. is stored in a semitrusted third party; (3) cloud computing is
+e platform can be understood as a computer cluster based on a distributed network, and the computer servers are
operating system, and Spark and MapReduce are the only noded, and the user’s data is stored in a node in the network.
programming languages supported by this operating system Above, in theory, an attacker can access its surrounding
[17, 18]. HDFS is a derivative of the file system based on all nodes through a certain node through a certain method [20].
computer file systems. Table 1 is a comparison between As an extension of cloud computing and its derivative
Hadoop and Google for cloud computing systems. technology, cloud storage has also aroused great interest in
As shown in Figure 3, the process map of MapReduce is industry, academia, and even the government. Cloud storage
an important part of the Hadoop ecosystem. MapReduce is a is an emerging storage technology. Its core is to store and
high-performance parallel computing platform that forms a manage resources on a cloud platform, enabling people to
distributed and parallel computer cluster, containing tens, access data through the Internet in real time. +e world’s IT
hundreds, or even thousands of nodes [19]. giants Microsoft, Google, Amazon, and domestic companies
First, the MapReduce library of the user program splits such as Baidu, Ali, and Tencent have done a lot of research
the input file into multiple copies and then starts the on cloud storage and provide corresponding cloud storage
multiple copies of the program in the cluster. +e master platforms. Architecture diagram of mobile phone backup
node then selects inactive nodes from all working nodes and system based on cloud storage is shown in Figure 4.
assigns or reduces mapping tasks for them. After assigning +e redundancy of user data in the cloud storage system
the work node to the job, it starts to read the contents of the will increase the storage pressure of the cloud storage server,
corresponding input slice. +e content of the input slice is cause network transmission delays, and increase remote
divided into key/value pairs for each row. +e intermediate bandwidth pressure. In order to reduce the large amount of
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 5
Table 1: Comparison of Hadoop cloud computing system and +e value of each element in the dissimilarity matrix
Google cloud computing system. represents the difference between the two data objects,
including
Hadoop cloud computing Google cloud computing
system system 0 ... ... ... ...
⎡⎢⎢⎢ ⎤⎥⎥
Hadoop HDFS Google GFS ⎢⎢⎢ k(2, 1)
⎢⎢ . . . . . . . . . ⎥⎥⎥⎥⎥
0
Hadoop MapReduce Google MapReduce ⎥⎥⎥
⎢⎢⎢⎢
Hadoop HBase Google Bigtable Kt×t � ⎢⎢⎢ k(3, 1) k(3, 2) 0 . . . . . . ⎥⎥⎥⎥⎥, (3)
⎢⎢⎢ ⎥⎥⎥
Hadoop ZooKeeper Google Chubby
. . . . . . . . . ⎥⎥⎥⎥⎦
⎢⎢⎢ . . .
⎢⎣ ...
Hadoop Pig Google Sawzall
k(t, 1) k(t, 2) . . . . . . 0
redundant data in cloud storage servers and save storage where k(m, n) represents the quantized dissimilarity between
space and network bandwidth to the utmost extent, data objects m and n. Generally speaking, its value is a non-
deduplication technology has gradually become a hot re- negative number. +e closer the two objects are to 0, the
search topic in recent years. In addition, backup and data more similar the two objects are.
recovery of these existing data are also important. +e cloud
storage system puts pressure on it.
Compared with cloud computing, cloud storage’s se- 2.4.2. Similarity Measurement. +e distance between data
curity issues are more focused on data issues. Data distri- objects is commonly used to evaluate the similarity between
bution in the cloud node transmission process may cause objects. +e higher the similarity between the data objects in
security risks. When internal attacks or illegal operations by the cluster and the greater the difference between the data
employees take place, data leakage and loss may occur. objects and between the clusters, the better the j clustering
When the system is under attack, user information may be result. +e difference between data objects is usually mea-
leaked from it. Compared with traditional storage, the new sured by the distance between data objects. +e shorter the
features of cloud storage have brought many new security distance, the higher the similarity. Typical similarity mea-
issues, especially the need to ensure the confidentiality and sures are as follows:
security of stored data, as well as the integrity and availability Euclidean distance refers to the true distance between
of data [21, 22]. two points in m-dimensional space [24]. +e calculation
formula is
����������������������������������
2.4. Cluster Analysis. Clustering analysis is to divide a given 2 2 2
k(m, n) � ix1 − iy1 + ix2 − iy2 + · · · + ixt − iyt ,
data set into multiple classes or clusters. +e goal is that
objects in the same cluster have high similarity, and objects (4)
in different clusters have high dissimilarity [23].
Assuming that a data setB � b1 , b2 , . . . , bn , where ix and iy represent two t-dimensional data objects,
bm (m � 1, 2, . . . , n)is a data object, the data set is divided and a weight can be added to the attributes of each di-
into l subsets according to the similarity between the data mension. +e calculation formula is
�����������������������������������������
objects, and these subsets meet the following conditions: 2 2 2
k(m, n) � V1 ix1 − iy1 + V2 ix2 − iy2 + · · · + Vn ixt − iyt ,
⎪
⎧
⎪ Am ≠ ∅, m � 1, 2, . . . , l,
⎪
⎨ l (5)
⎪ ∪ A � I, m � 1, 2, . . . , l, (1)
⎪ m�1 m
⎪ where V1 , V2 , andVn are the weight of each dimension
⎩ A ∩ A � ∅, m ≠ n, m, n � 1, 2, . . . , l.
m n attribute.
Manhattan distance is used to describe the average
difference of objects in each dimension in a multidimen-
2.4.1. Data Structure. Clustering is a hierarchical cluster, sional space [25], and its calculation formula is
that is, a collection of nested clusters similar to a tree
structure, and a data matrix is obtained through structured k(m, n) � ix1 − iy1 + ix2 − iy2 + · · · + ixt − iyt . (6)
data storage. +e data matrix represents the attribute values
Minkowski is a generalization of Euclidean distance, and
of all data objects in the data set, such as
Euclidean distance is a special case of Minkowski distance.
I1a . . . I1b . . . I1c q q q (1/q)
⎢
⎡ ⎤⎥⎥
⎢
⎢
⎢
⎢ . . . . . . . . . . . . ⎥⎥⎥⎥⎥ k(m, n) � ix1 − iy1 + ix2 − iy2 + · · · + ixt − iyt ,
⎢...
⎢
⎢ ⎥⎥⎥
⎢
⎢
It×q �⎢
⎢
⎢
⎢
⎢ Ixa . . . Ixb . . . Ixc ⎥⎥⎥⎥⎥, (2) (7)
⎢
⎢ ⎥⎥⎥
⎢
⎢
⎢
⎣...
⎢
⎢ . . . . . . . . . . . . ⎥⎥⎥⎥⎦ where q is a positive integer, and when the parameter q is 1,
Ita . . . Itb . . . Itc Min’s distance is converted to Manhattan distance, and
when q is 2, it is converted to Euclidean distance.
where t indicates that there are t data objects in the data set, In addition to using distance for similarity measurement,
and each object has q different attributes. the similarity coefficient can also be used as the unit of
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Mobile Information Systems
User
Program
fork
fork
fork
Master
Assign Assign
reduce reduce
Worker
Split 0 write Output
Worker File 0
Split 1
Local write Remote read
read Worker
Split 2
Output
Split 3 Worker fiu 1
Split 4
Worker
Service
Distributed file system Distributed database
deployment
Figure 4: Architecture diagram of mobile phone backup system based on cloud storage (pictures from Baidu picture).
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Mobile Information Systems
the system needs to manage; the management layer is the core 4.2. Number of Data Packets. After starting the test, you can
of the system, and the management layer includes backup compare the number of data packets received by each node
contacts, backup computers, and authentication centers; the of the remote monitoring software. +e test results are
service layer represents server resources that provide storage. shown in Table 4. +e packet loss rate is equal to the number
Among the three practical layers of the backup system, the of packets sent minus the number of packets received di-
management layer is the core of the business management logic vided by the number of packets sent.
[27]. +e client software with backup statements distributed From the data in Table 4, it can be seen that the data
online is the client layer. Clients generally perform data backup packet received by the monitoring software is smaller than
operations, including backup plans, backup and test settings, the data packet sent by the monitoring node, indicating that
and environmental settings recovery; backup and recovery this is a packet loss phenomenon in the data transmission
security measures, including data encryption and decryption, process. +is phenomenon is largely due to the large ir-
user certificate application, management and use; and can- regularities in the communication unit of the low-power
cellation of the recovery process. +e service layer can be unit. +e results in Table 1 show that the loss rate of database
implemented with one or more duplicate servers. +e storage 1 at the fourth node is 0.4%, 2.4%, 1.6%, and 3.2% and the
device 21 connected to the server provides a backup of the loss rate of each node is less than 5%, indicating that the
storage space. +e archive backup system management service system can respond to applications.
is distributed on the server side, responds to backup and re-
covery requests, and performs backup and recovery organi-
zations, saves, and ends [28]. 4.3. Calculation and Communication Costs. +e size of the
+e data recovery process includes three stages: First, there file stored here is limited to 20 kb to 20mb, the number of
is the data recovery application review process. +e backup elements is from 20 to 200, and the sample ratios are 10%,
customer submits the recovery application to the console 20%, 30%, 40%, and 50%. +e experimental results are
through the backup adapter, and the console authenticates the shown in Figure 6. When the sample ratio is 50%, as the file
application through the CA. +e second is the process of size increases, the cost of calculation and communication
establishing a data recovery channel. +e data transmission rises from 0 to 193. +ese results indicate that the cost of
channel is established between the backup client and the calculation and communication increases with the file size
backup service under the scheduling of the console. +e third is and sample ratio.
the completion of the data recovery process. When the data +e cost of response and inquiry is similar, and the cost
recovery is over, the connection between the client and the of the answer and verification process is also similar. Figure 7
backup server is closed, and the recovery result is notified to the shows the experimental results of different query ratios. In
console [29]. order to confirm these verification results, the significant
sample ratio in this paper is 10%–50%, the file size is limited
to 10 MB, and each block has 200 elements. In the hybrid
4. Cloud Computing Storage Backup and cloud P � {P1, P2, P3}, the data block is verified. +e ratios
Recovery Strategy Analysis Based on Secure are 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. It can be seen from
IoT and Spark the figure that the cost of calculation and communication of
inquiry and response changes slightly with the change of the
4.1. Challenge-Response-Verification Framework. It can be sample rate, but the cost of answering and verification in-
seen from Figure 5 that as the number of challenge data creases with the increase of the sample ratio. Here, the
blocks increases, the time overhead of challenge-response- challenge and response are divided into two subprocesses:
verification is gradually increasing, and the increase in the response 1 and response 2. Furthermore, the proportion of
overhead of the verification process is small, which is dif- data blocks in each CSP largely affects the calculation and
ficult to see; under normal circumstances, the challenge is communication costs of queries and responses.
generated. +e time cost is much smaller than the response
cost, and the cost gradually increases with the increase in the
number of challenge data blocks, but when the number of 4.4. Spark Method Selection. For data storage and backup on
data blocks is very large, for example, when it reaches 1500, the spark platform, algorithms need to be used to build a
the time cost of the challenge increases sharply, and it be- special processing system. Here, the APCA segmentation,
comes related to the verification process. +e cost is ratio R, difference D, and durationik T methods are selected
comparable. for error research.
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 9
3000
2500
2000
Time overhead
1500
2795
2563
1000 1983
1273
500 978
832 886
698 697 695 732
502
187 232 362
0
100 300 500 1000 1500
Number of data blocks extracted per challenge
Challenge
Response
Verification
Figure 5: Challenge-response-authentication process time overhead.
As can be seen from Figure 8, compared with APCA 4.6. Iteration and Caching. It can be seen from Figure 10 that
segmentation, the errors of the other three two-stage rep- when the number of iterations is 1, the processing time
resentation methods are relatively small. +e experiment within the range of 1–7 packets is not much different and the
found that the average error selected by the ratio R method is difference is almost negligible, but when the number of
the smallest and these two methods are better than selecting packets is 10, the impact of buffering begins to manifest. It
important points by controlling the duration. +erefore, turns out that the cached one needs less processing time, 104
comparing the results of running on Spark, it is best to use hours, while the uncached one needs 133 hours. However,
the ratio R to select important points. when the number of iterations is 3, there is basically no
significant change in whether the data packet changes from 1
to 13, and it is basically the same throughout the whole
4.5. %roughput of Different Platforms. Figure 9 shows the process.
comparison result with the native system throughput. After +e default noncaching strategy means that no caching is
the optimization of the index mechanism, the throughput of performed, and the default caching means that no cost
S-TSQS is significantly higher than that of SparkDS and evaluation and optimization processing are performed on
SparkSQL. Because of the better optimization strategy of nodes with caching value, and the caching is set directly.
SparkSQL, the efficiency of similarity query is slightly higher Four representative queries were selected from the experi-
than that of SparkDS. Experimental data shows that the mental results for analysis, and they were referred to as query
query efficiency of S-TSQS is about 3–6 times that of 1, query 2, query 3, and query 4. +e result data is shown in
SparkSQL and SparkDS. Table 5 (the value unit is seconds).
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 Mobile Information Systems
200
Calculation and communication costs
180
160
140
120
100
80
60
40
20
0
20 200 2000 20000
File size
10% 40%
20% 50%
30%
Figure 6: Use different ratios of experimental results.
80
70
60
50
Spend
40
30
20
10
0
0.1 0.2 0.3 0.4 0.5
Block query ratio
1.4
1.31
1.2
1.13 1.05
1
Average error
0.89 0.9 0.92 0.85
0.8 0.69 0.73
0.6 0.63 0.67
0.49 0.55 0.61
0.47 0.54
0.4 0.34 0.48
0.32
0.35
0.2
50 55 60 65 70
Compression ratio
100
70
Data set size
50
30
20
10
0 2 4 6 8 10 12 14
roughput
Sparksql UDF
Spark DS
S-TSQS-10
Figure 9: Comparison with native system throughput.
250
200 183
Operating hours
150 133
148
100 91
75 104
89
50 35 71
15 8 12 13.4 14.1 18.4
015 36 20.2
0 9 12.2 14.1 15.3
10 3 5 7 10 13
Data size
issues, many people and companies have been holding a [7] S. Davidson, “Engineering secure internet of things systems,”
wait-and-see attitude towards cloud computing. Among IEEE Design & Test, vol. 34, no. 5, pp. 97-98, 2017.
them, cloud storage security is the core that people pay most [8] B. Kim and S.-B. Cho, “3D TSV-based inductor design for a
attention to. +is paper conducts research from two per- secure internet of things,” International Symposium on Mi-
spectives of data integrity and data privacy protection re- croelectronics, vol. 2016, no. 1, pp. 364–367, 2016.
[9] M. Zaharia, R. S. Xin, P. Wendell et al., “Apache spark,”
search. +e shortcoming of this article is that the projects in
Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016.
the article are all independent projects; that is, the work is [10] I. V. Lukiyanchuk, V. S. Rudnev, and V. G. Kuryavyi, “Surface
divided into multiple independent computer projects for morphology, composition and thermal behavior of tungsten-
parallel computer processing. In a real cloud environment, containing anodic spark coatings on aluminium alloy,” %in
many projects are not independent projects, but interde- Solid Films, vol. 446, no. 1, pp. 54–60, 2016.
pendent and important. Future work can explore how to [11] J. Merlin, B. A. Evans, N. Dehvari, M. Sato, T. Bengtsson, and
improve the data center and reduce the completion time of D. S. Hutchinson, “Could burning fat start with a brite spark?
dependent projects. In the design of the backup plan al- Pharmacological and nutritional ways to promote thermo-
gorithm, only one indicator of the backup cost is considered. genesis,” Molecular Nutrition & Food Research, vol. 60, no. 1,
In the actual cloud environment, there are many factors that pp. 18–42, 2016.
[12] G. Yang, Y. Yao, J. Fang, T. Gan, Q. Li, and L. Lu, “Large-eddy
affect project organization, such as the storage space of
simulation of shock-wave/turbulent boundary layer interac-
resources. When designing project backups, future work tion with and without SparkJet control,” Chinese Journal of
may have a greater impact on all aspects. Aeronautics, vol. 29, no. 3, pp. 617–629, 2016.
[13] M. Penchal Reddy, R. A. Shakoor, A. M. A. Mohamed,
Data Availability M. Gupta, and Q. Huang, “Effect of sintering temperature on
the structural and magnetic properties of MgFe2O4 ceramics
No data were used to support this study. prepared by spark plasma sintering,” Ceramics International,
vol. 42, no. 3, pp. 4221–4227, 2016.
[14] N. V. Patil, C. R. Krishna, and K. Kumar, “Apache spark based
Conflicts of Interest real-time DDoS detection system,” Journal of Intelligent and
Fuzzy Systems, vol. 38, no. 3, pp. 1–9, 2020.
+ere are no potential conflicts of interest in this study. [15] S. H. Popkin, B. Z. Cybyk, C. H. Foster, and F. S. Alvi,
“Experimental estimation of SparkJet efficiency,” AIAA
References Journal, vol. 54, no. 6, pp. 1831–1845, 2016.
[16] M. Charles and Ugras, “Identification of the norton-green
[1] P. C. P. Kumar and G. Geetha, “Gateway pi-design and compaction model for the prediction of the Ti-6Al-4V den-
implementation of smart and secure internet of things sification during the spark plasma sintering process,” Ad-
gateway integrating with Raspberry Pi,” Journal of Compu- vanced Engineering Materials, vol. 18, no. 10, pp. 1720–1727,
tational and %eoretical Nanoscience, vol. 14, no. 9, 2016.
pp. 4448–4453, 2017. [17] A. H. Sebayang, H. H. Masjuki, and H. C. Ong, “A perspective
[2] A. Arul Mary and K. Chitra, “OGSO-DR: oppositional group on bioethanol production from biomass as alternative fuel for
search optimizer based efficient disaster recovery in a cloud spark ignition engine,” RSC Advances, vol. 6, no. 13,
environment,” Journal of ambient intelligence and humanized pp. 14964–14992, 2016.
computing, vol. 10, no. 5, pp. 1885–1895, 2019. [18] J. Chen, K. Li, Z. Tang et al., “A parallel random forest al-
[3] M. Interlandi, K. Shah, and S. D. Tetali, “Titian: data prov- gorithm for big data in a spark cloud computing environ-
enance support in spark,” Proceedings of the VLDB Endow- ment,” IEEE Transactions on Parallel and Distributed Systems,
ment, vol. 9, no. 3, pp. 216–227, 2016. vol. 28, no. 4, pp. 919–933, 2017.
[4] J. Y. Kim, W. Hu, H. Shafagh, and S. Jha, “SEDA: secure over- [19] L. Zeng, S. Xu, and Y. Wang, “VMBackup: an efficient
the-air code dissemination protocol for the internet of things,” framework for online virtual machine image backup and
IEEE Transactions on Dependable and Secure Computing, recovery,” Concurrency and Computation: Practice and Ex-
vol. 15, no. 6, pp. 1041–1054, 2018. perience, vol. 28, no. 9, pp. 2630–2643, 2016.
[5] L. Zhe, X. Huang, and H. Zhi, “On emerging family of elliptic [20] S. Gokulakrishnan and J. M. Gnanasekar, “Data integrity and
curves to secure internet of things: ECC comes of age,” IEEE recovery management under peer to peer convoluted fault
Transactions on Dependable and Secure Computing, vol. 14, recognition cloud systems,” Journal of Computational and
no. 3, pp. 237–248, 2017. %eoretical Nanoscience, vol. 17, no. 5, pp. 2147–2150, 2020.
[6] S. Roy, S. Chatterjee, and G. Mahapatra, “An efficient bio- [21] F. Deng, L. Dong, and C. Zhe, “Control strategy of wind
metric based remote user authentication scheme for secure turbine based on permanent magnet synchronous generator
internet of things environment,” Journal of Intelligent and and energy storage for stand-alone systems,” Chinese Journal
Fuzzy Systems, vol. 34, no. 3, pp. 1403–1410, 2018. of Electrical Engineering, vol. 3, no. 1, pp. 51–62, 2017.
9071, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9505249, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 13