V1 0-Mdpi
V1 0-Mdpi
1 Article
5 1
School of Information and Computer Science, Beijing Jiaotong University; [email protected]
6 2
Rail Transit College, Suzhou University; [email protected]
7 * Correspondence: [email protected]
8 Abstract: With the development of the Internet, people pay more and more attention to the
9 privacy protection when browsing the web. More users tend to use anonymous communication
10 tools, such as The Second-Generation Onion Router (Tor), which is currently the most used
11 anonymous communication system. It can protect user privacy however some criminals use this
12 feature to carry out illegal activities. In this article, research on anonymous network application
13 classification and fingerprint attack technology. On the one hand, it provides technical and
14 theoretical support for network supervisors to purify network environment; On the other hand,
15 the vulnerability of Tor system can be found to improve the Tor system and provide anonymity
16 for legitimate users. This article mainly studies the identification and application classification of
17 Tor traffic, and further carries out fingerprint attack to identify the specific page visited by the
18 client. The main contributions of this paper are: (1) In view of the low accuracy of Tor anonymous
19 traffic identification and traffic classification, this paper proposes the XL-Stacking model based on
20 integrated learning. K-Nearest-Neighbor (KNN), XGBoost and Random-Forest algorithm are
21 selected for the first layer of the stacking model. Logistic-Regression algorithm is used in the
22 second layer, which can achieve higher classification accuracy under smaller feature dimensions.
23 The algorithm can quickly identify whether the user's traffic is dark web traffic, and the accuracy
24 rate can reach 99.7% on the data set collected by itself. Further classification of dark web traffic,
25 Citation: To be added by editorial you can quickly locate the traffic categories, which are divided into the following eight categories:
26 staff during production. video, web browsing, chat, file transfer, mail, P2P, audio, VOIP. On the publicly available UNB-
27 CIC dataset, the accuracy rate is 90.3% and the recall rate is 87.4%, which is better than the
Academic Editor: Firstname Last-
28 classification performance of similar work. (2) In view of the large amount of data required for
name
29 website fingerprint attacks. This paper proposes a spatiotemporal BiGRU-ResNet fingerprint
30 Received: date attack model, which makes full use of the Tor website fingerprint sequence, including time, space
31 Revised: date and website information triples, and integrates it into Spatiotemporal multi-modality improves
Accepted: date
32 the efficiency and accuracy of model recognition. In the closed world scene, an accuracy rate of
Published: date
33 98.46% was achieved. When the number of instances on each monitoring page was 100, the
34 accuracy rate reached 87.51%, proving that the model can achieve better performance even with a
35 small training sample. The effect is to reduce the cost for regulators to supervise the Tor
36 anonymous network.
37 Keywords: Anonymous network; Tor; Traffic identification; Website fingerprint attack; Space-time
38 multimodal
39
40 1. Introduction
41 The original intention of the emergence of anonymous networks is to protect user
42 privacy. According to the survey, most users can make reasonable use of anonymous
43 networks, such as anonymous voting, etc. However, some anonymous websites have
44 illegal web pages and use the anonymity of the dark web to conduct some illegal and
45 criminal activities, seriously damaging the green and pure Internet environment. What's
46 more, they endanger the safety of people's lives and property, such as using anonymity
47 to conduct online blackmail, spreading Internet viruses and conducting cyberattacks,
48 and conducting illegal transactions and selling drugs, guns and other illegal activities on
49 the dark web. For example, in 2013, the FBI seized Silk Road, the largest darknet website
50 at the time. This website could only be accessed through the Tor anonymous network. It
51 had a large number of black transactions, and its monthly turnover as high as $1.2
52 million.
53 It can be seen that the monitoring of the dark network is imminent, and
54 strengthening the supervision of the dark network is very necessary to protect the
55 security of user information. At the same time, due to the anonymity of the dark
56 network, the supervision of the supervision department is also subject to greater
57 difficulties and resistance. Fingerprint attack technology can identify the traffic in the
58 dark net network. On the one hand, it can help network administrators or judicial
59 personnel to take corresponding measures, which is conducive to reducing the damage
60 of such cyber-criminal activities to the security of cyberspace and maintaining the
61 security of the Internet. On the other hand, through the attack and prevention of
62 fingerprint identification, the vulnerability of the system can be found, and the Tor
63 system can be improved and perfected. Provide secure anonymity for legitimate users.
64 Therefore, based on the existing research, this paper will further explore the attack and
65 defense technology of Tor anonymous network, in order to realize that anonymous
66 network technology will not be used by criminals and purify the network environment.
67 And through corresponding attack technologies, we can discover the vulnerabilities of
68 the Tor network and improve the Tor network system.
69 Tor uses fixed-size packets of 512 bytes for transmission, called Cells. Each contains
70 a header and a payload, and there are two types of packets. They are control package
71 (Control) and relay package (Relay) respectively. Control packets are used to parse and
6
7 Remote Sens. 2023, 15, x FOR PEER REVIEW 3 of 36
8
72 execute commands related to filling, building, extending, and disconnecting links. Relay
73 packets are used to transmit end-to-end data flows. It includes the identity of the data
74 stream (StreamID), end-to-end integrity check (Digest), length of forwarded data (Len),
75 and forwarding command (CMD). The structures of these two types are shown in Figure
76 1 and Figure 2. They carry end-to-end communication messages between clients and
77 servers. The CircID and Command fields are not encrypted and are used on each OR,
78 and the rest are encrypted.
79 Figure1 Structure of control cell [1]
83 attacks are based on the characteristics of anonymous traffic, classifying different web
84 application types such as email, file transfer, chat, and extracting traffic characteristics of
85 different application traffic on anonymous networks. Use machine learning methods to
86 establish an application classification model, and use the application classification model
87 to associate users with communication targets, and ultimately identify which type of
88 website the user has visited.
89 In 2017, the Canadian Network Institute proposed to detect and characterize Tor
90 traffic based on time analysis [2]. The feature set is only time-based statistics such as
91 forward packet interval (IAT), backward packet interval, stream duration, etc. KNN,
92 C4.5, Random Forest and other algorithm models can be used to accurately analyze the
93 types of web pages visited by users within 15 seconds, such as chat, audio/video stream,
94 mail, file transfer, etc., but the accuracy of the model needs to be further improved.
95 In 2019, He Y et al. proposed an Obfs4 traffic detection scheme based on two-stage
96 filtering [3]. The high precision and real-time recognition of Obfs4 traffic is realized by
97 using coarse-grained fast filtering and fine-grained accurate recognition [4]. In the
98 coarse-grained filtering stage, randomness detection algorithm is used to detect the
99 randomness of the handshake packet payload in the communication, and the timing
100 characteristics of the packet are used to remove other interference flow. In the fine-
101 grained recognition stage, the accuracy of Obfs4 identification using SVM (support
102 vector machine) algorithm is more than 99%. However, the randomness detection of this
9
10 Remote Sens. 2023, 15, x FOR PEER REVIEW 4 of 36
11
103 method does not perform well in the case of other encrypted traffic interference, and
104 other packets before and after TCP need to be captured, and the space efficiency is poor.
105 Machine learning has high efficiency and accuracy in identifying and classifying
106 Tor traffic. Therefore, this paper selects a machine learning model to identify and
107 classify Tor traffic [5]. In the existing studies, most feature selection of machine learning
108 algorithms lacks feature vectors for Tor traffic. This paper tries to combine Tor routing
109 protocols to find targeted feature vectors, so as to improve the traffic analysis effect and
110 model training efficiency from the source.
111 Based on the fingerprint information in the traffic link, the network fingerprint
112 attack compares the unknown traffic monitored with the known traffic in the fingerprint
113 database to identify the specific websites visited by users. A large number of known
114 website traffic is collected in advance, formed into a fingerprint database, and the
115 fingerprint attack model is trained offline. In the online traffic identification stage, the
116 specific websites visited by users are identified.
117 In recent years, fingerprint attacks based on deep learning neural networks (DNNS)
118 have been researched and developed, and have surpassed the effectiveness of machine
119 learning-based research, using deep learning neural networks for automatic feature
120 extraction, and these attacks are more effective than the original attacks compared to
121 traditional attacks [6].
122 Rimmer et al. proposed the first DNN-based attack - AWF [7] in 2017. Pioneering
123 use of stacked denoising autoencoders (SDAE), long short-term memory (LSTM), and
124 convolutional neural networks (CNN) to automatically select features. Rimmer et al.
125 built one of the largest data sets of network fingerprints, containing more than 3 million
126 network traces. In a closed world of 100 sites [8], the success rate was over 96%.
127 However, the neural network they designed is too simple, and the network layer is
128 shallow, which cannot better extract network features [9].
129 In 2018, Sirinam et al. proposed a deep fingerprint recognition (DF) model [10],
130 which uses CNN model, adopts complex architecture design, and achieves good results
131 on their own dataset. DF attacks on their own large data set, without defense against the
132 closed world accuracy of more than 98%, better than all previous attacks.
133 In recent years, the fingerprint attack based on deep learning neural network
134 (DNN) has been researched and developed [11]. The attack using deep learning neural
135 network to extract fingerprints automatically is more effective than the traditional attack.
136 However, most deep learning algorithms do not make full use of the
137 spatiotemporal information of data packets, and deep learning algorithms need to
138 collect a large number of data samples, which consumes more training time and training
139 costs, and has low iteration efficiency [10]. From a practical point of view, due to the
140 privacy of user traffic, it is impossible to collect large amounts at will, and the network
141 status of the Tor system is constantly changing, and plug-ins are advancing with the
142 times. A good attack model not only needs high accuracy but also needs to be able to
12
13 Remote Sens. 2023, 15, x FOR PEER REVIEW 5 of 36
14
143 adapt to changes in the Tor network system as quickly as possible and reduce the cost
144 and time of model training. The new model proposed in this article solves these
145 problems.
146 This paper proposes the XL-Stacking model based on the stacking model to identify
147 Tor anonymous traffic and classify applications. After verification through experiments
148 and tests, the first layer of the stacked model uses KNN, XGBoost, and random forest
149 algorithms. The second layer uses the Logistic algorithm, which can achieve higher
150 classification accuracy under smaller feature dimensions. Experiments show that the XL-
151 Stacking model can quickly identify whether a user's traffic is darknet traffic. If the
152 algorithm detects that the user's traffic is darknet traffic, it will further classify the
153 darknet traffic and quickly locate the monitored traffic category. Specifically, the
154 following eight categories can be identified: video, web browsing, chat, file transfer,
155 email, P2P, audio, and VOIP.
156 For fingerprint attacks, it aims to identify which specific monitored website a user
157 visits. This article proposes the BiGRU-ResNet model based on spatiotemporal features.
158 According to the characteristics of Tor traffic packets, make full use of the Tor website
159 fingerprint sequence. Including time, space and website information triples, integrating
160 spatiotemporal multi-modality to improve the efficiency and accuracy of model
161 identification. BiGRU-ResNet uses bidirectional GRU to extract temporal features and
162 ResNet to extract spatial features of website fingerprints. On a smaller sample space, the
163 experimental results of our attack show that it performs better than the state-of-the-art
164 attacks, and the time overhead is acceptable [12].
165 This article is organized as follows. Part one, introduction. First, the research
166 background and significance of this article are explained, and then the Tor anonymous
167 communication system is introduced. An overview of Tor anonymous network traffic
168 identification and application classification, as well as the current research status of
169 website fingerprint attacks at home and abroad. The research content and innovation
170 points of this article are summarized. The second part introduces Tor anonymous
171 network traffic identification and application classification model and fingerprint
172 attacks. Proposed XL-Stacking integrated learning model and BiGRU-ResNet model
173 based on spatiotemporal features. Compare with existing research. In the third part, the
174 efficiency and accuracy of the XL-Stacking model and BiGRU-ResNet model are verified
175 through experimental data. Section 4 discusses the results in a broader context and
176 points out the shortcomings of this article and future research directions. Finally,
177 conclusions are drawn in Chapter Five.
15
16 Remote Sens. 2023, 15, x FOR PEER REVIEW 6 of 36
17
197 2.1. Tor anonymous network traffic identification and application classification model
198 This chapter proposes the XL-Stacking model for Tor anonymous traffic
199 identification and classification. The overall architecture is shown in Figure 3, which
200 includes five modules: traffic collection, feature acquisition, data preprocessing, model
201 training, identification and classification.
202
203 Figure 3 Tor anonymous traffic identification and classification model architecture
204 The traffic collection module is used to collect raw traffic samples. The feature
205 processing module is used to extract features from the collected traffic. After in-depth
18
19 Remote Sens. 2023, 15, x FOR PEER REVIEW 7 of 36
20
206 analysis of the Tor and Obfs4 protocols, the combined traffic characteristics of
207 handshake packet length characteristics, information entropy characteristics, and time
208 interval characteristics were mainly selected. The data processing module is used to
209 calculate and preprocess correlation features at the granularity of data streams. The
210 model training module learns traffic fingerprint features and trains the parameters of the
211 final classifier. The final trained model is used to predict labels for anonymous
212 communication traffic. That is, identify whether the user traffic is Tor anonymous traffic
213 and classify the Tor anonymous traffic application type. Different XL-Stacking model
214 classifiers are trained for two different scenarios of traffic identification and
215 classification. In these scenarios, the captured unknown label traffic is extracted,
216 preprocessed, and then fed into the model and the classification results are obtained. In
217 the traffic identification scenario, determine whether the input traffic is Tor anonymous
218 network traffic; in the application classification scenario, determine the application type
219 corresponding to the input Tor anonymous traffic.
220 2.1.1. Feature design of Tor anonymous traffic Experimental data set
221 The purpose of extracting features is to effectively distinguish whether it is Tor
222 traffic based on the traffic loaded by web pages, and to further classify the types of Tor
223 applications, so as to effectively identify the categories of websites accessed by
224 anonymous network traffic and strengthen the management of the network
225 environment by regulatory authorities. The quality of features directly affects the quality
226 of results.
227 The Obfs4 plug-in further obfuscates the packet size, reorganizes and randomly fills
228 the packets based on the obfuscated traffic quintuple. Based on the characteristics of
229 random filling, the traffic characteristics related to information entropy can be analyzed.
230 Although Obfs4 hides the surface characteristics of anonymous traffic, it does not have
231 functions such as reordering, random packet insertion, and delay. Therefore, time
232 correlation features such as packet interval time of traffic when accessing different
233 services can be extracted for traffic fingerprint analysis. Liang Di et al. proposed the
234 handshake packet length feature and information entropy feature, and Arash et al.
235 proposed the time interval feature. This article combines the traffic characteristics of
236 these three dimensions and inputs them into the classifier model proposed in this
237 chapter. The specific traffic characteristics are shown in Table 1.
Type Feature
Total length of data stream, C2S data packet length, S2C data packet length;
Handshake packet length charac-
Mean, minimum, maximum, total length, quartiles, median, and variance of
teristics
the overall length.
21
22 Remote Sens. 2023, 15, x FOR PEER REVIEW 8 of 36
23
24
25 Remote Sens. 2023, 15, x FOR PEER REVIEW 9 of 36
26
265 good and the principles should be as different as possible, and the meta-learner should
266 be simple.
267 The learning effect of Stacking integrated learning does not come from the stacking
268 of multi-layer models, but from the learning capabilities of different learners for
269 different features. Multi-layer aggregation will face more complex over-fitting problems
270 and has limited benefits. Generally, two layers are enough. Therefore, the Stacking
271 model in this article also chooses two layers.
272 The XL-Stacking integrated learning model in this article mainly consists of the
273 following two layers. The first-layer base learner uses XGBoost, random forest, and
274 KNN, as shown in Figure 4. The second-layer meta-learner selects a simpler Logistic
275 model to reduce the complexity of the model, as shown in Figure 5.
276 The XL-Stacking model training steps are as follows:
277 (1) First train each base learner model using five-fold cross-validation on the
278 original training set [16]. Four out of five copies are selected as training data and the
279 remaining one is used as test data [17].
280 (2) After the data is trained, predict the test data to obtain the corresponding
281 prediction result ResultA, and predict the original test set to obtain the corresponding
282 prediction result ResultB. Each model is trained five times, the ResultA obtained five
283 times is combined into one column, and the ResultB results obtained five times are
284 averaged [18].
285 (3) The primary models XGBoost, RF and KNN obtain 3 groups of ResultA and 3
286 groups of ResultB through step (1), from which new training sets and test sets can be
287 formed.
288 (4) Input the new training set into the logistic regression model training, predict the
289 newly generated test group, and obtain the final output result.
290
27
28 Remote Sens. 2023, 15, x FOR PEER REVIEW 10 of 36
29
291
293
294
296
297 2.2. Tor network fingerprint attack model based on spatiotemporal characteristics
298 Machine learning methods applied to fingerprint attacks have no obvious effect
299 because the traffic differences between the same type of websites are small. Feature set
300 design and construction will affect the accuracy of model classification results more than
301 using different classifiers. Although some research has proposed designing Tor
302 anonymous traffic feature selection methods with different classification granularities
303 and gaining expert experience in analyzing operating mechanisms such as the Tor
30
31 Remote Sens. 2023, 15, x FOR PEER REVIEW 11 of 36
32
304 network and its plug-ins. However, the selection and determination of features are also
305 based on different assumptions, and there is no way to apply them to real network
306 environments. And as anonymous network access technology is repeatedly upgraded,
307 the existing selected features may no longer be effective.
308 Deep neural networks can more effectively extract fingerprint features of websites,
309 but require a large amount of data and the time cost of model training is very high.
310 Supervisors must collect data sets in advance, this identification model is a static model,
311 and time will affect the accuracy of the classifier [10]. Fingerprint models require large
312 amounts of data, and the accuracy of the model will decrease significantly over time.
313 And since website tracking changes rapidly, attackers must frequently update the
314 tracking database to match user traffic, which weakens the attack in practice. As deep
315 learning continues to develop, another problem is that the network is getting deeper and
316 deeper. Due to the vanishing gradient problem, neural networks with deeper layers are
317 more difficult to train. If you simply increase the number of layers of the neural network,
318 it will not have a big effect. Because backpropagation will pass the gradient to the
319 previous layer, repeated multiplication will make the gradient infinitesimal. As a result,
320 as the number of neural network layers increases, its performance tends to be saturated
321 or even begins to decline. Therefore, a new attack model needs to be proposed to achieve
322 better results in small samples and without gradient disappearance.
323 In order to effectively solve the problem of fingerprint models that require a large
324 amount of data, the accuracy of the model will significantly decrease over time. This
325 paper proposes a deep neural network model based on temporal and spatial features,
326 the BiGRU-ResNet model, which can effectively solve this problem. BiGRU-ResNet uses
327 a two-layer GRU to extract temporal features and an 18-layer ResNet to extract spatial
328 features of website fingerprints. On smaller sample spaces, our attack proves to perform
329 better than state-of-the-art attacks with acceptable time overhead, significantly reducing
330 the amount of training data required to perform website fingerprinting attacks. This
331 shortens the time required for data collection and solves the problem of data instability.
332 The model framework proposed in this chapter is shown in Figure 6.
333
33
34 Remote Sens. 2023, 15, x FOR PEER REVIEW 12 of 36
35
335
336 The website fingerprint is sent to the Bi-GRU network to extract temporal features, while it is sent to the 18-layer
337 ResNet network to obtain the spatial features of the website fingerprint. We merge the feature vectors extracted by the
338 GRU network and ResNet network into one vector and send it to the Softmax classifier for classification. See
339 Algorithm 1 for the overall description of the pseudocode. The sequences extracted by Tor cell are used as data set F,
340 which is the built fingerprint database of monitoring websites. Among them, fk is the unknown fingerprint to be
341 tested, which is used as the input vector. After the detection and full connection of the BiGRU-ResNet model, the
342 probability value p is obtained for comparison with λ. If p≥λ, add to the candidates set. Otherwise, it is a fingerprint
343 of a non-supervised page. If fingerprint data exists in candidates and belongs to the previously constructed regulatory
344 page fingerprint, the regulatory page fingerprint is returned. Otherwise, it is classified to the most similar regulatory
345 page.
346
347
348
349 Algorithm 1: Fingerprint attack algorithm based on BiGRU-ResNet
Input: Data set F extracted through Tor cell
Output: matching fingerprint id
Begin:
Function:
candidates←∅
For fk ∈ F do
<Xt-1, Xt-2, Xt > = feature Vector (fu, fk)
P←PBiGRU-ResNet ( fu.id = fk.id <Xt-1, Xt-2, Xt>)
If p ≥ λ
then candidates=candidates ∪< fk, p >
else return unmonitored
end if
end for
If |candidates|>0 and sameIds (candidates)
then return candidates[0].id
else return similarId ( )
end if
end function
36
37 Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 36
38
39
40 Remote Sens. 2023, 15, x FOR PEER REVIEW 14 of 36
41
374
376
377 The "skip" connection between ResNet input and output facilitates the optimization of
378 large networks. By simply having deep blocks copy previous blocks, deeper networks
379 can be extrapolated from shallower ones, simplifying larger network optimizations. This
380 facilitates higher-level feature extraction and improves expressiveness. ResNet-18 is both
381 general enough to accept any sequence of similar inputs and powerful enough to
382 perform well on these inputs.
383 2.2.2. Time dimension information extraction layer
384 We use a two-layer GRU network to extract the temporal features of website
385 fingerprints. Because the traffic at time t depends not only on the previous traffic status,
386 but also on the future traffic status. So, we choose the bidirectional GRU network. It can
387 not only extract the previous moments but also extract the impact of future moments on
388 the traffic status, and further extract the website fingerprint timing characteristics.
389 There is an input at each moment, and the hidden layer has two nodes, one for
390 forward calculation and the other for reverse calculation. The output layer is determined
391 by these two values [20]. The calculation of the output layer requires obtaining the input
392 Xt of the current time step t. The forward hidden state and the reverse hidden state at
393 time t-1. And the hidden state Ht at time t is obtained by weighted summation of the
394 hidden states in both directions. And finally get the output Yt. Whh, Whq are the weight
42
43 Remote Sens. 2023, 15, x FOR PEER REVIEW 15 of 36
44
395 coefficients, bh, bq are the deviations corresponding to the hidden state at a certain
396 moment, see formula (1):
(1)
397 ⃗
H t =σ ( X t W xh + ⃗
H t−1 W hh +bh )
398 The input of the BiGRU module in this article uses 1D input. Experiments have
399 proven [10] that 1D input is much faster than 2D input, even though the total number of
400 data in both input dimensions is the same. Analysis suggests that this difference is due
401 to tensor operations, which must process higher-dimensional data. So, it trains faster
402 and provides better classification performance. The GRU input dimension is also the
403 vector dimension of the website fingerprint, which is 5000, and the output is determined
404 by GRU in both directions simultaneously. The output dimension in one direction is 256,
405 so the output dimension of the bidirectional GRU is also 512.
406 2.2.3. Spatiotemporal information fusion layer
407 Features learned from directional inputs are significantly different from those
408 learned from temporal inputs, making it difficult for shared models to find a set of
409 shared weights. In order to effectively combine temporal features and spatial direction
410 features, we take the arithmetic mean of its Softmax output after training each of the
411 above models separately. We perform a vector merging operation on the spatial feature
412 vector output by the ResNet network with dimension 512 and the temporal feature
413 vector output by the BiGRU network with dimension 512. Finally, a website fingerprint
414 feature vector with a dimension of 1024 is generated, and then input to the Dorpout
415 layer and Softmax layer for fingerprint prediction. The Softmax layer will predict the
416 probability that the input website fingerprint belongs to different monitored websites,
417 and the monitored website with the largest probability value is the prediction result [13].
418 In order to further reduce the overfitting of the model, this article uses the Dropout
419 algorithm. Even complex neural networks can prevent overfitting and improve the
420 efficiency of neural networks.
421 In the training phase of the model, the calculation formula of the neural network is
422 as follows:
(2)
423 Set the probability p and enter the neural network calculation of the Dropout
424 algorithm, see formula (3):
45
46 Remote Sens. 2023, 15, x FOR PEER REVIEW 16 of 36
47
(3)
425 Among them, Bernoulli is the Bernoulli random distribution function, which is
426 used to randomly generate a 0-1 probability vector with probability p. Among them, wi
427 and ri represent the weight parameters of the network, and bi is the bias. Finally, the
428 discarded neurons are recovered and replaced, and the above process is repeated.
429 Reduce interactions between hidden nodes by randomly ignoring them. During
430 propagation, neurons with probability p stop working, so the learning process does not
431 become too dependent on local features. Overfitting mostly occurs in fully connected
432 layers and is less of a problem in convolutional layers. Through experiments, it is found
433 that setting the probability value p of Dropout to 0.5 during model training can improve
434 the robustness of the model and prevent overfitting.
435 The normalization mentioned in the previous section helps the model learn and
436 generalize to new data. So not only do you need to normalize the data before entering
437 the model, but you should also consider normalizing it after each transformation of the
438 network. Batch Normalization (BN) is a type of layer proposed by Loffe and Szegedy in
439 2015 [21]. The BatchNormalization layer receives an axis parameter, which specifies
440 which feature axis should be normalized. The main advantage of batch normalization is
441 that it helps gradient propagation and improves classification performance, learning
442 faster while maintaining or even improving accuracy.
443 Therefore, we combine BN and Dropout to achieve improvements in both
444 performance and generalization. However, adding a BN layer requires additional
445 training time, which increases the training time by approximately twice per echo
446 compared to the model without BN applied. However, we believe that BN is worth
447 applying because the additional training time is compensated by the fast-learning rate
448 and ultimately enables higher test accuracy. In the BiGRU-ResNet model, we apply BN
449 after convolutional and fully connected layers [22].
48
49 Remote Sens. 2023, 15, x FOR PEER REVIEW 17 of 36
50
460 number of unmonitored web fingerprint instances and therefore requires a large amount
461 of data. For the sake of user privacy and comparison with similar work, this article
462 selects the data collected by Rimmer et al. in 2018. This is currently the largest data set
463 for website fingerprinting attacks, and many studies have been conducted on this data
464 set.
465 1. Closed world: Rimmer et al. visited the homepages of 1200 most popular websites
466 based on Alexa data. Start b3 filtering the list of popular websites and removing
467 duplicate entries. Data for these 1,200 websites were collected in four iterations,
468 each containing 300 websites. Network traces were collected for four iterations over
469 approximately 14 days starting in January 2017. After collecting data on 3.6 million
470 page views, invalid entries were filtered out. These entries are caused by timeouts,
471 browser or Selenium driver crashes. Additionally, Rimmer et al. filtered out web
472 pages that displayed verification codes on every visit. Finally, the dataset is
473 balanced, fixing the same number of traces for each site, ensuring an even
474 distribution of instances across sites.After this filtering process, the Rimmer et al.
475 closed world dataset consists of 900 websites, each with 2500 valid web traces. Call
476 this data set CW900. Similarly, for datasets consisting of subsets of the websites,
477 corresponding notations are used: the datasets for the top 100, 200, and 500
478 websites are called CW100, CW200, and CW500, respectively [7].
479 2. 2) Open World: Since the open world data is only used for testing purposes and not
480 for training the model, only one instance is collected for each page in the open
481 world. Collected 400,000 web traffic on Alexa website. In addition, an additional
482 2,000 test traces (400,000 total) were collected for each site of the monitored closed
483 CW200. Finally, 800,000 pieces of traffic data were evaluated in the open world, half
484 from the closed world and half from the open world. Rimmer et al.'s dataset
485 represents a 4x increase compared to the largest dataset in previous work [7]. The
486 complete data set composition is shown in Table 3:
487
488 Table 3. Dataset Composition
Data Website number Instances per website The length of trace
CW100 100 2500 5000
CW200 200 2500 5000
51
52 Remote Sens. 2023, 15, x FOR PEER REVIEW 18 of 36
53
monitored:200 monitored:2000
CW200_400000 5000
unmonitored:400000 unmonitored:1
489
memory 8G
harddisk 256GB
CPU 4-core GPU
processor AMD A8-700
python version 3.6
496
497 We designed two experiments.
498 The first experiment is the identification of Tor traffic. Validated on the public ISCX
499 Tor dataset. Finally, the experimental results are compared with the results of
500 Canadian research institutes.
501 The second experiment is to further classify Tor traffic by application. Use the ISCX
502 Tor data published by UNB-CIC to experiment, and conduct comparative analysis
503 of the experimental results.
504 2.4.2. Traffic application classification experimental design
505 The experiment uses the Pytorch deep learning framework to implement the model
506 we designed. Deep learning experiments were conducted on the cloud platform with
507 Nvidia 2080Ti GPU and 12GB CPU. The specific experimental configuration is shown in
508 Table 5.
509
510 Table 5. The experimental configuration
54
55 Remote Sens. 2023, 15, x FOR PEER REVIEW 19 of 36
56
511
512
513 We designed three experiments.
514 Fingerprint attack experiment in a closed world
515 Open world fingerprint attack experiment
516 Ablation experiment
(4)
522
523 TP is the number of correctly classified regulated sites. TN is the number of
524 correctly classified non-regulated websites. FN is the number of regulated sites that are
525 incorrectly classified as unregulated sites. FP represents the number of non-regulated
526 websites that are incorrectly classified as regulated websites [10].
527 The closed world fingerprint attack is a multi-classification task, which is evaluated
528 using accuracy and recall. The specific definitions are shown in equations (5) and (6):
57
58 Remote Sens. 2023, 15, x FOR PEER REVIEW 20 of 36
59
(5)
(6)
529
530 Open-world fingerprinting attack is a two-class problem. The performance of the
531 model is not only reflected in correctly identifying monitored web pages, but also in
532 minimizing the misidentification of non-monitored pages as monitored pages.
533 Therefore, the True Positive Rate (TPR) and False Positive Rate (FPR) are used to
534 evaluate the performance of the model [24]. TPR is the proportion of monitored pages
535 that are correctly classified as any monitored page. FPR is the ratio of unmonitored
536 traffic that is misclassified as a monitored site and is a measure of an attacker's false
537 identification. See equations (7) and (8):
(7)
(8)
538 3. Results
60
61 Remote Sens. 2023, 15, x FOR PEER REVIEW 21 of 36
62
558 recall rate is also the highest at 99.3%. A higher recall rate means a lower false negative
559 rate in network supervision. The experimental results strongly prove that although the
560 Tor protocol and Obfs4 plug-in confuse the surface characteristics of the traffic.
561 However, they are still different from normal network traffic in terms of combined
105
562 features such as handshake packet length features, information entropy features, and
563 time interval
100
features.
Precision/Recall ( % )
564
95
90
85
80
75
Acc Recall
566
567 Figure 9. Comparison of Tor anonymous traffic identification experimental results
568
63
64 Remote Sens. 2023, 15, x FOR PEER REVIEW 22 of 36
65
600 0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100
0.000
Acc Recall
66
67 Remote Sens. 2023, 15, x FOR PEER REVIEW 23 of 36
68
604 Finally, on the ISCX Tor dataset, the test dataset is used to evaluate the precision
605 and recall of different application categories. And compared with the RF, KNN, and
606 C4.5 models used by UNB-CIC. The results are shown in Figures 12 and 13, which show
607 the precision andForest
Random recall valuesC4.5
for each of KNN
the eight application categories. It can be seen
XL-Stacking
608 1.2 that the XL-Stacking model proposed in this article has the best classification results
609
1
0.8
0.6
0.4
0.2
0
VOIP AUDIO B RO W S CHA T MA I L FILE- P2P VIDEO
ING T RA NS
FER
610
0.8
0.6
0.4
0.2
0
VOIP AUDIO B RO W S CHA T MA I L FILE- P2P VIDEO
ING T RA NS
FER
69
70 Remote Sens. 2023, 15, x FOR PEER REVIEW 24 of 36
71
617 In a closed world, network regulators can determine the specific websites visited by
618 users through fingerprint attacks and monitor the network environment. We compare
619 the BiGRU-ResNet proposed in this article with the SDAE, CNN, and LSTM models
620 proposed by Rimmer et al. on the same data set. First, on the CW100 data set, for the
621 monitored websites, the impact of different number of instances on the accuracy of
622 fingerprint attacks is shown in Table 6 and Figure 14.
623 When the number of instances of the monitoring page is 100, the accuracy rate
624 reaches 87.51%. It is proved that this model can achieve better results even when the
625 training sample is small. It reduces the cost of attacks to a certain extent and significantly
626 reduces the amount of training data required to perform website fingerprint attacks.
627 This shortens the time required for data collection and reduces the likelihood of data
628 instability issues. As the number of training examples increases, the accuracy of website
629 identification also increases. When 2500 instances are involved in training and testing,
630 the accuracy of each model reaches its maximum. Among them, the BiGRU-ResNet
631 model proposed in this article has higher accuracy than other models in different
632 number of instances. When the number of instances of each monitoring page is 2500, the
633 accuracy reaches a maximum of 98.46%. It is much better than the 96.26% accuracy of the
634 best CNN model proposed by Rimmer et al.
635
636 Table 6. Accuracy of different number of instances
637
72
73 Remote Sens. 2023, 15, x FOR PEER REVIEW 25 of 36
74
80
70
60
50
40
100 200 500 1000 1500 2000 2500
638 Figure 14. The accuracy of each model in CW100 with different number of instances
639
640 Collecting more monitored traffic for each website helps improve the classification
641 accuracy of these classifiers [26]. Because more sufficient data can help the model train
642 and learn the characteristics of the sequence more comprehensively. But this type of
643 model strongly depends on the amount of data [27]. In contrast, our model can still
644 perform well even with a limited number of instances on each website. Our model is
645 robust compared to state-of-the-art fingerprint attacks.
646 Then compare the accuracy of each model when the number of instances of each
647 page in the closed world data sets CW100, CW200, CW300, CW500 and CW900 is 2500.
648 The results are shown in Table 7 and Figure 15. It can be observed that as the type and
649 number of monitoring pages increase, each model shows a small range of decline. But no
650 matter which data set it is on, the BiGRU-ResNet model has a higher page recognition
651 accuracy than other models.
652
653 Table 7. The accuracy of each model on different datasets
75
76 Remote Sens. 2023, 15, x FOR PEER REVIEW 26 of 36
77
92
90
88
86
84
82
CW100 CW200 CW500 CW900
654
656
78
79 Remote Sens. 2023, 15, x FOR PEER REVIEW 27 of 36
80
100 20
95
90 16
85
TPR ( % )
80 12
FPR ( % )
75
70 8
65
60 4
55
50 0
0 10000 20000 30000 40000
TPR FPR
669 Figure 16. BiGRU-ResNet results on different numbers of unmonitored pages
670
671 It can be seen that as the number of unmonitored pages for fingerprint attacks
672 increases, the FPR decreases. And when the number of unmonitored pages is at most
673 40,000 (the ratio of monitored and unmonitored pages is 1:1), FPR reaches the minimum
674 value. TPR will decrease slightly as FPR decreases. Generally speaking, the smaller the
675 proportion of monitored pages in the training set, the less is known about the monitored
676 category and the lower the TPR. And more unmonitored pages will bias the attack
677 model to correctly distinguish monitored from unmonitored, thereby reducing FPR.
678 In order to prove the efficiency of the BiGRU-ResNet model proposed in this article
679 in open world fingerprint attacks. Compare it with the SDAE, CNN, and LSTM models
680 mentioned in the previous section on the same CW200_40000 data set, as shown in
681 Figure 17. The results show that the BiGRU-ResNet model performs best on TPR and
682 FPR. In the 40,000 unmonitored training set, TPR and FPR are 86.26% and 3.21%
683 respectively. This shows that the model proposed in this chapter can still achieve the
684 best results in open world fingerprint recognition
685 Figure 17. TPR and FPR for each model
686
81
82 Remote Sens. 2023, 15, x FOR PEER REVIEW 28 of 36
83
100
90
80
TPR/FPR ( % )
70
60
50
40
30
20
10
0
SDAE CNN LSTM BiGRU-RestNet
TPR FPR
691 1) Remove the spatial dimension information extraction layer. That is, the model
692 only removes the ResNet-18 component that extracts spatial information. The remaining
693 time dimension information extraction component BiGRU. Its output is directly input to
694 the fusion layer.
695 2) Remove the time dimension information extraction layer. That is, only the BiGRU
696 component that extracts time information is removed. The spatial dimension
697 information extraction component ResNet-18 remains. Its output is directly input to the
698 fusion layer.
699 3) Remove the spatiotemporal information fusion layer. Pass the output vector of
700 the spatial dimension information extraction layer and the time dimension information
701 extraction layer. The splicing operation is not input to the Dropout and Softmax layers,
702 but the traffic prediction is performed directly. Then the maximum value of the output
703 probability of the ResNet-18 component and the BiGRU component is taken as the
704 experimental result of the ablation experiment.
705 The results of the ablation experiment are shown in Table 8[30]. In the closed world
706 CW100 and open world CW200_400000 test sets, the results of the model without the time
707 dimension information extraction layer are similar on the two data sets. It shows that the
708 final prediction effect of fingerprint attack depends more on the spatial information in
709 the traffic. It can be seen that ResNet-18 has certain advantages in extracting spatial
710 information in traffic. In contrast, the time dimension extraction layer has a much
711 smaller impact on the prediction results, but it is also an essential part.
712 Table8. Ablation experimental results
84
85 Remote Sens. 2023, 15, x FOR PEER REVIEW 29 of 36
86
719 4. Discussion
87
88 Remote Sens. 2023, 15, x FOR PEER REVIEW 30 of 36
89
747 fingerprinting attack. The time required for data collection is shortened and the problem
748 of data instability over time is reduced. Compared with previous state-of-the-art WF
749 attacks, deep learning WF attacks can still achieve high accuracy even with a small
750 amount of training data.
751 This article proposes a BiGRU-ResNet model based on space and time. According
752 to the characteristics of Tor traffic packets, make full use of the Tor website fingerprint
753 sequence, including time, space and website information triples. Integrate multi-
754 modality to improve the efficiency and accuracy of model recognition. Dramatically
755 reduce the amount of training data required to perform website fingerprinting attacks.
756 Reduced time required for data collection. Reduces the possibility of data stability
757 issues.
758 This paper sees the bright future of applying deep learning to anonymous network
759 fingerprinting attacks. The model can be further improved in the future to improve
760 classification accuracy. Apply it to more complex scenarios. For example, there are
761 fingerprint recognition scenarios where users visit multiple websites at the same time;
762 scenarios where users download files in the background while listening to music, add
763 noise traffic, and pay attention to improving the computational efficiency of the model.
764 5. Conclusions
765 The XL-Staking model based on ensemble learning proposed in this chapter can not
766 only identify Tor anonymous traffic, but also classify Tor traffic. In terms of the effect of
767 Tor traffic identification, the model proposed in this article is better than the traditional
768 SVM and the results of C4.5 and KNN used by the Canadian Network Research
769 Institute. The precision and recall rates are higher, reaching 99.1% and 99.3%
770 respectively. In the Tor application traffic classification experiment, it also reached an
771 accuracy of 90.3% on the ISCX public data set. Compared with UNB-CIC's RF, KNN,
772 and C4.5 multi-classification models, the results are better.
773 Experimental results show that the model proposed in this chapter can realize the
774 identification and application classification of Tor anonymous traffic. Provide certain
775 technical support for the supervision of Tor traffic. At the same time, it lays the
776 foundation for further website fingerprinting.
777
778 Experimental results show that the BiGRU-ResNet model proposed in this article
779 achieves high accuracy and effect in closed world scenes and open world scenes. At
780 acceptable time cost and training cost, the attack performance is better than similar
781 work. In the closed world scene, the accuracy rate was 98.46%, and in the open world
782 scene, the true positive rate was 86.26%.
783 This article achieves a balance between training accuracy and training efficiency.
784 When the number of instances of each monitoring page is 100, the accuracy reaches
785 87.51%. It can also achieve better results when the training sample is smaller. Allows
90
91 Remote Sens. 2023, 15, x FOR PEER REVIEW 31 of 36
92
786 attackers to use fewer resources and less time to collect data. It reduces the cost of
787 attacks to a certain extent, allowing weak attackers with fewer data collection resources
788 to successfully conduct WF attacks. In terms of supervision of the network environment,
789 it improves the cost and efficiency of supervisors' supervision of the Tor anonymous
790 network and lays the foundation for subsequent research.
791
792 Author Contributions:
793 Data Availability Statement: Not applicable.
794 Funding:
795 Conflicts of Interest: The authors declare no conflict of interest.
796
797 Appendix A
798 Base learner and meta-learner parameter settings used in the XL-Stacking model
799 The main parameters of the XGBoost model are: the threshold gamma parameter of
800 the node splitting loss function. The larger the value set, the more conservative the
801 model will be. The maximum depth of the tree is max_deep. Reasonable setting can
802 prevent overfitting. Booster is used to specify the type of weak learner. The default value
803 is 'gbtree', which means using a tree-based model for calculation. This article chooses the
804 default value. This article first sets the parameters according to the default initial values,
805 and uses grid_search to tune the max_deep and gamma parameters. First make rough
806 adjustments and then fine-tune, and finally determine that max_deep is set to 7 and the
807 gamma parameter is set to 0. Adjust the regularization parameters alpha and lambda in
808 the range [
809 0,1,2,3,4,5]. The best results are alpha =2 and lambda=1.
810 The method of adjusting the parameters of the random forest algorithm is similar.
811 First set the parameters to initial parameters. max_features represent the maximum
812 number of features that a random forest allows a single decision tree to use [31].
813 Increasing max_features generally improve the performance of the model. Because at
814 each node, we have more options to consider. We set max_features to None and simply
815 select all features. Every tree can take advantage of them. In this case, there are no
816 restrictions on every tree. n_estimators represent the number of subtrees to build. More
817 subtrees can give the model better performance, but at the same time make the code
818 slower. We set the starting value to 1, the ending value to 100, increase by 5 each time,
819 and finally select n_estimators to 30. If this parameter is further increased, the
820 improvement effect of the model will not be significant. min_sample_leaf represents the
821 minimum sample leaf size. Smaller leaves make it easier for the model to capture noise
822 in the training data [32]. The starting value is 1 and the ending value is 100, increasing
93
94 Remote Sens. 2023, 15, x FOR PEER REVIEW 32 of 36
95
823 by 5 each time. As the min_sample_leaf parameter increases, the decision tree submodel
824 gradually transforms from a complex structure to a simple structure, and the classifier f1
825 score also gradually decreases. In order to maintain the classification performance of the
826 classifier while shortening the model training time, select the value of min_sample_leaf
827 as 10. In the same way, the value of min_samples_split is determined to be 20.
828 Parameter adjustment of KNN algorithm. In this section we choose Euclidean
829 distance to measure the distance between two samples. First set the parameters to initial
830 parameters. n_neighbors indicate the number of selected neighbors. We also use the grid
831 search method and grid_search is tuned to 9.
832 The logistic regression model is implemented using the LogisticRegression function
833 of the Sklearn library. The parameter penalty is the penalty term. In this article, we
834 choose L1 regularization.
835 Appendix B
layer parameter
Input_dim = 10*5000
RestNet Output_dim = 1*512
The activation function is Relu
Input_dim = 1*5000
Bi-GRU
Output_dim = 1*512
96
97 Remote Sens. 2023, 15, x FOR PEER REVIEW 33 of 36
98
dropout P=0.5
Input_dim = 1*1024
The activation function is softmax
Fully connected Optimizer:Adam
Batchsize:64
Epochs:30
850
851 Compared with using a fixed number of learning rates, research has found that
852 determining learning rate changes based on validation set performance is more effective.
853 The two parameters we adjust here are the initial learning rate and the factor used for
854 learning rate decay. From our experiments, the default learning rate that is too small will
855 lead to more local minima, while increasing the learning rate does not improve the
856 accuracy of the model. In some cases, increasing the patience value of a training session
857 can slightly improve accuracy, but also significantly increase the average number of
858 training runs. Through experimental comparison, we finally started training with a
859 learning rate of 0.001. This is the default value of the Adam optimizer. The network is
860 allowed to train for 5 epochs without improving validation accuracy (called the patience
861 value) before reducing the learning rate by a factor. The lowest learning rate is 0.00001,
862 and training is stopped after 10 epochs without improving validation accuracy. 30
863 epochs are used in all experimental settings. We save the best performing model in the
864 validation set, which can be reloaded to perform final classification on the test set.
865 References
866 1. Basyoni, L.; Fetais, N.; Erbad, A.; Mohamed, A.; Guizani, M. Traffic analysis attacks on Tor: a survey. In Pro -
867 ceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT),
868 2020; pp. 183-188.
869 2. Lashkari, A.H.; Gil, G.D.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of tor traffic using time based fea-
870 tures. In Proceedings of the International Conference on Information Systems Security and Privacy, 2017; pp.
871 253-262.
872 3. Cao, Z.; Li, Z.; Zhang, J.; Fu, H. A Homogeneous Stacking Ensemble Learning Model for Fault Diagnosis of
873 Rotating Machinery With Small Samples. IEEE Sensors Journal 2022, 22, 8944-8959, doi:10.1109/
874 JSEN.2022.3163760.
99
100 Remote Sens. 2023, 15, x FOR PEER REVIEW 34 of 36
101
875 4. He, Y.; Hu, L.; Gao, R. Detection of tor traffic hiding under obfs4 protocol based on two-level filtering. In Pro -
876 ceedings of the 2019 2nd International Conference on Data Intelligence and Security (ICDIS), 2019; pp. 195-
877 200.
878 5. Lingyu, J.; Yang, L.; Bailing, W.; Hongri, L.; Guodong, X. A hierarchical classification approach for tor anony-
879 mous traffic. In Proceedings of the 2017 IEEE 9th International Conference on Communication Software and
880 Networks (ICCSN), 6-8 May 2017, 2017; pp. 239-243.
881 6. Pandey, L. Lip Reading as an Active Mode of Interaction with Computer Systems; University of California, Merced:
882 2022.
883 7. Rimmer, V.; Preuveneers, D.; Juarez, M.; Van Goethem, T.; Joosen, W. Automated website fingerprinting
884 through deep learning. arXiv preprint arXiv:1708.06376 2017.
885 8. Bhat, S.; Lu, D.; Kwon, A.; Devadas, S. Var-CNN: A data-efficient website fingerprinting attack based on deep
886 learning. arXiv preprint arXiv:1802.10215 2018.
887 9. He, X.; Wang, J.; He, Y.; Shi, Y. A Deep Learning Approach for Website Fingerprinting Attack. In Proceedings
888 of the 2018 IEEE 4th International Conference on Computer and Communications (ICCC), 7-10 Dec. 2018,
889 2018; pp. 1419-1423.
890 10. Sirinam, P.; Imani, M.; Juarez, M.; Wright, M. Deep fingerprinting: Undermining website fingerprinting de-
891 fenses with deep learning. In Proceedings of the Proceedings of the 2018 ACM SIGSAC Conference on Com-
892 puter and Communications Security, 2018; pp. 1928-1943.
893 11. Lu, Y.; Cai, M.; Zhao, C.; Zhao, W. Tor Anonymous Traffic Identification Based on Parallelizing Dilated Con-
894 volutional Network. Applied Sciences 2023, 13, 3243.
895 12. Wang, M.; Li, Y.; Wang, X.; Liu, T.; Shi, J.; Chen, M. 2ch-TCN: A Website Fingerprinting Attack over Tor Us-
896 ing 2-channel Temporal Convolutional Networks. In Proceedings of the 2020 IEEE Symposium on Computers
897 and Communications (ISCC), 7-10 July 2020, 2020; pp. 1-7.
898 13. Wu, S.; Wang, Y. Attention-based Encoder-Decoder Recurrent Neural Networks for HTTP Payload Anomaly
899 Detection. In Proceedings of the 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications,
900 Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking
901 (ISPA/BDCloud/SocialCom/SustainCom), 30 Sept.-3 Oct. 2021, 2021; pp. 1452-1459.
902 14. Chen, H.Y.; Lin, T.N. The Challenge of Only One Flow Problem for Traffic Classification in Identity Obfusca -
903 tion Environments. IEEE Access 2021, 9, 84110-84121, doi:10.1109/ACCESS.2021.3087528.
904 15. Attarian, R.; Abdi, L.; Hashemi, S. AdaWFPA: Adaptive online website fingerprinting attack for tor anony-
905 mous network: A stream-wise paradigm. Computer Communications 2019, 148, 74-85.
102
103 Remote Sens. 2023, 15, x FOR PEER REVIEW 35 of 36
104
906 16. Xie, X.; Zhang, X.; Fu, J.; Jiang, D.; Yu, C.; Jin, M. Location recommendation of digital signage based on multi-
907 source information fusion. Sustainability 2018, 10, 2357.
908 17. Džeroski, S.; Ženko, B. Is combining classifiers with stacking better than selecting the best one? Machine learn-
909 ing 2004, 54, 255-273.
910 18. Xian, S.; Li, T.; Cheng, Y. A novel fuzzy time series forecasting model based on the hybrid wolf pack algo-
911 rithm and ordered weighted averaging aggregation operator. International Journal of Fuzzy Systems 2020, 22,
912 1832-1850.
913 19. Wang, S.; Wang, X.; Guo, X. Advanced Face Mask Detection Model Using Hybrid Dilation Convolution Based
914 Method. Journal of Software Engineering and Applications 2023, 16, 1-19.
915 20. Luo, Z.; Zhu, J.; Li, Z.; Liu, S. Research the Method of Joint Segmentation and POS Tagging for Tibetan Using
916 BiGRU-CRF. In Proceedings of the Proceedings of the 2020 3rd International Conference on Algorithms, Com-
917 puting and Artificial Intelligence, 2020; pp. 1-6.
918 21. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate
919 shift. In Proceedings of the International conference on machine learning, 2015; pp. 448-456.
920 22. Xu, J.; Wang, J.; Qi, Q.; Sun, H.; He, B. DEEP NEURAL NETWORKS FOR APPLICATION AWARENESS IN
921 SDN-BASED NETWORK. In Proceedings of the 2018 IEEE 28th International Workshop on Machine Learning
922 for Signal Processing (MLSP), 17-20 Sept. 2018, 2018; pp. 1-6.
923 23. Zhu, S.; Xu, X.; Gao, H.; Xiao, F. CMTSNN: A Deep Learning Model for Multiclassification of Abnormal and
924 Encrypted Traffic of Internet of Things. IEEE Internet of Things Journal 2023, 10, 11773-11791, doi:10.1109/
925 JIOT.2023.3244544.
926 24. Baek, I.; Kim, S.B. 3-Dimensional convolutional neural networks for predicting StarCraft Ⅱ results and extract -
927 ing key game situations. Plos one 2022, 17, e0264550.
928 25. Zhioua, S. Tor traffic analysis using hidden markov models. Security and Communication Networks 2013, 6,
929 1075-1086.
930 26. Panchenko, A.; Mitseva, A.; Henze, M.; Lanze, F.; Wehrle, K.; Engel, T. Analysis of fingerprinting techniques
931 for Tor hidden services. In Proceedings of the Proceedings of the 2017 on Workshop on Privacy in the Elec-
932 tronic Society, 2017; pp. 165-175.
933 27. Wang, Y.; Xu, H.; Guo, Z.; Qin, Z.; Ren, K. snWF: Website Fingerprinting Attack by Ensembling the Snapshot
934 of Deep Learning. IEEE Transactions on Information Forensics and Security 2022, 17, 1214-1226, doi:10.1109/
935 TIFS.2022.3158086.
936 28. Sirinam, P. Website fingerprinting using deep learning; Rochester Institute of Technology: 2019.
105
106 Remote Sens. 2023, 15, x FOR PEER REVIEW 36 of 36
107
937 29. Wang, T.; Huang, Z.; Wu, J.; Cai, Y.; Li, Z. Semi-Supervised Medical Image Segmentation with Co-Distribu -
938 tion Alignment. Bioengineering 2023, 10, 869.
939 30. Zhang, X.; Hu, D.; Li, S.; Luo, Y.; Li, J.; Zhang, C. Aircraft Detection from Low SCNR SAR Imagery Using Co -
940 herent Scattering Enhancement and Fused Attention Pyramid. Remote Sensing 2023, 15, 4480.
941 31. Zhang, Y.; Deng, Q.; Liang, W.; Zou, X. An efficient feature selection strategy based on multiple support vec-
942 tor machine technology with gene expression data. BioMed research international 2018, 2018.
943 32. Zhang, Z. Data Sets Modeling and Frequency Prediction via Machine Learning and Neural Network. In Pro-
944 ceedings of the 2021 IEEE International Conference on Emergency Science and Information Technology (ICE-
945 SIT), 22-24 Nov. 2021, 2021; pp. 855-863.
946 33. Zhou, B.; Yin, Y.; Wang, M.; Zhang, R.; Zhang, Y.; Guo, W. Identification of Strong Motion Record Baseline
947 Drift Based on Bayesian Optimized Transformer Network. 2023.
948
108