Vehicle Network Intrusion Detection Based On K-Nearest Neighbor Variational Autoencoder Using Contrastive Learning
Vehicle Network Intrusion Detection Based On K-Nearest Neighbor Variational Autoencoder Using Contrastive Learning
Contrastive Learning
Chenyun Duan1 , Lei Du1,2 , Liyi Zeng2,* , Zhaoquan Gu1,2,*
1
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
2
Department of New Networks, Pengcheng Laboratory, Shenzhen, China
[email protected], {dul,zengly}@pcl.ac.cn, [email protected]
123
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.
the car, which may cause serious threats and significant
losses.
C. Problem Formulation
To perform network intrusion detection on the CAN bus
system, our main focus is on processing the characteristics of
attack traffic and using models to identify it. In this paper,
we process common attacks and benign traffic on CAN bus
to obtain traffic characteristics as inputs to the model. After
processing by the model, we obtain traffic characteristics
and use them to determine the type of traffic and obtain
Fig. 3: The specific situation of DoS attacks, obfuscation outputs. Specifically, for the input traffic information pi , we
attacks, and spoofing attacks injection attacks first process it to obtain the original traffic features xi , then
use the model to process the hidden features zi of the traffic,
and use zi to perform traffic detection to obtain the final traffic
to determine the processing priority when multiple pieces of
label yi .
information are input simultaneously. Information with more
The recognition process for known and unknown classes is
leading zeros in the ID has a higher priority. DoS attacks
different. We first use the hidden features zi obtained from
against vehicle networks take advantage of this property, which
the above model, combined with the features of the classes
will be mentioned later in this paper.
obtained from the training set, and use k-nearest neighbors
B. Attack Types to obtain the known class labels yi of the traffic. Then, we
use the hidden features zi and the known class yi as inputs to
The CAN bus system may be subject to many types of
obtain the decoded original features x′i through the model. We
injection attacks, such as DoS attacks, obfuscation attacks,
use the difference between x′i and xi to determine whether the
spoofing attacks, etc. This is because the CAN bus system is
recognized known class label yi is correct. If it is correct, it
based on broadcasting and adopts a message priority scheme
means it is a known class yi , and if it is incorrect, it means it
without encryption and authentication mechanisms. In this
is an unknown attack.
case, it is easy to carry out injection attacks by modifying
the CAN ID, and the specific characteristics of each injection III. M ETHODOLOGY
attack are shown below, as shown in Fig. 3:
In this section, we consider how to obtain features of classes
• DoS attack: In a short period of time, by utilizing the
with a more concentrated distribution after VAE encoding, in
priority mechanism of messages, injecting information order to achieve the accuracy of detection results. When trying
with very high ID numbers, other legitimate information to identify and classify samples, a good distance function is
cannot be processed promptly. The most common attack often needed to measure the distance difference between the
is the injection of an ID number of “0x0000”, in which detected samples and the trained sample classes. However,
legitimate messages with normal ID numbers are delayed due to the sparsity and uneven distribution of samples in
in processing, as shown in Fig. 3(a). high-dimensional data, the concept of distance begins to
• Fuzzy attack: Fuzzy attack is the injection of many ID
lose effectiveness due to the curse of dimensionality [10].
numbers and random data messages in a short period of Therefore, direct detection may lead to poor detection results
time. Compared to DoS attacks, the message ID injected and an inability to distinguish sample boundaries well. To
by fuzzy attacks is random, mainly using malicious address the above problems, we introduce a novel model,
information to cause car malfunctions. For example, in named VAEsK. We utilize contrastive loss to replace the
normal messages, passing in some random malicious original loss calculation method and employ distance for class
messages may cause errors in the originally coherent and discrimination. This model effectively leverages existing class
complete actions of the car, as shown in Fig. 3(b). labels to minimize intra-class distance and maximize inter-
• Deception attack (gear/RPM): A deception attack is a
class distance during training, thus achieving the goal of
situation where an attacker attempts to impersonate an identifying network traffic categories. The overall architecture
operator and issue commands to a car for execution over of the model is shown in Fig. 4.
a period of time. Compared to the above two types of We propose a VAEsK model based on contrastive learning
attacks, deception attacks have a stronger purpose. For for intrusion detection of vehicular traffic. The model aims
example, if the ID number corresponding to the throttle to use the VAEsK model for detecting known and unknown
control is 0x0123, the attacker can forge the information vehicular traffic. The model mainly consists of the known
of that ID number and control the vehicle’s throttle, as traffic detection stage and the unknown traffic detection stage.
shown in Fig. 3(c). Through the above attack methods, Among them, it is known that the traffic detection classi-
attackers can transmit incorrect information in the CAN fication utilizes the VAE model under contrastive learning,
bus system and cause related problems or malfunctions in and uses the obtained hidden features and training data to
124
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.
Fig. 4: The overall architecture of VAEsK
perform k-nearest neighbor on the test data to obtain traffic The loss is divided into two parts: one is the reconstruction
labels. The classification of unknown traffic detection utilizes loss of the data reconstructed by the VAE model, and the other
the labels obtained during the known traffic detection stage is the distance loss of the same and different classes calculated
and calculates the reconstruction error using test data and the using the contrastive learning approach in this paper. Formula
obtained labels. (2) represents the sum of all distances that are the same as
the sample label, while formula (3) represents the sum of
A. VAEsK based on contrastive loss and k-nearest neighbors all distances that are different from the sample label. In this
To achieve the above goals, we consider improving the loss paper, a distance threshold a is set. When the distance between
function of VAE. As shown in Fig. 5, we hope to improve samples with different labels is greater than this threshold, it
the loss function to achieve changes in class distance in the can be understood that due to the distance being too far, it
latent space, and obtain a classification boundary that improves does not affect the calculation of contrastive loss, that is, the
the detection accuracy rate. This will make the data of the distance is set to 0. Finally, when calculating the loss in this
same class labels as concentrated as possible and the data of paper, α weight value ω is given to the contrastive loss after
different class labels as dispersed as possible. calculation, and the weight of the contrastive loss is controlled
The calculation of the original loss function of the VAE through this hyperparameter ω.
model [3] is as follows :
L(θ, φ, x, x′ ) = min Ex ∥x − x′ ∥ (1) After minimizing such losses, the obtained loss value not
θ,φ
only takes into account the losses caused by model recon-
Among them, θ is the parameter of the encoder, and α is struction itself but also takes into account the impact of its
the parameter of the decoder. x ∈ Rq is the data samples categories and labels on the generation of potential features in
input in the training set and x′ ∈ Rq is the data samples the model. Therefore, by reducing the distance of the potential
input in the training set. We use formula (1) to calculate features trained by the encoder, we can attempt to make
the loss of the model and continuously update the model. In distance judgments by calculating the Euclidean distance.
order to achieve the above ideas, we use contrastive learning However, it should be noted that we cannot determine whether
to calculate the loss, which updates the model to achieve the class feature space extracted by the model has a clear
better classification performance by calculating the losses of class center (the class center of a class may not be distributed
the same and different labels in the same batch of data. By within it), which could potentially cause issues when using
using this loss calculation method, we can reduce the distance the class center method for judgment. Therefore, considering
between samples of the same class and increase the distance the principle of local sample distribution, we opt to use the
between samples of different classes, thereby achieving better k-nearest neighbor method to obtain the corresponding test
classification results. Specifically, when training the encoder labels. It should be noted that the k-nearest neighbor classifica-
f and decoder h of the VAE model, the loss function of this tion method we use is more suitable for scenarios with a large
paper is as follows: number of training samples and may not be as effective with
X smaller sample sizes. The reason is straightforward: since we
ds = ∥xi − xj ∥ (2) classify samples based on the number of other samples around
xi xj the test sample, if there are too few training samples, the test
X sample may be misclassified due to an insufficient number
dd = max(0, α − ∥xi − xj ∥) (3)
of surrounding sample points. Therefore, when selecting and
xi xj
processing datasets, we ensure a sufficient number of samples
LC = min Ex ∥x − x′ ∥ (4) in the training set to prevent misclassification of test samples
θ,φ
X due to sample size issues, while also preserving the original
L(θ, φ, x, x′ ) = LC + ωExi xj (ds + dd ) (5)
class distributions as much as possible.
125
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.
includes a flag indicating whether it is an injection message.
Subsequently, we use the injection information flags in each
attack dataset to determine whether the traffic is injected attack
traffic or normal traffic.
B. Experiments Setup
126
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.
formulas: TABLE I: Known attack detection compared to other model
TP + TN algorithms
Accuracy = (6)
TP + TN + FP + FN Model Accuracy Compare to the best
TP
Precision = (7) Decision Tree 0.6800 -0.3176
TP + FP MLP 0.9933 -0.0043
TP XGBoost 0.9976 0
Recall = (8) RF 0.6390 -0.3586
TP + FN
VAEsK(ours) 0.9898 -0.0078
Precision · Recall
F1 score = 2 · (9)
Precision + Recall
TABLE II: Identification of various traffic on known network
D. Detection Results attacks
1) Results for known attack: To compare the performance
differences between the methods used in our experiment and Precision Recall F1-score Support
other machine learning algorithms, this paper also implements Normal 0.9867 0.9960 0.9914 34136
DoS 0.9997 0.9935 0.9966 7222
various commonly used network traffic recognition models and Fuzzy 0.9797 0.9513 0.9653 8536
compares them with the model proposed in this study. We use RPM 0.9984 0.9963 0.9973 13960
algorithms such as decision tree, multilayer perceptron (MLP),
extreme gradient boosting (XGBoost), and random forest to
compare with our experimental method in terms of known removes one type of attack from the training set each time
attack classification, as shown in Table I. and considers this attack as an unknown attack to be detected
From Table I, it is not difficult to see that in terms of together with the original test traffic in the test set. This
identifying known network attacks, the XGBoost model, MLP approach aims to create a detection dataset that resembles
model, and the method proposed in this paper have almost an open set, where the model identifies both known and
the same recognition accuracy, reaching over 98.9% and far unknown attacks, simulating real-world conditions where new
higher than other recognition algorithms. This indicates that threats may arise. For example, if the unknown attack in this
the model and recognition method proposed in this paper can paper is DoS, then the training set in this paper includes
effectively solve the problem of identifying known network benign traffic, Fuzzy, gear, and RPM, while the test set in
traffic in vehicle networks, and also provide a certain possibil- this paper includes the four types of traffic mentioned above
ity for detecting unknown network attack traffic in the future. and the traffic corresponding to DoS for unknown attacks.
Compared to tree-based decision tree models and random Table III shows the detection performance of DoS, Fuzzy,
forest models, the recognition performance is relatively poor, and gear as unknown network attacks. The paper utilizes
with an accuracy rate of less than 70%. The possible reason the improved reconstruction error detection method to detect
for this is that the preprocessing method of the data results in unknown network attacks. The paper mainly presents three
high dimensions for each traffic feature. For excessively high detection indicators: Precision, Recall, and F1 score. For the
dimensions, it is not suitable to use tree structure algorithms sake of convenience, this paper abbreviates the three indicators
for attack recognition, which may result in overfitting and low as Prec, Rec, and F1.
accuracy. From the figure, it can be seen that whether it is a known
Therefore, in terms of identifying known network attacks, attack or an unknown attack, the results of each indicator are
the method used in this paper has high accuracy, and its other above 87%, and the F1 score corresponding to the gear attack
evaluation indicators in fine-grained classification also have is the highest, above 93.4%. In contrast, the recall rate of
good results. This paper takes normal traffic, DoS, Fuzzy, and Fuzzy and the accuracy rate of DoS are 0.9462 and 0.9692,
RPM as examples of known attacks, as shown in Table II. respectively, but this result is acceptable in this paper because
From the table, it is evident that the model presented in this we did not use any unknown attack labels during training. Of
paper has achieved a high level of recognition in terms of course, through observation in this paper, it can also be found
accuracy, recall, and F1 values. The recognition accuracy for that when identifying unknown network attacks, the model
each attack type has reached an average of over 97.9%, and the balances between known and unknown attacks.
recall and F1 scores have also exceeded 99% for most attack Besides, we compare the performance of unknown at-
types, except for Fuzzy attacks. This demonstrates that the tack detection with two baseline methods, including Self-
model can effectively distinguish and finely classify different supervised[12] and CAAE[11]. The first method uses self-
types of known vehicular traffic. Achieving a high level of supervised learning, which is trained with both normal and
known network attack fine classification detection provides a generated data, and additional RPM data is also used to
solid foundation for the future detection of unknown network improve the results. The second method is to use convolu-
attacks. tional data processing, based on Autoencoder and generating
2) Results for unknown attack: Due to the limited amount adversarial samples, to detect unknown network attacks. As
of traffic information and attack types in vehicle network shown in Table IV, the bold font represents the most effective
traffic data, in order to detect unknown attacks, this paper evaluation metric for each attack. We can see that when the
127
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.
TABLE III: The detection performance of unknown network attacks
Unknown result Known result
Unknown attack
Prec Rec F1 Prec Rec F1
DoS 0.9517 0.9454 0.9486 0.9577 0.9592 0.9571
Fuzzy 0.9692 0.8883 0.9270 0.9684 0.9772 0.9718
gear 0.9476 0.9631 0.9553 0.9121 0.8860 0.8978
DoS attack is an unknown attack, the model recognition in this the corresponding specific data may also be affected by the dis-
paper is not as good as the two methods compared; however, tribution. Common methods of using VAE include extending
for Fuzzy attacks, the accuracy and F1 score of our model Variational Autoencoder (VAE) to cooperative filtering with
are the highest, reaching 96.9% and 92.7% respectively, and implicit feedback[13], using Variational Autoencoder (VAE)
have obvious advantages; For gear attacks, although the model for the detection of anomaly attacks[14]. However, when using
results in this paper are not particularly outstanding, they are commonly used VAE models for traffic recognition, they may
still at a relatively high level. This result is acceptable because not be able to extract hidden features well, resulting in less
this paper only uses an 11 bit base ID as input for detection, compact hidden features and poorer detection performance.
and many more effective methods use the common 29 bits of Therefore, it is necessary to improve the extraction of hidden
base ID and extended ID as input for detection. When using a features.
series of timestamp continuous traffic as input for data frames,
this paper uses fewer features. B. Intrusion Detection In CAN bus System
V. R ELATED W ORK Due to the non encryption nature of CAN bus system traffic,
In this section, we introduce some related research work with the continuous development of intelligent vehicles, the
from the perspectives of models and algorithms, mainly in- traffic detection technology of the CAN bus system on the
cluding the Variational Autoencoder and the current status of in vehicle network of intelligent vehicles is also constantly
intrusion detection in the CAN bus system. updated and improved. In terms of known attack detection
for CAN bus traffic, an intrusion detection technique based
A. Variational Autoencoder on deep convolutional neural network (DCNN) has achieved
Variational Autoencoder (VAE) is a model based on AE, lower false negative and error rates in traffic detection [4].
which has undergone some improvements. It provides a M. Müter et al.[9] proposed IDS intrusion detection for
formula that relates the obtained results to probability [3]. vehicular network traffic in different attack scenarios, and
Autoencoder (AE) and Variational Autoencoder (VAE) both detected the feedback of IDS on attacks in different attack
consist of two parts: encoder and decoder. VAE uses two scenarios. Afterward, in order to cope with adversarial attacks
neural networks to establish a probability density distribution on in vehicle networks, Seo et al.[5]proposed a method based
model: one is used to perform variational inference on the on generative adversarial networks to train ML based IDS,
original input data, generating the variational probability dis- demonstrating the importance of solving adversarial situations.
tribution of hidden variables, which is the inference network; Martinelli et al.[6] employed four k-nearest neighbor (KNN)
another approach is to restore the approximate probability classifiers to distinguish between four types of attacks aimed
distribution of the generated raw data based on the generated at the CAN bus. These algorithms encompass two variations
latent variable variational probability distribution, which is the of fuzzy rough KNNs, named the discord classifier and a fuzzy
generative network. That is to say, the posterior distribution unordered rule induction algorithm. From the above research,
of latent variables can be approximated as the calculation it can be seen that the detection of known vehicle traffic
probability, which is the generation network. Furthermore, has achieved high accuracy. However, with the development
since the generator model utilizes a probability distribution, the of in vehicle networks, attacks against them are constantly
Variational Autoencoder (VAE) are easier to find the distribu- changing, and how to detect unknown in vehicle network
tion of data compared to the Autoencoder (AE), and of course, attacks is also an important part. Hoang et al. [7]proposed
128
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.
a convolutional adversarial autoencoder based model for in- 2017 IEEE International Conference on Fuzzy Systems
trusion detection of known and unknown attacks, achieving (FUZZIEEE’17), 2017, CA, 1–7.
detection of unknown attacks with a high F1 score and low [7] Hoang, Thien-Nu, Kim, Daehee., ”Detecting in-vehicle
error rate. Li et al. [8]proposed a multi-level hybrid IDS to intrusion via semi-supervised learning-based convolu-
detect known and unknown attacks on vehicular networks, and tional adversarial autoencoders”,Vehicular Communica-
achieved high detection accuracy. There are still a series of tions, 2022, pp. 100520.
problems such as low accuracy and high false positive rate in [8] Yang, Li and Moubayed, Abdallah, Shami, Abdallah,
the detection of unknown vehicle network attacks, which need ”MTH-IDS: A Multitiered Hybrid Intrusion Detection
further research and solutions. System for Internet of Vehicles”, IEEE Internet of Things
Journal, 2022, pp. 616-632
VI. C ONCLUSION [9] M. Müter and N. Asaj, ”Entropy-based anomaly de-
In this paper, we study the field of unknown attack detection tection for in-vehicle networks”,2011 IEEE Intelligent
in the field of vehicular networks and provide a detailed Vehicles Symposium (IV), 2011, pp. 1110-1115.
explanation of the characteristics of vehicular network traffic [10] Zimek A, Schubert E, Kriegel H P. ”A survey on unsu-
formats, as well as related attack methods and formats. We pervised outlier detection in high-dimensional numerical
improve the loss calculation method of VAE in order to data”, Statistical Analysis and Data Mining, 2012.
detect the vehicular networks’ attacks. The proposed VAEsK [11] HOANG T N, KIM D. ”Detecting in-vehicle intrusion
model achieves outstanding results on vehicular network traffic via semi-supervised learning-based convolutional adver-
datasets and attains high detection accuracy, verifying the sarial autoencoders”, Vehicular Communications, 2022,
effectiveness of this model and providing new insights and pp. 100520.
solutions for the improvement and development of unknown [12] H.M. Song, H.K. Kim. ”Self-supervised anomaly detec-
network attack detection technology in the future, particularly tion for in-vehicle network using noised pseudo nor-
in the automotive industry. However, the current detection mal data”, IEEE Trans. Veh. Technol., 2021, vol.70,
methods still face some challenges. For instance, they may no.2,pp.1098–1108.
incorrectly identify some benign traffic as unknown traffic or [13] Liang, D., Krishnan, R. G., Hoffman, M. D., Jebara, T.
some unknown traffic as known, which can impact the prac- ”Variational Autoencoders for Collaborative Filtering.”
tical application of the detection method. Therefore, further Proceedings of the 2018 World Wide Web Conference.
exploration and refinement of the detection method are needed International World Wide Web Conferences Steering
to enhance its accuracy and reliability. This will be a focus Committee, 2018, pp. 689–698.
of future work, where we aim to develop more robust tech- [14] Zavrak, S., İskefiyeli, M. ”Anomaly-Based Intrusion De-
niques for distinguishing between known and unknown traffic, tection From Network Flow Features Using Variational
thereby improving the overall performance and applicability of Autoencoder.” IEEE Access, 2020,vol. 8, pp. 108346-
the detection method in real-world scenarios. 108358.
ACKNOWLEDGEMENT
R EFERENCES
[1] G. Leen, D. Heffernan, ”Expanding automotive electronic
systems”, Computer35, 2002, pp. 88–93.
[2] BOSCH. ”CAN Specification Version 2.0”. 1991.
[3] Kingma D P, Welling M. ”Auto-encoding variational
bayes”, arXiv preprint arXiv:1312.6114, 2013.
[4] Hyun Min Song, Jiyoung Woo, Huy Kang Kim, ”In-
vehicle network intrusion detection using deep convolu-
tional neural network”,Vehicular Communications, 2020,
vol.21, ISSN 2214-2096.
[5] Seo, Eunseong and Kim, Jeongeun and Lee, Wook and
Seok, Junhee, ”Adversarial Attack of ML-based Intrusion
Detection System on In-vehicle System using GAN”,
International Conference on Ubiquitous and Future Net-
works (ICUFN), 2023, pp. 700-703.
[6] Fabio Martinelli, Francesco Mercaldo, Vittoria Nar-
done, Antonella Santone, ”Car hacking identification
through fuzzy logic algorithms”, Proceedings of the
129
Authorized licensed use limited to: R V College of Engineering. Downloaded on April 26,2025 at 21:20:47 UTC from IEEE Xplore. Restrictions apply.