LogPrompt_A_Log-based_Anomaly_Detection_Framework_Using_Prompts
LogPrompt_A_Log-based_Anomaly_Detection_Framework_Using_Prompts
Abstract—Log data are widely used in anomaly detection tasks (1) Learning from the whole labeled log data requires
of software system. At present, log anomaly detection methods
2023 International Joint Conference on Neural Networks (IJCNN) | 978-1-6654-8867-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/IJCNN54540.2023.10191948
detection technology. Log data are closely related to network LogPrompt, which leverages prompts to enable the PLM
security and big data. Software systems usually record op- to learn more about the representations of logs. Com-
eration and status information by printing console logs. A pared with training a model from scratch, PLM provides
complex software system may generate many logs. When good parameter initialization, which can reduce resource
anomaly occurs, log data can help operation and maintenance consumption. Even with few training data, LogPrompt
(O&M) personnel discover system failure in time. Therefore, achieves good detection performance.
log data are widely used in log anomaly detection task. • Semantic and sequential tokens are comprehensively
Traditional log data analysis and anomaly detection are considered and embedded to help PLM effectively and
performed manually by O&M personnel on the basis of efficiently detect point and conditional anomalies.
their professional knowledge. However, manual log analysis • Focal loss is used to replace cross entropy loss, which
cannot meet current requirements because of massive and alleviates the class imbalance of real-world log data.
unstructured system log data. Therefore, log anomaly detection Thus, the evaluation metrics are improved.
based on deep learning has become an important research II. R ELATED W ORK
trend. However, log anomaly detection tasks need to consider
A. Deep Learning-based Log Anomaly Detection
the following issues:
Du et al. [2] proposed DeepLog, which leverages long short-
* Corresponding author. term memory (LSTM) to capture the execution sequence of
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
normal logs in training stage, then determine whether the log Prompt tuning mainly includes three processes: prompt
sequences are abnormal in anomaly detection stage. engineering, answer searching and answer mapping. Template
Weibin Meng et al. [3] proposed LogAnomaly, which also engineering refers to the process of designing prompt tem-
uses LSTM model. They also proposed a word embedding plates, which can help PLMs ”recall” what they ”learned”
method called Template2Vec that considers synonym and during pre-training. It is a technical approach to activate the
antonym information. They aimed to learn semantic similarity knowledge of PLMs. Prompt templates mainly include discrete
between log keywords and templates, and to solve the online templates (e.g., PET [11]) and continuous templates (e.g., P-
learning problem of new templates. tuning [12]), both of which contain a [M SK] token. Answer
Xu Zhang et al. [4] proposed LogRobust, which extracts searching refers to searching the token that involves in the
the semantic information of log events and represents it as vocabularies of PLM with the highest probability of corre-
a semantic vector. They use an attention-based bidirectional sponding [M SK] in the prompt template; answer mapping
LSTM (Bi-LSTM) model to capture contextual information in maps the prediction of [M SK] into label, which is normal or
log sequences and automatically learn the representations of anomaly in log anomaly detection task. The mapping function
different log events. Thus, LogRobust can identify and handle is also called verbalizer. Thus, the process of answer mapping
unstable log events and sequences. is also called verbalizing.
At present, large-scale language models are widely used in
the field of natural language processing (NLP), and they show III. D EFINITIONS AND TASK D ESCRIPTION
good performance in various NLP tasks. Jacob Devlin et al.
[5] proposed BERT in 2017, which is based on bidirectional A. Definitions
Transformer encoder for pre-training. Inspired by BERT, Haix- 1) Log: A log is also called a log entry, which is typically
uan Guo et al [6] proposed LogBERT based on BERT, which printed to a console or file by log print statements in programs.
captures the sequential information by learning from normal Most logs contain information such as timestamp, log level,
log sequences. and log content. Log content consists of constants and vari-
ables. The content printed by the same print statement is the
B. Pre-trained Language Model
same. The variable is also called parameter, and it reflects the
Pre-trained language model learns from some general tasks variable information of the running system. Under different
and changes the weights of some parameters in the model, and states or activities, the variable may be different. In general,
then transfers the model to other downstream tasks for further log content is simply referred to as log. If the log content is
learning, so that it can further adapt to downstream tasks. segmented by certain delimiters (e.g., blank space), then each
A great number of studies have shown that language mod- element obtained is called a token. The log length usually
els pre-trained on large corpora can learn general language refers to the number of tokens. We can represent a log as:
representations and provide better model initialization of pa-
rameters, which not only improves the generalization ability L = {t1 , t2 , . . . , tm } (1)
of models, but also speeds up the convergence of target task.
In addition, pre-training saves a lot of resources by avoiding where L represents a log, ti represents the i-th token, and m
training a new model from scratch. represents the log length.
BERT [5] is a widely used pre-trained language model, it 2) Log sequence: A log sequence is usually composed of
utilizes the mask language model (MLM), which predicts the logs in chronological order. The length of a log sequence
masked tokens in a sentence based on Transformer encoder. usually refers to the number of log entries contained in a log
RoBERTa [7] and ALBERT [8] are the improvements of sequence. Therefore, we can represent a log sequence as:
BERT. RoBERTa mainly changes static mask to dynamic
mask. ALBERT compresses the model size and reduces the SL = {L1 , L2 , . . . , Ln } (2)
number of parameters through cross-layer parameter sharing
where SL represents a log sequence, Li represents the i-th
and factorized embedding parameterization.
log, and n represents the length of the log sequence.
C. Prompt Tuning
B. Task Description
PLMs are usually pre-trained on large corpora, but the se-
mantics in logs may differ greatly from those in the pre-trained Given a log sequence SL = {L1 , L2 , . . . , Ln }, we hope to
tasks because of their diverse training objectives. Prompt detect whether the log sequence reflects point anomalies or
tuning [9] is a method to adapt downstream tasks to the PLM. conditional anomalies that occur in the system. Specifically, a
The PLM can better understand the log anomaly detection task point anomaly refers to existing Li ∈ L, and Li can indicate
through prompt tuning. Moreover, prompt tuning is suitable for that the system occurs an anomaly. A conditional anomaly
few-shot learning which can significantly improve the learning refers to existing SL′ = {Li , Li+1 , . . . } ⊆ SL , and SL′ can
capabilities for machine intelligence and practical adaptive indicate that the system occurs an anomaly. The objective
applications by accessing only a small number of labeled of log anomaly detection task is to detect whether anomalies
examples [10]. occurs in the running system by analyzing logs.
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
Verbalizer
(5) Log anomaly
(4) Prompt-based test detection
tuning on PLM size
Normal Normal
running
Focal Loss error Anomaly
Verbalizing
failure
Semantic Embedding
Sequential Embedding
Template Embedding [CLS] semantic Receiving ... allocateBlock sequential bbb51b95 3d91fa85 it is [MSK]
Prompt Repository
Tokenizer Templates
① semantic <SEM> sequential <SEQ> it is [MSK]
② <SEM> <SEQ> it is [MSK]
③ <SEM> <SEQ> normal or anomaly ? [MSK]
④ <SEM> <SEQ> h[PRO] h[PRO] [MSK]
(1) Log preprocessing Structured Label words
Logs Anomaly: error, failure
Normal: test, size
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
2) Label Words and Verbalizer: Label words Vlabel are of prompt template T , respectively. Then, the prompt sequence
constructed manually, and they can represent normal and xin is obtained:
abnormal meanings. The selected label words should be in the xin = {[P RO]1 , ..., xsem , [P RO]i ,
vocabulary V of PLM L. Then, a mapping called verbalizer (7)
that maps the label words to the class Y = {0, 1} (0 represents ..., xseq , [P RO]j , ..., [M SK]}
normal, 1 represents anomaly) can be manually constructed. Then, the PLM L predicts the [M SK] in xin , that is,
Formally, the verbalizer can be represented as: P ([M SK] = v|xin ) is calculated to indicate which word v
in Vlabel is the best substitute for [M SK]. Finally, the word
M :v→y (6) v with the highest probability is selected, and the result y is
obtained according to the mapping M :
where v represents the word in the label words, that is, v ∈ V ,
and y represents the category of log anomaly detection task, y = M (arg max P ([M SK] = v|xin ) (8)
that is, normal or anomaly. For example, label words such v∈V
as ”error” and ”failure” can be used to indicate anomalies, For example, a log semantic sequence and an
and the other words unrelated to anomalies such as ”test” and execution sequence are represented as xsem =
”size” can be used to indicate normal instances. {Receiving, block, src, dest, N ameSystem, allocateBlock}
and xseq = {bbb51b95, 3d91f a85}, and they
C. Embeddings are filled into the given prompt template T =
{semantic, <SEM >, sequential, <SEQ>, it, is, [M SK]}
Logs are unstructured text data. Thus, many works use then obtain xin = {semantic, xsem , sequential, xseq ,
word embeddings to represent the semantic information of it, is, [M SK]}. PLM L will embed xin and
logs. Word2Vec [15] is a non-contextual embedding method predict the [M SK] token. We define verbalizer as
and mainly includes skip-gram and continuous bag of words. M = {{error, f ailure} → 1, {test, size} → 0}. If
It has two main limitations. The first limitation is that the the prediction of [M SK] is ”error”, then the verbalizer maps
embeddings are static, which means the embeddings for a it to anomaly label. If the prediction of [M SK] is ”test”,
word is always the same regardless of its context. Thus, it then it is normal. This case study is shown in Fig.2.
fails to model polysemous words. For example, the word
”block” may refer to the data block in HDFS, or refer to E. Loss Function
preventing something from happening. However, Word2Vec In the training stage, PLM learns the words represented as
fails to distinguish between them. The second is the out- normal or anomaly through prompts and performs anomaly
of-vocabulary problem, which means it cannot represent the detection. For supervised training, loss function is used for
words that do not appear in the vocabulary. evaluating the performance of model and guiding model for
Language models, such as ELMo [16], GPT [17] and BERT update. The target of log anomaly detection task is to detect
[5] are proposed to address the limitations of non-contextual whether an anomaly occurs in the system through a log entry
embeddings. They use contextual embedding to capture the or a log execution sequence. In essence, log anomaly detection
semantic information of words in different contexts. Moreover, task is a binary classification task. Thus, most deep learning-
they can effectively alleviate the out-of-vocabulary problem by based models for log anomaly detection use cross entropy loss
pre-training on a large-scale corpus. as loss function. For log anomaly detection task, cross entropy
The contextual embeddings by PLMs are powerful. Thus, loss function is defined as follows:
we leverage BERT-based models to generate log semantic LCE = −ylog(p) − (1 − y)log(1 − p)
and sequential embeddings by inputting the log texts and
−log(p), if y = 1 (9)
log event execution sequences respectively. The main reason =
−log(1 − p), if y = 0
for modeling the two kinds of embeddings is that system
anomalies can be mainly divided into point and conditional where y represents the ground true label of a single log entry
anomalies [18]. Point anomalies are usually detected at the or a log event execution sequence, y = 1 represents the log is
semantic level of log texts, while conditional anomalies are abnormal, y = 0 represents the log is normal, and p represents
usually detected at the log event execution sequences. We the probability that the log is predicted as an abnormal log.
combine them to improve the robustness of LogPrompt. However, abnormal logs are rarer than normal logs in real
world. Thus, capturing the semantic words representing system
anomalies is more difficult. To alleviate the impact of class
D. Prompt Tuning imbalance, a weight factor α is added to cross entropy loss
Prompt-based tuning enables the PLM to better understand function to obtain the balance cross entropy loss function,
log anomaly detection task, learn the representations of normal which is defined as follows:
and abnormal logs, and distinguish them. LBCE = −yαlog(p) − (1 − y)(1 − α)log(1 − p)
First, the semantic sequence xsem and execution sequence
−αlog(p), if y = 1 (10)
xseq are filled into the placeholders <SEM > and <SEQ> =
−(1 − α)log(1 − p), if y = 0
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
1. Raw logs 3. Prompts
Date Time ... Event ID Event Template semantic Receiving block src
dest NameSystem allocateBlock
Receiving block <*> sequential bbb51b9 3d91fa85
08/11/09 20:35:19 ... bbb51b9 it is [MSK]
src: /<*> dest: /<*>
BLOCK*
08/11/09 20:35:20 ... 3d91fa85 NameSystem.allocateBl Anomaly Detection
ock: <*> <*> by PLM & Verbalizer
where y and p are represented as above, α is a weight factor texts, and conditional anomalies are detected by the log
and α ∈ [0, 1], which is used to adjust the imbalance between execution sequences.
normal logs and abnormal logs. • BGL dataset is an open dataset of logs collected from
We hope that the model can pay more attention to the logs the BlueGene/L supercomputer system at Lawrence Liv-
that are difficult to be classified, to improve the accuracy of ermore National Labs (LLNL). It contains fine-grained
classification. Focal loss [19] further improves balance cross labels, so it is widely used for log anomaly detection
entropy loss by adding modulating factors (1 − p) and p, and task. Compared with HDFS dataset, BGL dataset does
a focusing parameter γ, which not only alleviate the log class not have identifiers.
imbalance problem, but also make the model focus more on Logs are randomly sampled for each kind of datasets to
the hard-to-classify logs during training. Focal loss is defined build training sets that contains 10,000 log sequences for few-
as follows: shot learning since the whole datasets are so much larger.
LF = −yα(1 − p)γ log(p) − (1 − y)(1 − α)pγ log(1 − p) The whole datasets are used to evaluate the performance of
LogPrompt and compare it with other log anomaly detection
−α(1 − p)γ log(p),
if y = 1
= frameworks.
−(1 − α)pγ log(1 − p), if y = 0
(11) TABLE I
D ETAILED STATISTICAL INFORMATION OF THE DATASETS USED IN THE
where y, p and α are represented as above, γ is the adjustable EXPERIMENTS
focusing parameter.
Dataset # Training set (ano.) # Testing set (ano.)
V. E XPERIMENTS
HDFS 10,000 (298) 575,060 (16,838)
A. Datasets
BGL 10,000 (759) 4,747,963 (348,460)
Loghub [20] is an open-source platform that collects many
real-world log datasets, including supercomputer, distributed
and operating system logs. HDFS and BGL datasets are B. Experimental Design and Results
selected as experimental datasets in this paper. The performance of log anomaly detection is usually evalu-
• HDFS dataset is generated by MapReduce jobs (such ated by four metrics: precision, recall, F1 score and accuracy.
as distributed sorting and text scanning) running on 203 We conduct contrastive experiments to explore the influence
Amazon EC2 nodes. It includes 11,197,954 log entries, of different parameter settings and to compare with previous
about 2.9% of which are abnormal logs. Based on the work, and conduct ablation study to prove the effectiveness of
identifier block id in HDFS dataset, it can be grouped leveraging semantic and sequential information in LogPrompt.
into 575,060 log sequences. Since the anomalies in HDFS We put forward research questions (RQs) and answer these
dataset are labeled based on block id, it is necessary to RQs through experiments.
consider point anomalies and conditional anomalies that RQ1: What are the advantages of LogPrompt compared
may be included in the log event execution sequences of with other log anomaly detection frameworks?
the corresponding block id. In general, point anomalies To evaluate the performance of LogPrompt, we compared
can be detected by the semantic information of log it with four other log anomaly detection frameworks based
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
TABLE II
P ERFORMANCE OF L OG P ROMPT AND FOUR OTHER LOG ANOMALY DETECTION FRAMEWORKS
HDFS BGL
Framework
Precision↑ Recall↑ F1 score↑ Precision↑ Recall↑ F1 score↑
DeepLog 0.8844 0.6949 0.7734 0.8974 0.8278 0.8612
LogAnomaly 0.9415 0.4047 0.5619 0.7312 0.7609 0.7408
LogRobust 0.9830 0.9480 0.9460 0.8970 0.7920 0.8410
LogBERT 0.8702 0.7810 0.8232 0.8940 0.9232 0.9083
LogPrompt (SEM) 0.9832 0.7436 0.8459 0.9841 0.8741 0.9219
LogPrompt (SEQ) 0.6819 0.5783 0.5202 0.8842 0.8963 0.8902
LogPrompt (SEM + SEQ) 0.9965 0.9474 0.9713 0.9848 0.9753 0.9799
TABLE III
P ERFORMANCE EFFECT OF DIFFERENT PLM S ON L OG P ROMPT
HDFS BGL
PLM
Precision↑ Recall↑ F1 score↑ Accuracy↑ Precision↑ Recall↑ F1 score↑ Accuracy↑
BERT-base-uncased 0.9964 0.8911 0.9406 0.9949 0.9861 0.9367 0.9607 0.9941
BERT-large-uncased 0.9981 0.5909 0.7065 0.9813 1.0000 0.7542 0.8599 0.9817
RoBERTa-base 0.9975 0.8938 0.9426 0.9951 0.9843 0.9363 0.9597 0.9939
RoBERTa-large 0.9811 0.6982 0.8092 0.9857 0.9649 0.8555 0.9030 0.9867
ALBERT-base-v1 0.9968 0.8938 0.9419 0.9951 0.9872 0.9607 0.9737 0.9961
ALBERT-base-v2 0.9965 0.9474 0.9713 0.9963 0.9848 0.9753 0.9799 0.9970
on deep learning, namely, DeepLog, LogAnomaly, LogRobust information of the logs, most metrics are worse than those
and LogBERT. Tabel II shows the performance of these using both the semantic and sequential information of the logs.
different log anomaly detection frameworks evaluated on two RQ2: What is the effect of using different PLMs on the
log datasets. performance of log anomaly detection?
We compare six different BERT, RoBERTa and ALBERT
TABLE IV models to explore the impact of using different PLMs on the
D ESIGN OF PROMPTS performance. Table III shows their performances on two log
Prompt Template Label words
datasets.
Anomaly:”error” The performance of different PLMs on different datasets
Bi-LSTM <SEM > <SEQ> h[P RO] h[P RO] [M SK]
Normal:”normal” may be different. For example, RoBERTa-large has good
Anomaly:”error”
Manual-0 <SEM > <SEQ> it is [M SK]
Normal:”normal” performance on BGL dataset, but is has poor performance
Anomaly:”normal” on HDFS dataset. We also find that light models are better
Manual-1 <SEM > <SEQ> it is [M SK]
Normal:”error”
Anomaly:”yes” than large models. For example, BERT-base-uncased and
Manual-2 <SEM > <SEQ> anomaly ? [M SK]
Normal:”no” RoBERTa-base get better recalls than BERT-large-uncased
Anomaly:”error”
None <SEM > <SEQ> [M SK] and RoBERTa-large, respectively. Moreover, ALBERT-base-
Normal:”normal”
v1 and ALBERT-base-v2 have better performance than BERT
The experimental results of LogPrompt are under the op- and RoBERTa models, one possible reason is that they com-
timal parameter combination. ALBERT-based-v2 is used as press the model size and reduce the number of parameters
the PLM, the prompt encoder type is Bi-LSTM, and other through cross-layer parameter sharing and factorized embed-
parameters are set to the default value of ALBERT-based-v2. ding parameterization.
In the experiment, three different random seeds are set and RQ3: What is the effect of using different kinds of
three models are trained respectively, and each model evaluates prompts on the performance of log anomaly detection?
the testing set after training. Table II shows the mean value We use PLM ALBERT-base-v2 for experiments. On the
of metrics that LogPrompt evaluates on the testing sets of the one hand, continuous template is constructed automatically
three models. based on Bi-LSTM. On the other hand, discrete templates
Table II shows that LogPrompt exceeds most baseline mod- are constructed manually. Prompts are designed and shown
els on HDFS dataset. It is only slightly lower than LogRobust in Table IV and the experiment results are shown in Table V.
in recall. It outperforms all baseline models on BGL dataset. The experimental results show that the precision, F1 score
In addition, through ablation experiments, it is found that and accuracy of constructing continuous template automati-
if LogPrompt only uses one of the semantic or sequential cally based on Bi-LSTM are higher than that of constructing
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
TABLE V
P ERFORMANCE OF USING DIFFERENT KINDS OF PROMPTS
HDFS BGL
Prompt
Precision↑ Recall↑ F1 score↑ Accuracy↑ Precision↑ Recall↑ F1 score↑ Accuracy↑
Bi-LSTM 0.9965 0.9474 0.9713 0.9963 0.9848 0.9753 0.9799 0.9970
Manual-0 0.8982 0.9849 0.9112 0.9947 0.9680 0.9734 0.9702 0.9955
Manual-1 0.9863 0.8219 0.8960 0.9857 0.9863 0.8219 0.8960 0.9857
Manual-2 0.9759 0.9420 0.9586 0.9961 0.9733 0.9795 0.9764 0.9964
None 0.3440 0.8456 0.4891 0.9196 0.6320 0.0178 0.0346 0.9253
TABLE VI
P ERFORMANCE OF USING DIFFERENT KINDS OF LOSS FUNCTIONS
Metrics
Dataset Loss function
Precision↑ Recall↑ F1 score↑ Accuracy↑
Cross entropy loss 0.8982 0.9849 0.9112 0.9947
HDFS
Focal loss 0.9965 0.9474 0.9713 0.9963
Cross entropy loss 0.9966 0.8861 0.9359 0.9912
BGL
Focal loss 0.9848 0.9753 0.9799 0.9970
(a) Predictions of using cross entropy (b) Truths of using cross entropy loss (c) Predictions of using focal loss (d) Truths of using focal loss
loss
discrete templates manually (Manual-0,1,2) on HDFS dataset. The experimental results are showed in Table VI.
On BGL dataset, automatically constructing continuous tem- For HDFS dataset, precision, F1 score and accuracy are
plates outperforms manually constructing discrete templates in higher when using focal loss than when using cross entropy
F1 score and accuracy. In conclusion, constructing continuous loss, but recall is lower. For BGL dataset, although using
templates automatically is feasible and effective, and the cross entropy loss has a higher precision than focal loss, it
tedious procedures of manually constructing templates are is lower than using focal loss on recall and F1 score. Overall,
avoided. using focal loss can alleviate log class imbalance problem to a
Compared with Manual-0 and Manual-1, recalls and F1 certain extent and improve the overall performance of anomaly
scores decrease when the label word mapping is exchanged. detection.
Therefore, the model misunderstands the semantic information We visualize the outputted logits when evaluating on the
of the logs when the label words are used with the opposite testing set of BGL dataset, as shown in Fig.3.
meanings of the ground true labels. For example, the label Fig.3(a) shows the predictions of ALBERT-base-v2 trained
word ”error” might be interpreted as having a positive meaning with cross entropy loss. The upper left part of the line y = x
and classified as normal instance. is classified as anomaly, and the lower right part is classified
If [M SK] is predicted without other prompt tokens (None), as normal. Fig.3(b) shows the ground true labels of the logs.
then all metrics decrease significantly, which implies that We can observe that many anomalies are classified as normal.
prompt construction is crucial to log anomaly detection task. Fig.3(c) shows the predictions of ALBERT-base-v2 trained
RQ4: Does focal loss alleviate log class imbalance prob- with focal loss. The upper left part of the line y = x is
lem compared with cross entropy loss? classified as anomaly, and the lower right part is classified
We construct continuous template automatically based on as normal. Fig.3(d) shows the ground true labels of the logs.
Bi-LSTM, using cross entropy loss and focal loss on PLM Compared with cross entropy loss, logs of misclassification
ALBERT-base-v2 to conduct anomaly detection experiments. are greatly reduced.
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.
In conclusion, compared with using cross entropy loss, us- [15] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
ing focal loss can detect anomalies more accurately. Moreover, “Distributed representations of words and phrases and their composi-
tionality,” Advances in neural information processing systems, vol. 26,
the intra-class distance between normal and anomaly class 2013.
clusters is smaller and the inter-class distance is larger, which [16] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee,
greatly reduces the possibility of predicting anomalous logs as and L. Zettlemoyer, “Deep contextualized word representations,” in
Proceedings of NAACL-HLT, 2018, pp. 2227–2237.
normal logs. [17] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al.,
“Language models are unsupervised multitask learners,” OpenAI blog,
vol. 1, no. 8, p. 9, 2019.
VI. C ONCLUSION [18] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A
survey,” arXiv preprint arXiv:1901.03407, 2019.
This paper proposes LogPrompt, which is a prompt-based [19] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
log anomaly detection framework. It constructs prompts to for dense object detection,” in Proceedings of the IEEE international
guide the PLM to learn semantic and sequential information conference on computer vision, 2017, pp. 2980–2988.
[20] S. He, J. Zhu, P. He, and M. R. Lyu, “Loghub: a large collection of
in the logs, improving the evaluation metrics of log anomaly system log datasets towards automated log analytics,” arXiv preprint
detection task. Focal loss is used instead of cross entropy arXiv:2008.06448, 2020.
loss to guide model optimization during model training, for
alleviating the imbalance of normal and abnormal log samples.
Experiments show that LogPrompt can detect anomalies more
effectively and efficiently by learning semantic and sequential
information from prompts, even training with few log data.
R EFERENCES
[1] G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, “Deep learning for
anomaly detection: A review,” ACM Computing Surveys (CSUR), vol. 54,
no. 2, pp. 1–38, 2021.
[2] M. Du, F. Li, G. Zheng, and V. Srikumar, “Deeplog: Anomaly detection
and diagnosis from system logs through deep learning,” in Proceedings
of the 2017 ACM SIGSAC conference on computer and communications
security, 2017, pp. 1285–1298.
[3] W. Meng, Y. Liu, Y. Zhu, S. Zhang, D. Pei, Y. Liu, Y. Chen, R. Zhang,
S. Tao, P. Sun et al., “Loganomaly: Unsupervised detection of sequential
and quantitative anomalies in unstructured logs.” in IJCAI, vol. 19, no. 7,
2019, pp. 4739–4745.
[4] X. Zhang, Y. Xu, Q. Lin, B. Qiao, H. Zhang, Y. Dang, C. Xie, X. Yang,
Q. Cheng, Z. Li et al., “Robust log-based anomaly detection on unstable
log data,” in Proceedings of the 2019 27th ACM Joint Meeting on
European Software Engineering Conference and Symposium on the
Foundations of Software Engineering, 2019, pp. 807–817.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
[6] H. Guo, S. Yuan, and X. Wu, “Logbert: Log anomaly detection via bert,”
in 2021 International Joint Conference on Neural Networks (IJCNN).
IEEE, 2021, pp. 1–8.
[7] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[8] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut,
“Albert: A lite bert for self-supervised learning of language representa-
tions,” arXiv preprint arXiv:1909.11942, 2019.
[9] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-
train, prompt, and predict: A systematic survey of prompting methods
in natural language processing,” arXiv preprint arXiv:2107.13586, 2021.
[10] N. Zhang, S. Deng, Z. Sun, J. Chen, W. Zhang, and H. Chen, “Relation
adversarial network for low resource knowledge graph completion,” in
Proceedings of The Web Conference 2020, 2020, pp. 1–12.
[11] T. Schick and H. Schütze, “Few-shot text generation with natural lan-
guage instructions,” in Proceedings of the 2021 Conference on Empirical
Methods in Natural Language Processing, 2021, pp. 390–402.
[12] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang, “Gpt
understands, too,” arXiv preprint arXiv:2103.10385, 2021.
[13] M. Du and F. Li, “Spell: Streaming parsing of system event logs,” in
2016 IEEE 16th International Conference on Data Mining (ICDM).
IEEE, 2016, pp. 859–864.
[14] P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing
approach with fixed depth tree,” in 2017 IEEE international conference
on web services (ICWS). IEEE, 2017, pp. 33–40.
Authorized licensed use limited to: Zhejiang University. Downloaded on June 15,2024 at 15:29:21 UTC from IEEE Xplore. Restrictions apply.