0% found this document useful (0 votes)
100 views8 pages

Drain An Online Log Parsing Approach With Fixed Depth Tree

This document discusses an online log parsing approach called Drain. Drain can parse logs in a streaming manner without requiring source code or offline training. It uses a fixed depth parse tree to encode parsing rules and evaluate log messages as they are collected. The experimental results show that Drain has the highest accuracy on four datasets and comparable accuracy on the fifth dataset, while providing 51.85-81.47% improvement in running time over existing online parsers.

Uploaded by

redzgn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views8 pages

Drain An Online Log Parsing Approach With Fixed Depth Tree

This document discusses an online log parsing approach called Drain. Drain can parse logs in a streaming manner without requiring source code or offline training. It uses a fixed depth parse tree to encode parsing rules and evaluate log messages as they are collected. The experimental results show that Drain has the highest accuracy on four datasets and comparable accuracy on the fifth dataset, while providing 51.85-81.47% improvement in running time over existing online parsers.

Uploaded by

redzgn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2017 IEEE 24th International Conference on Web Services

Drain: An Online Log Parsing Approach with Fixed


Depth Tree
Pinjia He∗ , Jieming Zhu∗ , Zibin Zheng† , and Michael R. Lyu∗
∗ ComputerScience and Engineering Department, The Chinese University of Hong Kong, China
{pjhe, jmzhu, lyu}@cse.cuhk.edu.hk
† Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education

School of Data and Computer Science, Sun Yat-sen University, China


[email protected]

Abstract—Logs, which record valuable system runtime infor- logging.info()) written by developers. Thus, log analysis tech-
mation, have been widely employed in Web service management niques, which apply data mining models to get insights of sys-
by service providers and users. A typical log analysis based Web tem behaviors, are in widespread use for service management.
service management procedure is to first parse raw log messages
because of their unstructured format; and then apply data mining For service providers, there are studies in anomaly detection
models to extract critical system behavior information, which can [3], [4], fault diagnosis [5], [6] and performance improvement
assist Web service management. Most of the existing log parsing [7]. For service users, typical examples include business model
methods focus on offline, batch processing of logs. However, as mining [8], [9] and user behavior analysis [10], [11].
the volume of logs increases rapidly, model training of offline
log parsing methods, which employs all existing logs after log Most of the data mining models used in these log analysis
collection, becomes time consuming. To address this problem, techniques require structured input (e.g., an event list or a
we propose an online log parsing method, namely Drain, that matrix). However, raw log messages are usually unstructured,
can parse logs in a streaming and timely manner. To accelerate because developers are allowed to write free-text log messages
the parsing process, Drain uses a fixed depth parse tree, which in source code. Thus, the first step of log analysis is log
encodes specially designed rules for parsing. We evaluate Drain
on five real-world log data sets with more than 10 million raw parsing, where unstructured log messages are transformed into
log messages. The experimental results show that Drain has the structured events. An unstructured log message, as in the
highest accuracy on four data sets, and comparable accuracy following example, usually contains various forms of system
on the remaining one. Besides, Drain obtains 51.85%∼81.47% runtime information: timestamp (records the occurring time
improvement in running time compared with the state-of-the- of an event), verbosity level (indicate the severity level of
art online parser. We also conduct a case study on an anomaly
detection task using Drain in the parsing step, which determines an event, e.g., INFO), and raw message content (free-text
the effectiveness of Drain in log analysis. description of a service operation).
Index Terms—Log parsing; Online algorithm; Log analysis;
Web service management; 081109 204655 556 INFO dfs.DataNode$PacketResponder
: Received block blk_3587508140051953248 of size 67
108864 from /10.251.42.84
I. I NTRODUCTION
The prevalence of cloud computing, which enables on- Traditionally, log parsing relies heavily on regular expres-
demand service delivery, has made Service-oriented Architec- sions [12], which are designed and maintained manually by
ture (SOA) a dominant architectural style. Nowadays, more developers. However, this manual method is not suitable for
and more developers leverage existing Web services to build logs generated by modern services for the following three
their own systems because of their rich functionality and reasons. First, the volume of logs is increasing rapidly, which
“plug-and-play” property. Although developing Web service makes the manual method prohibitive. For example, a large-
based system is convenient and lightweight, Web service man- scale service system can generate 50 GB logs (120∼200
agement is a significant challenge for both service providers million lines) per hour [13]. Second, as open-source platforms
and users. Specifically, service providers (e.g., Amazon EC2 (e.g., Github) and Web service become popular, a system often
[1]) are expected to provide services with no failures or SLA consists of components written by hundreds of developers
(service-level agreement) violations to a large number of users. globally [3]. Thus, people in charge of the regular expressions
Similarly, service users need to effectively and efficiently may not know the original logging purpose, which makes
manage the adopted services, which have been discussed in manual management even harder. Third, logging statements
many recent works (e.g., Web service monitoring [2]). In this in modern systems updates frequently (e.g., hundreds of new
context, log analysis based service management techniques, logging statements every month [14]). In order to maintain
which employ service logs to achieve automatic or semi- a correct regular expression set, developers need to check all
automatic service management, have been widely studied. logging statements regularly, which is tedious and error-prone.
Logs are usually the only data resource available that Log parsing is widely studied to parse the raw log messages
records service runtime information. In general, a log message automatically. Most of existing log parsers focus on offline,
is a line of text printed by logging statements (e.g., printf(), batch processing. For example, Xu et al. [3] design a method

978-1-5386-0752-7/17 $31.00 © 2017 IEEE 33


DOI 10.1109/ICWS.2017.13
to automatically generate regular expressions based on source 081109 204608 Receiving block blk_3587 src: /10.251.42.84:57069 dest:
code. However, source code is often inaccessible in practice /10.251.42.84:50010
081109 204655 PacketResponder 0 for block blk_4003 terminating
(e.g., Web service components). For general log parsing, recent
081109 204655 Received block blk_3587 of size 67108864 from /10.251.42.84
studies propose data-driven methods [4], [15], which directly
extract log templates from raw log messages. These log parsers Log Parsing
are offline, and limited by the memory of a single computer.
blk_3587 Receiving block * src: * dest: *
Besides, they fail to align with the log collecting manner. A blk_4003 PacketResponder * for block * terminating
typical log collection system has a log shipper installed on blk_3587 Received block * of size * from *
each node to forward log entries in a streaming manner to a
centralized server that contains a log parser [16]. The offline Fig. 1: Overview of Log Parsing
log parsers need to employ all logs after log collection for
a certain period (e.g., 1h) for the parser training. In contrast, Specifically, raw log messages are unstructured data, including
an online log parser parses logs in a streaming manner, and it timestamps and raw message contents. The raw log messages
does not require an offline training step. Thus, current systems in Figure 1 are simplified HDFS raw log messages collected
highly demand online log parsing, which is only studied in on the Amazon EC2 platform [3]. In the parsing process, a
a few preliminary works [16], [17]. However, we observe parser distinguishes between the constant part and variable
that the parsers proposed in these works are not accurate and part of each raw log message. The constant part is tokens
efficient enough, which make them not eligible for log parsing that describe a system operation template (i.e., log event),
in modern Web service or Web service based systems. such as “Receiving block * src: * dest: *” in Figure 1; while
In this paper, we propose an online log parsing method, the variable part is the remaining tokens (e.g, “blk 3587”)
namely Drain, that can accurately and efficiently parse raw that carry dynamic runtime system information. A typical
log messages in a streaming manner. Drain does not require structured log message contains a matched log event and fields
source code or any information other than raw log messages. of interest (e.g, the HDFS block ID “blk 3587”). Typical log
Drain can automatically extract log templates from raw log parsers [4], [15], [16], [17] regard log parsing as a clustering
messages and split them into disjoint log groups. It employs problem, where they cluster raw log messages with the same
a parse tree with fixed depth to guide the log group search log event into a log group. The following section introduces
process, which effectively avoids constructing a very deep and our proposed log parser, which clusters the raw log messages
unbalanced tree. Besides, specially designed parsing rules are into different log groups in a streaming manner.
compactly encoded in the parse tree nodes. We evaluate Drain
on five real-world log data sets with more than 10 million III. M ETHODOLOGY
raw log messages. Drain demonstrates the highest accuracy In this section, we briefly introduce Drain, a fixed depth
on four data sets, and comparable accuracy on the remaining tree based online log parsing method. When a new raw log
one. Besides, Drain obtains 51.8%∼81.47% improvement in message arrives, Drain will preprocess it by simple regular
running time compared with the state-of-the-art online parser expressions based on domain knowledge. Then we search a log
[16]. We also demonstrate the effectiveness of Drain in log group (i.e., leaf node of the tree) by following the specially-
analysis by tackling a real-world anomaly detection task [3]. designed rules encoded in the internal nodes of the tree. If a
In summary, our paper makes the following contributions: suitable log group is found, the log message will be matched
• This paper presents the design of an online log parsing with the log event stored in that log group. Otherwise, a new
method (Drain), which encodes specially designed pars- log group will be created based on the log message. In the
ing rules in a parse tree with fixed depth. following, we first introduce the structure of the fixed depth
• Extensive experiments have been conducted on five real- tree (i.e., parse tree). Then we explain how Drain parses raw
world log data sets, which determine the superiority of log messages by searching the nodes of the parse tree.
Drain in terms of accuracy and efficiency.
• The source code of Drain has been publicly released [18],
A. Overall Tree Structure
allowing for easy use by researchers and practitioners for When a raw log message arrives, an online log parser needs
future study. to search the most suitable log group for it, or create a new log
The remainder of this paper is organized as follows. Section group. In this process, a simple solution is to compare the raw
II presents the overview of log parsing process. Section III log message with log event stored in each log group one by
describes our online log parsing method, Drain. We evaluate one. However, this solution is very slow because the number
the performance of Drain in Section IV. Related work is of log groups increases rapidly in parsing. To accelerate this
introduced in Section V. Finally, we conclude this paper in process, we design a parse tree with fixed depth to guide the
Section VI. log group search, which effectively bounds the number of log
groups that a raw log message needs to compare with.
II. OVERVIEW OF L OG PARSING The parse tree is illustrated in Figure 2. The root node is
The goal of log parsing is to transform raw log mes- in the top layer of the parse tree; the bottom layer contains the
sages into structured log messages, as described in Figure 1. leaf nodes; other nodes in the tree are internal nodes. Root

34
C. Step 2: Search by Log Message Length
Root In this step and step 3, we explain how we traverse the
parse tree according to the encoded rules and finally find a
. . . leaf node.
Length: 4 Length: 5 Length: 10
Drain starts from the root node of the parse tree with the
preprocessed log message. The 1-st layer nodes in the parse
tree represent log groups whose log messages are of different
Send Receive Starting * log message lengths. By log message length, we mean the
number of tokens in a log message. In this step, Drain selects
Log Group a path to a 1-st layer node based on the log message length of
A List of Log Groups
the preprocessed log message. For example, for log message
Log Event: Receive from node *
... Log IDs: [1, 23, 25, 46, 345, …]
“Receive from node 4”, Drain traverse to the internal node
“Length: 4” in Figure 2. This is based on the assumption
that log messages with the same log event will probably
have the same log message length. Although it is possible
Root Node Internal Node Leaf Node Log Group that log messages with the same log event have different log
message lengths, it can be handled by simple postprocessing.
Fig. 2: Structure of Parse Tree in Drain (depth = 3)
Besides, our experiments in Section IV-B demonstrate the
superiority of Drain in terms of parsing accuracy even without
node and internal nodes encode specially-designed rules to postprocessing.
guide the search process. They do not contain any log groups. D. Step 3: Search by Preceding Tokens
Each path in the parse tree ends with a leaf node, which stores
a list of log groups, and we only plot one leaf node here In this step, Drain traverses from a 1-st layer node, which
for simplicity. Each log group has two parts: log event and is searched in step 2, to a leaf node. This step is based on
log IDs. Log event is the template that best describes the log the assumption that tokens in the beginning positions of a log
messages in this group, which consists of the constant part message are more likely to be constants. Specifically, Drain
of a log message. Log IDs records the IDs of log messages selects the next internal node by the tokens in the beginning
in this group. One special design of the parse tree is that positions of the log message. For example, for log message
the depth of all leaf nodes are the same and are fixed by “Receive from node 4”, Drain traverses from 1-st layer node
a predefined parameter depth. For example, the depth of the “Length: 4” to 2-nd layer node “Receive” because the token
leaf nodes in Figure 2 is fixed to 3. This parameter bounds in the first position of the log message is “Receive”. Then
the number of nodes Drain visits during the search process, Drain will traverse to the leaf node linked with internal node
which greatly improves its efficiency. Besides, to avoid tree “Receive”, and go to step 4.
branch explosion, we employ a parameter maxChild, which The number of internal nodes that Drain traverses in this
restricts the maximum number of children of a node. In the step is (depth − 2), where depth is the parse tree parameter
following, for clarity, we define an n-th layer node as a node restricting the depth of all leaf nodes. Thus, there are (depth−
whose depth is n. Besides, unless otherwise stated, we use the 2) layers that encode the first (depth − 2) tokens in the log
parse tree in Figure 2 as an example in following explanation. messages as search rules. In the example above, we use the
parse tree in Figure 2 for simplicity, whose depth is 3, so
B. Step 1: Preprocess by Domain Knowledge we search by only the token in the first position. In practice,
According to our previous empirical study on existing Drain can consider more preceding tokens with larger depth
log parsing methods [19], preprocessing can improve parsing settings. Note that if depth is 2, Drain only considers the first
accuracy. Thus, before employing the parse tree, we preprocess layer used by step 2.
the raw log message when it arrives. Specifically, Drain In some cases, a log message may start with a parameter, for
allows users to provide simple regular expressions based on example, “120 bytes received”. These kinds of log messages
domain knowledge that represent commonly-used variables, can lead to branch explosion in the parse tree because each
such as IP address and block ID. Then Drain will remove the parameter (e.g., 120) will be encoded in an internal node.
tokens matched from the raw log message by these regular To avoid branch explosion, we only consider tokens that do
expressions. For example, block IDs in Figure 1 will be not contain digits in this step. If a token contains digits, it
removed by “blk [0-9]+”. will match a special internal node “*”. For example, for the
The regular expressions employed in this step are often log message above, Drain will traverse to the internal node
very simple, because they are used to match tokens instead “*” instead of “120”. Besides, we also define a parameter
of log messages. Besides, a data set usually requires only a maxChild, which restricts the maximum number of children
few such regular expressions. For example, the data sets used of a node. If a node already has maxChild children, any
in our evaluation section require at most two such regular non-matched tokens will match the special internal node “*”
expressions. among all its children.

35
E. Step 4: Search by Token Similarity
Root Root
Before this step, Drain has traversed to a leaf node, which
contains a list of log groups. The log messages in these log
Length: 3 Length: 3
groups comply with the rules encoded in the internal nodes
along the path. For example, the log group in Figure 2 has log
Send Send Receive
event “Receive from node *”, where the log messages contain
4 tokens and start with token “Receive”. block block
In this step, Drain selects the most suitable log group from *
the log group list. We calculate the similarity simSeq between
the log message and the log event of each log group. simSeq Log Event: Send block 44 Log Event: Send block 44 Log Event: Receive 120 bytes
is defined as following: Log IDs: [1] Log IDs: [1] Log IDs: [2]

n
equ(seq1 (i), seq2 (i))
simSeq = i=1 , (1) Fig. 3: Parse Tree Update Example (depth = 4)
n
where seq1 and seq2 represent the log message and the log
event respectively; seq(i) is the i-th token of the sequence; n system logs (HDFS and Zookeeper) to standalone software
is the log message length of the sequences; function equ is logs (Proxifier). Companies rarely release their log data to
defined as following: the public, because it may violates confidential clauses. We
 obtained three log data sets from other researchers with their
1 if t1 = t2 generous support. Specifically, BGL is a log data set collected
equ(t1 , t2 ) = (2)
0 otherwise by Lawrence Livermore National Labs (LLNL) from Blue-
Gene/L supercomputer system [20]. HPC logs are collected
where t1 and t2 are two tokens. After finding the log group from a high performance cluster, which has 49 nodes with
with the largest simSeq, we compare it with a predefined 6,152 cores and 128GB memory per node [21]. HDFS is a
similarity threshold st. If simSeq ≥ st, Drain returns the log data set collected from a 203-node cluster on Amazon
group as the most suitable log group. Otherwise, Drain returns EC2 platform in [3]. We also collect two log data sets for
a flag (e.g., None in Python) to indicate no suitable log group. evaluation. One is collected from Zookeeper installed on a
F. Step 5: Update the Parse Tree 32-node cluster in our lab. The other are logs of a standalone
software Proxifier.
If a suitable log group is returned in step 4, Drain will add
2) Comparison: To prove the effectiveness of Drain, we
the log ID of the current log message to the log IDs in the
compare its performance with four existing log parsing meth-
returned log group. Besides, the log event in the returned log
ods in terms of accuracy, efficiency and effectiveness on
group will be updated. Specifically, Drain scans the tokens in
subsequent log mining tasks. Specifically, two of them are
the same position of the log message and the log event. If the
offline log parsers, and the other two are online log parsers.
two tokens are the same, we do not modify the token in that
The ideas of these log parsers are briefly introduced as
token position. Otherwise, we update the token in that token
following:
position by wildcard (i.e., *) in the log event.
• LKE [4]: This is an offline log parsing method devel-
If Drain cannot find a suitable log group, it creates a new
oped by Microsoft. It employs hierarchical clustering and
log group based on the current log message, where log IDs
heuristic rules.
contains only the ID of the log message and log event is
• IPLoM [15]: IPLoM conducts a three-step hierarchical
exactly the log message. Then, Drain will update the parse tree
partitioning before template generation in an offline man-
with the new log group. Intuitively, Drain traverses from the
ner.
root node to a leaf node that should contain the new log group,
• SHISO [17]: In this online parser, a tree with predefined
and adds the missing internal nodes and leaf node accordingly
number of children in each node is used to guide log
along the path. For example, assume the current parse tree
group searching.
is the tree in the left-hand side of Figure 3, and a new log
message “Receive 120 bytes” arrives. Then Drain will update
the parse tree to the right-hand side tree in Figure 3. Note TABLE I: Summary of Log Data Sets
that the new internal node in the 3-rd layer is encoded as “*” System Description #Log Messages Log Message Length #Events
because the token “120” contains digits. BGL
BlueGene/L
4,747,963 10~102 376
Supercomputer
IV. E VALUATION High Performance
HPC Cluster 433,490 6~104 105
A. Experimental Settings (Los Alamos)
HDFS Hadoop File System 11,175,629 8~29 29
1) Log Data Sets: The log data sets used in our evaluation Zookeeper
Distributed
74,380 8~27 80
System Coordinator
are summarized in Table I. These five real-world data sets Proxifier Proxy Client 10,108 10~27 8
range from supercomputer logs (BGL and HPC) to distributed

36
• Spell [16]: This method uses longest common sequence to TABLE III: Parsing Accuracy of Log Parsing Methods
search log group in an online manner. It accelerates the
BGL HPC HDFS Zookeeper Proxifier
searching process by subsequence matching and prefix Offline Log Parsers
LKE 0.67 0.17 0.57 0.78 0.85
tree. IPLoM 0.99 0.65 0.99 0.99 0.85
3) Evaluation Metric and Experimental Setup: We use F- Online Log Parsers
SHISO 0.87 0.53 0.93 0.68 0.85
measure [22], [23], which is a typical evaluation metric for Spell 0.98 0.82 0.87 0.99 0.87
clustering algorithms, to evaluate the accuracy of log parsing Drain 0.99 0.84 0.99 0.99 0.86
methods. The definition of accuracy is as the following.
TABLE IV: Running Time (Sec) of Log Parsing Methods
2 ∗ P recision ∗ Recall
Accuracy = , (3)
P recision + Recall BGL HPC HDFS
Offline Log Parsers
Zookeeper Proxifier

where P recision and Recall are defined as follows: LKE N/A N/A N/A N/A 8888.49
IPLoM 140.57 12.74 333.03 2.17 0.38
Online Log Parsers
TP
P recision = , (4) SHISO 10964.55 582.14 6649.23 87.61 8.41
TP + FP Spell 447.14 47.28 676.45 5.27 0.87
Drain 115.96 8.76 325.7 1.81 0.27
Improvement 74.07% 81.47% 51.85% 65.65% 68.97%
TP
Recall = , (5)
TP + FN
where a true positive (T P ) decision assigns two log messages In this section, we evaluate the accuracy of two offline and
with the same log event to the same log group; a false positive two online log parsing methods on the data sets described in
(F P ) decision assigns two log messages with different log Table I. The evaluation results are in Table III. LKE fails to
events to the same log group; and a false negative (F N ) handle the data sets except Proxifier, because its O(n2 ) time
decision assigns two log messages with the same log event complexity makes it too slow for the other data sets. Thus,
to different log groups. This evaluation metric is also used in for the other four data sets, as with the existing work [19],
our previous study [19] on existing log parsers. [24], we evaluate LKE’s accuracy on sample data sets with
TABLE II: Parameter Setting of Drain 2k log messages randomly extracted from the original ones,
while all parsers are evaluated on the 2k sample data sets in
BGL HPC HDFS Zookeeper Proxifier our previous paper [19].
depth 3 4 3 3 4
st 0.3 0.4 0.5 0.3 0.3 We observe that the proposed online parsing method,
namely Drain, obtains the best accuracy on four data sets,
We run all experiments on a Linux server with Intel Xeon even compared with the offline log parsing methods. For
E5-2670v2 CPU and 128GB DDR3 1600 RAM, running 64- data set Proxifier, Drain also has the second best accuracy
bit Ubuntu 14.04.2 with Linux kernel 3.16.0. We run each (i.e., 0.86), and it is comparable to Spell, which obtains the
experiment 10 times to avoid bias. For the preprocessing highest accuracy (0.87) on this data set. LKE is not that
step of Drain (step 1), we remove obvious parameters in good on some data sets, because it employs an aggressive
log messages (i.e., IP addresses in HPC&Zookeeper&HDFS, clustering strategy, which can lead to under-partitioning.
core IDs in BGL, block IDs in HDFS and application IDs IPLoM obtains high accuracy on most data sets because of its
in Proxifier). The parameter setting of Drain is shown in specially-designed heuristic rules. SHISO uses the similarity
Table II. Besides, we empirically set maxChild to 100 for of characters in log messages to search the corresponding
all experiments. The number of children of a tree node rarely log events. This strategy is too coarse-grained, which causes
exceeds maxChild, because the encoded rules in the parse inaccuracy. Spell is accurate, but its strategy only based on
tree can already distribute the logs evenly to different paths. longest common subsequence can lead to under-partitioning.
We also re-tune the parameters of other log parsers to optimize Drain has the overall best accuracy for three reasons. First,
their performance, which is not presented here because of the it compounds both the log message length and the first few
space limit. We put them in our released source code [18] for tokens, which are effective and specially-designed rules, to
further reference. construct the fixed depth tree. Second, Drain only uses tokens
that do not contain digits to guide the searching process,
B. Accuracy of Drain which effectively avoids over-partitioning. Third, the tunable
Accuracy demonstrates how well a log parser matches tree depth and similar threshold st allows users to conduct
raw log messages with the correct log events. Accuracy is fine-grained tuning on different data sets.
important because parsing errors can degrade the performance
of subsequent log mining task. Intuitively, an offline log C. Efficiency of Drain
parsing method could obtain higher accuracy compared with To evaluate the efficiency of Drain, we measure the running
an online one, because an offline method enjoys all raw log time of it and four existing log parsers on five real-world log
messages at the beginning of parsing, while an online method data sets described in Table I. In Table IV, we demonstrate
adjusts its parsing model gradually in the parsing process. the running time of these log parsers. LKE fails to handle

37
four data sets in reasonable time (i.e., days or weeks), so we enjoys linear time complexity. The time complexity of Drain
mark the corresponding results as not available. is O( (d + cm)n ), where d is the depth of the parse tree, c
Considering online parsing methods, SHISO takes too much is the number of candidate log groups in the leaf node, m is
time on some data sets (e.g., takes more than 3h on BGL). the log message length, and n is the number of log messages.
This is mainly because SHISO only limits the number of Obviously, d and m are constants. c can also be regarded as
children for its tree nodes, which can cause very deep parse a constant, because the quantity of candidate log groups in
tree. Spell obtains better efficiency performance, because it each leaf node is nearly the same, and the number of log
employs a prefix tree structure to store all log events found, groups is far less than that of log messages. Thus, the time
which greatly reduces its running time. However, Spell does complexity of Drain is O(n). For SHISO and Spell, the depth
not restrict the depth of its prefix tree either, and it calculates of the parse tree could increase during the parsing process.
the longest common subsequence between two log messages, Second, we use the specially-designed simSeq to calculate the
which is time consuming. Compared with the existing online similarity between a log message and a log event candidate.
parsing methods, our proposed Drain requires the least running Its time complexity is O(m1 + m2 ), while m1 and m2 are
time on all five data sets. Specifically, Drain only needs 2 number of tokens in them respectively. In Drain, m1 = m2 . By
min to parse 4m BGL log messages and 6 min to parse 10m comparison, SHISO and Spell calculate the longest common
HDFS log messages. Drain greatly improves the running time subsequence between two sequences, whose time complexity
of existing online parsing methods. The improvements on the is O(m1 m2 ).
five real-world data sets are at least 51.85%, and it reduce
81.47% running time on HPC. Drain also outperforms the D. Effectiveness of Drain on Real-World Anomaly Detection
existing offline log parsing methods. It requires less running Task
time than IPLoM on all five data sets. Moreover, as an online In previous sections, we demonstrate the superiority of
log parsing method, Drain is not limited by the memory of a Drain in terms of accuracy and efficiency. Although high
single computer, which is the bottleneck of most offline log accuracy is necessary for log parsing methods, it does not
parsing methods. For example, IPLoM needs to load all log guarantee good performance in the subsequent log mining
messages into computer memory, and it will construct extra task. For example, because log mining could be sensitive to
data structures of comparable size in runtime. Thus, although some critical events, little parsing error may cause an order
IPLoM is efficient too, it may fail to handle large-scale log of magnitude performance degradation in log mining [19]. To
data. Drain is not limited by the memory of single computer, evaluate the effectiveness of Drain on subsequent log mining
because it processes the log messages one by one. tasks, we conduct a case study on a real-world anomaly
detection task.
TABLE V: Log Size of Sample Datasets for Efficiency Ex- We use the HDFS log data set in this case study. Specif-
periments ically, raw log messages in the HDFS data set [3] records
BGL 400 4k 40k 400k 4m system operations on 575,061 HDFS blocks with a total of
HPC 600 3k 15k 75k 375k 29 log event types. Among these blocks, 16,838 are manually
HDFS 1k 10k 100k 1m 10m labeled as anomalies by the original authors. In the original
Zookeeper 4k 8k 16k 32k 64k paper [3], the authors employ Principal Component Analysis
Proxifier 600 1200 2400 4800 9600
(PCA) to detect these anomalies. Next, we will briefly intro-
Because log size of modern systems is rapidly increasing, a duce the anomaly detection workflow, including log parsing
log parsing method is expected to handle large-scale log data. and log mining. In log parsing step, all the raw log messages
Thus, to simulate the increasing of log size, we also measure are parsed into structured log messages. Each structured log
the running time of these log parsers on 25 sampled log data message contains the corresponding HDFS block ID and a
sets with varying log size (i.e., number of log messages) as log event. A source code-based log parsing method is used
described in Table V. The log messages in these sampled data in the original paper, which is not discussed here because
sets are randomly extracted from the real-world data sets in source code is inaccessible in many cases (e.g., in third
Table I. party libraries). In log mining, we first use the structured log
The evaluation results are illustrated in Figure 4, which is messages to generate an event count matrix, where each row
in logarithmic scale. In this figure, we observe that, compared represents an HDFS block; each column represents a log event
with other methods, the running time of LKE raises faster as type; each cell counts the occurrence of an event on a certain
the log size increases. Because the time complexity of LKE HDFS block. Then we use TF-IDF [25] to preprocess the
is O(n2 ), and the time complexity of other methods is O(n), event count matrix. Intuitively, TF-IDF gives lower weights to
while n is the number of log messages. IPLoM is comparable common event types, which are less likely to contribute to the
to Drain, but it requires substantial amounts of memory as anomaly detection process. Finally, the event count matrix is
explained above. Online parsing methods (i.e., SHISO, Spell, fed into PCA, which automatically marks the blocks as normal
Drain) process log message one by one, and they all use a or abnormal.
parse tree to accelerate the log event search process. Drain is In our case study, we evaluate the performance of the
faster than others because of two main reasons. First, Drain anomaly detection task with different log parsing methods

38
Fig. 4: Running Time of Log Parsing Methods on Data Sets in Different Size

TABLE VI: Anomaly Detection with Different Log Parsing high parsing accuracy (0.93), does not perform well in this
Methods (16,838 True Anomalies) anomaly detection task. By using SHISO, we would report
Parsing Reported Detected False 1, 907 false alarms, which are 6 times worse than others.
Accuracy Anomaly Anomaly Alarm This will largely increase the workload of developers, because
IPLoM 0.99 10,998 10,720 (63%) 278 (2.5%) they usually need to manually check the anomalies reported.
SHISO 0.93 13,050 11,143 (66%) 1,907 (14.6%) Among the online parsing methods, Drain not only has the
Spell 0.87 10,949 10,674 (63%) 275 (2.5%)
highest parsing accuracy as demonstrated in Section IV-B,
Drain 0.99 10,998 10,720 (63%) 278 (2.5%)
Ground truth 1.00 11,473 11,195 (66%) 278 (2.4%)
but also obtains nearly optimal performance in the anomaly
detection case study.
used in the parsing step. Specifically, we use different log
V. R ELATED W ORK
parsing methods to parse the HDFS raw log messages respec-
tively and, hence, we obtain different sets of structured log Log Analysis for Service Management. Logs, which
messages. For example, an HDFS block ID could match with records system runtime information, are in widespread use for
different log events by using different log parsing methods. service management tasks, such as business model mining [8],
Then, we generate different event count matrices, and fed them [9], user behavior analysis [10], [11], anomaly detection [3],
into PCA, respectively. [4], [26], fault diagnosis [5], [6], performance improvement
The experimental results are shown in Table VI. In this [7], etc. Log parsing is a critical step to enable automated and
table, reported anomaly is the number of anomalies reported effective log analysis [19], because most of these techniques
by the PCA model; detected anomaly is the number of true require structured log messages as input. Thus, we believe our
anomalies reported; f alse alarm is the number of wrongly proposed online parsing method can benefit these techniques
reported ones. We use four existing log parsing methods to and future studies on log analysis.
handle the parsing step of this anomaly detection task. We do Log Parsing. Log parsing has been widely studied in recent
not use LKE because it cannot handle this large amount of years. Xu et al. [3] design a source code based log parser
data. Ground truth is the experiment using exactly correct that achieves high accuracy. However, source code is often
parsed results. inaccessible in practice (e.g., Web service components). Some
We can observe that Drain obtains nearly the optimal other work proposes data-driven approaches (LKE [4], IPLoM
anomaly detection performance. It detects 10, 720 true anoma- [15], SHISO [17], Spell [16]), in which data mining techniques
lies with only 278 false alarms. Although 37% of anomalies are employed to extract log templates and split raw log mes-
have not been detected, it is caused by the log mining sages into different log groups accordingly. Specifically, LKE
step. Because even when all the log messages are correctly and IPLoM are offline log parsers, which are studied in our
parsed, the log mining model still leaves 34% of anomalies previous evaluation study on offline log parsers [19]. SHISO
at large. Note that although IPLoM demonstrates the same and Spell are online log parsers, which parse log messages
anomaly detection performance as Drain, their parsing results in a streaming manner, and are not limited by the memory
are different. We also observe that SHISO, although has a of a single computer. In this paper, we propose an online log

39
parser, namely Drain, that greatly outperforms existing online [7] Y. Sun, H. Li, I. G. Councill, J. Huang, W. C. Lee, and C. L. Giles,
“Personalized ranking for digital libraries based on log analysis,” in
log parsers in terms of both accuracy and efficiency. It even WIDM’08: Proc. of the 10th ACM workshop on Web information and
performs better than the state-of-the-art offline parsers. data management, 2008, pp. 133–140.
[8] H. J. Cheng and A. Kumar, “Process mining on noisy logs-can log
Reliability of Web Service Systems. Many recent studies sanitization help to improve performance?” Decision Support Systems,
focus on enhancing the reliability of Web service systems. vol. 79, pp. 138–149, 2015.
[9] H. R. Motahari-Nezhad, R. Saint-Paul, B. Benatallah, and F. Casati,
Cubo et al. [27] use dynamic software product lines to “Deriving protocol models from imperfect service conversation logs,”
TKDE’08: IEEE Transactions on Knowledge and Data Engineering,
reconfigure service failures dynamically. Service selection and vol. 20, no. 12, pp. 1683–1698, 2008.
recommendation are also widely studied [28], [29]. These [10] X. Yu, M. Li, I. Paik, and K. H. Ryu, “Prediction of web user behavior by
discovering temporal relational rules from web log data,” in DEXA’12:
studies usually employ QoS (quality of service) values to Proc. of the 23rd International Conference on Database and Expert
characterize the reliability of different Web services. Jurca et Systems Applications, 2012, pp. 31–38.
[11] N. Poggi, V. Muthusamy, D. Carrera, and R. Khalaf, “Business process
al. [30] propose a reliable QoS monitoring technique based mining from e-commerce web logs,” in Business Process Management,
2013, pp. 65–80.
on client feedback. Yao et al. [31] develop a model with [12] D. Lang, “Using SEC,” USENIX ;login: Magazine, vol. 38, 2013.
[13] H. Mi, H. Wang, Y. Zhou, M. R. Lyu, and H. Cai, “Toward fine-grained,
accountability for business and QoS compliance. Besides, unsupervised, scalable performance diagnosis for production cloud com-
Chen et al. [32] propose a performance prediction method for puting systems,” IEEE Transactions on Parallel and Distributed Systems,
vol. 24, pp. 1245–1255, 2013.
component-based applications. Our proposed online log parser [14] W. Xu, “System problem detection by mining console logs,” Ph.D.
is critical for log analysis techniques, which can complement dissertation, University of California, Berkeley, 2010.
[15] A. Makanju, A. Zincir-Heywood, and E. Milios, “A lightweight algo-
with these methods in reliability enhancement for Web service rithm for message type extraction in system application logs,” TKDE’12:
IEEE Transactions on Knowledge and Data Engineering, 2012.
systems. The log analysis methods can also improve the [16] M. Du and F. Li, “Spell: Streaming parsing of system event logs,” in
reliability of many existing service systems [33], [34], [35]. ICDM’16 Proc. of the 16th International Conference on Data Mining,
2016.
[17] M. Mizutani, “Incremental mining of system log format,” in SCC’13:
VI. C ONCLUSION Proc. of the 10th International Conference on Services Computing, 2013.
[18] Drain source code. [Online]. Available: http://appsrv.cse.cuhk.edu.hk/
∼pjhe/Drain.py
Log parsing is critical for log analysis based Web service [19] P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, “An evaluation study on
management techniques. This paper proposes an online log log parsing and its use in log mining,” in DSN’16: Proc. of the 46th
Annual IEEE/IFIP International Conference on Dependable Systems and
parsing method, namely Drain, that parses raw log messages Networks, 2016.
in a streaming manner. Drain adopts a fixed depth parse tree [20] A. Oliner and J. Stearley, “What supercomputers say: A study of five
system logs,” in DSN’07, 2007.
to accelerate the log group search process, which encodes [21] L. A. N. S. LLC. Operational data to support and enable computer
specially designed rules in its tree nodes. To evaluate the science research. [Online]. Available: http://institutes.lanl.gov/data/fdata
[22] C. Manning, P. Raghavan, and H. Schutze, Introduction to Information
effectiveness of Drain, we conduct experiments on five real- Retrieval. Cambridge University Press, 2008.
[23] Evaluation of clustering. [Online]. Available: http://nlp.stanford.edu/
world log data sets. The experimental results show that Drain IR-book/html/htmledition/evaluation-of-clustering-1.html
[24] L. Tang, T. Li, and C. Perng, “LogSig: generating system events from
greatly outperforms existing online log parsers in terms of raw textual logs,” in CIKM’11: Proc. of ACM International Conference
accuracy and efficiency. Drain even obtains better performance on Information and Knowledge Management, 2011.
[25] G. Salton and C. Buckley, “Term weighting approaches in automatic
than the state-of-the-art offline log parsers, which are limited text retrival,” Cornell, Tech. Rep., 1987.
[26] W. Zhang, F. Bastani, I. L. Yen, K. Hulin, F. Bastani, and L. Khan, “Real-
by the memory of a single computer. Besides, we conduct time anomaly detection in streams of execution traces,” in HASE’16:
a case study on a real-world anomaly detection task, which Proc. of the 14th International Symposium on High-Assurance Systems
Engineering, 2012, pp. 32–39.
demonstrates the effectiveness of Drain on log analysis tasks. [27] J. Cubo, N. Gamez, E. Pimentel, and L. Fuentes, “Reconfiguration
of service failures in damasco using dynamic software product lines,”
ACKNOWLEDGMENT in SCC’15: Proc. of the 12nd International Conference on Services
Computing, 2015, pp. 114–121.
[28] S. Y. Hwang, W. P. Liao, and C. H. Lee, “Web services selection in
The work described in this paper was supported by the support of reliable web service choreography,” in ICWS’10: Proc. of the
National Natural Science Foundation of China (Project No. 17th International Conference on Web Services, 2010, pp. 115–122.
[29] S. Meng, Z. Zhou, T. Huang, D. Li, S. Wang, F. Fei, W. Wang,
61332010 and 61472338), the National Basic Research Pro- and W. Dou, “A temporal-aware hybrid collaborative recommendation
gram of China (973 Project No. 2014CB347701), and the method for cloud service,” in ICWS’16: Proc. of the 23rd International
Conference on Web Services, 2016, pp. 252–259.
Research Grants Council of the Hong Kong Special Admin- [30] R. Jurca, B. Faltings, and W. Binder, “Reliable qos monitoring based
on client feedback,” in WWW’07: Proc. of the 16th International
istrative Region, China (No. CUHK 14234416 of the General Conference on World Wide Web, 2007, pp. 1003–1012.
Research Fund). [31] J. Yao, S. Chen, C. Wang, D. Levy, and J. Zic, “Modelling collaborative
services for business and qos compliance,” in ICWS’11: Proc. of the 18th
International Conference on Web Services, 2011, pp. 299–306.
R EFERENCES [32] S. Chen, Y. Liu, I. Gorton, and A. Liu, “Performance prediction
of component-based applications,” JSS’05: Journal of Systems and
[1] Amazon ec2. [Online]. Available: https://aws.amazon.com/tw/ec2/ Software, vol. 74, no. 1, pp. 35–43, 2005.
[2] R. Ding, H. Zhou, J. Lou, H. Zhang, Q. Lin, Q. Fu, D. Zhang, and T. Xie, [33] A. Iwai and M. Aoyama, “Automotive cloud service systems based on
“Log2: A cost-aware logging mechanism for performance diagnosis,” in service-oriented architecture and its evaluation,” in CLOUD’11: Proc.
ATC’15: Proc. of the USENIX Annual Technical Conference, 2015. of the 4th International Conference on Cloud Computing, 2011.
[3] W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordon, “Detecting large- [34] J. Zhang, B. Iannucci, M. Hennessy, K. Gopal, S. Xiao, S. Kumar,
scale system problems by mining console logs,” in SOSP’09: Proc. of D. Pfeffer, B. Aljedia, Y. Ren, M. Griss, S. Rosenberg, J. Cao, and
the ACM Symposium on Operating Systems Principles, 2009. A. Rowe, “Sensor data as a service–a federated platform for mobile
[4] Q. Fu, J. Lou, Y. Wang, and J. Li, “Execution anomaly detection in
distributed systems through unstructured log analysis,” in ICDM’09: data-centric service development and sharing,” in SCC’13: Proc. of the
Proc. of International Conference on Data Mining, 2009. 10th International Conference on Services Computing, 2013.
[5] W. E. Wong, V. Debroy, R. Golden, X. Xu, and B. Thuraisingham, [35] Y. Duan, G. Fu, N. Zhou, X. Sun, N. C. Narendra, and B. Hu,
“Effective software fault localization using an rbf neural network,” “Everything as a service (xaas) on the cloud: origins, current and future
TR’12: IEEE Transactions on Reliability, 2012. trends,” in CLOUD’15: Proc. of the 8th International Conference on
[6] D. Q. Zou, H. Qin, and H. Jin, “Uilog: Improving log-based fault Cloud Computing, 2015, pp. 621–628.
diagnosis by log analysis,” Journal of Computer Science and Technology,
vol. 31, no. 5, pp. 1038–1052, 2016.

40

You might also like