AWSOM-LP: An Effective Log Parsing Technique Using Pattern Recognition and Frequency Analysis
AWSOM-LP: An Effective Log Parsing Technique Using Pattern Recognition and Frequency Analysis
AWSOM-LP: An Effective Log Parsing Technique Using Pattern Recognition and Frequency Analysis
Abstract—Logs provide users with useful insights to help with a variety of development and operations tasks. The problem is that logs are often
unstructured, making their analysis a complex task. This is mainly due to the lack of guidelines and best practices for logging, combined with a
large number of logging libraries at the disposal of software developers. There exist studies that aim to parse automatically large logs. The main
objective is to extract templates from samples of log data that are used to recognize future logs. In this paper, we propose AWSOM-LP, a
powerful log parsing and abstraction tool, which is highly accurate, stable, and efficient. AWSOM-LP is built on the idea of applying pattern
recognition and frequency analysis. First, log events are organized into patterns using a simple text processing method. Frequency analysis is
then applied locally to instances of the same group to identify static and dynamic content of log events. When applied to 16 log datasets of the
the LogPai project, AWSOM-LP achieves an average grouping accuracy of 93.5%, which outperforms the accuracy of five leading log parsing
tools namely, Logram, Lenma, Drain, IPLoM and AEL. Additionally, AWSOM-LP can generate more than 80% of the final log templates from 10%
to 50% of the entire log dataset and can parse up to a million log events in an average time of 5 minutes. AWSOM-LP is available online as an
open source. It can be used by practitioners and researchers to parse effectively and efficiently large log files so as to support log analysis tasks.
Index Terms—Log Parsing, Log Abstraction, Log Analytics, Software Logging, Software Engineering.
1 I NTRODUCTION [2] [3], debugging and understanding system failure [4] [5] [6]
[7] [8] , anomaly detection [1] [9] [10] [11] [12], testing [13],
S OFTWARE logging is an important activity that is used by
software developers to record critical events that would
help them analyze a system at runtime. Logs are generated by
performance analysis [14] [15] [16], automatic analysis of
logs during operation [17] [8] [14] [18] [19], failure prediction
logging statements inserted in the source code. An example [8] [20], data leakage detection [12] and recently for AI for
of a logging statement is shown in Figure 1, which is a code IT Operations (AIOps) [21] [17]. Logs, however, have been
snippet extracted from a Hadoop Java source file. historically difficult to work with. First, typical log files can
The generated log event (Figure 1) is composed mainly be significantly large (in the order of millions of events) [22]
of two parts: the log header and log message. The log [23] [7] . Add to this, the practice of logging remains largely ad
header typically contains the timestamp, the log level, and hoc with no known guidelines and best practice [24]. There is
the logging function. The log message contains static tokens no standardized way for representing log data either [23]. To
(i.e., usually text) such as "Received block", "size of", "from" make things worse, logs are rarely structured [1] [20], making
in the example of Figure 1 and dynamic tokens, which it difficult to extract meaningful information from large raw
represent variable values. In the example of Figure 1, we log files [25] [26].
have three variable values, which represent the block id (blk_-
In this paper, we focus on the problem of log parsing
5627280853087685), block size (67108864), and an IP address
and abstraction, which consists of automatically converting
(/10.251.91.84).
unstructured raw log events into a structured format that
Log files contain a wealth of information regarding the
would facilitate future analysis. Several log parsing and
execution of a software system used to support various
abstraction tools have been proposed (e.g., [27] [28] [29]).
software engineering tasks including anomaly detection [1]
More precisely, log parsing techniques focus on parsing the
log message. This is because log headers usually follow the
• I. Sedki is with the Department of ECE, Concordia University,
Montreal, QC, Canada.
same format within a log file. Parsing a log message is further
E-mail:[email protected] reduced to the problem of automatically distinguishing the
• A. Hamou-Lhadj is with the Department of ECE, Concordia University, static text from the dynamic variables. The result of parsing
Montreal, QC, Canada. the log event of Figure 1 consists of extracting the template
E-mail: [email protected]
• O. Ait-Mohamed is with the Department of ECE, Concordia University, shown in Figure 1, where the structure of the log message
Montreal, QC, Canada. is clearly identified. A simple way to parse log events is to
E-mail: [email protected] use regular expressions [30] [31]. The problem is that there
may exist thousands of such templates in industrial log files
2
Fig. 1. A code snippet showing an example of a logging statement, the generated log event, and the expected log template
[16] [32]. In addition, as the system evolves, new log formats Paper organization. The paper is organized as follows.
are introduced due to the use of various logging libraries, Section 2 introduces the background of log parsing and
requiring constant updates of the regular expressions. surveys prior work in that area. Section 3 presents, through
In this paper, we propose AWSOM-LP1 , a powerful a simple cherry-picked running example, a comprehensive
automated log parsing approach and tool that leverages description of AWSOM-LP approach. Section 4 shows the
pattern recognition and frequency analysis. AWSOM-LP outcomes of assessing AWSOM-LP’s accuracy, efficiency and
starts by identifying patterns of log events using similarity ease of stabilization. Section 5 discusses the limitations and
measurements and clusters them into groups. It then applies the threats to the validity of our findings. Finally, Section 6
frequency analysis to instances of each group to differentiate concludes the paper.
between static and dynamic tokens of log messages. The
idea is that tokens that are repeated more frequently are
2 R ELATED W ORK
most likely static tokens than variable values. This is not
the first time that frequency analysis is used in log parsing. Log analysis has received a great deal of attention from
Logram, a recent approach proposed by Dai et al. [27], also researchers and practitioners in recent years, due to the
uses frequency analysis. However, Logram applies frequency increasing need to understand complex systems at runtime.
analysis to the entire log file, which makes it hard to find clear Log parsing is an essential prerequisite for log analysis
demarcation between static and dynamic tokens. AWSOM-LP, tasks. Perhaps the most comprehensive survey of log parsing
on the other hand, applies frequency analysis to log events techniques is the one proposed by El-Masri et al. [24] in
that belong the same pattern, which increases the likelihood which the authors proposed a quality model for classifying
of distinguishing between static and dynamic tokens. In log parsing techniques and examined 17 different log parsing
addition, AWSOM-LP does not require building 3-gram tables techniques tools using this model. Existing tools can be
as it is the case for Logram, which greatly simplifies the categorized into groups based on the techniques they use,
parsing process approach. namely rule-based parsing tools, frequent tokens mining,
We evaluated AWSOM-LP using 16 log datasets from natural language processing, and classification/clustering
the LogPai benchmark2 and compared it to five leading log approaches. Another excellent study on surveying log parsing
parsing tools, mainly Drain [29], AEL [31], Lenma [28], IPLoM tools include the study of Zhu et al. [30]. We discuss the main
[33] and Logram [27]. Our results show that AWSOM-LP approaches in what follows and conclude with a general
performs better than these tools in parsing 13 out of the comparison of AWSOM-LP with these techniques.
16 log datasets. In addition, our approach has an average Jiang et al. [34] introduced AEL (Abstracting Execution
accuracy of 93.5%, whereas the second best method, Logram, Logs), which is a log parsing method that relies on textual
has an average accuracy of 82.5%. Additionally, AWSOM-LP similarity to group log events together. AEL starts by
can parse large files in a few minutes. When applied to 12 detecting trivial dynamic using hard-coded heuristics based
log files, it took AWSOM-LP less than 5 min to parse up to 1 on system knowledge (e.g., IP addresses, numbers, memory
million log events. For small files (100 thousand events and addresses). The resulting log events are then tokenized and
less), the average parsing time is less than 1 minute. AWSOM- assigned to bins based on the number of terms they contain.
LP is also stable. It requires between 10% to 50% of the log This grouping is used to compare log messages in each bin
data to learn 80% of the templates. and abstracts them into templates. The problem with AEL is
AWSOM-LP is available online as an open source3 . It can that it assumes that events that contain the same number
be used by practitioners and researchers to parse large log of terms should be grouped together, resulting in many false
files to support the various log analysis techniques. positives.
Vaarandi et al. [35], [36] proposed SLCT (Simple Logfile
Clustering Tool). The authors used clustering techniques to
1. AWSOM-LP stands for AbdelWahab Hamou-Lhadj, ISsam Sedki, and
OtMane Ait-Mohamed Log Parser. identify log templates. SLCT groups log events together based
2. https://github.com/logpai on their most common frequent terms. To this end, the
3. https://github.com/SRT-Lab/awsom-lp approach relies on a density-based clustering algorithm to
3
recognize the dynamic tokens, SLCT uses frequency analysis LTG (Natural Language Processing–Log Template Generation)
across all log lines in the log file. LogCluster [21] is an [40], which considers event template extraction from log
improved version of SLCT proposed by the same authors. messages as a problem of labeling sequential data in natural
LogCluster extracts all frequent terms from the log messages language. It uses Conditional Random Fields (CRF) [41] to
and arranges them into tuples. Then, it splits the log file classify terms in log messages as a static or dynamic. To
into clusters that contain at least a certain number of log construct the labeled data (the ground truth), it uses human
messages ensuring that all log events in the same cluster knowledge and regular expressions.
match the pattern constructed by the frequent words and Thaler et al. proposed NLM-FSE (Neural language Model-
the wildcards, which replace the dynamic variables. For Signature Extraction) [42], which trains a character-
Another clustering approach is the one proposed based neural network to classify static and dynamic parts
by Makanju et al., which is called IPLOM (Iterative of log messages. The approach constructs the training model
Partitioning Log Mining) [33]. IPLOM employs a heuristic- through four layers. (1) The embedding layer transforms the
based hierarchical clustering algorithm. Using this approach, categorical character input into a feature vector. (2) The
the first step is to partition log messages. For this, the Bidirectional-LSTM layer [43] allows each prediction to be
authors used heuristics considering the size of log events. conditioned on the complete past and future context of
The next step is to further divide each partition based on a sequence. (3) The dropout layer avoids over-fitting by
the highest number of similar terms. The resulting leaf nodes concatenating the results of the bi-LSTM layer, and (4) the
of the hierarchical partitioning as clusters and event types. fully connected, feed-forward neural network layer predicts
Fu et al. proposed LKE (Log Key Extraction) [37], another the event template using the Softmax activation function.
clustering-based approach, using a distance-based clustering He et al. [29] proposed Drain, a tool that abstracts log
technique. Log events are grouped together using weighted messages into event types using a parse-tree. Drain algorithm
edit distance, giving more weight to terms that appear at consists of five steps. Drain starts by pre-processing raw
the beginning of log events. Then, LKE splits the clusters log messages using regular expressions to identify trivial
until each raw log level in the same cluster belongs to the dynamic tokens, just like AWSOM-LP. Then, it builds a parse-
same log key, and extracts the common parts of the raw log tree using the number of tokens in log events. Drain assumes
key from each cluster to generate event types. Tang et al. that tokens that appears in the beginning of a log message
proposed LogSig [38], which considers the words present in are most likely static tokens. It uses a similarity metric that
a log event as signatures of event types. LogSig identifies log compares leaf nodes to event types to identify log groups.
events using a set of predefined message signatures. First, it Spell (Streaming Parser for Event Logs using an LCS) is
converts log messages into pairs of terms. Then, it forms log- a log parser, which converts log messages into event types.
message clusters using a local search strategy. LogSig selects Spell relies on the idea that log messages that are produced
the terms that appear frequently in each cluster and use them by the same logging statement can be assigned a type, which
to build the event templates. represents their longest common sequence. The LCS of the
two messages is likely to be static fields.
Hamooni et al. proposed LogMine [39], which uses
The main difference between AWSOM-LP and existing
MapReduce to abstract heterogeneous log messages
approaches lies in the way AWSOM-LP is designed. AWSOM-
generated from various systems. The LogMine algorithm
LP leverages the idea that static and dynamic tokens of log
consists of a hierarchical clustering module combined with
events can be better identified if we use frequency analysis
pattern recognition. It uses regular expressions based on
on instances of log events that belong to the same group.
domain knowledge to detect a set of dynamic variables.
We use a simple pattern recognition technique based on
Then, it replaces the real value of each field with its name. It
text similarity to identify these groups. This contrasted with
then clusters similar log messages with the friends-of-friends
techniques that use clustering alone (e.g., AEL, and IPLOM)
clustering algorithm.
and those that apply frequency analysis on the entire log file
Natural Language Processing (NLP) techniques have also (e.g., Logram). From this perspective, AWSOM-LP combines
been used for log parsing. Logram, a recent approach the best of these methods.
proposed by Dai et al. [27], is an automated log parsing
approach developed to address the growing size of logs, and
the need for low-latency log analysis tools. It leverages n- 3 A PPROACH
gram dictionaries to achieve efficient log parsing. Logram AWSOM-LP consists of three main steps: Pre-processing,
stores the frequencies of n-grams in logs and relies on the pattern recognition, frequency analysis, and replacement
n-gram dictionaries to distinguish between the static tokens of the remaining numerical dynamic variables. Similar to
and dynamic variables. Moreover, as the n-gram dictionaries existing log parsing techniques, AWSOM-LP requires an
can be constructed concurrently and aggregated efficiently, initial dataset to recognize the structure of the log events.
Logram can achieve high scalability when deployed in a It goes through different steps to build a model that
multi-core environment without sacrificing parsing accuracy. characterizes the information in the given log dataset. The
In Logram, the identification of dynamic and static tokens first step is a pre-processing step where header information
depends on a threshold applied to the number of times the such as the timestamp, the log level, and the logging function
n-grams occur. An automated approach to estimates this are identified. We also replace trivial dynamic variables such
threshold was proposed. Kobayashi et al. proposed NLP- as IP and MAC addresses by the expression ’<*>’. The second
4
step of AWSOM-LP is to identify similar log events and group Another essential part of the pre-processing step is the
them into patterns, used later to distinguish between the identification of trivial dynamic variables such as IP and MAC
static and the dynamic tokens. The next step is to apply addresses and replace them with a standard token, which
frequency analysis locally to instances of each group to is <*> in our case. Identifying trivial variables can increase
determine the static and dynamic tokens. We conjecture that parsing accuracy as shown by He et al. [44] and Dai et al. [27].
the frequency of static tokens is considerably higher than This step also increases our chances of identifying similar
that of dynamic tokens when frequency analysis is used for log events that should be instances of the same pattern. The
each group of log events. The last step consists of fine-tuning pattern recognition step is discussed in the next subsection.
the result to further improve the parsing accuracy and this is In this paper, we detect the most common formats of
by replacing the remaining numerical dynamic variables. We the variables listed below (note that AWSOM-LP allows users
explain each step in more detail in the following subsections. to define other regular expressions that may describe, for
To illustrate our approach, we use the sample log events from example, domain-specific trivial variables, etc... ). The regular
the HDFS log dataset shown in Figure 2, which is one of expressions to detect these variables are included in the
the datasets used to evaluate AWSOM-LP. We added a line AWSOM-LP git repository.4
number to each log event to help explain the approach. • Directory paths such as "Library/Logs"
• IPv4 addresses with or without the port number such as
"210.245.165.136" and "210.245.165.136:8080"
3.1 Pre-processing
• Any value that starts with "0x" in the form of "0x0001FC".
We start pre-processing the log events by identifying header • MAC addresses in the form of
information, which usually contains the timestamp, process "FF:F2:9F:16:EB:27:00:0D:60:E9:14:D8"
ID, log level, and the logging function. This information • Months such as "Jan" or "January"
is easily identifiable using simple regular expressions as • Days such as "Thu" or "Thursday"
shown in similar studies (e.g., [27]). In addition, the LogPai • Time such as "12:23:34.893"
datasets (used in the evaluation section) come with many • URL such as "http://www.google.com" and
regular expressions to detect headers in various log files. For "https://www.google.com" (i.e., with https)
example, in HDFS log events of Figure 2, we can see that The result of applying the pre-processing step to the HDFS
all log events start with a timestamp (e.g., 081109 203615), a running example is shown in Figure 3 where the header
process ID (e.g., 148), a log level (e.g., INFO), and a logging information is omitted and the IP addresses in Lines 3, 6,
function (e.g., dfs.DataNode$PacketResponder:). The regular 7, 10, 11, and 12 have been replaced by <*>.
expression that extracts this header information is as follows:
<Date> <Time> <Pid> <Level> <Component>: <Content> 4. https://github.com/SRT-Lab/awsom-lp
5
TABLE 1 "[678]" as dynamic tokens. One might think that this step
Example of a frequency analysis results applied to Pattern 1 could have been included as part of the pre-processing
stage of AWSOM-LP when looking for trivial variables.
Term Frequency
However, we found that doing so will affect the result of
PacketResponder 5
0 1 the frequency analysis step by letting more non-numerical
1 1 dynamic variables appear more than the threshold, ending
2 3 up falsely included as static tokens.
for 5
The final result of parsing the HDFS example log events
block 5
blk_38865049064139660 1 of Figure 2 is shown below. As we can see all templates have
blk_-6952295868487656571 1 accurately been detected.
blk_8229193803249955061 1
1) PacketResponder <*> for block <*> terminating
blk_-6670958622368987959 1
blk_572492839287299681 1 2) BLOCK* NameSystem.addStoredBlock: blockMap
terminating 5 updated: <*> is added to <*> size <*>
3) Received block <*> of size <*> from <*>
TABLE 4 TABLE 5
Accuracy of AWSOM-LP compared with other log parsers. We have Comparison between AWSOM-LP grouping and matching accuracy
highlighted the results that are the highest among the parsers in bold.
Name Grouping Accuracy Matching Accuracy
Name Drain AEL Lenma IPLoM Logram AWSOM- Android 0.970 0.879
LP Apache 0.999 0.990
Android 0.933 0.867 0.976 0.716 0.848 0.970 BGL 0.945 0.437
Apache 0.693 0.693 0.693 0.693 0.699 0.999 Hadoop 0.991 0.427
BGL 0.822 0.818 0.577 0.792 0.740 0.945 HDFS 0.988 0.997
Hadoop 0.545 0.539 0.535 0.373 0.965 0.991 HealthApp 0.955 0.887
HDFS 0.999 0.999 0.998 0.998 0.981 0.988 HPC 0.985 0.997
HealthApp 0.609 0.615 0.141 0.651 0.969 0.955 Linux 0.988 0.839
HPC 0.929 0.990 0.915 0.979 0.959 0.997 Mac 0.977 0.652
Linux 0.250 0.241 0.251 0.235 0.460 0.988 openSSH 0.945 0.801
Mac 0.515 0.579 0.551 0.503 0.666 0.977 Openstack 0.840 0.816
openSSH 0.507 0.247 0.522 0.508 0.847 0.945 Proxifier 0.739 0.810
Openstack 0.538 0.718 0.759 0.697 0.545 0.840 Spark 0.992 0.755
Proxifier 0.973 0.968 0.955 0.975 0.951 0.739 Thunderbird 0.669 0.363
Spark 0.902 0.965 0.943 0.883 0.903 0.992 Windows 0.984 0.694
Thunder... 0.803 0.782 0.814 0.505 0.761 0.669 Zookeeper 0.999 0.683
Windows 0.983 0.983 0.277 0.554 0.957 0.984 Average 0.936 0.730
Zookeeper 0.962 0.922 0.842 0.967 0.955 0.999
Average 0.748 0.745 0.672 0.689 0.825 0.936
TABLE 6 of LogPai.
Examples of mismatches between the ground truth and AWSOM-LP
Results
Log file Comparison
Figure 4 shows the results of ease of stabilization of AWSOM-
Event: generating core.21045
BGL Ground truth : generating core<*> LP. The red line in each figure means that 80% of the
AWSOM-LP : generating <*> templates have been detected. As we can see, AWSOM-LP
Event: 10765 total interrupts. 0 critical input interrupts. can generate 80% of log templates with less than 35% of
BGL Ground: <*>total interrupts. 0 critical input interrupts.
AWSOM-LP : <*>total interrupts. <*>critical input interrupts.
the logs in 8 log datasets out of 12. The results are even
Event : idoproxydb hit ASSERT condition: better with Apache, Mac, Spark, and Proxifier log files in
ASSERT expression=0 which case AWSOM-LP needs less than 10% of the logs to
Ground truth : idoproxydb hit ASSERT condition: generate 80% of the templates. For Hadoop and HealthApp
BGL
ASSERT expression=0
AWSOM-LP : idoproxydb hit ASSERT condition: logs, we need 25% of the total size of the log files. For BGL,
ASSERT expression=<*> OpenSSH, Android, and Openstack, AWSOM-LP requires at
Event : task_1445144423722_0020_m_000007 Task least 50% of the data to discover 80% and more templates.
Transitioned from NEW to SCHED
Hadoop These results can be attributed to the fact that these logs
Ground: task<*>Task Transitioned from NEW to SCHED
AWSOM-LP : <*>Task Transitioned from NEW to SCHED contains a large number of patterns, forcing AWSOM-LP to
Event : Progress of TaskAttempt learn new patterns as new logs are coming in. We did not
attempt_1445144423_0020_m_000000_0 is : 0.1795899 find a correlation between ease stabilisation and the size of
Hadoop
Ground: Progress of TaskAttempt attempt<*>is :<*>
log files. Smaller log files such as Hadoop (7,000 log events)
AWSOM-LP : Progress of TaskAttempt <*>is :<*>
Event. : Using callQueue class may require a large set to stabilize AWSOM-LP than larger
java.util.concurrent.LinkedBlockingQueue files. The key factors that impact ease of stabilization are
Hadoop Ground: Using callQueue class the number of patterns they contain and the high variability
java.util.concurrent.LinkedBlockingQueue
AWSOM-LP : Using callQueue class <*>
in the data (i.e., newer log events are introduced frequently
Event : (Release Date: Mon Sep 27 22:15:07 EDT 2004) during the execution of the logged system).
Thunder Ground truth : <*>(Release Date: <*>EDT <*>)
AWSOM-LP : <*>(Release Date: <*>)
4.3 Efficiency
To assess the efficiency of AWSOM-LP, we record the
4.2 Ease of stabilisation execution time to complete the end-to-end parsing process.
We assess the ease of stabilization by processing portions For this aspect of the study, we use the same log files as the
of the logs and measure how much knowledge in terms ones used to assess ease of stabilization (see Table 7). Note
of input log events is required by AWSOM-LP to reach a that for Android, BGL, and Spark logs, which contains more
reasonable accuracy. In other words, we want to know the than 1 million events, we measure AWSOM-LP’s efficienct
likelihood of creating a comprehensive set of log templates for these logs up to 1 million log events. This limitation
from a small amount of log data that can recognize future is mainly caused by our limited computing environment.
raw log events. Building a dictionary of templates from a All experiments were conducted on a MacBook Pro Laptop
limited set of log events is desirable for improved efficiency running MacOS Big Sur version 11.4 with a 6 Intel Core i7
and scalability. To measure ease of stabilization, we follow CPU 2.2GHz, 32GB 2400MHz DDR4 RAM, and 256 GB SSD
the same approach as the one proposed by Dai et al. [27]. hard drive.
For each dataset, we select portions of the log file and Unlike other studies that use the file size to measure
measure how many templates are extracted using these efficiency [30] [45], we decided in this paper to use the
portions compared to the total number of templates that number of log events instead. We believe that the number
are generated when using the entire file. We start with 5% of log events is more representative of the amount of
of log events and increase this portion by 10% until we reach information contained in a log file. For example, the Hadoop
100% coverage of the log file. This part of the evaluation
is applied to larger log files (not only the 2,000 log events TABLE 7
as for accuracy). The LogPai log datasets are organized in The log datasets used for ease of stabilisation
two different ways. Some datasets such as Hadoop and
OpenStack are organized into folders that contain multiple File Size Number of Lines
log files. For these, we selected the largest log file from each Apache 5.1MB 56,482
Android 192.3MB 1,555,000
dataset for evaluation. The other datasets such as BGL, HDFS,
BGL 743.2MB 4,747,000
etc. are saved as one log file. For these, we took the entire file Spark 166MB 1,225,000
as our testbed. However, we excluded Thunderbird, Windows, OpenSSH 73MB 655,000
and HDFS because of the large amount of data they contain, Zookeeper 10MB 74,000
OpenStack 41MB 137,000
which is in Giga Bytes. Additionally, we excluded the HPC
Proxifier 3MB 21,000
dataset because the file was corrupt and we were not Mac 17MB 117,000
able to process it. The exact files used to evaluate ease HealthApp 24MB 253,000
of stabilization can be found on the AWSOM-LP Github Hadoop 2MB 7,000
repository. In total, we used 12 log files out of the 16 datasets Linux 2.4MB 27,000
10
Fig. 4. Results of ease of stabilization of AWSOM-LP. The x-axis refers to the percentage of log events used for parsing, while the y-axis refers to the percentage
of templates that were identified. The red line indicates that 80% of the templates were identified.
log file has a size of 1.5MB, but only contains 7,253 log events, log events, and increase this number by a factor of 2 until we
whereas the size of the Spark log file is 1MB, but has 8,750 reach the total size of the file (i.e., 56,482).
log events. This because the format and complexity of log
events vary from one dataset to another. Using the size of the Results
storage space to measure efficiency is misleading.
Figure 5 shows the results of AWSOM-LP’s efficiency, which
For each file, we measure the efficiency of AWSOM-LP varies from 3 seconds to 1,296 seconds (i.e., 22 min)
by randomly selecting various portions of the file to see depending on the log file. For Hadoop, Proxifier, Apache,
how AWSOM-LP performs as the size of the file increases. Linux, Zookeper, MAC, AWSOM-LP took from 3 to 100
For example, for the Apache log file of Table 7, we start by seconds to parse each file. For larger logs such as HealthApp,
measuring the efficiency of LogPaser when applied to 10,000 Spark, OpenSSH, BGL, Android, AWSOM-LP took between
11
Fig. 5. Results of AWSOM-LP efficiency. The x-axis represents the number of log events and the y-axis represents the execution time in seconds.
143 (3 min) to 1,296 seconds (22 min). This variation is is very good, which is 5 minutes on average. This
mainly due to the structure of the log events of each file performance is obviously impacted by the hardware used
and of course the number of lines processed. Note, however, to run the experiments. We anticipate that using more
that Android logs took the most time to parse 1,296 seconds computing power and better hardware, combined with
(22 min) and this is due to the complexity of Android events. parallel programming, would significantly increase the
Each event contains a large number of static and dynamic efficiency of AWSOM-LP.
tokens, which affect the AWSOM-LP’s pattern recognition
step. In average, AWSOM-LP’s efficiency is 233 seconds (4 5 T HREATS TO VALIDITY
min).
In this section, we discuss the threats to the validity of this
Additionally, the figure shows that the parsing time study, which are organized as internal, external, conclusion,
increases linearly with respect to the file size (in terms of the and reliability threats.
number of log events), meaning that AWSOM-LP’s efficiency Internal validity: Internal validity threats concern the
does not degrade as the size of the file increases. factors that might impact our results. We assessed AWSOM-
In conclusion, the overall efficiency of AWSOM-LP LP using 16 log datasets from the LogPai benchmark.
12
We cannot ascertain that AWSOM-LP’s accuracy would be AWSOM-LP does not need to read the entire log file to learn
the same if applied to other datasets. This said, these the templates. IT requires between 10% to 50% of the data
datasets cover software systems from different domains, to recognize at least 80% of the template, making it a very
which make them a good testbed for log parsing and analysis stable tool, ready to process logs in real-time.
tools. Another internal threat to validity is related to the Future work should build on this work by focusing on
threshold we used when applying local frequency analysis, the following aspects (a) apply AWSOM-LP to more logs,
which is the minimum frequency. A different threshold especially those from industrial system, (b) improve AWSOM-
may lead to different results. To mitigate this threat, we LP by adding more regular expressions to identify other
experimented with different threshold including the median trivial dynamic variables such as domain-specific variables,
frequency and found that the minimum frequency yields best (c) investigate a simple way to recommend a cutoff threshold
results. We should work towards automating the selection for the frequency analysis step based on the characteristics of
of the threshold tailored to specific log files based on the log data, and (d) improve the efficiency of the tool when
statistical analysis of sample log data. Also, despite our applied to log files with a large number of patterns, with high
efforts implementing and testing AWSOM-LP, errors may variability.
have occurred. To mitigate this threat, we tested the tool
on many log files and we also checked manually its output
on small samples. In addition, we make the tool and the
R EFERENCES
data available on Github to allow researcher to reproduce the
results. Finally, to check the accuracy of AWSOM-LP, we have [1] W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan,
had to examine the differences between the results obtained “Detecting large-scale system problems by mining console logs,”
by AWSOM-LP and the ground truth. This was done semi- in Proceedings of the ACM SIGOPS 22nd Symposium on Operating
Systems Principles, ser. SOSP ’09. New York, NY, USA: Association
automatically through scripts and manual inspections. All for Computing Machinery, 2009, p. 117–132. [Online]. Available:
efforts were made to reduce potential errors. https://doi.org/10.1145/1629575.1629587
Reliability Validity. Reliability validity concerns the [2] S. He, J. Zhu, P. He, and M. R. Lyu, “Experience report: system log
possibility to replicate this study. We provide an online analysis for anomaly detection,” in Software Reliability Engineering
(ISSRE), 2016 IEEE 27th International Symposium on. IEEE, 2016,
package to facilitate the assessment, replicability and pp. 207–218.
reproducibility of this study. [3] Q. Lin, H. Zhang, J.-G. Lou, Y. Zhang, and X. Chen, “Log
Conclusion validity. Conclusion validity threats clustering based problem identification for online service systems,”
correspond to the correctness of the obtained results. in Proceedings of the 38th International Conference on Software
Engineering Companion. ACM, 2016, pp. 102–111.
We applied AWSOM-LP to 16 log files that are widely used in [4] T. Barik, R. DeLine, S. M. Drucker, and D. Fisher, “The bones of
similar studies. We made every effort to review the accuracy the system: a case study of logging and telemetry at microsoft,”
(grouping and matching), efficiency, and ease of stabilisation in Proceedings of the 38th International Conference on Software
Engineering (ICSE 2016), Companion Volume, 2016, pp. 92–101.
experiments to ensure that we properly interpret the results.
[5] J. Cito, P. Leitner, T. Fritz, and H. C. Gall, “The making of cloud
The tool and the files used in every step of this study applications: an empirical study on software development for the
are made available online to allow the assessment and cloud,” in Proceedings of the 2015 10th Joint Meeting on Foundations
reproducibility of our results. of Software Engineering (ESEC/FSE 2015), 2015, pp. 393–403.
[6] P. Huang, C. Guo, J. R. Lorch, L. Zhou, and Y. Dang, “Capturing
External validity:. External validity is about the
and enhancing in situ system observability for failure detection,” in
generalizability of the results. We performed our study on 16 Proceedings of the 13th USENIX Symposium on Operating Systems
log files that cover a wide range of software systems. We do Design and Implementation (OSDI 2018), 2018, pp. 1—-16.
not claim that our results can be generalized to all possible [7] A. Oliner and J. Stearley, “What supercomputers say: A study of
five system logs,” in Proceedings of the 37th Annual IEEE/IFIP
log files, in particular to industrial and proprietary logs to International Conference on Dependable Systems and Networks
which we did not have access. (DSN 2007), 2007, pp. 575––584.
[8] Q. Lin, K. Hsieh, Y. Dang, K. Zhang, H.and Sui, Y. Xu, J. Lou,
C. Li, Y. Wu, R. Yao, R. Chintalapati, and D. Zhang, “Predicting
6 C ONCLUSION node failure in cloud service systems,” in Proceedings of the 2018
ACM Joint Meeting on European Software Engineering Conference
We presented AWSOM-LP, a powerful log parsing approach and Symposium on the Foundations of Software Engineering
and tool that can be used by researchers and practitioners to (ESEC/SIGSOFT FSE 2018), 2018, pp. 480––490.
parse and abstract unstructured raw log data, an important [9] M. Islam, K. Wael, and A. Hamou-Lhadj, “Anomaly detection
first step of any viable log analysis task. AWSOM-LP differs techniques based on kappa-pruned ensembles,” IEEE Transactions
on Reliability, vol. 67, no. 1, pp. 212–229, 2018.
from other tools in its design. It uses a clever way to [10] J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, “Mining invariants from
distinguish between static and dynamic tokens of log events console logs for system problem detection.” in USENIX Annual
by applying frequency analysis to instances of events that Technical Conference, 2010, pp. 23–25.
are grouped in the same pattern. By doing so, AWSOM-LP [11] W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan, “Online
system problem detection by mining patterns of console logs,”
is capable to clearly extract log templates that can be used in Proceedings of the 9th IEEE International Conference on Data
to recognized and structure log events. AWSOM-LP is more Mining (ICDM 2009), 2009, pp. 588—-597.
accurate in parsing a representative set of 16 log files of the [12] R. Zhou, M. Hamdaqa, H. Cai, and A. Hamou-Lhadj, “Mobilogleak:
A preliminary study on data leakage caused by poor logging
LogPai project than any existing open source log file. Not
practices,” in Proceedings of the Proc. of the International
only that, AWSOM-LP is also efficient. It took in average 4 Conference on Software Analysis, Evolution, and Reengineering
min to parse 12 log file with up to 1 million events. Further, (SANER 2020), ERA Track, 2020, pp. 577–581.
13
[13] Z. M. Jiang, A. E. Hassan, G. Hamann, and P. Flora, “Automatic [32] H. Mi, H. Wang, Y. Zhou, M. R.-T. Lyu, and H. Cai, “Toward
identification of load testing problems,” in Proceedings of the 24th fine-grained, unsupervised, scalable performance diagnosis for
IEEE International Conference on Software Maintenance (ICSM production cloud computing systems,” IEEE Transactions on
2008), 2008, pp. 307—-316. Parallel and Distributed Systems, vol. 24, no. 6, pp. 1245–1255, 2013.
[14] S. He, Q. Lin, J. Lou, H. Zhang, M. R. Lyu, and D. Zhang, [33] A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, “A lightweight
“Identifying impactful service system problems via log analysis,” in algorithm for message type extraction in system application logs,”
Proceedings of the 2018 ACM Joint Meeting on European Software IEEE Transactions on Knowledge and Data Engineering, vol. 24,
Engineering Conference and Symposium on the Foundations of no. 11, pp. 1921–1936, 2012.
Software Engineering (ESEC/SIGSOFT FSE 2018), 2018, pp. 60—-70. [34] Z. M. Jiang, A. E. Hassan, G. Hamann, and P. Flora, “An automated
[15] K. Nagaraj, C. Killian, and J. Neville, “Structured comparative approach for abstracting execution logs to execution events,”
analysis of systems logs to diagnose performance problems,” in Journal of Software Maintenance and Evolution: Research and
Proceedings of the 9th USENIX conference on Networked Systems Practice, vol. 20, no. 4, pp. 249–267, 2008.
Design and Implementation. USENIX Association, 2012, pp. 26–26. [35] R. Vaarandi, “Mining event logs with slct and loghound,” in Network
[16] M. Chow, D. Meisner, J. Flinn, D. Peek, and T. F. Wenisch, “The Operations and Management Symposium, 2008. NOMS 2008. IEEE.
mystery machine: End-to-end performance analysis of large-scale IEEE, 2008, pp. 1071–1074.
internet services,” in Proceedings of the 11th USENIX Conference on [36] ——, “A data clustering algorithm for mining patterns from event
Operating Systems Design and Implementation, ser. OSDI’14. USA: logs,” in IP Operations & Management, 2003.(IPOM 2003). 3rd IEEE
USENIX Association, 2014, p. 217–231. Workshop on. IEEE, 2003, pp. 119–126.
[17] Y. Dang, Q. Lin, and P. Huang, “Aiops: Real-world challenges and [37] Q. Fu, J.-G. Lou, Y. Wang, and J. Li, “Execution anomaly detection
research innovations,” in Proceedings of the 41st International in distributed systems through unstructured log analysis,” in Data
Conference on Software Engineering (ICSE 2019), Companion Mining, 2009. ICDM’09. Ninth IEEE International Conference on.
Volume, 2019, pp. 4—-5. IEEE, 2009, pp. 149–158.
[18] N. El-Sayed, H. Zhu, and B. Schroeder, “Learning from failure [38] L. Tang, T. Li, and C.-S. Perng, “Logsig: Generating system events
across multiple clusters: A trace-driven approach to understanding, from raw textual logs,” in Proceedings of the 20th ACM international
predicting, and mitigating job terminations,” in 2017 IEEE conference on Information and knowledge management. ACM,
37th International Conference on Distributed Computing Systems 2011, pp. 785–794.
(ICDCS), 2017, pp. 1333–1344. [39] H. Hamooni, B. Debnath, J. Xu, H. Zhang, G. Jiang, and A. Mueen,
[19] P. Huang, C. Guo, J. R. Lorch, L. Zhou, and Y. Dang, “Capturing “Logmine: fast pattern recognition for log analytics,” in Proceedings
and enhancing in situ system observability for failure detection,” of the 25th ACM International on Conference on Information and
in Proceedings of the 13th USENIX Conference on Operating Knowledge Management. ACM, 2016, pp. 1573–1582.
Systems Design and Implementation, ser. OSDI’18. USA: USENIX [40] S. Kobayashi, K. Fukuda, and H. Esaki, “Towards an nlp-based
Association, 2018, p. 1–16. log template generation algorithm for system log analysis,” in
Proceedings of The Ninth International Conference on Future
[20] Q. Fu, J.-G. Lou, Q. Lin, R. Ding, D. Zhang, and T. Xie, “Contextual
Internet Technologies. ACM, 2014, p. 11.
analysis of program logs for understanding system behaviors,” in
[41] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random
Proceedings of the 10th Working Conference on Mining Software
fields: Probabilistic models for segmenting and labeling sequence
Repositories. IEEE Press, 2013, pp. 397–400.
data,” in Proceedings of the International Conference on Machine
[21] R. Vaarandi and M. Pihelgas, “Logcluster - a data clustering and
Learning, 2001, pp. 282–289.
pattern mining algorithm for event logs,” in 2015 11th International
[42] S. Thaler, V. Menkonvski, and M. Petkovic, “Towards a neural
Conference on Network and Service Management (CNSM), 2015, pp.
language model for signature extraction from forensic logs,” in
1–7.
Digital Forensic and Security (ISDFS), 2017 5th International
[22] M. Lemoudden and B. E. Ouahidi, “Managing cloud-generated logs
Symposium on. IEEE, 2017, pp. 1–6.
using big data technologies,” in Proceedings of the International
[43] A. Graves, N. Jaitly, and A.-r. Mohamed, “Hybrid speech recognition
Conference on Wireless Networks and Mobile Communications,
with deep bidirectional lstm,” in Automatic Speech Recognition and
WINCOM 2015, 2015, pp. 1––7.
Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp.
[23] A. Miransky, A. Hamou-Lhadj, E. Cialini, and A. Larsson, 273–278.
“Operational-log analysis for big data systems: Challenges and [44] P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, “An evaluation study
solutions,” IEEE Software, vol. 33, no. 2, pp. 55–59, 2016. on log parsing and its use in log mining,” in 2016 46th Annual
[24] E.-M. Diana, F. Petrillo, Y.-G. Guéhéneuc, A. Hamou-Lhadj, and IEEE/IFIP International Conference on Dependable Systems and
A. Bouziane, “A systematic literature review on automated log Networks (DSN). IEEE, 2016, pp. 654–661.
abstraction techniques,” Information and Software Technology, vol. [45] P. He, J. Zhu, P. Xu, Z. Zheng, and M. R. Lyu, “A directed acyclic graph
122, pp. 106–276, 02 2020. approach to online log parsing,” arXiv preprint arXiv:1806.04356,
[25] W. Shang, Z. M. Jiang, B. Adams, A. E. Hassan, M. W. Godfrey, 2018.
M. Nasser, and P. Flora, “An exploratory study of the evolution of
communicated information about the execution of large software
systems,” in 2011 18th Working Conference on Reverse Engineering,
2011, pp. 335–344.
[26] W. Shang, Z. M. Jiang, H. Hemmati, B. Adams, A. E. Hassan, and
P. Martin, “Assisting developers of big data analytics applications
when deploying on hadoop clouds,” in 2013 35th International
Conference on Software Engineering (ICSE), 2013, pp. 402–411.
[27] H. Dai, H. Li, W. Shang, T.-H. Chen, and C.-S. Chen, “Logram:
Efficient log parsing using n-gram dictionaries,” pp. 2–9, 2020.
[28] K. Shima, “Length matters: Clustering system log messages using
length of words,” CoRR, vol. abs/1611.03213, 2016.
[29] P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing
approach with fixed depth tree,” in Web Services (ICWS), 2017 IEEE
International Conference on. IEEE, 2017, pp. 33–40.
[30] J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M. R. Lyu, “Tools
and benchmarks for automated log parsing,” in Proceedings of the
41st International Conference on Software Engineering: Software
Engineering in Practice. IEEE Press, 2019, pp. 121–130.
[31] G. H. Z. M. Jiang, A. E. Hassan and P. Flora, “An automated approach
for abstracting execution logs to execution events,” pp. 249–267,
2008.