0% found this document useful (0 votes)
135 views17 pages

Log Analysis PDF

This document presents a study analyzing over 200,000 log analysis queries from the commercial data analytics system Splunk. The study aims to quantitatively describe common log analysis behaviors to inform the design of analysis tools. Key findings include that log analysis primarily involves filtering, reformatting, and summarizing data, and non-technical users are increasingly using log analysis to drive business decisions. The study's results suggest areas for future research such as improving data transformation representation and integrated provenance collection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views17 pages

Log Analysis PDF

This document presents a study analyzing over 200,000 log analysis queries from the commercial data analytics system Splunk. The study aims to quantitatively describe common log analysis behaviors to inform the design of analysis tools. Key findings include that log analysis primarily involves filtering, reformatting, and summarizing data, and non-technical users are increasingly using log analysis to drive business decisions. The study's results suggest areas for future research such as improving data transformation representation and integrated provenance collection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Analyzing Log Analysis: An Empirical Study

of User Log Mining


S. Alspaugh, University of California, Berkeley and Splunk Inc.; Beidi Chen and Jessica Lin,
University of California, Berkeley; Archana Ganapathi, Splunk Inc.; Marti A. Hearst
and Randy Katz, University of California, Berkeley
https://www.usenix.org/conference/lisa14/conference-program/presentation/alspaugh

This paper is included in the Proceedings of the


28th Large Installation System Administration Conference (LISA14).
November 9–14, 2014 • Seattle, WA
ISBN 978-1-931971-17-1

Open access to the


Proceedings of the 28th Large Installation
­System Administration Conference (LISA14)
is sponsored by USENIX
Analyzing Log Analysis: An Empirical Study of User Log Mining

S. Alspaugh∗ Beidi Chen Jessica Lin


UC Berkeley UC Berkeley UC Berkeley
Archana Ganapathi Marti A. Hearst Randy Katz
Splunk Inc. UC Berkeley UC Berkeley

Abstract recorded from a commercial data analytics system called


Splunk. One challenge is that logged system events are
We present an in-depth study of over 200K log analysis not an ideal representation of human log analysis activ-
queries from Splunk, a platform for data analytics. Using ity [3]. Logging code is typically not designed to capture
these queries, we quantitatively describe log analysis be- human behavior at the most efficacious level of granu-
havior to inform the design of analysis tools. This study larity. Even if it were, recorded events may not reflect
includes state machine based descriptions of typical log internal mental activities. To help address this gap, we
analysis pipelines, cluster analysis of the most common supplement the reported data with results of a survey of
transformation types, and survey data about Splunk user Splunk sales engineers regarding how Splunk is used in
roles, use cases, and skill sets. We find that log anal- practice.
ysis primarily involves filtering, reformatting, and sum- In our analysis, we examine questions such as: What
marizing data and that non-technical users increasingly transformations do users apply to log data in order to
need data from logs to drive their decision making. We analyze it? What are common analysis workflows, as
conclude with a number of suggestions for future re- described by sequences of such transformations? What
search. do such workflows tell us, qualitatively, about the nature
Tags: log analysis, query logs, user modeling, Splunk, of log analysis? Who performs log analysis and to what
user surveys end? What improvements do we need to make to analysis
tools, as well as to the infrastructure that logs activities
from such tools, in order to improve our understanding
1 Introduction of the analysis process and make it easier for users to
extract insights from their data?
The answers to these questions support a picture of log
Log analysis is the process of transforming raw log data analysis primarily as a task of filtering, reformatting, and
into information for solving problems. The market for summarizing. Much of this activity appears to be data
log analysis software is huge and growing as more busi- munging, supporting other reports in the literature [28].
ness insights are obtained from logs. Stakeholders in In addition, we learn from our survey results that users
this industry need detailed, quantitative data about the outside of IT departments, including marketers and ex-
log analysis process to identify inefficiencies, stream- ecutives, are starting to turn to log analysis to gain busi-
line workflows, automate tasks, design high-level anal- ness insights. Together, our experience analyzing these
ysis languages, and spot outstanding challenges. For queries and the results of our analysis suggest several
these purposes, it is important to understand log anal- important avenues for future research: improving data
ysis in terms of discrete tasks and data transformations transformation representation in analytics tools, imple-
that can be measured, quantified, correlated, and auto- menting integrated provenance collection for user activ-
mated, rather than qualitative descriptions and experi- ity record, improving data analytics interfaces and cre-
ence alone. ating intelligent predictive assistants, and further analyz-
This paper helps meet this need using over 200K queries ing other data analysis activities from other systems and
other types of data besides logs.
∗ This author was an employee of Splunk Inc. when this paper was

written.

USENIX Association 28th Large Installation System Administration Conference (LISA14)  53


Figure 1: The default Splunk GUI view displays the first several events indexed, with extracted fields highlighted on
the side, and a histogram of the number of events over time displayed along the top. The user types their query into
the search bar at the top of this view.

2 Related Work ence [25, 19], and data extraction [16, 39]. Although
their motivating use cases overlap with Splunk use cases,
in our observations, the use of such techniques appears to
We discuss (1) systems for log analysis, (2) techniques be relatively rare (even though Splunk does provide, e.g.,
for log analysis, and (3) results of log analysis, so that clustering and anomaly detection functionality).
those log analysis activities can be compared to our ob- Results of log analysis: Log analysis is also used in re-
servations. We also discuss (4) user studies of system ad- search as a means to an end rather than as the subject
ministrators – one of the primary classes of log analysts – itself. Logs have been used to explain system behav-
and (5) of search engine users – where query logs are the ior [7, 6], understand failures [31, 24], identify design
main source of data on user behavior and needs. flaws [11], spot security vulnerabilities [15], highlight
Systems for log analysis: The purpose of this section new phenomena [29], and drive system simulations [12].
is not to compare Splunk to other analysis systems, but To the extent that such research involves heavy applica-
to describe the uses these systems support, to provide a tion of human inference rather than “automatic” statisti-
sense of how our observations fit within the larger con- cal inference techniques, like many of those mentioned
text. Dapper, Google’s system tracing infrastructure, in the previous section, it appears to more closely align
is used by engineers to track request latency, guaran- with our observations of log analysis behavior. However,
tee data correctness, assess data access costs, and find the problems addressed are naturally of an academic na-
bugs [34]. From their detailed descriptions, we can in- ture, whereas Splunk users of often looking for timely
fer that engineers use transformations similar to those business insights specific to their situation.
used by Splunk users. Other systems, such as Sawzall System administrator user studies: As system admin-
and PigLatin, include query languages that extract data istrators are one of the primary classes of log analysts,
from logs with heavy use of these same types of trans- studies of their behavior are relevant to our study of log
formations [30, 26]. These points suggest that the activ- analysis. Researchers have studied system administra-
ity records we have collected may represent typical log tors to characterize their work environments and prob-
analysis usage, despite being gathered from only one sys- lems commonly faced [4], as well as the mental mod-
tem. els they form [13]. One study surveying 125 system
Techniques for log analysis: Published techniques for administrators discovered that accuracy, reliability, and
log analysis center around the main challenges in work- credibility are considered the most important features in
ing with logs, such as dealing with messy formats, tools [38]. Other researchers have called for more stan-
and solving event-based problems [23]. This includes dardization in system administration activities – such ef-
event and host clustering [20, 21], root failure diagno- forts will benefit from the data we present [9].
sis [8, 17], anomaly detection [18], dependency infer-

2
54  28th Large Installation System Administration Conference (LISA14) USENIX Association
Term Definition
event a raw, timestamped item of data indexed by Splunk, similar to a tuple or row in databases
field a key corresponding to a value in an event, similar to the concept of a column name
value part of an event corresponding to a certain field, similar to a particular column entry in a particular row
query a small program written in the Splunk query language, consisting of pipelined stages
stage a portion of a query syntactically between pipes; conceptually a single transformation
transformation an abstract category of similar commands e.g., filter or aggregate; each stage is a transformation
command the part of a stage that indicates what operation to apply to the data
argument the parts of a stage that indicate what fields, values, or option values to use with a command
interactive a query that is run when it is entered by the user into the search bar
scheduled a query that has been saved by a user and scheduled to run periodically like a cron job

Table 1: Terminology describing Splunk data.

Search engine query log studies: While we are unaware Splunk does not require the user to specify a schema for
of prior work that uses query logs to study analysis be- the data, because much log data is semi-structured or un-
havior, query logs are often used to study search engine structured, and there is often no notion of a schema that
user behavior. People have used search engine query can be imposed on the data a priori. Rather, fields and
logs to model semantic relationships [22], track user values are extracted from events at run time based on
preferences [35], and identify information needs [32]. the source type. Specifically, when a user defines a new
Techniques involve examining query terms and analyz- source type, Splunk guides the user in constructing reg-
ing user sessions [14, 33]. Due to data quality issues dis- ular expressions to extract fields and values from each
cussed in Section 4, we could not analyze user sessions, incoming raw event.
but other aspects of our current and previous work paral- Query llanguage Splunk includes a query language for
lel these techniques [2]. Employing some of these tech- searching and manipulating data and a graphical user in-
niques to examine data analysis activity logs is a promis- terface (GUI) with tools for visualizing query results.
ing avenue of future research. Going forward we ex- The query consists of a set of stages separated by the
pect that the study of human information seeking behav- pipe character, and each stage in turn consists of a com-
ior will be enriched through the study of analysis query mand and arguments. Splunk passes events through each
logs. stage of a query. Each stage filters, transforms or en-
riches data it receives from the previous stage, and pipes
it to the subsequent stage, updating the displayed results
3 Splunk Logs and Queries as they are processed. A simple example of a query is
a plain text search for specific strings or matching field-
We collected queries from Splunk1 , a platform for index- value pairs. A more complex example can perform more
ing and analyzing large quantities of data from heteroge- advanced transformations, such as clustering the data us-
neous data sources, especially machine-generated logs. ing k-means. Users can save certain queries and schedule
Splunk is used for a variety of data analysis needs, in- them to be run on a given schedule, much like a cron job.
cluding root cause failure detection, web analytics, A/B We call these queries scheduled queries.
testing and product usage statistics. Consequently, the
Graphical user interface Users almost always com-
types of data sets indexed in Splunk also span a wide
pose Splunk queries in the GUI. The default GUI view
range, such as system event logs, web access logs, cus-
displays the first several events indexed, with extracted
tomer records, call detail records, and product usage
fields highlighted on the left hand side, and a histogram
logs. This section describes the Splunk data collection
of the number of events over time displayed along the
and query language in more detail; Table 1 lists the ter-
top. A screen shot of this default view is shown in Fig-
minology introduced in this section.
ure 1. The user types their query into the search bar at
the top of this view. When the user composes their query
3.1 Overview in the GUI, we call it an interactive query.
When the user enters a query that performs a filter, the
Data collection To use Splunk, the user indicates the GUI updates to display events which pass through the
data that Splunk must index, such as a log directory on filter. When the user uses a query to add or transform
a file system. Splunk organizes this data into temporal a field, the GUI displays events in updated form. Most
events by using timestamps as delineators, and processes queries result in visualizations such as tables, time series,
these events using a MapReduce-like architecture [5]. and histograms, some of which appear in the GUI when
1 www.splunk.com the query is executed, in the “Visualization” tab (Fig-

3
USENIX Association 28th Large Installation System Administration Conference (LISA14)  55
ure 1). Users can also create “apps,” which are custom Total queries 203691
views that display the results of pre-specified queries, Interactive queries 18872
possibly in real time, which is useful for things like mon- Scheduled queries 184819
itoring and reporting. Although the set of visualizations Distinct scheduled queries 17085
Splunk offers does not represent the full breadth of all
possible visualizations, they still capture a large set of Table 2: Characteristics of the set of queries analyzed
standard, commonly used ones. from the Splunk logs.

each of the codes in the “status” field, and puts the corre-
3.2 An Example Splunk Query sponding descriptions into the “statusdesc” field.

The Splunk query language is modeled after the Unix count   status   statusdesc  
grep command and pipe operator. Below is an example
1   404   Not  Found  
query that provides a count of errors by detailed status
code: 1   500   Internal  Server  Error  

search error | stats count by status | lookup


statuscodes status OUTPUT statusdesc
This example has three stages: search, stats, and 4 Study Data
lookup are the commands in each stage, count by
and OUTPUT are functions and option flags passed to We collected over 200K Splunk queries. The data set
these commands, and “error”, “status”, “statuscodes”, consists of a list of timestamped query strings. Table 2
and “statusdesc” are arguments. In particular, “status” summarizes some basic information about this query
and “statusdesc” are fields. set.
To see how this query operates, consider the following We wrote a parser for this query language; the parser
toy data set: is freely available 2 . This parser is capable of parsing
over 90% of all queries in the data set, some of which
0.0   -­‐ error   404   may be valid failures, as the queries may be malformed.
0.5   -­‐ OK   200  
(This limitation only affects the cluster analysis in Sec-
tion 6.)
0.7   -­‐ error   500  
It is important to note that we do not have access to any
1.5   -­‐ OK   200   information about the data over which the queries were
issued because these data sets are proprietary and thus
The first stage of the query (search error) filters out all unavailable. Having access only to query logs is a com-
events not containing the word “error”. After this stage, mon occurrence for data analysis, and methodologies
the data looks like: that can work under these circumstances are therefore
important to develop. Further, by manually inspecting
0.0   -­‐ error   404   the queries and using them to partially reconstruct some
0.7   -­‐ error   500   data sets using the fields and values mentioned in the
queries, we are fairly certain that these queries were is-
sued over many different sources of data (e.g., web server
The second stage (stats count by status) aggregates logs, security logs, retail transaction logs, etc.), suggest-
events by applying the count function over events ing the results presented here will generalize across dif-
grouped according to the “status” field, to produce the ferent datasets.
number of events in each “status” group. It is also important to note that some of the queries la-
beled as interactive in our data set turned out to be pro-
count   status   grammatically issued from sources external to Splunk,
1   404   such as a user-written script. It is difficult to sepa-
1   500  
rate these mislabeled queries from the true interactive
queries, so we leave their analysis to future work, and
instead focus our analysis in this paper on scheduled
The final stage (lookup status codes status OUTPUT sta- queries.
tusdesc) performs a join on the “status” field between
the data and an outside table that contains descriptions of 2 https://github.com/salspaugh/splparser

4
56  28th Large Installation System Administration Conference (LISA14) USENIX Association
5 Transformation Analysis 90
80
70
N = 82680 stages

% stages
60
The Splunk query language is complex and supports a 50
40
wide range of functionality, including but not limited 30
20
to: reformatting, grouping and aggregating, filtering, re- 10
0
ordering, converting numerical values, and applying data
90
mining techniques like clustering, anomaly detection, 80 N = 17085 queries

% queries
70
and prediction. It has 134 distinct core commands at the 60
50
time of this writing, and commands are often added with 40
30
each new release. In addition, users and Splunk app de- 20
10
velopers can define their own commands. 0

C
Fi he
Ag r
M reg
Au ro
R e
Pr am
Tr ect
Jo sfo
R
In rde
O t
M ut
Se lla
M
W
Tr w
We originally attempted to analyze the logs in terms of

ac

en nt

eo

ut
lt e

ac at

pu r

is

et
an

in
an
oj e

in r m
g

gm

t ne
ce
p

a
do
sp
command frequencies, but it was difficult to general-

os
e

e
ou
ize from these in a way that is meaningful outside of

s
Figure 2: The distribution of data transformations that
Splunk [1]. So, to allow for comparisons to other log are used in log analysis. The top graph shows, for
analysis workflows and abstract our observations beyond each transformation, the percent of stages that apply that
the Splunk search language, we manually classified these transformation. The bottom graph shows, for each trans-
134 commands into 17 categories representing the types formation, the percent of queries that contain that trans-
of transformations encoded, such as filtering, aggregat- formation at least once (so the percents do not add to
ing, and reordering (Table 3). 100).
Note that because some Splunk commands are over-
loaded with functionality, several commands actually not report directly on them here due to data quality is-
perform multiple types of transformations, such as ag- sues (Section 4), anecdotally, it appears that interactive
gregation followed by renaming. In these cases, we cat- queries have a similar distribution except that the use of
egorized the command according to its dominant use Cache and Macro is less frequent, and the use of Input
case. is more frequent.
We use this categorization scheme to answer the follow- For each transformation type, we also computed the
ing questions about log analysis activity: number of queries that used that transformation (Fig-
ure 2). This gives us some idea of how many of the
• How are the individual data transformations statisti-
queries would be expressible in a restricted subset of the
cally distributed? What are the most common transfor-
language, which is interesting because it tells us the rel-
mations users perform? What are the least common?
ative importance of various transformations.
• How are sequences of transformations statistically dis-
tributed? What type of transformations do queries usu- From this we see that Filter transformations are ex-
ally start with? What do they end with? What transfor- tremely important – 99% of scheduled queries use such
mations typically follow a given other transformation? transformations. Without Aggregate transformations,
• How many transformations do users typically apply in 42% of scheduled queries would not be possible. Around
a given query? What are the longest common subse- a quarter of queries use Augment, Rename, and Project
quences of transformations? transformations, and 17% use commands that Trans-
form columns.
In contrast, Joins are only used in 6% of scheduled
5.1 Transformation Frequencies queries. This possible difference from database work-
loads could be because log data is not usually relational
We first counted the number of times that each trans- and generally has no schema, so it may often not have in-
formation was used (Figure 2). The most common are formation that would satisfy key constraints needed for
Cache (27% of stages), Filter (26% of stages), Aggre- join, or it may already be sufficiently denormalized for
gate (10% of stages), Macro (10% of stages),and Aug- most uses. It could also be because these are scheduled
ment (9% of stages). Scheduled queries are crafted and queries, and expensive Join operations have been opti-
set up to run periodically, so the heavy use of caching and mized away, although again anecdotally the interactive
macros is unsurprising: Splunk adds caching to sched- queries do not suggest this. Reorder transformations
uled queries to speed their execution, and macros capture are also used only 6% of the time – log events are al-
common workflows, which are likely to be discovered by ready ordered by time by Splunk, and this is probably
users after the iterative, ad hoc querying that results in often the desired order. Input and Output transforma-
a “production-ready” scheduled query. Although we do tions are used in only 2% of scheduled queries – these

5
USENIX Association 28th Large Installation System Administration Conference (LISA14)  57
Transformation Description Top Commands % Queries Examples
coalesce values of a given field or stats 86.0 stats sum(size kb)
Aggregate fields (columns) into one summary timechart 9.0 timechart count by region
value top 3.0 top hostname
eval 57.0 eval pct=count/total*100
add a field (column) to each event,
Augment appendcols 19.0 spath input=json
usually a function of other fields
rex 15.0 rex "To: (?<to>.*)"
write to or read from cache for fast summaryindex 98.0 summaryindex namespace=foo
Cache
processing sitimechart 30.0 sitimechart count by city
search 100.0 search name="alspaugh"
remove events (rows) not meeting
Filter where 7.0 where count > 10
the given criteria
dedup 4.0 dedup session id
input events into the system from inputlookup 88.0 inputlookup data.csv
Input
elsewhere
join two sets of events based on join 82.0 join type=outer ID
Join
matching criteria lookup 16.0 lookup
apply user-defined sequence of ‘sourcetype metrics‘ 50.0 ‘sourcetype metrics‘
Macro
Splunk commands ‘forwarder metrics‘ 13.0 ‘forwarder metrics‘
localop 83.0 localop
Meta configure execution environment
commands that do not fit into other noop 39.0 noop
Miscellaneous
categories
Output write results to external storage or outputlookup outputlookup results.csv
send over network
remove all columns except those se- table 80.0 table region total
Project
lected fields 22.0 fields count
Rename rename fields rename 100.0 rename cnt AS Count
reorder events based on some crite- sort 100.0 sort - count
Reorder
ria
append 66.0 append [...]
Set perform set operations on data
set 40.0 set intersect [...] [...]
mutate the value of a given field for fillnull 96.0 fillnull status
Transform
each event convert 2.0 convert num(run time)
swap events (rows) with fields transpose 100.0 transpose
Transpose
(columns)
add fields that are windowing func- streamstats 90.0 streamstats first(edge)
Window tions of other data

Table 3: Manual classification of commands in the Splunk Processing Language into abstract transformations cate-
gories. For each transformation category, the Top Commands column shows the most-used commands in that category.
The % Queries column shows, for all queries containing a given transformation, what percent of queries contained
that command.

58  28th Large Installation System Administration Conference (LISA14) USENIX Association
again could have been optimized away, or possibly cap- quently applied transformation. It also makes sense in
tured in Macros. Lastly, the other transformations are the context of log analysis – logging collects a great deal
used in nearly zero queries. In the case of Windowing of information over the course of operation of a system,
transformations, this could be because windowed opera- only a fraction of which is likely to be relevant to a given
tions are accomplished “manually” through sequences of situation. Thus it is almost always necessary to get rid of
Augment transformations or via overloaded commands this extraneous information. We investigate Filter, Ag-
that were classified as other transformation types. We gregate, and Augment transformations in more detail in
were surprised such operations were not more common. Section 6 to explain why these also appear in common
In the case of the others, such as Transpose, it is more pipelines.
likely because log data is rarely of the type for which These transformations sequences may seem simple com-
such operations are needed. pared to some log analysis techniques published in con-
ferences like KDD or DSN [20, 25]. These pipelines
more closely correspond to the simpler use cases de-
5.2 Transformation Pipelines scribed in the Dapper or Sawzall papers [34, 30]. There
are many possible explanations for this: Most of the
Next, for each pair of transformation types, we counted problems faced by log analysts may not be data mining
the number of times within a query that the first trans- or machine learning problems, and when they are, they
formation of the pair was followed by the second trans- may be difficult to map to published data mining and
formation of the pair. We used these counts to compute, machine learning algorithms. Human intuition and do-
for each transformation, how frequently each of the other main expertise may be extremely competitive with state
transformation types followed it in a query. of the art machine learning and other techniques for a
We used these frequencies to create a state machine wide variety of problems – simple filters, aggregations
graph, as shown in Figure 3. Each node is a type of and transformations coupled with visualizations are pow-
transformation, and each edge from transformation A to erful tools in the hands of experts. Other reasons are
a transformation B indicates the number of times B was suggested by user studies and first-hand industry expe-
used after A as a fraction of the number of times A was rience [23, 38]. Users may prefer interpretable, eas-
used. Also included as nodes are states representing the ily adaptable approaches over black-boxes that require
start of a query, before any command has been issued, lots of mathematical expertise. It is worth further in-
and the end of a query, when no further commands are vestigating the types of analysis techniques currently in
issued. The edges between these nodes can be thought of widespread use and assess how the research on analysis
as transition probabilities that describe how likely a user techniques can better address practitioner needs.
is to issue transformation B after having issued transfor- We hypothesize that one important variable determining
mation A. what transformation sequences are most often needed is
Using these graphs, we can discover typical log analysis the data type. Thus, we created more focused state ma-
pipelines employed by Splunk users. We exclude from chine graphs for two commonly analyzed source types by
presentation sequences with Cache transformations, as pulling out all queries that explicitly specified that source
those have in most cases been automatically added to type4 : Figure 4 shows the analysis applied to server ac-
scheduled queries by Splunk to optimize them, as well cess logs, used for web analytics (measuring traffic, re-
as Macros, because these can represent any transfor- ferrals, and clicks). Figure 5 shows the results on operat-
mation, so we do not learn much by including them. ing system event logs (analyzing processes, memory and
The remaining top transformation pipelines by weight CPU usage). These figures suggest that indeed, query
(where the weight of a path is the product of its edges) patterns can be expected to differ significantly depend-
are: ing on the type of data being analyzed. This could be
• Filter due to the domain of the data, which could cause the
• Filter | Aggregate types of questions asked to vary, or it could be due to
• Filter | Filter 3 the format of the data. For example web logs may have
• Filter | Augment | Aggregate a more regular format, allowing users to avoid the con-
• Filter | Reorder voluted processing required to normalize less structured
• Filter | Augment data sources.
The preponderance of Filter transformations in typical Other important factors likely include who the user is and
pipelines is not surprising given that it is the most fre- what problems they are trying to solve. For example, in
3 These can be thought of as one Filter that happened to be applied 4 Source type can be specified in Filter transformations – this is

in separate consecutive stages. what we looked for.

7
USENIX Association 28th Large Installation System Administration Conference (LISA14)  59
for clarity.
0.09

0.7
0.09

1.0

Negligible Transpose
0.06

0.08
0.06

0.1
0.07

0.18 Input
0.27 Rename

0.09 0.45 Output


0.93 0.4
0.85 0.31 0.2
Transform 0.1
0.2 End
Negligible Window
Aggregate 0.35
Macro 0.06 0.38
0.03 Reorder
0.1 0.2
0.1

8
0.13 0.22 0.56
0.43 0.22
0.24 0.29 0.11
0.11 0.33 0.14
Cache
0.36 0.6
0.16

60  28th Large Installation System Administration Conference (LISA14)


0.08 0.09 Misc
Augment Project
0.09 Negligible 0.03

0.08
0.1 Set
0.08
0.05

0.09 0.66
0.09 0.02
0.97 0.82
Start Filter Join
0.31
0.08
0.16
0.08

Meta
0.2 0.83

0.52
0.08

between the command categories described in the text. Only edges with weight greater or equal to .05 are shown,
Figure 3: State machine diagram describing, for all distinct scheduled queries, the pairwise transition frequency

USENIX Association
0.86
1.0
Aggregate End
1.0 1.0
Start Filter 0.07
0.07 Project
1.0
1.0
Transform Augment

256 distinct queries

Figure 4: The pairwise transition frequency between transformations for web access log queries.
0.38

0.64

0.62 Join

1.0 0.33
Aggregate Augment 0.16
0.36
0.96
0.49 Macro
0.02 Project 1.0
1.0
Start Filter
0.02 0.01 End
1.0
Input 1.0
Output
Reorder
1.0

46 distinct queries

Figure 5: The pairwise transition frequency between transformations for OS event log queries.

the case of web access log data, an operations user will ing section.
want to know, “Where are the 404s?5 Are there any hosts
that are down? Is there a spike in traffic that I should add
capacity for?” A marketer will want to know, “What key- 6 Cluster Analysis
words are people searching today after my recent press
release? What are the popular webinars viewed on my Recall from Section 5 that three of the most common
website?” A salesperson may ask, “Of the visitors today, transformation types in log analysis are Filter, Aggre-
how many are new versus returning, and how can I figure gate and Augment. To find out more details about why
out whom to engage in a sales deal next based on what and how such transformations are used, we clustered
they’re looking for on the web site?” Capturing this sup- query stages containing these types of transformations,
plemental information from data analysis tools to include and then examined the distribution of transformations
in the analysis would be useful for later tailoring tools to across these clusters. Clustering provides an alterna-
particular use cases. We have gathered some informa- tive to manually looking through thousands of examples
tion about this (Section 7) but unfortunately we could to find patterns. Similar conclusions would likely have
not cross-reference this data with query data. been arrived at using manual coding techniques (i.e.,
content analysis), but this would have been more time-
5.3 Longest Subsequences consuming.
In clustering these transformations, we investigate the
To investigate what longer, possibly more complex, following sets of questions:
queries look like, we looked at the longest common sub- • What are the different ways in which Filter, Aggre-
sequences of transformations (Table 4). Again, we ex- gate, and Augment transformations are applied, and
cluded Cache and Macro transformations from presen- how are these different ways distributed?
tation. We again see the preponderance of Filter, Ag- • Can we identify higher-level tasks and activities by
gregate, and Augment transformations. Beyond that, identifying related clusters of transformations? Do
the most striking feature is the preponderance of Aug- these clusters allow us to identify common workflow
ment transformations, particularly in the longer subse- patterns? What can we infer about the user’s informa-
quences. To gain more insight into exactly what such tion needs from these groups?
sequences of Augment transformations are doing, we • How well do the commands in the Splunk query lan-
look more closely at such transformations in the follow- guage map to the tasks users are trying to perform?
5 404 is an HTTP standard response code indicating the requested What implications do the clusters we find have on data
resource was not found. transformation language design?

9
USENIX Association 28th Large Installation System Administration Conference (LISA14)  61
Length Count % Queries Subsequence
2 2866 16.77 Transform | Aggregate
2 2675 6.13 Augment | Augment
2 2446 14.06 Filter | Aggregate
2 2170 12.70 Aggregate | Rename
2 1724 8.42 Filter | Augment
3 2134 12.49 Transform | Aggregate | Rename
3 1430 4.00 Augment | Augment | Augment
3 746 4.24 Aggregate | Augment | Filter
3 718 4.20 Aggregate | Join | Filter
3 717 4.20 Aggregate | Project | Filter
4 710 4.16 Aggregate | Project | Filter | Rename
4 710 4.16 Transform | Aggregate | Augment | Filter
4 694 2.71 Augment | Augment | Augment | Augment
4 472 2.73 Filter | Augment | Augment | Augment
4 234 1.37 Augment | Augment | Augment | Project
5 280 1.62 Filter | Augment | Augment | Augment | Augment
5 222 1.30 Augment | Augment | Augment | Augment | Project
5 200 0.61 Augment | Augment | Augment | Augment | Augment
5 171 1.00 Augment | Augment | Augment | Augment | Filter
5 167 0.98 Filter | Augment | Augment | Augment | Aggregate
6 161 0.94 Augment | Augment | Augment | Augment | Filter | Filter
6 160 0.94 Augment | Augment | Filter | Filter | Filter | Augment
6 160 0.94 Augment | Augment | Augment | Filter | Filter | Filter
6 148 0.87 Filter | Augment | Augment | Augment | Augment | Filter
6 102 0.60 Augment | Aggregate | Augment | Augment | Augment | Augment

Table 4: Longest common subsequences of transformations along with count of how many times such sequences
appeared, and the percent of queries they appeared in.

To cluster each set of transformations, we: all distinct Filter stages and discovered 11 cluster types
(1) parsed each query (see: Section 4) using 26 features7 (Figure 6). Some of the clusters over-
(2) extracted the stages consisting of the given transfor- lap, in that some examples could belong to more than one
mation type, group. We discuss how we resolve this below.
(3) converted the stages into feature vectors, The most common application of Filter is to use
(4) projected these feature vectors down to a lower di- multi-predicate logical conditions to refine an event
mensional space using PCA, set, where these predicates are themselves filters of
(5) projected these features further down into two dimen- the other types, such as those that look for matches
sions, to allow visualization of the clusters, using t- of a given field (e.g., search status=404), or
SNE [37], and lastly those that look for any event containing a specified
(6) manually identified and labeled clusters in the data. string (e.g., search "Authentication failure for
Then, to count the number of transformations in each user: alspaugh"). When a Filter could go into mul-
cluster, we use a random sample of 300 labeled exam- tiple categories, it was placed into this one, which also
ples from the clustering step to estimate the true propor- contains Filters with many predicates of the same type in
tion of stages in each cluster within 95% confidence in- a statement with many disjunctions and negations. Thus,
tervals. 6 it is the largest category. Considering each filter pred-
icate individually might be more informative; we leave
that to future work.
6.1 Types of Filters Another common Filter pulls data from a given source,
index, or host (like a SELECT clause in SQL). These re-
Filter stages primarily consist of the use of the search
semble Filters that look for a match on a given field, but
command, which almost all Splunk queries begin with,
return all events from a given source rather than all events
and which allows users to both select events from a
with a specific value in a specific field.
source and filter them in a variety of ways. We clustered
Other types of filters include those that deduplicate
6 Assuming cluster distribution is multinomial with k parameters
−1 −1
events, and those that filter based on time range, index,
pi we use the formula n = k(.05/1.96)
(1−k )
2 (which assumes each cluster is regular expression match, or the result of a function eval-
equally likely) to estimate the sample size required to estimate the true
parameters with a 95% confidence interval. The maximum required
size was 246. 7 See Section 11 for more information about the features used.

10
62  28th Large Installation System Administration Conference (LISA14) USENIX Association
90 90
80 80

% transformations
% transformations

70 70
60 60
50 50
40 40

30 30

20 20

10 10

0 0

Ag

Vi

Vi

Ag

G
Fi

Fi

Se by

Fi lica

Fi

Fi
Fi

Fi

ro
su

su
ed

se

se

gr

gr
lte

lte

lte

lte

lte
lte

lte cro nge

up
le

al

al
eg

eg
s

s
up
rs

rs

rs

rs

rs
rs

rs
ct

iz

iz
m

su ult

by
at

at
s

e
by

by

by

by
b

by
a
y

io

e,
bs

ag

ag

tim
lo

st

tim

in

re
sp gica

re

n
te

ea f fu

so
gr

gr
ng

de

ge
s

e
s
ec

in

rt,
rc

eg

eg
g

x
ify
lo

ra

at

at

l
c

im
in

on

io

io

it
g

n
ta

nc
fie tion
lc

ov
in

tio
on

s
d

er
s

n
di

tim
e
Types of Filter Transformations Types of Aggregate Transformations

Figure 6: Distribution of different types of Filter trans- Figure 7: Distribution of different types of Aggregate
formations. transformations.
uation on the fields of each event. Lastly, some Filter type. Another 10% of Aggregates do this, but then also
transformations include the use of macros, others, the prepare the output for visualization in a a chart rather
use of subsearches, the results of which are used as ar- than simply return the results (see the “Visualization” tab
guments to further constrain the current filter. discussed in Section 3). Another common type of Ag-
These use cases reveal several things: gregate is similar to these, but first buckets events tem-
porally, aggregates each bucket, and displays the aggre-
• It is helpful to be able to simultaneously treat log data
gated value over time in a histogram. Another type first
both as structured (field-value filters, similar to SQL
aggregates, then sorts, then returns the top N results (e.g.,
WHERE clauses) and as unstructured (string-contains
top user). The last type groups by time, but not neces-
searches, similar to grep).
sarily into uniformly sized buckets (e.g., when forming
• Some commands in Splunk, like search, are heavily
user sessions).
overloaded. A redesign of the language could make it
easier to identify what users are doing, by bringing the The takeaways from this are:
task performed and the command invoked to perform • Visualizing the results of aggregations is reasonably
it more in line with one another. For example, there popular, though much of the time, simply viewing a
could be a distinct command for each task identified table of the results suffices. Aggregations lead to the
above. This might also form a more intuitive basis on types of relational graphics that many people are fa-
which to organize a data transformation language or miliar with, such as bar and line graphs [36]. Users
interface, but would need to be evaluated for usability. might also appreciate having the ability to more easily
• Though it may appear that time range searches are not visualize the result of Filter transformations as well;
as prevalent as might have be suspected given the im- for example, using brushing and linking. 8
portance of the time dimension in log data, this is be- • For log analysis, when visualization is used, it is more
cause the time range is most often encoded in other pa- likely to visualize an aggregate value over buckets of
rameters that are passed along with the query. So time time than aggregated over all time.
is still one of the most important filter dimensions for
log analysis, but this is not reflected in these results.
6.3 Types of Augments
Augments add or transform a field for each event. The
6.2 Types of Aggregates most commonly used such command is eval, which is
another example of a heavily overloaded command. We
We discovered five Aggregate cluster types using 46 fea- discovered eight classes of Augment use by clustering
tures (Figure 7). The most common Aggregate com- over 127 features (Figure 8). These classes shed light
mand is stats, which applies a specific aggregation onto the results of Section 5 and reveal what some of
function to any number of fields grouped by any number 8 Brushing and linking is an interactive visualization technique
of other fields and returns the result. Most often, com-
wherein multiple views of data are linked and data highlighted in one
monplace aggregation functions like count, avg, and view (i.e., a filter) appears also highlighted in the other view (i.e., a bar
max are used. Almost 75% of Aggregates are of this graph or heat map).

11
USENIX Association 28th Large Installation System Administration Conference (LISA14)  63
90 7 Usage Survey
80

70
% transformations

60
The analytic results open many questions about usage
50 goals that can best be answered by talking to the peo-
40 ple who use the system. To this end, we administered a
30
survey to Splunk sales engineers and obtained responses
20

10
that describe the use cases, data sources, roles, and skill
0
sets of 39 customer organizations. Note: these are not
responses directly from customers, rather each sales en-
St

Ar

As

As

U
on

at

se
ul
rin

ith

si

si
e

ti-

s
di

gn

gn
gineer answered each question once for each of three
g

or

va

su
tio
m

et

s
lu
tim

bs
na
an

ic

to

si
e

m
customers, based on their firsthand knowledge and ex-

ea
ls
ip

gr
op

p
ul

ca
ta

ou

rc
le
e
at

te

h
lc

ra

va
io

perience working with those customers. Figure 9 sum-


m

ul

tio
n

lu
en

at

e
io
t

Types of Augment Transformations marizes the results visually.

Figure 8: Distribution of different types of Augment 7.1 Survey Results


transformations.
The main results are:
the long pipelines full of Augment transformations were User roles: The bulk of Splunk users are in IT and en-
likely doing. gineering departments, but there is an important emerg-
The most common ways users transform their ing class of users in management, marketing, sales, and
data are by manipulating strings (e.g., eval finance. This may be because more business divisions
name=concat(first, " ", last)), conditionally are interleaving one or more machine generated log data
updating fields (e.g., eval type=if(status>=400, sources for business insights.
"failure", "success")), performing arithmetic Programming experience: Although most Splunk users
(e.g., eval pct=cnt/total*100), calculating date- are technically savvy, most only have limited to moderate
time information (e.g., eval ago=now()- time), amounts of programming experience.
applying multi-valued operations (e.g., eval
Splunk experience: Surprisingly, many of the cus-
nitems=mvcount(items)), or simple value as-
tomers reported on did not consistently have expertise
signments (e.g., eval thresh=5). Other Augment
with Splunk, in fact, some users had no Splunk experi-
operations add a field that indicates which group an
ence. This may be an artifact of the fact that the survey
event belongs to and still others use the results of a
respondents were sales engineers, who may have opted
subsearch to update a field.
to reply about more recent or growing customer deploy-
These tasks reveal that: ments.
• Aside from filtering and aggregation, much of log Use cases: Along with the main user roles, the main use
analysis consists of data munging (i.e., translating cases are also IT-oriented, but, consistent with the other
data from one format into another, such as convert- responses, Splunk is sometimes used to analyze business
ing units, and reformatting strings). This is supported data.
by other studies of data analysis in general [28]. Such
Data sources: Correspondingly, the main type of data
data munging transformations could be mined to cre-
explored with Splunk is typical IT data: logs from web
ate more intelligent logging infrastructure that outputs
servers, firewalls, network devices, and so on. However,
data in form already more palatable to end-users, or
customers also used Splunk to explore sales, customer,
could be incorporated into an automated system that
and manufacturing data.
converts raw logs into nicely structured information.
The more complicated transformations should be eval- Transformations applied: Customers primarily use
uated to identify whether the tool could be made more Splunk to extract strings from data, perform simple arith-
expressive. metic, and manipulate date and time information. In
• Just as with Filter transformations, here we observe some cases, customers perform more complex operations
heavily overloaded commands (i.e., eval). Refac- such as outlier removal and interpolation.
toring functionality to clean up the mapping between Statistical sophistication: Customers generally do not
tasks and commands would help here for the same rea- use Splunk to perform very complicated statistical analy-
sons. sis, limiting themselves to operations like computing de-
scriptive statistics and looking for correlations. In one
instance, a customer reported having a team of “math

12
64  28th Large Installation System Administration Conference (LISA14) USENIX Association
Figure 9: Summary of survey answers. Each vertical line represents a customer. Each colored grouping represents a
different question and each row in the group represents one possible response to that question. A dot is present along
a given column and row if the option corresponding to that row was selected for the question in that group, for the
customer in that column.

13
USENIX Association 28th Large Installation System Administration Conference (LISA14)  65
junkies” that exported data out of Splunk, ran “very so- 8 Conclusion
phisticated batch analytics,” and then imported those re-
sults back into Splunk for reporting.
In this paper we presented detailed, quantitative data de-
Data mash ups: The degree to which customers com-
scribing the process of log analysis. While there have
bine data sources in their analysis varies across individ-
been a number of system administrator user studies, there
ual users and organizations. Some organizations almost
have been few if any quantitative reports on traces of user
always combine data sources for their analysis while
behavior from an actual log analysis system at the level
a nearly equal number almost never do. This could
of detail we provide. In addition, we provide qualitative
be in part due to diversity in Splunk expertise and use
survey data for high-level context. Together these are
cases.
important sources of information that can be used to to
Other tools: To better understand the ecosystem in inform product design, guide user testing, construct sta-
which Splunk exists, we asked what other data analysis tistical user models, and even create smart interfaces that
tools customers used. In keeping with their IT-oriented make recommendations to users to enhance their analysis
roles and use cases, command line tools are frequently capabilities. We first summarize our main observations,
used by most Splunk users, in addition to databases, then follow with a call to action for current tool builders
scripting languages, and desktop visualization tools like and future researchers.
Tableau. A significant number of customers used custom
Filtering: In our observations, a large portion of log
in-house applications for analyzing their data. A rela-
analysis activity in Splunk consists of filtering. One pos-
tively small number used cluster computing frameworks
sible explanation is that log analysis is often used to solve
or analysis languages like MATLAB.
problems that involve hunting down a few particular
Based on these results, we make the following predic- pieces of data – a handful of abnormal events or a partic-
tions. ular record. This could include account troubleshooting,
• IT and engineering professionals will be increasingly performance debugging, intrusion detection, and other
called upon to use their expertise working with ma- security-related problems. Another possible explanation
chine data to aid other business divisions in their is that much of the information collected in logs, e.g.,
information-seeking needs, and will gain some exper- for debugging during development, is not useful for end-
tise in these other domains as a result (deduced from users of the system. In other words, logs include many
user role and use case data). different types of data logged for many different reasons,
• Classic tools of the trade for system administrators and and the difference between signal and noise may depend
engineers will be increasingly picked up by less tech- on perspective.
nical users with other types of training, causing an Reformatting: Our analysis of Augment transforma-
evolution in both the features offered by the tools of tions suggested that most of these transformations were
the trade as well as the skills typically held by these for the purpose of data munging, or reformatting and
other users (deduced from user role data). Although cleaning data. The prevalence of reformatting as a por-
it is likely that more people in a growing variety of tion of log analysis activity is likely reflective of the fact
professions will learn how to program over the com- that much log data is structured in an inconsistent, ad
ing years, the market for log and data analysis tools hoc manner. Taken together, the prevalence of filter-
that do not require programming experience will likely ing and reformatting activity in Splunk suggest that it
grow even faster (deduced from programming experi- may be useful for system developers to collaborate with
ence data). the end users of such systems to ensure that data use-
• There is still no “one stop shop” for data analysis and ful for the day-to-day management of such systems is
exploration needs – customers rely on a variety of collected. Alternatively, another possible explanation is
tools depending on their needs and individual exper- that the Splunk interface is not as useful for other types
tise (based on the other tools data). This may be due of analysis. However, other reports indicate that indeed,
to the existence of a well-established toolchain where much of data analysis in general does involve a lot of data
different components are integrated into a holistic ap- munging [28].
proach, not used disparately. Better understanding of
Summarization: We observed that it is common in
which parts of different tools draw users would help
Splunk to Aggregate log data, which is a way of summa-
both researchers and businesses that make data analy-
rizing it. Summarization is a frequently-used technique
sis products understand where to focus their energies.
in data analysis in general, and is used to create some
of the more common graph types, such as bar charts
and line graphs [36]. This suggests it may be useful
to automatically create certain types of summarization

14
66  28th Large Installation System Administration Conference (LISA14) USENIX Association
to present to the user to save time. In log analysis with If the mapping between analysis tasks and analysis repre-
Splunk, summarizing with respect to the time dimension sentation (i.e., the analysis language) were less muddied,
is an important use case. it would alleviate some of the difficulties of analyzing
The complexity of log analysis activity: We were not this activity data and pave the way for easier modeling of
able to determine whether Splunk users make use of user behavior.
some of the more advanced data mining techniques pro- Opportunities for predictive interfaces: Thinking for-
posed in the literature, such as techniques for event clus- ward, detailed data on user behavior can be fed into ad-
tering and failure diagnosis [20, 25]. One possible ex- vanced statistical models that return predictions about
planation for this is that due to the complexity and vari- user actions. Studies such as the one we present are im-
ability of real world problems, as well as of logged in- portant for designing such models, including identifying
formation, designing one-size-fits-all tools for these sit- what variables to model and the possible values they can
uations is not feasible. Alternatively, such analyses may take on. Other important variables to model could in-
occur using custom code outside of Splunk or other an- clude who the user is, where their data came from, and
alytics products as part of a large toolchain, into which what problems they are trying to solve. These could be
we have little visibility. This idea is supported by some used to provide suggestions to the user about what they
of the Splunk survey results (Section 7). Other possi- might like to try, similar to how other recently successful
ble explanations include lack of problems that require tools operate [27].
complicated solutions, lack of expertise, or requirements Further analyses of data analysis activity: Finally, in
that solutions be transparent, which may not be the case this paper, we only presented data analysis activity from
for statistical techniques. It could also be the case that one system. It would be informative to compare this to
such techniques are used, but are drowned out by the data analysis activity from other systems, and on other
volume of data munging activity. Finally, it may be that types of data besides log data. Thus, we make our anal-
we simply were not able to find more complex analyt- ysis code public so others can more easily adapt and
ics pipelines because programmatically identifying such apply our analysis to more data sets and compare re-
higher-level activities from sequences of smaller, lower- sults.
level steps is a difficult problem.
Log analysis outside of IT departments: Our sur-
vey results also suggest that log analysis is not just for 10 Acknowledgments
IT managers any longer; increasing numbers of non-
technical users need to extract business insights from Thanks to David Carasso for being the go-to person for
logs to drive their decision making. answering obscure questions about Splunk. Thanks to
Ari Rabkin and Yanpei Chen for feedback on early drafts.
This research is supported in part by NSF CISE Expe-
ditions Award CCF-1139158, LBNL Award 7076018,
9 Future Work and DARPA XData Award FA8750-12-2-0331, and gifts
from Amazon Web Services, Google, SAP, The Thomas
Need for integrated provenance collection: Under- and Stacey Siebel Foundation, Adobe, Apple, Inc.,
standably, most data that is logged is done so for the pur- Bosch, C3Energy, Cisco, Cloudera, EMC, Ericsson,
pose of debugging systems, not building detailed models Facebook, GameOnTalis, Guavus, HP, Huawei, Intel,
of user behavior [3]. This means that much of the con- Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware,
textual information that is highly relevant to understand- and Yahoo!.
ing user behavior is not easily available, and even basic
information must be inferred through elaborate or unreli-
able means [10]. We hope to draw attention to this issue 11 Availability
to encourage solutions to this problem.
Improving transformation representation: In the pro- The code used to generate the facts and figures presented
cess of analyzing the query data, we encountered diffi- in this paper, along with additional data that was not in-
culties relating to the fact that many commands in the cluded due to lack of space, can be found at:
Splunk language are heavily overloaded and can do many https://github.com/salspaugh/lupe
different things. For example, stats can both aggregate
and rename data. When this is the case, we are more
likely to have to rely on error-prone data mining tech-
References
niques like clustering and classification to resolve ambi- [1] A LSPAUGH , S., ET AL . Towards a data analysis recommendation
guities involved in automatically labeling user activities. system. In OSDI Workshop on Managing Systems Automatically

15
USENIX Association 28th Large Installation System Administration Conference (LISA14)  67
and Dynamically (MAD) (2012). [22] N ORIAKI , K., ET AL . Semantic log analysis based on a user
query behavior model. In IEEE International Conference on Data
[2] A LSPAUGH , S., ET AL . Building blocks for exploratory data
analysis tools. In KDD Workshop on Interactive Data Explo- Mining (ICDM) (2003).
ration and Analytics (IDEA) (2013). [23] O LINER , A., ET AL . Advances and challenges in log analysis.
[3] A LSPAUGH , S., ET AL . Better logging to improve interactive ACM Queue (2011).
data analysis tools. In KDD Workshop on Interactive Data Ex- [24] O LINER , A., AND S TEARLEY, J. What supercomputers say: A
ploration and Analytics (IDEA) (2014). study of five system logs. In IEEE/IFIP International Conference
[4] BARRETT, R., ET AL . Field studies of computer system admin- on Dependable Systems and Networks (DSN) (2007).
istrators: Analysis of system management tools and practices. [25] O LINER , A. J., ET AL . Using correlated surprise to infer shared
In ACM Conference on Computer Supported Cooperative Work influence. In IEEE/IFIP International Conference on Depend-
(CSCW) (2004). able Systems and Networks (DSN) (2010).
[5] B ITINCKA , L., ET AL . Optimizing data analysis with a semi- [26] O LSTON , C., ET AL . Pig latin: A not-so-foreign language for
structured time series database. In OSDI Workshop on Manag- data processing. In ACM International Conference on Manage-
ing Systems via Log Analysis and Machine Learning Techniques ment of Data (SIGMOD) (2008).
(SLAML) (2010).
[27] OTHERS , S. K. Wrangler: Interactive visual specification of data
[6] C HEN , Y., ET AL . Design implications for enterprise storage sys- transformation scripts. In ACM Conference on Human Factors in
tems via multi-dimensional trace analysis. In ACM Symposium on Computing Systems (CHI) (2011).
Operating Systems Principles (SOSP) (2011).
[28] OTHERS , S. K. Enterprise data analysis and visualization: An in-
[7] C HEN , Y., ET AL . Interactive analytical processing in big data terview study. In Visual Analytics Science & Technology (VAST)
systems: A cross-industry study of mapreduce workloads. Inter- (2012).
national Conference on Very Large Databases (VLDB) (2012).
[29] PANG , R., ET AL . Characteristics of internet background ra-
[8] C HIARINI , M. Provenance for system troubleshooting. In diation. In ACM SIGCOMM Internet Measurement Conference
USENIX Conference on System Administration (LISA) (2011). (IMC) (2004).
[9] C OUCH , A. L. ”standard deviations” of the ”average” system [30] P IKE , R., ET AL . Interpreting the data: Parallel analysis with
administrator. USENIX Conference on System Administration sawzall. Scientific Programming Journal (2005).
(LISA), 2008.
[31] P INHEIRO , E., ET AL . Failure trends in a large disk drive popu-
[10] G OTZ , D., ET AL . Characterizing users’ visual analytic activity lation. In USENIX Conference on File and Storage Technologies
for insight provenance. IEEE Information Visualization Confer- (FAST) (2007).
ence (InfoVis) (2009).
[32] P OBLETE , B., AND BAEZA -YATES , R. Query-sets: Using im-
[11] G RAY, J. Why do computers stop and what can be done about it? plicit feedback and query patterns to organize web documents. In
Tech. rep., 1985. International Conference on World Wide Web (WWW) (2008).
[12] H AUGERUD , H., AND S TRAUMSNES , S. Simulation of user- [33] R ICHARDSON , M. Learning about the world through long-term
driven computer behaviour. In USENIX Conference on System query logs. ACM Transactions on the Web (TWEB) (2008).
Administration (LISA) (2001).
[34] S IGELMAN , B. H., AND OTHERS . Dapper, a large-scale dis-
[13] H REBEC , D. G., AND S TIBER , M. A survey of system admin- tributed systems tracing infrastructure. Tech. rep., Google, Inc.,
istrator mental models and situation awareness. In ACM Confer- 2010.
ence on Computer Personnel Research (SIGCPR) (2001).
[35] S ILVESTRI , F. Mining query logs: Turning search usage data
[14] JANSEN , B. J., ET AL . Real life information retrieval: A study into knowledge. Foundations and Trends in Information Retrieval
of user queries on the web. SIGIR Forum (1998). (2010).
[15] K LEIN , D. V. A forensic analysis of a distributed two-stage web- [36] T UFTE , E., AND H OWARD , G. The Visual Display of Quantita-
based spam attack. In USENIX Conference on System Adminis- tive Information. Graphics Press, 1983.
tration (LISA) (2006).
[37] VAN DER M AATEN , L., AND H INTON , G. Visualizing high-
[16] L AENDER , A. H. F., R IBEIRO -N ETO , B. A., DA S ILVA , A. S., dimensional data using t-sne. Journal of Machine Learning Re-
AND T EIXEIRA , J. S. A brief survey of web data extraction tools. search (JMLR) (2008).
SIGMOD Record (2002).
[38] V ELASQUEZ , N. F., ET AL . Designing tools for system ad-
[17] L AO , N., ET AL . Combining high level symptom descriptions ministrators: An empirical test of the integrated user satisfaction
and low level state information for configuration fault diagnosis. model. In USENIX Conference on System Administration (LISA)
In USENIX Conference on System Administration (LISA) (2004). (2008).
[18] L OU , J.-G., ET AL . Mining dependency in distributed systems [39] X U , W., ET AL . Experience mining google’s production console
through unstructured logs analysis. SIGOPS Operating System logs. In OSDI Workshop on Managing Systems via Log Analysis
Review (2010). and Machine Learning Techniques (SLAML) (2010).
[19] L OU , J.-G., F U , Q., WANG , Y., AND L I , J. Mining depen-
dency in distributed systems through unstructured logs analysis.
SIGOPS Operating System Review (2010).
[20] M AKANJU , A. A., ET AL . Clustering event logs using iterative
partitioning. In ACM International Conference on Knowledge
Discovery and Data Mining (KDD) (2009).
[21] M ARMORSTEIN , R., AND K EARNS , P. Firewall analysis with
policy-based host classification. In USENIX Conference on Sys-
tem Administration (LISA) (2006).

16
68  28th Large Installation System Administration Conference (LISA14) USENIX Association

You might also like