Pattern Matching

Uploaded by

Akhil Teja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views11 pages

Pattern Matching

Uploaded by

Akhil Teja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Identifying Frequent User Tasks from Application Logs

Himel Dev Zhicheng Liu

Department of Computer Science Adobe Research
University of Illinois at Urbana-Champaign San Francisco, CA
[email protected] [email protected]

ABSTRACT To understand the tasks and challenges of the analysts, we

In the light of continuous growth in log analytics, applica- conducted a background study with a group of log analysts
tion logs remain a valuable source to understand and analyze from a major software company. Based on our study, we
patterns in user behavior. Today, almost every major soft- find that identifying meaningful and frequent user tasks is
ware company employs analysts to reveal user insights from an important milestone in many log analysis scenarios. For
log data. To understand the tasks and challenges of the an- example, to recognize what features needs to be prioritized in
alysts, we conducted a background study with a group of product roadmaps, product mangers urge to identify the major
analysts from a major software company. A fundamental ana- user requirements, which can be revealed by examining user
lytics objective that we recognized through this study involves tasks. Similarly, to automatically recommend workflows, it is
identifying frequent user tasks from application logs. More crucial to recognize user intent, which can be modeled based
specifically, analysts are interested in identifying operation on frequent tasks.
groups that represent meaningful tasks performed by many
While identifying frequent user task is important, doing so for
users inside applications. This is challenging, primarily be-
large-scale, complex log data is challenging. The challenges
cause of the nature of modern application logs, which are
arise from two main sources: the volume and complexity of
long, noisy and consist of events from high-cardinality set.
the data, and the diversity in user behavior and domain context.
In this paper, we address these challenges to design a novel
The current practice of identifying frequent user tasks involves
frequent pattern ranking technique that extracts frequent user
applying frequent pattern mining techniques such as frequent
tasks from application logs. Our experimental study shows
itemset mining and sequential pattern mining [3, 4, 5, 12].
that our proposed technique significantly outperforms state of
These techniques, however, are not customized for frequent
the art for real-world data.
tasks and can not address the aforementioned challenges.
ACM Classification Keywords In this paper, we present a novel frequent pattern ranking
H.5.m. Information Interfaces and Presentation (e.g. HCI): technique to extract frequent user tasks from application logs.
Miscellaneous; H.2.8. Database Management: Database Ap- The key idea of our approach is to rank patterns based on
plications membership based cohesion, which prioritizes the patterns
whose events appear contagiously in the supporting sequences
Author Keywords with no or few outliers (events not belonging to the pattern).
User Task; Application Log; Frequent Pattern Mining; Pattern We apply our technique on real-world log data and conduct
Ranking a user study to evaluate the effectiveness of our approach.
Our experimental study shows that our approach significantly
INTRODUCTION outperforms state of the art for a variety of standard metrics
Application log data contain valuable information about user such as NDCG and P/R@k.
behavior that can inform technical or business decisions. With
effective analysis of such data, marketers may be able to cor- In summary, our contributions are as follows:
relate user behavior with marketing goals (e.g. successful • We conduct a three-phase background study to understand
purchase) and improve promotion strategies; application de- the vitals of log analytics . Our study reveals many idiosyn-
velopers may better prioritize features in product roadmaps, crasies of log analytics that we present in the Objectives,
discover potential bugs and make automatic recommendations Data and Techniques sections.
without interrupting users’ workflow. These valuable appli-
cations drive companies and organizations to employ data • We formulate the frequent task identification problem using
analysts to reveal user insight from log data. a set of example-driven assumptions and propose a novel
frequent pattern ranking technique to solve this problem.
We present the details in the Identifying Frequent User
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed Tasks section.
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM • We conduct a user study to evaluate the effectiveness of our
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
approach using real-world data. We compare our approach
fee. Request permissions from [email protected]. with state of the art using a standard set of metrics. We
IUI 2017, March 13–16, 2017, Limassol, Cyprus. provide the details in Evaluation section.
Copyright © 2017 ACM ISBN 978-1-4503-4348-0/17/03 ...$15.00.
http://dx.doi.org/10.1145/3025171.3025184
Figure 1. Statistics on the datasets we have analyzed

BACKGROUND STUDY about the elements of a successful project: “What are some
To understand the scale and complexity of log data and the common tasks present in successful (published) projects?”.
techniques and challenges of associated analysis, we con- We note that many such analytics goals require identifying
ducted a mixed-methods study consisting of three phases. In and understanding frequent user tasks. Though we recognize
the first phase, we conducted in-depth interviews to under- other relevant objectives of log analytics, such as determining
stand the vitals of log analytics including the primary objec- usefulness of a particular application feature, we focus on
tives, characteristics of data and state-of-the-art techniques. In identifying frequent user tasks as it is fundamental to both
the second phase, we performed some of the analysts’ work analysts and managers we interviewed.
routines to get a first-hand experience on both the data and
techniques. In the third phase, we conducted a pilot study to Data
get user feedback on the findings of our analysis. Note that we Based on our first-hand analysis and the interviews, we identify
recruited the same group of participants in the first and third the following distinguishing characteristics of log data.
phase of study.
• The cardinality of event set is large. The number of pos-
Participants sible user operations in a modern software application can
As this study requires understanding the vitals of log analytics, range from hundreds to tens of thousands. Each of these
the domain knowledge of participants is of critical importance. operations triggers one or more events that are recorded in
To address this concern, we recruited two groups of people application logs. Typically, logs are pre-processed to reflect
who are practitioners and stakeholders of log analytics. The the user operations.
first group consists of three analysts who work for a major soft- • A session can have an arbitrary length. While most users
ware company that sells multimedia and creativity software use software applications for a few minutes or hours in a
products. The daily jobs of the analysts involve analyzing and session, others leave them open for days. It is common for
reporting user behavior on the company’s software applica- a session to contain several hundred events, albeit the upper
tions. They execute queries to retrieve log data from Hadoop bound is much higher.
servers, wrangle and analyze the data, and prepare presen-
tations on any insights they have found. The second group • There is substantial variety in user behavior. Users may
consists of two product managers who are domain experts perform the same task in a variety of ways. This variety
on the software applications. We included product managers arises from the presence of different operation combinations
because they work with a much larger set of users and offer that can be used to achieve the same outcome.
a high-level perspective on the common tasks shared by the
analysts. • Application logs are noisy. A sizable fraction of users
execute a rather hodgepodge of operations to perform an
OBJECTIVES, DATA AND TECHNIQUES intended task, which often includes operations that are not
In this section, we discuss the findings of our background required for performing the task. These misplaced opera-
study, which reveal many idiosyncrasies of log analytics. tions act as a source of noise. Noise can also arise from
mistakes, i.e., unintended user operations.
Objectives
Based on our interviews, we recognize frequent task identi- Techniques
fication as one of primary objectives of log analytics. For Because we focus on frequent task identification, we shall
example, some analysts want to understand what users are only discuss the techniques that are relevant to this problem.
doing at a certain phase/period: “What do most users do in Based on our study, we find that the analysts apply frequent
their early projects?”. Likewise, some want to understand the pattern mining techniques such as frequent itemset mining
evolution of users: “Are users performing similar/different [3] and sequential pattern mining [4] on log data to identify
tasks in their early and later projects?”. Again, some care frequent user tasks. A major challenge of applying frequent
1 2 5 7

3 4 6 8

Figure 2. Event sequence fragments from eight distinct logs (corresponding to a photo editing application) shows instances of T1 (version management
task) and T2 (cropping task). The instances of T1 are highlighted in red font, with red vertical lines outlining the instance boundaries. The instances of
T2 are highlighted in blue font, with blue vertical lines outlining the instance boundaries. For T2 , Paste implies Copy followed by Paste.

pattern mining techniques is the fact that such techniques set of example-driven assumptions that we recognized during
often generate a huge number of patterns [2]. For example, we our first-hand analysis and later confirmed with the analysts.
applied several classes of frequent pattern mining techniques Before we report these assumptions (A1 to A4 ), we present
on real-world datasets, and the results (Figure 1) show that the two well-acknowledged tasks (T1 and T2 ) that we shall use to
number of patterns can be overwhelming, even for very strict explain the assumptions.
classes such as closed patterns. Further, most of these patterns
T1 : This task involves creating a new file from an existing one
are not very useful to the analysts as they do not represent user
tasks. Consequently, analysts face the formidable challenge via copy pasting. This is a naive version management strategy
of manually exploring the patterns and their supporting logs commonly used by novice users to preserve the content of
to identify the useful patterns representing user tasks. We a file while attempting exploratory operations. In Figure 2,
extensively study the literature of frequent pattern mining in the instances of this task are highlighted in red font, with red
search of technique(s) that can address this problem. While we vertical lines outlining the instance boundaries.
find numerous works addressing the concern of finding useful T2 : This task involves cropping an image and checking the
patterns, none entirely solves this problem, primarily due to size of its dimensions. This is yet another common task per-
the concept of pattern usefulness being context dependent. formed by many users to generate an image with specific
Indeed, patterns that are useful in one context, may not be length and/or width. In Figure 2, the instances of this task are
relevant in another. For this reason, the problem of finding highlighted in blue font, with blue vertical lines outlining the
appropriate patterns representing user tasks remains open to instance boundaries.
us.
A1 : Operations corresponding to a task may or may not have
any associated ordering.
IDENTIFYING FREQUENT USER TASKS
In this section, we discuss our problem formulation, review the The first two event sequence fragments in Figure 2 show dif-
relevant concepts of frequent pattern mining and present our ferent operation orderings for T1 . There are two alternative
solution to resolve the frequent task identification problem. operation sequences that can be executed to perform this task.
The first alternative is to (i) open the existing file, (ii) copy
Problem Formulation items from the existing file, (iii) create a new file, and (iv)
While the idea of user task is easy to understand, a formal paste items in the new file. The second alternative is to (i)
formulation in log analytics is absent. For example, one could create a new file, (ii) open the existing file, (iii) copy items
define it as a sequence of user operations to actualize user from the existing file, and (iv) paste items in the new file.
intent. Another way is to define it as a set of user operations While the operations corresponding to the task have some
to achieve some milestone. While the former considers order- partial ordering (open followed by copy followed by paste,
ing of operations, the latter does not. There are other such new followed by paste), there is no absolute order. The third
factors (e.g., adjacency of operations) that one needs to con- and fourth event sequence fragments in Figure 2 show two
sider while defining user task. We define user task based on a
operation orderings for T2 . For this task, users may check the D EFINITION 6. The occurrence window WP,S of a pattern
image size either before or after cropping. Our discussion with P in a sequence S refers to the interval(s) within S that contains
the analysts reveal that there are many such tasks where the P.
operation ordering is variable. While the aforementioned tasks
have two alternative operation orderings, other tasks may have D EFINITION 7. The minimum length occurrence win-
(L−)
more than two alternatives. dow or minimum occurrence window WP,S of a pattern P
in a sequence S refers to the minimum length interval(s) within
A2 : A user may execute a required operation multiple times S that contains P. Here, the function L() returns length and
within the duration of a task. the superscript (L−) denotes minimum length.
The fifth and sixth event sequence fragments in Figure 2 show
repetition of operations for T1 and T2 . Our first hand analysis (L−)
reveals that users often repeat operations in a loop till they WP,S = arg min L(WP,S ) (1)
WP,S
achieve the intended results. Within the task duration, a user
may repeat a required subset of operations in a loop. Frequent Pattern Classes
We discuss different classes of frequent patterns (with ex-
A3 : A user may perform multiple tasks in a single session. amples in Table 1) based on two dimensions relevant to our
problem.
The seventh and eigth event sequence fragments in Figure 2
show users performing at least two different tasks (T1 and T2 ) Order Dimension: Frequent patterns extracted from an event
in a session. In fact, most users perform many different tasks sequence database can either be sets or sequences (of events).
in a session. A set based frequent pattern does not consider the order of
events in database sequences and is called a frequent itemset
A4 : To perform a task, a user executes the corresponding
[3]. In contrast, a sequence based frequent pattern consid-
operations contiguously, with no or few outliers (operations
ers the order of events in database sequences and is called a
that are not part of the task).
sequential pattern [4].
In all eight event sequence fragments of Figure 2, we see users
ID Sequence
executing the operations of a task contiguously, without any
S1 [A,C,D]
outlier (unnecessary operation). In some cases, users may
S2 [B,C,E]
execute one or more unnecessary operations within a task by
S3 [E,A,B,C]
mistake, yet, such occurrences should be few.
S4 [B,E]
S5 [E,B,A,C]
Frequent Pattern Mining
In this subsection, we review the concepts of frequent pattern (a) Input sequences
mining. There has been extensive research on this topic and
we essentially discuss the concepts relevant to our problem. Class Patterns from Input Sequences
Frequent {A},{B},{C},{E},{A,B},{A,C},{A,E},
Preliminaries
Itemset {B,C},{B,E},{C,E},{A,B,C},{A,B,E},
{A,C,E},{B,C,E},{A,B,C,E}
We will be using the following definitions to discuss different
classes of frequent patterns. Cohesive {A},{B},{C},{E},{A,B},{A,C},{B,C},
Itemset {B,E},{A,B,C},{A,B,E},{A,B,C,E}
D EFINITION 1. An event sequence S = [E 1 , E 2 , . . . , E m ] Sequential [A],[B],[C],[E],[A,C],[B,C],[B,E],
(E i
∈ E) is an ordered list of events, where E denotes the Pattern [E,A],[E,B],[E,C]
event dictionary and i denotes the order of event E i in S. 2-gram [B,C]
D EFINITION 2. A sequence database D = {S1 , S2 , . . . , Sn } Episode [A],[B],[C],[E],[A,C],[B,C]
is an unordered set of sequences.
(b) Patterns with >= 40% support. For cohesive itemsets
D EFINITION 3. In our discussion, a pattern P is either (i) and episodes, we used the ratio of pattern length (number
a set of events whose members appear in random order, or (ii) of elements in pattern) and occurrence window length as the
a sequence of events that appear as subsequence(s), in one or cohesion parameter, and set the threshold value to 1.
more sequences in an event sequence database. Table 1. Examples of frequent patterns
D EFINITION 4. The support set DP of a pattern P in a
sequence database D is the largest subset of D where P ap- Cohesion Dimension: Cohesion or adjacency of pattern ele-
pears in all sequences belonging to D p . The support of P is ments (events) in support set (supporting database sequences)
quantified as the percentage ratio of the size of D p and D. is a central concept in our problem. The pattern classes that
address the concern of cohesion includes N-gram, episode
D EFINITION 5. Frequent patterns F = {P1 , P2 , . . . , Pf } is and cohesive itemset. N-gram and episode are subclasses of
a set of patterns (of same type) where the support of each sequential pattern, whereas cohesive itemset is a subclass of
pattern in a given database is no less than a user-specified frequent itemset. An N-gram is a sequential pattern whose
threshold Θ. elements appear contiguously in supporting sequences, with
no outlier. An episode is a sequential pattern, for which the interval(s) within S that contains P while containing minimum
average length of minimum occurrence windows in supporting possible outliers (i.e., elements not belonging to the pattern
sequences is smaller than a user-specified threshold [12]. A P). Here, the function O() returns the number of outliers and
cohesive itemset is similar except that it is a frequent itemset the superscript (O−) denotes minimum outlier.
[5].
Limitations (O−)
WP,S = arg min O(WP,S ) (2)
We study the state-of-the-art frequent pattern mining tech- WP,S
niques to determine if any of these techniques can solve the
problem of frequent task identification. We find that the exist- D EFINITION 9. The minimum outlier based maximum
ing techniques fail to satisfy one or more assumptions that we length occurrence window or minimum outlier based max-
(O−)(L+)
reported (Table 2), and consequently fall short in solving the imum occurrence window WP,S of a pattern P in a se-
problem. quence S refers to the maximum length interval(s) within S
that contains P while containing minimum possible outliers.
• Order-sensitive patterns such as sequential patterns and fre- In other words, the minimum outlier based maximum occur-
quent episodes fail to satisfy assumption A1 . More specif- rence window refers to the interval(s) that contain(s) P, and
ically, if a task has many alternative operation orderings, includes as many elements of P as possible without including
then none of these orderings could be frequent for a given any element not belonging to P. Here, the function L() returns
threshold. length and the superscript (L+) denotes maximum length.
• Cohesion-insensitive patterns such as frequent itemsets and
sequential patterns fail to satisfy assumption A4 . More (O−)(L+) (O−)
specifically, these patterns do not distinguish between the WP,S = arg max L(WP,S ) (3)
(O−)
set of events or operations that appear adjacently and the set WP,S
of operations that appear randomly in supporting sequences.
For example, the subsets of top k popular operations may For example, consider the sequences in Table 3. If we per-
appear in many sequences, however, the operations of these form frequent itemset mining on these sequences with sup-
sets may or may not be executed as a group to achieve some port threshold Θ = 0.5, we get itemsets such as {A,B,C} and
milestone. {D,E,F}. The minimum occurrence windows (according to
D EFINITION 7) of itemset {A,B,C} in S1, S2 and S3 are
• Cohesion-insensitive patterns such as frequent itemsets and marked with underlines. Contrast these to the minimum outlier
sequential patterns also fail to satisfy assumption A3 . In par- based maximum occurrence windows (according to D EFINI -
ticular, these patterns may contain operations from several TION 9) which are marked with overlines. The former involves
tasks. minimizing window length, whereas the latter involves first
• Finally, the existing cohesion-sensitive patterns such as minimizing outlier count and then maximizing window length.
frequent episodes and cohesive itemsets fail to satisfy as-
ID Sequence
sumption A2 . In these patterns, cohesion is determined
in terms of the lengths of minimum occurrence windows,
which fail to satisfy the assumption of a required operation S1 [A,A,A,B,B,B,B,B,C,C,C,D,B,B,E,F]
to be repeated arbitrary number of times.
S2 [G,H,I,J,A,B,A,B,A,B,A,B,C]
State-of-the-Art A1 A2 A3 A4
Frequent Itemset × × S3 [D,F,X,Y,E,B,A,B,A,B,A,C,C]
Sequential Pattern × × × Table 3. Example sequences
Frequent Episode × ×
Cohesive Itemset × Membership Based Cohesion
Table 2. Limitations of state-of-the-art techniques in terms of four as-
We formulate the cohesion of a pattern as the signed difference
sumptions. A1 : Operations corresponding to a task may or may not have between the pattern’s length (number of elements in pattern)
any associated ordering. A2 : A user may execute a required operation and the median outlier count in its minimum outlier based
multiple times within the duration of a task. A3 : A user may perform maximum occurrence windows or outlier based minimum
multiple tasks in a single session. A4 : To perform a task, a user executes occurrence windows.
the corresponding operations contiguously, with no or few outliers.
For example, consider the sequences in Table 3. If we perform
Membership Based Cohesion for Patterns frequent itemset mining on these sequences with threshold
To address the limitation of existing approaches, we introduce Θ = 0.5, we get itemsets such as {A,B,C} and {D,E,F}. The
the idea of membership based cohesion for frequent patterns. length of both {A,B,C} and {D,E,F} is 3. There are no
To introduce this idea, we first present the concept of outlier outliers in the outlier based minimum occurrence windows of
based minimum occurrence window. {A,B,C} in S1, S2 and S3, and therefore, the median outlier
count is 0. According to our formulation, the cohesion score
D EFINITION 8. The outlier based minimum occurrence of itemset {A,B,C} is 3 - 0 = 3. In contrast, the number of
(O−)
window WP,S of a pattern P in a sequence S refers to the outliers in the outlier based minimum occurrence windows
of {D,E,F} in both S1 and S3 is 2, and therefore, the median resultant itemsets based on two different ranking criteria. In
outlier count is 2. According to our formulation, the cohesion the second phase, we selected a representative sample from
score of itemset {D,E,F} is 3 - 2 = 1. these itemsets and conducted a user study to evaluate the
potential of the selected itemsets to represent frequent user
Notice that, the average length of minimum occurrence win- tasks.
dows for itemset {D,E,F} is 5, which is smaller compared
to that of itemset {A,B,C}. According to state-of-the-art co-
hesive pattern ranking, itemset {D,E,F} is more cohesive Phase I: Frequent Itemset Mining and Ranking
compared to itemset {A,B,C}. However, if we consider the We applied frequent itemset mining on our log dataset us-
median outlier count in outlier based minimum occurrence ing 10% support threshold and acquired 738 itemsets. From
windows, the order is opposite. Our formulation of member- these itemsets, we selected 540 itemsets with length >= 3 and
ship based cohesion uses the second ordering. In other words, ranked those based on membership based cohesion. We used
it prefers small outlier count over small window length. standard competition ranking, i.e., “1224” ranking to rank the
itemsets [22] . We found that in the ranked itemsets cohesion
Our formulation of membership based cohesion uses pattern score rapidly dropped after the first few entries and based on
length. There are two key reasons why we use pattern length this observation, we determined a cut-off score (cut-off score
in cohesion formulation. The first one is to allow salient pat- = 2) to categorize the itemsets into two groups. The first group
terns/tasks with more events/operations to get priority over the consists of the top 16 itemsets that are highly likely to be
short patterns/tasks. The second one is based on the fact that frequent user tasks and the second group consists of the re-
the tasks with higher number of operations have higher room maining itemsets. The rationale behind selecting 2 as a cut-off
for error/outlier, which needs to be accounted for accordingly. score was to maintain a reasonable number of itemsets in user
Frequent Task Identification
evaluation as these evaluations are challenging for users.
To identify frequent user tasks, we first perform frequent item- In addition to our ranking, we ranked the 540 itemsets with
set mining on log dataset with a small threshold (e.g., 5%, length >= 3 based on state of the art cohesive itemset min-
10%). Then, we rank the itemsets in descending order based ing/ranking technique proposed at [5]. We used standard
on the membership based cohesion score. As a matter of competition ranking for this procedure as well.
fact, frequent itemset mining and the proposed ranking can
be performed simultaneously. We hypothesize that the top Phase II: User Study
itemsets with high membership based cohesion are likely to Based on the two rankings mentioned above, we selected 36
be frequent user tasks. Notice that the top itemsets satisfy itemsets covering each of the following groups: (i) top 16
all four assumptions (A1 to A4 ) that we reported. In partic- itemsets based on our ranking, (ii) top 16 itemsets based on
ular, the use of outlier count in determining cohesion score state of the art ranking, (iii) randomly selected itemsets from
resolves the concern of assumption A2 , without failing the remaining (rank > 16 in both ranking). We conducted a user
other assumptions. study to evaluate the potential of these selected itemsets to
represent frequent user tasks.
EVALUATION
We evaluate the effectiveness of our proposed frequent task Participants: We conducted our study with ten participants,
identification technique using real-world data. Our evaluation users of the creativity application, who have intermediate to
focuses on the following aspects of effectiveness: expert level experience with the application. Some of these
users are log analysts. We recruited participants by sending
• Q1 . How effective is our pattern ranking technique in iden-
invitation via email within an organization that heavily use the
tifying frequent user tasks?
application. Each of our participants had a minimum one year
• Q2 . How effective is our ranking technique compared to the experience with the application.
state of the art?
Tool: To conduct our user study, we developed a tool that
• Q3 . How meaningful are the resultant patterns, i.e., the allows a user to browse the supporting sequences of a selected
potential frequent user tasks? pattern. The tool marks the minimum outlier based maximum
occurrence windows of a selected pattern using vertical lines.
Dataset It also highlights the elements belonging to the pattern. Figure
We applied our frequent task identification technique on a real- 3 shows a screenshot of the tool.
world log dataset. The logs are from a desktop based photo
editing application that has several million users. Two of the Method: Our user study was a combination of exploration
analysts from our background study work with similar logs and survey. In the exploration phase, we asked participants
originating from the same application. We took a represen- to browse the supporting sequences of a selected itemset to
tative sample of the logs to compile our dataset. Our dataset understand the usage of the corresponding operation group.
has 10,000 event sequences and accommodates 1497 unique For each itemset, we asked participants to browse at least 10
events. The median length of sequence in the dataset is 35. supporting sequences while recommending them to browse
more sequences if required. After browsing phase, in survey
Method phase, we asked participants to rate the likelihood of the item-
Our evaluation method consists of two phases. In the first set (operation group) to represent frequent user task in a scale
phase, we performed frequent itemset mining and ranked the of 1 to 5. We also asked them to explain their decision and
higher in results, and lower when they are ranked lower. DCG
is defined as:

p
2reli − 1
DCG p = ∑ (5)
1 log2 (i + 1)

NDCG is defined as the ratio of a model’s DCG and the ideal

DCG (IDCG):
DCG p
nDCG p = (6)
IDCG p
Precision/Recall at Rank k (P/R@k)
Precision and recall are set-based metrics for binary classifi-
cation tasks. In a ranking context, the relevant entities should
be present within the top k entries. Therefore, to evaluate a
ranking technique, measuring precision and recall score at
rank k can serve as a good metric. Another advantage of using
Figure 3. Screenshot of our user study tool
P/R@k is the scores are easy to interpret.
Experimental Results
explain the task itself in a few sentence(s). We repeated this
In this subsection, we report the results of our experimental
procedure for all 36 itemsets.
evaluation.
Evaluation Metrics Top 16 vs Remaining
We use the following evaluation metrics to investigate the three Based on the user ratings from our study, we calculate the task
questions that we reported. score for each selected itemset as the average rating provided
by the users. We compare the task scores for the top 16
• M1 . To evaluate the effectiveness of our pattern ranking itemsets and the remaining itemsets, for the two rankings.
technique, we compare the task scores for two groups of Figure 4 shows the boxplot summary of task scores for the two
itemsets, (i) top 16 and (ii) remaining. groups of itemsets for our ranking, whereas Figure 5 shows
• M2 . To compare the effectiveness of our ranking with that similar plot for state of the art ranking.
of state of the art, we use information retrieval based metrics Notice that for our ranking, the task scores for the top 16 item-
such as normalized discounted cumulative gain (NDCG) sets are much higher compared to that of remaining itemsets.
and precision/recall at rank k (P/R@k). Among the top 16 itemsets, 14 itemsets received task scores
ranging from 4.3 to 4.8, whereas the other 2 received 3.6 and
• M3 . To evaluate the interpretability of our results, we per- 3.5. Among the 20 itemsets outside top 16, only 2 itemsets
form in-depth analysis on the itemsets/operation-groups that received more than 3 (3.4 and 3.5). As a matter of fact, these
potentially represent frequent user tasks. 2 itemsets with high scores are reasonably well ranked (rank
Normalized Discounted Cumulative Gain (NDCG) 17 and 22) in our ranking.
NDCG is the standard metric to measure a model’s ranking 5
quality in information retrieval. It measures the performance
of a ranking technique based on the graded goodness of the 4.5
ranked entities. It varies from 0.0 to 1.0, with 1.0 represent- 4
Average Ratings

ing the ideal ranking of the entities. Here we give a brief 3.5
background on NDCG.
3
The cumulative gain (CG) of a ranking model’s ordering is
2.5
the sum of goodness scores over the ranked entities. CG at a
particular rank position p is defined as: 2
1.5
p 1
Top 16 Remaining
CG p = ∑ reli (4)
1 Figure 4. Boxplot summary of task scores (average user ratings) for two
groups of itemsets based on our ranking
where reli is the relevance of the result at position i.
For state of the art ranking, the difference of task scores for
Discounted cumulative gain (DCG) is the sum of each ranked the two groups of itemsets is not clear. While the average task
entity’s goodness score discounted by its position in ranking. score is higher for the top 16 itemsets, there are many itemsets
DCG is therefore higher when top quality entities are ranked outside top 16 that received high task scores.
5 Potential Frequent Tasks
4.5 Based on the average ratings from our user study, we identify
4
the itemsets with average ratings greater than 3 as potential
tasks, whereas the other itemsets as not likely to be tasks.
Average Ratings

3.5 In Table 6, we present user provided explanation of these

3 potential tasks.
2.5 As per our user study, both intermediate and expert users are
2 familiar with many of the potential tasks: “That makes sense.”,
“This is an obvious task.”, “This is what I expect. This is 100%
1.5
a task.”.
1
Top 16 Remaining In fact, they could correlate to some of these tasks: “I can
Figure 5. Boxplot summary of task scores (average user ratings) for two imagine this being a task. I do this a lot.”, “This is something
groups of itemsets based on state of the art ranking I do a lot too.”.
Yet, at times some users recognized an itemset as a task based
Comparison with State of the Art on the example operation sequences: “I am identifying this as
To compare our ranking with state of the art, we compute a task based on the example logs.”, “I am not familiar with
NDCG and P/R@k for both ranking. ’Nudge’. This is based on the sequences I have seen.”
NDCG: To compute NDCG, we only use the itemsets that While many of the potential tasks didn’t surprise the users,
are within top 16 either in our ranking or in state of the art some did: “I don’t understand why they are deleting the layer.”,
ranking. There are 26 such itemsets. For these itemsets, we use “Weird, so many people are doing it.”.
the average user ratings as graded relevance scores. Further,
Users also validated our assumption regarding the ordering of
instead of using the actual ordering of these itemsets in a
operations in a task (assumption A1 ): “Fair mix of orders; I
ranking, we use relative ordering of these 26 itemsets as per
am not sure if there is any order for this task.”, “This is pretty
the ranking. We do this to avoid ranking gaps for the itemsets
consistent, the sequence doesn’t matter.”
with rank > 16 in both rankings.

Ranking Method DCG nDCG IMPLICATIONS

Our Method 187.77 0.95 Our work draws from, and has implications for, several re-
State of the Art 135.27 0.69 search threads.
Table 4. DCG and nDCG score for two rankings

Table 4 shows the DCG and nDCG scores for the two rankings. Implications for User Behavior Modeling
Notice that the NDCG score for our ranking (0.95) is much Understanding user behavior is crucial for the design and
higher compared to that of state of the art (0.65). This implies operation of modern software applications. A number of early
that our ranking is more effective in placing relevant itemsets studies analyzed log data to model user behavior. For example,
on top. [1, 14] study web revisitation behavior of users by analyzing
web logs. [15] studies image search behavior of users by
P/R@k: We compute P/R@k for the two rankings, for dif- analyzing query logs. [20, 19] built system(s) to capture user
ferent k values. To compute P/R@k, we classify the itemsets behavior from clickstream data. More specific log data based
with average ratings greater than 3 as potential tasks, whereas user behavior models appear in [6, 18].
the other itemsets as not likely to be tasks.
This paper addresses a fundamental problem in the log data
Our Method State of the Art based user behavior model space. We assert that understanding
k user task is often a crucial first step in understanding user
P@k R@k P@k R@k
2 1.0 0.11 1.0 0.11 behavior. Tasks are semantically meaningful units that offer
4 1.0 0.22 0.5 0.11 better user insight compared to the raw action sequences.
6 1.0 0.33 0.5 0.17
8 1.0 0.44 0.63 0.28 Implications for Temporal Event Sequence Analysis
10 1.0 0.56 0.6 0.33 Temporal event sequence data is pervasive in many application
12 1.0 0.67 0.58 0.39 domains, including electronic commerce and digital marketing
14 1.0 0.78 0.64 0.5 [10, 21], user workflow and behavior analysis [20, 9], online
16 1.0 0.89 0.56 0.5 education [17] and healthcare [16, 13]. In recent years, we
Table 5. P/R@k for two rankings have seen several research works that utilize frequent pattern
mining techniques to analyze temporal event sequence data
Table 5 shows the P/R@k scores for the two rankings, for [10, 7, 16].
different k values. Notice that for most k values, precision and
recall scores for our ranking is much higher compared to that The concepts introduced in this paper can be useful in tempo-
of state of the art. ral event sequence analysis. For example, membership based
Id Itemset/Operation-Group User Description R1 R2 U
I1 {Crop,Open,Image Size} Cropping an image and checking the 2 2 4.8
size of its dimensions
I2 {New,Open,New Document:Custom,Paste} Creating a new file from an existing 1 20 4.8
one via copy-pasting
I3 {Open,New Document:Custom,Paste} Creating a new file from an existing 8 22 4.8
one via copy-pasting
I4 {Open,Paste,Select Canvas} Selecting the entire canvas before 8 25 4.8
pasting; mostly done by novice users
I5 {Free Transform,Nudge,Move} Applying different move operations 2 7 4.7
to put something in correct position
I6 {New,Open,New Document:Custom} Opening a file 2 1 4.6
I7 {Edit Type Layer,New Type Layer,Move} Editing and moving text 2 11 4.6
I8 {Free Transform,Edit Type Layer,Move} Editing text and adjusting text size 2 14 4.6
I9 {Free Transform,Paste,Move} Scaling and rotating something during 2 19 4.6
copy-pasting
I10 {Layer Order,Free Transform,Move} Moving something and changing the 8 5 4.5
ordering of layers
I11 {Free Transform,Open,Drag Layer} Positioning something via Drag Layer 8 18 4.5
& making a movement to scene graph
I12 {New,Open,Paste} Creating a new file from an existing 8 21 4.5
one via copy-pasting
I13 {Free Transform,Edit Type Layer,Nudge} Editing text and adjusting text size 8 23 4.4
I14 {New,New Document:Custom,Paste} Pasting stuff into a new file from a file 8 24 4.3
belonging to a different application
I15 {Free Transform,Move,Delete Layer} Trying to fix a mistake; not an actual 8 7 3.6
task as the goal is not by choice
I16 {Rectangular Marquee,Free Transform,Deselect} Doing a pointless thing; not a real task 17 9 3.5
I17 {Crop,New,Open,New Document:Custom} Creating a cropped version of an image 8 26 3.5
while preserving the original one
I18 {Free Transform,Open,Nudge} Small tweak to put something in a layer 22 11 3.4
Table 6. User provided description of itemsets (potential frequent user tasks) with (i) rank based on our ranking (R1 ); (ii) rank based on state of the art
ranking (R2 ); (iii) average user rating (U). The itemsets with rank >= 16 in our ranking are highlighted in gray.

cohesion may reveal useful structural patterns for understand- can address the challenge of dealing with long clickstreams
ing relationships among temporal events. Such patterns can that are hard to visualize.
serve as a special class of behavioral motifs.
DISCUSSION
Implications for Clickstream Analysis and Visualization In this section, we discuss different dimensions of our pro-
Clickstream data is a valuable source in understanding user posed pattern ranking technique.
behavior. Researchers have proposed a variety of techniques
that can be used for analyzing and visualizing [8, 13, 23] Dropping Assumptions
clickstream data. More recently, researchers have integrated The four assumptions that we report form the basis of our
visual exploration with analytics to develop visual analytics approach. Yet, one or more of these assumptions may not be
system for clickstreams [9, 10, 20, 21]. Visual analytics system applicable for certain applications.
for temporal event sequence data [11, 16] can also be used for
clickstreams. For example, some applications may require extracting tasks
with defined operation ordering, which implies dropping as-
User task identification is a useful exercise in clickstream sumption A1 . Notice that our definition of outlier based mini-
analysis and visualization. In particular, converting click se- mum occurrence window and accordingly the formulation of
quences into coarse grained task sequences can be extremely membership based cohesion is applicable to both itemsets and
useful. For example, clustering users based on task sequences sequential patterns. Thus, our pattern ranking technique can
has two benefits over clustering based on click sequences. extract frequent tasks in presence of strict ordering constraint.
First, tasks are semantically meaningful and consequently, In recent times, the concept of soft pattern has emerged in
the clusters are easily interpretable. Second, tasks eliminate many application domains. In contrast to the strict ordering
noise (e.g., mistakes or unintended actions) and consequently, constraint of sequential patterns, soft patterns have flexible
the clusters are robust. Click sequence to task sequence con- ordering constraint. Our method can be applied with soft
version can also be useful in clickstream visualization. In patterns to uncover the partial ordering of operations within
particular, visualizing task sequence instead of click sequence tasks.
If we drop assumption A2 , state-of-the-art cohesive itemset our technique on a real-world log dataset and conduct a user
mining can reveal frequent user tasks. Notice that cohesive study to evaluate its effectiveness. Our experimental study
itemsets satisfy all assumptions, except A2 . shows that our technique outperforms state of the art for a
variety of standard metrics such as NDCG and P/R@k.
Dropping assumption A3 will ease the frequent user task iden-
tification problem. ACKNOWLEDGEMENTS
Finally, dropping assumption A4 will harden the frequent user We thank Matthew Hoffman, Hidy Kong, Stephen Nielson,
task identification problem. Under this setting, all frequent Manoj Ravi, and John Thompson for their valuable feedback
itemsets or sequential patterns are likely to be tasks. on this project. We also thank our user study participants for
their time and feedback.
Threshold Θ
REFERENCES
The value of support threshold Θ affects the result of frequent 1. Eytan Adar, Jaime Teevan, and Susan T. Dumais. 2008.
pattern mining and consequently, pattern ranking. We applied Large Scale Analysis of Web Revisitation Patterns. In
frequent pattern mining on our log datatset with different sup- Proceedings of the SIGCHI Conference on Human
port thresholds such as 1%, 2%, 5% and 10%. In addition, we Factors in Computing Systems (CHI ’08). ACM, New
ranked the resultant patterns using membership based cohe- York, NY, USA, 1197–1206. DOI:
sion. We find that, with lower support threshold, we get more http://dx.doi.org/10.1145/1357054.1357241
fine grained potential tasks (operation-groups).
2. Charu C. Aggarwal and Jiawei Han. 2014. Frequent
Semantic Factors Pattern Mining. Springer Publishing Company,
As the frequent pattern mining techniques are syntactical, they Incorporated.
ignore the semantic issues such as task equivalence. For ex- 3. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami.
ample, in Table 6, I8 and I13 represent the same logical task. 1993. Mining Association Rules Between Sets of Items in
However, it is not possible for a syntax based technique to Large Databases. In Proceedings of the 1993 ACM
capture such semantic similarity. SIGMOD International Conference on Management of
Data (SIGMOD ’93). ACM, New York, NY, USA,
Containment 207–216. DOI:http://dx.doi.org/10.1145/170035.170072
Containment is a central concept in frequent pattern mining. 4. Rakesh Agrawal and Ramakrishnan Srikant. 1995.
Notice that, if a pattern is frequent, each of its subpatterns is Mining Sequential Patterns. In Proceedings of the
frequent as well. To address this issue, the idea of closed and Eleventh International Conference on Data Engineering
maximal frequent patterns came into play. A closed frequent (ICDE ’95). IEEE Computer Society, Washington, DC,
pattern is a frequent pattern that includes as many events as USA, 3–14.
possible without compromising support. The idea of maximal http://dl.acm.org/citation.cfm?id=645480.655281
frequent pattern is even stricter: a maximal frequent pattern is
a frequent pattern that is not contained within another frequent 5. Boris Cuke, Bart Goethals, and Celine Robardet. A new
pattern. Both frequent itemsets and sequential patterns have constraint for mining sets in sequences. 317–328. DOI:
corresponding closed and maximal class. It is interesting that http://dx.doi.org/10.1137/1.9781611972795.28
we can not use closed or maximal patterns for identifying 6. R. Stuart Geiger and Aaron Halfaker. 2013. Using Edit
frequent tasks. These pattern classes eliminate candidates Sessions to Measure Participation in Wikipedia. In
based on containment, which is not a relevant criterion in our Proceedings of the 2013 Conference on Computer
problem formulation. As a result, using these pattern classes Supported Cooperative Work (CSCW ’13). ACM, New
escorts the risk of eliminating potential tasks from the ranking York, NY, USA, 861–870. DOI:
pool. http://dx.doi.org/10.1145/2441776.2441873
7. Brian C. Keegan, Shakked Lev, and Ofer Arazy. 2016.
Interpretability Analyzing Organizational Routines in Online Knowledge
There are different types of high-level structures that we can Collaborations: A Case for Sequence Analysis in CSCW.
extract from event sequences. From a human centred perspec- In Proceedings of the 19th ACM Conference on
tive, interpretability is a key factor in deciding which type to Computer-Supported Cooperative Work & Social
use. While machine learning based models are effective for Computing (CSCW ’16). ACM, New York, NY, USA,
prediction, such models are often difficult for humans to under- 1065–1079. DOI:
stand. In contrast, frequent patterns offer easy to understand http://dx.doi.org/10.1145/2818048.2819962
patterns.
8. Joseph B Kruskal and James M Landwehr. 1983. Icicle
CONCLUSION plots: Better displays for hierarchical clustering. The
In this paper, we propose a novel frequent pattern ranking American Statistician 37, 2 (1983), 162–168.
technique to extract frequent user tasks from application logs. 9. Heidi Lam, Daniel M. Russell, Diane Tang, and Tamara
Our technique is based on membership based cohesion, which Munzner. 2007. Session Viewer: Visual Exploratory
prioritizes the patterns whose events appear contagiously in Analysis of Web Session Logs. In Symposium on Visual
the supporting sequences with no or few outliers. We apply Analytics Science and Technology (VAST). 147–154.
10. Zhicheng Liu, Yang Wang, Mira Dontcheva, Matt 17. Huamin Qu and Qing Chen. 2015. Visual Analytics for
Hoffman, Seth Walker, and Alan Wilson. 2017. Patterns MOOC Data. IEEE Computer Graphics and Applications
and Sequences: Interactive Exploration of Clickstreams 35, 6 (Nov 2015), 69–75. DOI:
to Understand Common Visitor Paths. IEEE Transactions http://dx.doi.org/10.1109/MCG.2015.137
on Visualization and Computer Graphics 23, 01 (Janaury
18. Jeffrey M. Rzeszotarski and Aniket Kittur. 2011.
2017).
Instrumenting the Crowd: Using Implicit Behavioral
11. Sana Malik, Fan Du, Megan Monroe, Eberechukwu Measures to Predict Task Performance. In Proceedings of
Onukwugha, Catherine Plaisant, and Ben Shneiderman. the 24th Annual ACM Symposium on User Interface
2015. Cohort Comparison of Event Sequences with Software and Technology (UIST ’11). ACM, New York,
Balanced Integration of Visual Analytics and Statistics. NY, USA, 13–22. DOI:
In Proceedings of the 20th International Conference on http://dx.doi.org/10.1145/2047196.2047199
Intelligent User Interfaces (IUI ’15). ACM, New York,
NY, USA, 38–49. DOI: 19. Gang Wang, Tristan Konolige, Christo Wilson, Xiao
http://dx.doi.org/10.1145/2678025.2701407
Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You Are
How You Click: Clickstream Analysis for Sybil
12. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Detection. In Proceedings of the 22Nd USENIX
1997. Discovery of Frequent Episodes in Event Conference on Security (SEC’13). USENIX Association,
Sequences. Data Min. Knowl. Discov. 1, 3 (Jan. 1997), Berkeley, CA, USA, 241–256.
259–289. DOI: http://dl.acm.org/citation.cfm?id=2534766.2534788
http://dx.doi.org/10.1023/A:1009748302351
20. Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng,
13. Megan Monroe, Rongjian Lan, Hanseung Lee, Catherine and Ben Y. Zhao. 2016. Unsupervised Clickstream
Plaisant, and Ben Shneiderman. 2013. Temporal event Clustering for User Behavior Analysis. In Proceedings of
sequence simplification. Visualization and Computer the 2016 CHI Conference on Human Factors in
Graphics, IEEE Transactions on 19, 12 (2013), Computing Systems (CHI ’16). ACM, New York, NY,
2227–2236. USA, 225–236. DOI:
14. Hartmut Obendorf, Harald Weinreich, Eelco Herder, and http://dx.doi.org/10.1145/2858036.2858107
Matthias Mayer. 2007. Web Page Revisitation Revisited:
Implications of a Long-term Click-stream Study of 21. Jishang Wei, Zeqian Shen, Neel Sundaresan, and
Browser Usage. In Proceedings of the SIGCHI Kwan-Liu Ma. 2012. Visual cluster exploration of web
Conference on Human Factors in Computing Systems clickstream data. In Visual Analytics Science and
(CHI ’07). ACM, New York, NY, USA, 597–606. DOI: Technology (VAST), 2012 IEEE Conference on. IEEE,
http://dx.doi.org/10.1145/1240624.1240719
3–12.
15. Jaimie Y. Park, Neil O’Hare, Rossano Schifanella, 22. Wikipedia. 2017. Ranking — Wikipedia, The Free
Alejandro Jaimes, and Chin-Wan Chung. 2015. A Encyclopedia. (2017).
Large-Scale Study of User Image Search Behavior on the https://en.wikipedia.org/wiki/Ranking [Online;
Web. In Proceedings of the 33rd Annual ACM Conference accessed 9-January-2017].
on Human Factors in Computing Systems (CHI ’15). 23. Jian Zhao, Zhicheng Liu, Mira Dontcheva, Aaron
ACM, New York, NY, USA, 985–994. DOI: Hertzmann, and Alan Wilson. 2015. MatrixWave: Visual
http://dx.doi.org/10.1145/2702123.2702527 Comparison of Event Sequence Data. In Proceedings of
16. Adam Perer and Fei Wang. 2014. Frequence: Interactive the 33rd Annual ACM Conference on Human Factors in
Mining and Visualization of Temporal Frequent Event Computing Systems (CHI ’15). ACM, New York, NY,
Sequences. In Proceedings of the 19th International USA, 259–268. DOI:
Conference on Intelligent User Interfaces (IUI ’14). http://dx.doi.org/10.1145/2702123.2702419
ACM, New York, NY, USA, 153–162. DOI:
http://dx.doi.org/10.1145/2557500.2557508