Software Trace and Log Analysis: Pattern Reference Second Edition
Software Trace and Log Analysis: Pattern Reference Second Edition
Pattern Reference
Second Edition
Dmitry Vostokov
Software Diagnostics Institute
OpenTask
2
You must not circulate this book in any other binding or cover, and you
must impose the same condition on any acquirer.
A CIP catalog record for this book is available from the British Library.
Table of Contents
Preface 14
A 17
Abnormal Value 17
Activity Disruption 18
Activity Divergence 20
Activity Overlap 21
Activity Region 23
Activity Theatre 24
Adjoint Message 25
Adjoint Space 27
Anchor Messages 35
B 38
Back Trace 38
Basic Facts 42
Bifurcation Point 43
Blackout 45
Break-in Activity 47
C 48
Calibrating Trace 48
Circular Trace 52
Correlated Discontinuity 53
Corrupt Message 54
Counter Value 55
Coupled Activities 56
D 57
Data Association 57
Data Flow 58
Data Interval 59
Data Reversal 60
Data Selector 62
Declarative Trace 63
5
Defamiliarizing Effect 64
Density Distribution 66
Dialogue 67
Diegetic Messages 70
Discontinuity 71
E 73
Empty Trace 73
Error Distribution 74
Error Message 75
Error Powerset 76
Error Thread 77
F 83
Factor Group 83
Fiber Bundle 86
6
Fiber of Activity 88
File Size 89
Focus of Tracing 90
Fourier Activity 91
G 93
Glued Activity 93
Gossip 94
Guest Component 95
H 96
Hidden Error 96
Hidden Facts 97
I 98
Identification Messages 98
Inter-Correlation 110
Interspace 112
Intra-Correlation 113
L 116
M 121
Macrofunction 121
Milestones 133
Motif 137
N 138
No Activity 140
O 143
P 145
Q 156
R 157
S 163
Surveyor 180
T 181
Timeout 185
U 204
UI Message 204
11
V 206
W 209
Bibliography 217
Notes 219
12
33 new trace and log analysis patterns have been discovered since the
publication of the first edition almost two years ago. Memory Dump
Analysis Anthology has also grown to 3,800 pages with the publication of
volumes 8b, 9a, and 9b. Significant advances were made in software
diagnostics theory that is reflected in the added analysis patterns. This
edition also features better index, minor corrections to the patterns from
the first edition, and one pattern from the forthcoming volume 10a.
http://www.facebook.com/TraceAnalysis
http://www.facebook.com/groups/dumpanalysis
http://www.linkedin.com/groups/8473045
14
Preface
The need for this reference book arose when we started working on the
next version of “Accelerated Windows Software Trace Analysis” training1.
The previous version was two years old, and Software Diagnostics
Institute2 had already added 40 more trace and log analysis patterns to
their catalog. All of them (almost 100 patterns in total) were scattered
among 3,300 pages of various Memory Dump Analysis Anthology volumes
(3 – 7, 8a), and a few found only in Software Diagnostics Library3. So we
decided to reprint all these patterns and their illustrations in one small
book and full color for easy reference. During editing, we also corrected
various mistakes.
http://www.dumpanalysis.org/contact
Abnormal Value
Please note that we also have Significant Event (page 167) pattern
that is more general and also covers messages without variable part or just
suspicious log entries.
18 Activity Disruption
Activity Disruption
Sometimes a few Error Messages (page 75) or Periodic Errors (page 147)
with low Statement Density (page 178) for specific Activity Regions (page
23) or Adjoint Threads of Activity (for a specific component, file or
function, page 30) may constitute Activity Disruption. If the particular
functionality was no longer available at the logging time, then its
unavailability may not be explained by such disruptions, and such
messages may be considered False Positive Errors (page 85) in relation to
the reported problem:
Activity Disruption 19
Activity Divergence
Activity Overlap
For example, a first request may start a new session, and we expect
the second request processed by the same already established session:
However, users report the second session started upon the second
request. If we filter execution log by session id, we find out that session
initialization prologs (page 196) are overlapped. The new session started
because the first session initialization was not completed:
Activity Region 23
Activity Region
Activity Theatre
In addition to Message Patterns (page 129), there are higher level patterns
of specific activities and Motives (page 137). Such activities may or may
not coincide with specific components (modules) because they may be
grouped based on implementation messages, software internals semantics
and not on architectural and design entities (as in Use Case Trail analysis
pattern, page 205). Moreover, the same components may “play” different
activity roles. Once assigned, Activity Theatre “scripts” can be compared
with “scripts” from other traces and logs (Inter-Correlation, page 110) or
different parts of the same log (Intra-Correlation, page 113). This pattern
is illustrated in the following diagram:
Adjoint Message 25
Adjoint Message
Adjoint Space
There is also a reverse situation when we use logs to see past data
changes before memory snapshot time (Paratext memory analysis
pattern10):
30 Adjoint Thread of Activity
… … … … … … …
… … … … … … …
Adjoint Thread of Activity 33
… … … … … … …
Anchor Messages 35
Anchor Messages
Back Trace
Usually, when we analyze traces and find Anchor Message (page 35) or
Error Message (page 75), we backtrack using a combination of Data Flow
(page 58) and Message Sets (page 129). Then we select the appropriate log
messages to form Back Trace leading to a possible root cause message:
This pattern is different from Error Thread (page 76) pattern that
just backtracks messages having the same TID (or, in general, ATID18). It is
also different from Exception Stack Trace (page 81) pattern that is just a
serialized stack trace from memory snapshot.
Background and Foreground Components 39
Basic Facts
At least we can be sure that this trace was taken for the user test01
especially when we expect this or similar trace statements. If we do not
see this trace statement, we may suppose that the trace was taken at the
wrong time, for example, when the problem had already happened.
Bifurcation Point 43
Bifurcation Point
The following two software traces from working and non-working software
environments are a perfect example of the pattern. We borrow the name
21
of this pattern from catastrophe theory :
First, we notice that in both traces PID are the same (2768 and
3756) and we can conclude for this reason that most likely both traces
came from the same environment and session. Second, messages A, B, C
and further are identical up to messages X and Y. The latter two messages
differ greatly in their query results XXX and YYY. After that, message
distribution differs greatly in both size and content. Despite the same
44 Bifurcation Point
tracing time, 15 seconds, Statement Current (page 178) is 155 msg/s for
working and 388 msg/s for the non-working case.
Blackout
We recently analyzed a Process Monitor log that had a several hour gap
that we call Blackout. If you see such a pattern, it might have the following
possible causes:
Break-in Activity
This is a message or a set of messages that surface just before the end of
Discontinuity (page 71) of Adjoint Thread (page 30) and possibly triggered
it:
48 Calibrating Trace
Calibrating Trace
Multiple traces and logs are usually collected for diagnosing distributed
systems. Different tools and tracing settings (circular, sequential, file size
limit) may be used, systems may be unsynchronized, and individual system
tracing may be started at different times due to manual tracing setup and
switching between systems. There may be Blackouts (page 45), Circular
(page 52), and Truncated (page 203) traces. When we analyze such a trace
set (Inter-Correlation, page 110) we usually select one trace or log that is
used as Calibrating Trace. It is used for measuring all other traces against
Basic Facts (page 42) such as start and end tracing times, and the time of
the problem. One such scenario is illustrated in the following diagram:
Characteristic Message Block 49
Bird’s eye view of software traces22 makes it easier to see their coarse
blocked structure:
50 Characteristic Message Block
Circular Trace
Correlated Discontinuity
Corrupt Message
Sometimes log messages are formatted with mistakes; buffers are not
cleared before copying; copied strings are truncated; tracing
implementation and presentation contains coding defects. There can be
internal corruption when messages are formed or “corruption” during
presentation, for example, default field conversion rules (like in Excel). We
call this pattern Corrupt Message. Such messages may affect trace and log
analysis where data search may not show full relevant results. We then
recommend doublechecking findings by using Data Flow (page 58) of a
different Message Invariant (page 128).
Counter Value 55
Counter Value
This pattern covers performance monitoring and its logs. Counter Value is
some variable in memory, for example, Module Variable 23 memory
analysis pattern, that is updated periodically to reflect some aspect of state
or calculated from different variables and presented in trace messages. We
can organize such messages in a similar format as ETW based traces we
usually consider as examples for our trace patterns:
Therefore, all other trace and log analysis patterns such as Adjoint
Thread (page 30, can be visualized via different colors on a graph), Focus
of Tracing (page 90), Characteristic Message Block (page 48, for graphs),
Activity Region (page 23), Significant Event (page 167), and others can be
applicable here. There are also some specific patterns such as Global
Monotonicity and Constant Value that we discuss with examples in later
reference editions.
56 Coupled Activities
Coupled Activities
Data Association
This pattern is also different from Data Flow (page 58) where a
value stays constant across different sources and messages. It is also
different from Gossip (page 94) pattern that involves more semantic
changes. Metaphorically we can think of this pattern as a partial
derivative25.
58 Data Flow
Data Flow
Data Interval
When we have very large traces and Basic Facts (page 42) containing some
data values such as a user name, device name, or registry key value we
may use Data Interval analysis pattern to select the trace fragment for the
initial log analysis. The first and the last trace messages containing selected
data for the closed Data Interval. Depending on the trace size and other
considerations we can also choose open Data Intervals. It is illustrated in
the following diagram where we use Analysis interval notation borrowed
mathematics26:
Data Reversal
But it can also happen for some message types and not for others.
Typical example here are Enter/Leave trace messages for nested
synchronization objects such as monitors and critical sections:
Since we talk about the same message type (the same Message
Invariant, page 128), this pattern is different from Event Sequence Order
(page 78) pattern.
Data Selector
Declarative Trace
Defamiliarizing Effect
27
Ange Leccia, Motionless Journeys , by Fabien Danesi
Density Distribution
Dialogue
Message and source are not only IP addresses or port numbers but
also window handles, for example. Sometimes, the roles of source and
target play different Process ID and Thread ID combinations (Client ID,
CID). In such cases some parts of a message text may signify reply and
response as shown graphically:
Dialogue 69
Note that on all illustrations above the 3rd request does not have
a reply message: a possible Incomplete History (page 102) pattern.
70 Diegetic Messages
Diegetic Messages
Some modules may emit messages that tell about their status, but
from their message text, we know the larger computation story like in a
process startup sequence example30.
Discontinuity 71
Discontinuity
However, when looking for any Discontinuities (page 71) for the
thread 5476 we see the gap of more than 7 minutes:
Empty Trace
Empty Trace is another trivial missing pattern that we need to add to make
software log analysis pattern system complete. It ranges from an empty
trace message list where only a meta-trace header (if any) describing
overall trace structure is present to a few messages where we expect
thousands. It is also an extreme case of Truncated Trace (page 201), No
Activity (page 140) and Missing Component (page 134) patterns for a
trace taken as a whole. Note that an empty trace file does not necessarily
have a zero file size because a tracing architecture may preallocate some
file space for block data writing.
74 Error Distribution
Error Distribution
Error Message
Error Powerset
Typical software trace may contain several Error Messages (page 75) with
different error codes and different exception names with Exception Stack
Traces (page 81). Searching for individual codes or exceptions in problem
databases may show many matches. Searching for all of them may show
nothing. Therefore, we can construct the set of all subsets of the set of
codes and exceptions (a power set35) and perform analytic reasoning (and
a search) based on certain subsets based on the problem description,
Trace Viewpoints (page 198) such as Use Case Trails (page 205), Motifs
(page 137), Focus of Tracing (page 90), Foreground Components (page
39), (Adjoint, page 30) Threads of Activity (page 181), and simply some
Activity Regions (page 23) and Message Sets (page 129).
Error Thread
When we see Error Message (page 75) or Exception Stack Trace (page 81)
in a log file, we might want to explore the sequence of messages from the
same Thread of Activity (page 181) that led to the error. Such Message
Set (page 129) has analogy with memory analysis patterns such as
Execution Residue (of partial stack traces without overwrites36) and Stack
Trace (where the error message is a top stack frame37):
78 Event Sequence Order
Sometimes we have several use case instances traced into one log file.
Messages and Activity Regions (page 23) from many Use Case Trails (page
205) intermingle and make analysis difficult especially with the absence of
UCID (Use Case ID), any other identification tags, or Linked Messages
(page 120). However, initially, most of the time we are interested in a
sequence of Significant Events (page 167). After finding Anchor Messages
(page 35), we can use Time Deltas (page 184) to differentiate between
trace statements from different Use Case Trails. Here we assume correct
Event Sequence Order (page 78). We call this pattern Event Sequence
Phase by analogy with wave phases38. All such individual “waves” may
have different “shapes” due to various delays between different stages of
their use case and implementation narratives:
80 Event Sequence Phase
Often the analysis of software traces starts with searching for short textual
patterns, like a failure or an exception code or simply the word
“exception”. In addition, some software components can record their own
exceptions or exceptions that were propagated to them including full stack
traces. It is all common in .NET and Java environments. Here is a synthetic
and beautified example based on real software traces:
In the embedded stack trace we see that App object was trying to
enumerate business objects and asked Store object to get some data. The
latter object was probably trying to communicate with the real data store
via DCOM. The communication attempt failed with HRESULT.
Factor Group 83
Factor Group
We borrowed the next trace and log analysis pattern name from factor
groups in mathematics (or quotient groups39). Here a group is, of course,
not a mathematical group40, but just a group (or set) of log messages or
trace statements. However, every trace message has variable and invariant
parts (Message Invariant, page 128). Variable parts usually contain some
values, addresses or status bits. They can even be string values. Such
values from a set too and can be partitioned into disjoint (non-overlapping)
subsets. For example, a window foreground status can be either TRUE or
FALSE. In addition, we can group messages into disjoint factor groups each
one having either only true or only false foreground status. The following
trace graph illustrates a WindowHistory6441 log where it was reported that
one window was periodically losing and gaining focus:
84 Factor Group
OpenProcess error 5
Fiber Bundle
The modern software trace recording, visualization, and analysis tools such
as Process Monitor, Xperf, WPR, and WPA provide stack traces associated
with trace messages. Consider stack traces as software traces we have, in a
more general case, traces (fibers) bundled together on (attached to) a base
software trace. For example, a trace message, that mentions an IRP can
have its I/O stack attached together with the thread stack trace with
function calls leading to a function that emitted the trace message.
Another example is an association of different types of traces with trace
messages such as managed and unmanaged ones. This general trace
analysis pattern needed a name, so we opted for Fiber Bundle as an
analogy with a fiber bundle 42 from mathematics. Here’s a graphical
representation of stack traces recorded for each trace message where one
message also has an associated I/O stack trace:
Fiber Bundle 87
Trace
messages
I/O stack
Thread stack trace
88 Fiber of Activity
Fiber of Activity
When using complex trace and log analysis patterns such as Fourier
Activity (page 91) we may be first interested in selecting all instances of a
particular message type from specific Thread of Activity (page 181) and
then look for Time Deltas (page 184), Discontinuities (page 71), Data Flow
(page 58), and other patterns. We call this analysis pattern Fiber of Activity
by the analogy of fibers43 (lightweight threads) since the individual thread
execution flow is “co-operative” inside, whereas threads themselves are
preempted outside. The following diagram from Fourier Activity analysis
pattern example illustrates the concept by showing three fibers:
File Size
Trace and log analysis starts with the assessment of artifact File Size,
especially with multiple logging scenarios in distributed systems. If all log
files are of the same size, we might have either Circular Traces (page 52) or
Truncated Traces (page 201). Both point to wrong trace timing plan44 or
just using default tracing tool configuration.
90 Focus of Tracing
Focus of Tracing
Fourier Activity
Sometimes we have trace and log messages that appear with certain time-
frequency throughout all log or specific Thread of Activity (page 181). Such
frequencies may fluctuate reflecting varying system or process
performance. Analyzing trace areas where such messages have different
Time Deltas (page 184) may point to additional diagnostic log messages
useful for root cause analysis. The following minimal trace graph depicts
the recent log analysis for proprietary file copy operation where the
frequency of internal communication channel entry/exit Opposition
Messages (page 143) was decreasing from time to time. Such periods were
correlating with increased time intervals between “entry” and “exit”
messages. Analysis of messages between them revealed additional
diagnostic statements missing in periods of higher frequency and
corresponding Timeouts (page 185) adding up to overall performance
degradation and slowness of copy operation.
92 Fourier Activity
Glued Activity
Adjoint Thread (page 30) invariants like PID can be reused giving rise to
curious CDF (ETW) traces where two separate execution entities are glued
together in one trace. For example, in one trace we see AppA and AppB
sharing the same PID:
Gossip
This pattern has a funny name Gossip. We were thinking originally to call it
Duplicated Message but gave it the new name allowing for the possibility
of the semantics of the same message to be distorted in subsequent trace
messages from different Adjoint Threads (page 30). Here is a typical ETW /
CDF trace example (distortion free) of the same message content seen in
different modules (we omitted some columns like Date and Time):
Guest Component
Hidden Error
Hidden Facts
The previous patterns such as Basic Facts (page 42) and Vocabulary Index
(page 208) address the mapping of a problem description to software
execution artifacts such traces and logs. Indirect Facts (page 104) analysis
pattern addresses the problem of an incomplete problem description.
However, we need another pattern for completeness that addresses the
mapping from a log to troubleshooting and debugging recommendations.
We call it Hidden Facts that are uncovered by trace and log analysis. Of
course, there can be many such hidden facts, and usually, they are
uncovered after narrowing down analysis to particular Threads of Activity
(page 181), Adjoint Threads (page 30), Message Context (page 125),
Message Set (page 129), or Data Flow (page 58) patterns. The need for
that pattern had arisen during the pattern-oriented analysis of the trace
case study from Malcolm McCaffery 49 and can be illustrated in the
following diagram:
98 Identification Messages
Identification Messages
For example, in one case there were problems with the custom
status bar. However, the window handle for it or its parent wasn’t
specified in the problem report. In the log file, we had a lot of messages
describing GUI behavior of many windows. To find out the status bar we
thought that it should have small height but long width. Indeed, we found
one such child window. In addition, for this window the log file contained
many messages related to frequent window text changes, possibly
reflecting the status bar updates. Having identified the window handle, we
proceeded to the analysis of another log with thousands of window
messages. Because of the known window handle we were able to select
only messages pertaining to our problem status bar.
100 Implementation Discourse
Implementation Discourse
Impossible Trace
void foo()
{
TRACE("foo: start");
bar();
TRACE("foo: end");
}
void bar()
{
TRACE("bar: start");
// some code ...
TRACE("bar: end");
}
Incomplete History
Indexical Trace
Indirect Facts
Sometimes in the case of missing Basic Facts (page 42), we can discern
Indirect Facts from the message text and even from other patterns. For
example, in one incident we were interested in all messages from the
certain process name, but its PID was missing from the problem
description. Fortunately, we were able to get its PID from one of the
individual messages from the completely different source:
Indirect Message 105
Indirect Message
Inter-Correlation
shown with an incorrect dimension. We, for this reason, request the
application trace and in addition WindowHistory 51 trace to see
how coordinates of all windows change over time. We easily find some
Basic Facts (page 42) in both traces such as window class name or time,
but it looks like window handle is different. In another set of traces
recorded for comparison, we have same window handle values; class name
is absent from the ETW trace, but a process and thread ID for the same
window handle are different. We, for this reason, do not see a correlation
between these traces and suspect that both traces in two sets
were recorded in different terminal sessions, for example:
ETW trace:
WindowHistory trace:
Interspace
General traces and logs52 may have Message Space (page 131) regions
“surrounded” by the so-called Interspace. Such Interspace regions may
link individual Message Space regions like in this diagram generalizing
WinDbg !process 0 3f command output:
Intra-Correlation 113
Intra-Correlation
Sometimes we see a functional activity and Basic Facts (page 42) in a trace.
Then we might want to find a correlation between that activity and facts in
another part of the trace. If that intra-correlation fits into our problem
description, we may claim a possible explanation or, if we are lucky, we
have just found, an inference to the best explanation, as philosophers of
science like to say. Here is an example, but this time using
WindowHistory tracing tool53 . A third-party application was frequently
losing the focus, and the suspicion was on a terminal services client
process. We found that the following WindowHistory trace fragment
corresponded to that application:
We can see that most of the time when Application A window loses
focus, Application B window gets it.
116 Last Activity
Last Activity
Layered Periodization
Message layer:
118 Layered Periodization
Linked Messages
Macrofunction
Marked Message
Here [+] means the activity is present in the trace and [-] means
the activity is either undetected or definitely not present. Sometimes a
non-present activity can be a marked activity corresponding to all-inclusive
unmarked present activity (see, for example, No Activity pattern, page
140).
Master Trace 123
Master Trace
When reading and analyzing software traces and logs we always compare
them to Master Trace. Other names for this pattern borrowed from
narrative theory include Metatrace, Canonical Trace or Archetype. When
we look at the software trace from a system we either know the correct
sequence of Activity Regions (page 23), expect certain Background and
Foreground Components (page 39), Event Sequence Order (page 78), or
mentally construct a model based on our experience and Implementation
Discourse (page 98). For the latter example, software engineers internalize
software master narratives when they construct code and write tracing
code for supportability. For the former example, it is important to have a
repository of traces corresponding to Master Traces. Such a repository
helps in finding deviations after Bifurcation Point (page 43). Consider such
comparisons similar to regression testing when we check the computation
output against the expected prerecorded sequence.
124 Message Change
Message Change
Message Context
Message Cover
Message Interleave
Message Invariant
Most of the time software trace messages coming from the same source
code fragment (PLOT 60 ) contain invariant parts such as function and
variable names, descriptions, and mutable parts such as pointer values and
error codes. Message Invariant is a pattern useful for comparative analysis
of several trace files where we are interested in message differences. For
example, in one troubleshooting scenario, certain objects were not created
correctly for one user. We suspected a different object version was linked
to a user profile. We recorded separate application debug traces for each
user, and we could see the version 0×4 for a problem user and 0×5 for all
other normal users:
Message Pattern
Now we come to the trace and log analysis pattern that we call Message
Pattern. It is an ordered set of messages from Thread of Activity (page
181) or Adjoint Thread of Activity (page 30) having Message Invariants
(page 128) that can be used for matching another ordered set of messages
in another (Inter-Correlation, page 110) or the same trace or log (Intra-
Correlation, page 113). A typical Message Pattern from one of our own
trace and log analysis sessions is depicted in the following diagram:
130 Message Set
Message Set
Often, especially for large software logs, we need to select messages based
on some criteria be it a set of Error Messages (page 75), a set of messages
containing Basic Facts (page 42), or some other predicate. Then we can use
selected messages from that message set as Anchor Messages (page 35) or
reverse Pivot Messages (page 151) as an aid in further analysis.
Message Space 131
Message Space
Meta Trace
Milestones
They can also be a part of Significant Events (page 167), serve the
role of Anchor Messages (page 35), and be a part of Basic Facts (page 42)
and Vocabulary Index (page 208).
134 Missing Component
Missing Component
Missing Data
Some tracing architectures, especially the ones that intercept API calls by
filtering or hooking, may log synchronous requests by remembering to
write done return result in the same trace message later on when the
response is available after the wait. If such data is still not available in the
log or trace, it may point to some blocked request for which another
software execution artifact analysis (such as memory dump analysis) is
necessary. In some cases, the analysis of the corresponding Fiber Bundle
(page 86) stack trace may point to Blocking Module64 or the involvement
of file system filters (Stack Trace65). This analysis pattern that we call
Missing Data is illustrated in the following diagram:
136 Missing Message
Missing Message
Motif
News Value
// LogA
05/11/10 18:28:15.1562 : Service() - entry
[...]
14/12/10 10:31:58.0381 : Notification: sleep
* Start of new log *
14/12/10 10:34:38.4687 : Service() - entry
[...]
14/12/10 11:53:35.2729 : Service.CleanUp complete
* Start of new log *
14/12/10 11:56:11.7031 : Service() - entry
[...]
14/12/10 15:25:23.3004 : Notification: sleep
// LogB
[ 1] 12/14 10:34:29:890 Entry: ctor
[...]
[ 2] 12/14 11:53:30:866 Exit: COMServer.Server.DeleteObject
// LogC
[ 1] 12/14 11:56:03:359 Entry: ctor
[...]
[20] 12/14 15:30:20:110 Exit: Kernel32.Buffer.Release
analysis of the more recent logs. We also see that portions of LogA overlap
with LogB and LogC and, for this reason, have analysis value for us.
140 No Activity
No Activity
0:000> ~*kL
0:000> !cs -l -o -s
-----------------------------------------
DebugInfo = 0x01facdd0
Critical section = 0x01da19c0 (+0x1DA19C0)
LOCKED
LockCount = 0×2
WaiterWoken = No
OwningThread = 0×00001384
RecursionCount = 0×1
LockSemaphore = 0×578
SpinCount = 0×00000000
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled
0:000> ~~[1384]
^ Illegal thread error in ‘~~[1384]’
No Trace Metafile
In some cases when we don’t have TMF files (Trace Meta Files) it is
possible to detect broad behavioral patterns such as:
Opposition Messages
open / close
create / destroy
allocate / free (deallocate)
call / return
enter / exit (leave)
load / unload
save / load
lock / unlock
map / unmap
Original Message
This pattern deals with software trace messages where a certain activity is
repeated several times, but the only the first message occurrence, or
specific message vocabulary has significance for analysis activity. Typical
example from CDF / ETW tracing is module load events:
Palimpsest Messages
Palimpsest Messages are messages where some part or all of their content
was erased or overwritten.
relevant data is erased and by using Intra- (page 112) and Inter-
Correlation (page 110), and via the analysis of Message Invariants (page
128), it is possible to recover the original data. Also, as in Recovered
Messages (page 157) pattern, it may be possible to use Message Context
(page 125) to infer some partial content.
Periodic Error 147
Periodic Error
Periodic Error is the obvious and to some extent the trivial pattern. It is an
error or status value that is observed periodically many times:
or
Here single trace entries can be isolated from the trace and studied
in detail. We should be aware though that some modules might report
Periodic Errors that are false positive, in a sense, that they are expected as
a part of implementation details, for example, when a function returns an
error to indicate that the bigger buffer is required or to estimate its size for
a subsequent call.
148 Periodic Message Block
This pattern is similar to Periodic Error (page 145) but not limited to errors
or failure reports. One such example we recently encountered is when
some adjoint activity (such as messages from specific PID, Adjoint Thread,
page 30) stops appearing after the middle of the trace, and after that there
are repeated blocks of similar messages (Message Invariant, page 128)
from different PIDs with their threads checking some condition (for
example, waiting for an event and reporting timeouts):
Piecewise Activity 149
Piecewise Activity
Activity Regions (page 23) or blocks of messages having the same TID or
PID usually follow each other in a typical complex software trace. Such
following can be completely random and independent, or it may be linear
based on IPC or some inter-thread communication mechanism. For
example, after filtering out Background Components (page 39) we may
find that an RPC client call setup is followed by messages from an RPC
server:
150 Piecewise Activity
Using coordinate approach with the message number and PID axes
we can reformat this minimal trace diagram:
Pivot Message
Punctuated Activity
Quotient Trace
Recovered Messages
Relative Density
Resume Activity
Ruptured Trace
The name of the pattern comes from the notion of repeated DNA
sequences80.
164 Shared Point
Shared Point
Sometimes we know from Basic Facts (page 42) some data or activity we
seek to identify in different traces collected together to perform Inter-
Correlational analysis (page 105). It can be a shared file name, a named
synchronization object, a locked file with sharing violations, a common
virtual address in kernel space, or just some activity notification. We call
this pattern by analogy with intersecting curves in some abstract space.
Sheaf of Activities
Significant Event
When looking at software traces and doing either a search for or just
scrolling, certain messages grab attention immediately. We call them
Significant Events. It could be a recorded exception (Exception Stack
Trace, page 81) or an error, Basic Fact (page 42), a trace message from
Vocabulary Index (page 208), or just any trace statement that marks the
start of some activity we want to explore in depth, for example, a certain
DLL is attached to the process, a coupled process is started, or a function is
called. The start of a trace and the end of it are trivial Significant Events
and are used in deciding whether the trace is Circular (page 52), in
determining the trace recording interval or its average Statement Current
(page 178).
168 Silent Messages
Silent Messages
[...]
11 ms: message
12 ms: ----
13 ms: message
14 ms: ----
15 ms: message
16 ms: message
17 ms: ----
18 ms: ----
19 ms: message
[...]
Singleton Event
Small DA+TA
EnableFunctionality = 0
Sparse Trace
Sometimes we do not see anything in the trace or see very little because
trace statements did not cover particular source code fragment (see also
PLOTs84):
Split Trace
Some tracing tools such as CDFControl85 have the option to split software
traces and logs into several files during long recording. Although this
should be done judiciously, it is necessary sometimes. What to do if we get
several trace files and we want to use some other analysis tool such as
CDFAnalyzer86? If we know that the problem happened just before the
tracing was stopped, we can look at the last few such files from the file
sequence (although we recommend Circular Trace here, page 52).
Otherwise, we can convert them into CVS files and import into Excel, which
also supports adjoint threading87.
State and Event 175
State Dump
N PID TID
21 5928 8092 LookupAccountSid failed. Result = -2146238462
[...]
1013 5928 1340 SQL execution needs a retry. Result = 0
number of messages. For our first trace, we see that messages start from
the very beginning, and in our second trace they also almost start from the
beginning. So such adjustment should not give much better results here.
Also, these statements continue to be recorded till the very end of these
traces.
The possibility that much more was traced that resulted in lower
density for the second trace should be discarded because we have much
lower current. Perhaps the environment was not quite the same for the
second tracing. However, the same relative density for two different errors
suggests that they are correlated, and the higher density of the first error
suggests that we should start our investigation from it.
Surveyor
Thread of Activity
When we have software traces that record process identifiers (PID) and
thread identifiers (TID) it is important to differentiate between trace
statements sorted by time and Thread of Activity. The latter is simply the
flow of trace messages sorted by TID, and it is very helpful in cases with
dense traces coming from hundreds of processes and components. Here is
89
an example from MessageHistory bulk trace fragment showing different
Threads of Activity in different font styles:
[...]
21:5:41:990 S PID: a7c TID: 554 HWND: 0×0000000000010E62 Class:
“ToolbarWindow32″ Title: “” WM_USER+4b (0×44b) wParam: 0×14 lParam: 0×749e300
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E4A Class: “CtrlNotifySink”
Title: “” WM_NOTIFY (0×4e) wParam: 0×0 lParam: 0×749efa8
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+3f (0×43f) wParam: 0×14 lParam: 0×749e1e0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+4b (0×44b) wParam: 0×14 lParam: 0×749e300
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+19 (0×419) wParam: 0×14 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+61 (0×461) wParam: 0×6 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+56 (0×456) wParam: 0×0 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E4A Class: “CtrlNotifySink”
Title: “” WM_NOTIFY (0×4e) wParam: 0×0 lParam: 0×749f290
21:5:41:990 S PID: a7c TID: 554 HWND: 0×000E04A8 Class: “CtrlNotifySink”
Title: “” WM_NCPAINT (0×85) wParam: 0xffffffffcc043bdb lParam: 0×0
21:5:41:990 P PID: a7c TID: 554 HWND: 0×000E04A8 Class: “CtrlNotifySink”
Title: “” WM_PAINT (0xf) wParam: 0×0 lParam: 0×0
21:5:42:007 S PID: 1a8 TID: 660 HWND: 0×0001003C Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_WINDOWPOSCHANGING (0×46) wParam: 0×0
lParam: 0×29af030
21:5:42:007 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×6 lParam: 0×0
21:5:42:007 P PID: 1a8 TID: 660 HWND: 0×0001003C Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×8 lParam: 0×0
21:5:42:007 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×9 lParam: 0×0
21:5:42:022 P PID: a7c TID: a28 HWND: 0×0001061A Class: “WPDShServiceObject”
182 Thread of Activity
We sort by TID 7948 to see what happened before the error and get
additional information like the server name:
Time Delta
Time Delta is a time interval between Significant Events (page 167) or any
messages of interest, in general. For example:
Timeout
We filtered the trace for the error message TID and found three
Timeouts 30 minutes each:
Timeout 187
188 Trace Acceleration
Trace Acceleration
Trace Dimension
Trace Extension
Trace Frames
Trace Mask
Trace Partition
Trace Viewpoints
Error viewpoints (see also False Positive Error, page 85, Periodic
Error, page 145, and Error Distribution, page 74)
Use case (functional) viewpoints (see also Use Case Trail, page
205)
Architectural (design) viewpoints (see also Milestones, page 133)
Implementation viewpoints (see also Implementation Discourse,
page 98, Macrofunctions, page 121, and Focus of Tracing, page
90)
Non-functional viewpoints (see also Counter Value, page 54, and
Diegetic Messages, page 70)
Signal / noise viewpoints (see also Background and Foreground
Components, page 39)
Trace Viewpoints 199
Traces of Individuality
Translated Message
Sometimes we have messages that report about the error but do not give
exact details. For example, “Communication error. The problem on the
server side” or “Access denied error”. This may be the case of Translated
Messages. Such messages are plain language descriptions or
reinterpretations of flags, error and status codes contained in another log
message. These descriptions may be coming from system API, for example,
FormatMessage from Windows API, or may be from the custom
formatting code. Since the code translating the message is in close
proximity to the original message both messages usually follow each other
with zero or very small Time Delta (page 184), come from the same
component, file, function, and belong to the same Thread of Activity (page
181):
202 Translated Message
This pattern is different from Gossip (page 94) because the latter
messages come from different modules, and, although they reflect some
underlying event, they are independent of each.
Truncated Trace 203
Truncated Trace
UI Message
By filtering the emitting module we can create Adjoint Thread (page 30):
Master Traces (page 123) may also correspond to use cases, but
they should ideally correspond to only one use case instance.
206 Visibility Limit
Visibility Limit
Often it is not possible to trace from the very beginning of the software
execution. Obviously, internal application tracing cannot trace anything
before that application start and its early initialization. The same is for
system-wide tracing that cannot trace before the logging subsystem or
service starts. For this reason, each log has its visibility limit in addition to
possible Truncation (page 201) or Missing Components (page 134):
Visitor Trace
Some traces and logs may have Periodic Message Blocks (page 148) with
very similar message structure and content (mostly Message Invariants,
page 128). The only significant difference between them is some unique
data. We call such pattern Visitor Trace by analogy with Visitor design
pattern95 where tracing code “visits” each object data or data part to log its
content or status.
208 Vocabulary Index
Vocabulary Index
What will you do when confronted with one million trace messages
recorded between 10:44:15 and 10:46:55 with an average trace Statement
Current (page 178) of 7,000 msg/s from dozens of modules and having a
one sentence problem description? One solution is to try to search for a
specific vocabulary relevant to the problem description. For example, if a
problem is intermittent re-authentication then we might try to search for a
word “password” or a similar one drawn from a troubleshooting domain
vocabulary. So it is useful to have Vocabulary Index to search for. In our
trace example, the search for “password” jumps straight to small Activity
Region (page 23) of authorization modules starting from the message
number #180,010 and the last “password” occurrence is in the message
#180,490 that narrows initial analysis region to just 500 messages. Note
the similarity here between a book and its index and a trace as a software
narrative and its vocabulary index.
Watch Thread 209
Watch Thread
This analysis pattern is different from State Dump (page 177) which
is about intrinsic tracing where the developer of logging statements
already incorporated variable watch in the source code. Watch Threads
are completely independent of original tracing and may be added
independently. Counter Value (page 54) is the simplest example of Watch
Thread if done externally because the former usually doesn’t require
source code and often means some OS or Module Variable96 independent
of product internals. Watch Thread is also similar to Data Flow (page 58)
pattern where specific data we are interested in is a part of every trace
message.
211
Index of Patterns
A B
Adjoint Thread of Activity, 3, 18, 19, CDF, 25, 33, 48, 57, 79, 84, 88, 105,
21, 27, 33, 38, 50, 51, 61, 62, 64, 106, 150, 164, 208
69, 84, 85, 95, 100, 104, 105, 106,
110, 136, 137, 142, 146, 168, 171,
Characteristic Message Block, 53,
174, 175, 189, 216, 220, 228, 233
56, 61, 71, 135
Constant Value, 61
212
Corrupt Message, 59 E
Counter Value, 4, 60, 203, 227, 240
Empty Trace, 82
Coupled Activities, 4, 62
Error Distribution, 5, 83, 227
Coupled Processes, 62
Error Message, 5, 18, 41, 62, 83, 84,
85, 86, 108, 111, 142, 148, 178,
D 205, 211
Data Flow, 4, 41, 59, 64, 99, 103, Error Thread, 41, 86
110, 228, 240
ETW, 25, 33, 38, 42, 60, 88, 105,
Data Interval, 4, 65 106, 124, 125, 136, 150, 158, 164,
178
Data Reversal, 4, 66, 68
Event Sequence Order, 5, 68, 88, 89,
Data Selector, 4, 69, 70 106, 139
Focus of Tracing, 6, 61, 85, 101, 195, Indirect Message, 6, 111, 118, 119
227
Inter-Correlation, 7, 27, 52, 58, 63,
Fourier Activity, 6, 99, 102, 104 69, 74, 116, 124, 131, 146, 159,
166, 187, 188, 222, 236
G
Interspace, 7, 126, 150
Global Monotonicity, 61
Intra-Correlation, 7, 27, 58, 62, 74,
124, 127, 142, 146, 194
Glued Activity, 105, 137
Missing Component, 82, 107, 153, Periodic Error, 8, 18, 19, 79, 95, 167,
156, 160, 198, 232, 235 168, 204, 227
S T
Sequence Repeat Anomaly, 9, 186 Thread of Activity, 10, 21, 33, 34,
50, 58, 69, 79, 85, 86, 99, 102,
Shared Point, 187 104, 110, 142, 146, 162, 175, 207,
211, 216, 228, 230
Sheaf of Activities, 9, 63, 99, 142,
188 Time Delta, 89, 99, 102, 162, 210,
211
Significant Event, 17, 22, 61, 89,
131, 152, 178, 191, 210, 233 Timeout, 102, 175, 211, 212
Sparse Trace, 9, 70, 118, 131, 163, Trace Extension, 10, 206, 219
193, 196, 198
Trace Frames, 220
Spiking Thread, 185
Trace Mask, 10, 66, 222, 223
Split Trace, 49, 116, 200
Trace Partition, 220, 224
Stack Trace, 86, 154
Trace Viewpoint, 10, 85, 227
State and Event, 201
Traces of Individuality, 229
State Dump, 10, 203, 240
Translated Message, 10, 230
Statement Density and Current, 25,
47, 74, 104, 162, 180, 191, 192, Truncated Trace, 52, 82, 100, 220,
204, 215, 222, 237 232, 235
Step Dumps, 30
216
Bibliography
Accelerated Windows Software Trace Analysis: Training Course Transcript (ISBN: 978-1-
908043-42-9)
Software Narratology: An Introduction to the Applied Science of Software Stories (ISBN: 978-
1-908043-07-8)
218
Software Trace and Memory Dump Analysis: Patterns, Tools, Processes and Best Practices
(ISBN: 978-1-908043-23-8)
Notes
1
http://www.patterndiagnostics.com/accelerated-windows-software-trace-analysis-book
2
http://www.DumpAnalysis.org
3
http://www.DumpAnalysis.org/blog
4
http://www.patterndiagnostics.com/malware-narratives-materials
5
http://en.wikipedia.org/wiki/Divergence
6
Event Tracing for Windows http://msdn.microsoft.com/en-
us/library/windows/desktop/aa363668(v=vs.85).aspx
7
Citrix Diagnostic Facility
8
Memory Dump Analysis Anthology, Volume 9a, 149
9
Ibid., Volume 7, page 173
10
Ibid., Volume 7, page 225
11
http://www.debuggingexperts.com/adjoint-thread
12
Memory Dump Analysis Anthology, Volume 1, page 503
13
Looks like biology keeps giving insights into software, there is even a software phenotype
metaphor (http://turingmachine.org/~dmg/papers/dmg2009_iwsc_siblings.pdf) although a
bit restricted to code, and we also need an Extended Software Phenotype.
http://en.wikipedia.org/wiki/The_Extended_Phenotype
14
Event Tracing for Windows http://msdn.microsoft.com/en-
us/library/windows/desktop/aa363668(v=vs.85).aspx
15
Citrix Diagnostic Facility
16
http://support.citrix.com/article/ctx122741
17
http://en.wikipedia.org/wiki/Adjoint
18
Memory Dump Analysis Anthology, Volume 5, page 279
19
Ibid., Volume 4, page 241
220
20
Ibid., Volume 3, page 342
21
http://en.wikipedia.org/wiki/Catastrophe_theory
22
Memory Dump Analysis Anthology, Volume 4, page 329
23
Ibid., Volume 7, page 98
24
Ibid., Volume 1, page 419
25
http://en.wikipedia.org/wiki/Partial_derivative
26
http://en.wikipedia.org/wiki/Interval_(mathematics)
27
http://www.plpfilmmakers.com/motionless-journeys
28
Memory Dump Analysis Anthology, Volume 3, page 342
29
http://en.wikipedia.org/wiki/Diegesis
30
Memory Dump Analysis Anthology, Volume 2, page 387
31
http://www.debuggingexperts.com/memory-dump-trace-analysis-unified-pattern-
approach
32
Memory Dump Analysis Anthology, Volume 2, page 387
33
http://www.patterndiagnostics.com/accelerated-software-trace-analysis
34
http://support.citrix.com/article/CTX122741
35
http://en.wikipedia.org/wiki/Power_set
36
Memory Dump Analysis Anthology, Volume 2, page 239
37
Ibid., Volume 1, page 395
38
http://en.wikipedia.org/wiki/Phase_(waves)
39
http://en.wikipedia.org/wiki/Quotient_group
40
http://en.wikipedia.org/wiki/Group_(mathematics)
41
http://support.citrix.com/article/CTX109235
42
http://en.wikipedia.org/wiki/Fiber_bundle
221
43
https://en.wikipedia.org/wiki/Fiber_(computer_science)
44
Memory Dump Analysis Anthology, Volume 7, page 437
45
https://en.wikipedia.org/wiki/Fourier_series
46
http://en.wikipedia.org/wiki/Manifold
47
Memory Dump Analysis Anthology, Volume 1, page 271
48
Ibid., Volume 7, page 162
49
http://chentiangemalc.wordpress.com/2014/06/24/case-of-the-outlook-cannot-display-
this-view/
50
Memory Dump Analysis Anthology, Volume 5, page 272
51
http://support.citrix.com/article/CTX106985
52
Special and General Trace and Log Analysis, Memory Dump Analysis Anthology, Volume 8b,
page 119
53
http://www.dumpanalysis.org/blog/index.php/2007/02/15/windowhistory-40/
54
http://en.wikipedia.org/wiki/Periodization
55
http://en.wikipedia.org/wiki/Roman_Jakobson
56
http://en.wikipedia.org/wiki/Distinctive_features
57
http://en.wikipedia.org/wiki/Phonology
58
Memory Dump Analysis Anthology, Volume 5, page 279
59
http://en.wikipedia.org/wiki/Cover_(topology)
60
Memory Dump Analysis Anthology, Volume 5, page 272
61
Special and General Trace and Log Analysis, Memory Dump Analysis Anthology, Volume 8b,
page 119
62
http://en.wikipedia.org/wiki/Metanarrative
63
http://en.wikipedia.org/wiki/Milestone_(project_management)
64
Memory Dump Analysis Anthology, Volume 6, page 54
222
65
Ibid., Volume 8a, page 48
66
https://en.wikipedia.org/wiki/Motive_(algebraic_geometry)
67
Memory Dump Analysis Anthology, Volume 7, page 386
68
Ibid., Volume 1, page 298
69
http://en.wikipedia.org/wiki/Binary_opposition
70
http://en.wikipedia.org/wiki/Ferdinand_de_Saussure
71
http://en.wikipedia.org/wiki/Palimpsest
72
Memory Dump Analysis Anthology, Volume 8a, page 121
73
http://en.wikipedia.org/wiki/Piecewise_linear_function
74
http://www.dumpanalysis.org/blog/index.php/2009/02/17/wait-chain-patterns/
75
https://en.wikipedia.org/wiki/Quotient_space_(topology)
76
http://en.wikipedia.org/wiki/Relative_density
77
Memory Dump Analysis Anthology, Volume 6, page 62
78
Ibid., Volume 1, page 305
79
Ibid., Volume 4, page 279
80
http://en.wikipedia.org/wiki/Repeated_sequence_(DNA)
81
http://en.wikipedia.org/wiki/Sheaf_(mathematics)
82
http://en.wikipedia.org/wiki/Singleton_pattern
83
Special and General Trace and Log Analysis, Memory Dump Analysis Anthology, Volume 8b,
page 119
84
Memory Dump Analysis Anthology, Volume 5, page 272
85
http://support.citrix.com/article/CTX111961
86
http://support.citrix.com/article/CTX122741
87
http://www.debugging.tv/Frames/0x14/DebuggingTV_Frame_0x14.pdf
223
88
http://www.debugging.tv/
89
http://www.dumpanalysis.org/blog/index.php/2007/01/17/messagehistory-20/
90
http://www.debugging.tv/
91
Memory Dump Analysis Anthology, Volume 5, page 276
92
http://en.wikipedia.org/wiki/Boris_Uspensky
93
Memory Dump Analysis Anthology, Volume 2, page 387
94
http://en.wikipedia.org/wiki/Use_case
95
http://en.wikipedia.org/wiki/Visitor_pattern
96
Memory Dump Analysis Anthology, Volume 7, page 98