0% found this document useful (0 votes)
69 views223 pages

Software Trace and Log Analysis: Pattern Reference Second Edition

13rt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views223 pages

Software Trace and Log Analysis: Pattern Reference Second Edition

13rt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 223

Software Trace and Log Analysis

Pattern Reference
Second Edition
Dmitry Vostokov
Software Diagnostics Institute

OpenTask
2

Published by OpenTask, Republic of Ireland

Copyright © 2016 by Dmitry Vostokov

Copyright © 2016 by Software Diagnostics Institute

All rights reserved. No part of this book may be reproduced, stored in a


retrieval system, or transmitted, in any form or by any means, without the
prior written permission of the publisher.

You must not circulate this book in any other binding or cover, and you
must impose the same condition on any acquirer.

OpenTask books are available through booksellers and distributors


worldwide. For further information or comments send requests to
[email protected].

Product and company names mentioned in this book may be trademarks


of their owners.

A CIP catalog record for this book is available from the British Library.

ISBN-13: 978-1-908043-82-5 (Paperback)

First printing, 2016


3

Table of Contents

Preface to the Second Edition 13

Preface 14

About the Author 15

A 17

Abnormal Value 17

Activity Disruption 18

Activity Divergence 20

Activity Overlap 21

Activity Region 23

Activity Theatre 24

Adjoint Message 25

Adjoint Space 27

Adjoint Thread of Activity 30

Anchor Messages 35

B 38

Back Trace 38

Background and Foreground Components 39


4

Basic Facts 42

Bifurcation Point 43

Blackout 45

Break-in Activity 47

C 48

Calibrating Trace 48

Characteristic Message Block 49

Circular Trace 52

Correlated Discontinuity 53

Corrupt Message 54

Counter Value 55

Coupled Activities 56

D 57

Data Association 57

Data Flow 58

Data Interval 59

Data Reversal 60

Data Selector 62

Declarative Trace 63
5

Defamiliarizing Effect 64

Density Distribution 66

Dialogue 67

Diegetic Messages 70

Discontinuity 71

Dominant Event Sequence 72

E 73

Empty Trace 73

Error Distribution 74

Error Message 75

Error Powerset 76

Error Thread 77

Event Sequence Order 78

Event Sequence Phase 79

Exception Stack Trace 81

F 83

Factor Group 83

False Positive Error 85

Fiber Bundle 86
6

Fiber of Activity 88

File Size 89

Focus of Tracing 90

Fourier Activity 91

G 93

Glued Activity 93

Gossip 94

Guest Component 95

H 96

Hidden Error 96

Hidden Facts 97

I 98

Identification Messages 98

Implementation Discourse 100

Impossible Trace 101

Incomplete History 102

Indexical Trace 103

Indirect Facts 104

Indirect Message 105


7

Inter-Correlation 110

Interspace 112

Intra-Correlation 113

L 116

Last Activity 116

Layered Periodization 117

Linked Messages 120

M 121

Macrofunction 121

Marked Message 122

Master Trace 123

Message Change 124

Message Context 125

Message Cover 126

Message Interleave 127

Message Invariant 128

Message Pattern 129

Message Set 130

Message Space 131


8

Meta Trace 132

Milestones 133

Missing Component 134

Missing Data 135

Missing Message 136

Motif 137

N 138

News Value 138

No Activity 140

No Trace Metafile 142

O 143

Opposition Messages 143

Original Message 144

P 145

Palimpsest Messages 145

Periodic Error 147

Periodic Message Block 148

Piecewise Activity 149

Pivot Message 151


9

Punctuated Activity 155

Q 156

Quotient Trace 156

R 157

Recovered Messages 157

Relative Density 158

Resume Activity 159

Ruptured Trace 161

S 163

Sequence Repeat Anomaly 163

Shared Point 164

Sheaf of Activities 165

Significant Event 167

Silent Messages 168

Singleton Event 170

Small DA+TA 171

Sparse Trace 173

Split Trace 174

State and Event 175


10

State Dump 177

Statement Density and Current 178

Surveyor 180

T 181

Thread of Activity 181

Time Delta 184

Timeout 185

Trace Acceleration 188

Trace Dimension 189

Trace Extension 191

Trace Frames 192

Trace Mask 194

Trace Partition 196

Trace Viewpoints 198

Traces of Individuality 200

Translated Message 201

Truncated Trace 203

U 204

UI Message 204
11

Use Case Trail 205

V 206

Visibility Limit 206

Visitor Trace 207

Vocabulary Index 208

W 209

Watch Thread 209

Index of Patterns 211

Bibliography 217

Notes 219
12

[This page intentionally left blank]


13

Preface to the Second Edition

33 new trace and log analysis patterns have been discovered since the
publication of the first edition almost two years ago. Memory Dump
Analysis Anthology has also grown to 3,800 pages with the publication of
volumes 8b, 9a, and 9b. Significant advances were made in software
diagnostics theory that is reflected in the added analysis patterns. This
edition also features better index, minor corrections to the patterns from
the first edition, and one pattern from the forthcoming volume 10a.

In addition to previous contact details, please also refer to Facebook


trace analysis page, DA+TA group, and The Software Diagnostics Group on
LinkedIn:

http://www.facebook.com/TraceAnalysis
http://www.facebook.com/groups/dumpanalysis
http://www.linkedin.com/groups/8473045
14

Preface

The need for this reference book arose when we started working on the
next version of “Accelerated Windows Software Trace Analysis” training1.
The previous version was two years old, and Software Diagnostics
Institute2 had already added 40 more trace and log analysis patterns to
their catalog. All of them (almost 100 patterns in total) were scattered
among 3,300 pages of various Memory Dump Analysis Anthology volumes
(3 – 7, 8a), and a few found only in Software Diagnostics Library3. So we
decided to reprint all these patterns and their illustrations in one small
book and full color for easy reference. During editing, we also corrected
various mistakes.

If you encounter any error, please contact me using this form

http://www.dumpanalysis.org/contact

Alternatively, send me a personal message using this contact e-mail:

[email protected]

Alternatively, via Twitter @ DumpAnalysis


15

About the Author

Dmitry Vostokov is an internationally


recognized expert, speaker, educator,
scientist and author. He is the
founder of pattern-oriented software
diagnostics, forensics and prognostics
discipline and Software Diagnostics
Institute (DA+TA: DumpAnalysis.org +
TraceAnalysis.org). Vostokov has also
authored more than 30 books on
software diagnostics, forensics and
problem-solving, memory dump
analysis, debugging, software trace and log analysis, reverse engineering,
and malware analysis. He has more than 20 years of experience in
software architecture, design, development and maintenance in a variety
of industries including leadership, technical and people management roles.
Dmitry also founded DiaThings, Logtellect, OpenTask Iterative and
Incremental Publishing (OpenTask.com), Software Diagnostics Services
(former Memory Dump Analysis Services) PatternDiagnostics.com and
Software Prognostics. In his spare time, he presents various topics on
Debugging.TV and explores Software Narratology, an applied science of
software stories that he pioneered, and its further development as
Narratology of Things and Diagnostics of Things (DoT). His current area of
interest is theoretical software diagnostics.
16

[This page intentionally left blank]


Abnormal Value 17

Abnormal Value

While preparing a presentation on malware narratives4, we found that one


essential pattern is missing from the current log analysis pattern
catalog. Most of the time, we see some abnormal or unexpected value in a
software trace or log such as a network address outside the expected
range, and this triggers further investigation. The message structure may
have the same Message Invariant (page 128), but the variable part may
contain such values as depicted graphically:

Please note that we also have Significant Event (page 167) pattern
that is more general and also covers messages without variable part or just
suspicious log entries.
18 Activity Disruption

Activity Disruption

Sometimes a few Error Messages (page 75) or Periodic Errors (page 147)
with low Statement Density (page 178) for specific Activity Regions (page
23) or Adjoint Threads of Activity (for a specific component, file or
function, page 30) may constitute Activity Disruption. If the particular
functionality was no longer available at the logging time, then its
unavailability may not be explained by such disruptions, and such
messages may be considered False Positive Errors (page 85) in relation to
the reported problem:
Activity Disruption 19

But, if we have Periodic Message Blocks (page 148) containing only


Periodic Errors (page 147), Activity Region (page 23) or Adjoint Thread
(page 30) Discontinuity (page 71), or simply No Activity (page 140), then
we may have the complete cease of activity that may correlate with
the unavailable functionality:
20 Activity Divergence

Activity Divergence

Sometimes we have several Threads of Activity (page 181, for example,


from the same process) visible for a certain period and then suddenly we
see only one such thread till the end of a trace (or even none). It may be an
indication of an application hang or some other abnormal behavior if
several active threads doing logging are normal. If we consider such
activities (including Adjoint Threads, page 30) as vectors running through
some temporal “surface” we can use an analogy of a divergence5:
Activity Overlap 21

Activity Overlap

Sometimes specific parts of simultaneous Use Case Trails (page 205),


blocks of Significant Events (page 167) or Message Sets (page 129) in
general may overlap. It may point to possible synchronization problems
such as race conditions (prognostics) or be visible root causes of them if
such problems are reported (diagnostics). We call this pattern Activity
Overlap:
22 Activity Overlap

For example, a first request may start a new session, and we expect
the second request processed by the same already established session:

However, users report the second session started upon the second
request. If we filter execution log by session id, we find out that session
initialization prologs (page 196) are overlapped. The new session started
because the first session initialization was not completed:
Activity Region 23

Activity Region

When looking at lengthy traces with thousands and millions of messages


(trace statements), we can see regions of activity where the statement
current (Jm, msg/s) is much higher than in surrounding temporal regions
(Statement Current, page 178). Here is an illustration for a typical ETW6 /
CDF 7 trace where a middle region of activity (Jm2) signifies a system
performing some response function like a user session initialization and
application launch:
24 Activity Theatre

Activity Theatre

In addition to Message Patterns (page 129), there are higher level patterns
of specific activities and Motives (page 137). Such activities may or may
not coincide with specific components (modules) because they may be
grouped based on implementation messages, software internals semantics
and not on architectural and design entities (as in Use Case Trail analysis
pattern, page 205). Moreover, the same components may “play” different
activity roles. Once assigned, Activity Theatre “scripts” can be compared
with “scripts” from other traces and logs (Inter-Correlation, page 110) or
different parts of the same log (Intra-Correlation, page 113). This pattern
is illustrated in the following diagram:
Adjoint Message 25

Adjoint Message

By analogy with Adjoint Thread of Activity (page 30), we introduce Adjoint


Message analysis pattern. Most if not all analysis patterns focus on log
message text and consider TID, PID, Module, source file and function as its
attributes. However, we can choose one of the attributes and consider it as
a message in its own right with the original message text consigned now as
another attribute. Then we can analyze the structure of the trace from the
perspective of that newly selected message:
26 Adjoint Message

Since the number of different message values now is smaller (for


example, module names) compared to normal trace messages we can use
them in protein-like encoding and structure analysis schemes (see
Software Trace and Logs as Proteins8). We metaphorically name Adjoint
Messages as Amino-acid-Messages (A-Messages). We can also compress
same message sequences into one message which may be useful for
pattern matching (and even use different color intensities to represent
message cardinalities):
Adjoint Space 27

Adjoint Space

Sometimes we need memory reference information not available in


software traces and logs, for example, to see the pointer dereferences, to
follow pointers and linked structures. In such cases, memory dumps saved
during logging sessions may help. In the case of process memory dumps,
we can even have several Step Dumps9. We may force complete or kernel
memory dumps after saving a log file. We call such pattern Adjoint Space:
28 Adjoint Space

Then we can analyze logs and memory dumps together, for


example, to follow pointer data further in memory space:
Adjoint Space 29

There is also a reverse situation when we use logs to see past data
changes before memory snapshot time (Paratext memory analysis
pattern10):
30 Adjoint Thread of Activity

Adjoint Thread of Activity

This pattern is an extension of Thread of Activity (page 181) based on the


concept of multibraiding (see below). There is also an article published in
Debugged! MZ/PE magazine11.

Having considered computational threads as braided strings12 and


after discerning software trace analysis patterns we can see formatted and
tabulated software trace output in a new light and employ the “fabric of
traces” and braid metaphors for Adjoint Thread concept. This new concept
was motivated by reading about Extended Phenotype 13 and extensive
analysis of Citrix ETW14-based CDF15 traces using CDFAnalyzer16. The term
Adjoint was borrowed from mathematics17 because the concept we discuss
below resembles this metaphorical formula: (Thread A, B) = [A, Thread B].
Let me first illustrate adjoint threading using simplified trace tables.
Consider this generalized software trace example (we omitted date and
time column for visual clarity):
Adjoint Thread of Activity 31

# Source Dir PID TID File Name Function Message

1 \src\subsystemA 2792 5676 file1.cpp fooA Message text…

2 \src\subsystemA 2792 5676 file1.cpp fooA Message text…

3 \src\subsystemA 2792 5676 file1.cpp fooA Message text…

4 \src\lib 2792 5680 file2.cpp barA Message text…

5 \src\subsystemA 2792 5680 file1.cpp fooA Message text…

6 \src\subsystemA 2792 5676 file1.cpp fooA Message text…

7 \src\lib 2792 5680 file2.cpp fooA Message text…

8 \src\lib 2792 5680 file2.cpp fooA Message text…

9 \src\subsystemB 2792 3912 file3.cpp barB Message text…

10 \src\subsystemB 2792 3912 file3.cpp barB Message text…

11 \src\subsystemB 2792 3912 file3.cpp barB Message text…

12 \src\subsystemB 2792 3912 file3.cpp barB Message text…

13 \src\subsystemB 2792 3912 file3.cpp barB Message text…

14 \src\subsystemB 2792 3912 file3.cpp barB Message text…

15 \src\subsystemB 2792 2992 file4.cpp fooB Message text…

16 \src\subsystemB 2792 3008 file4.cpp fooB Message text…

… … … … … … …

We see several threads in a process PID 2792. In CDFAnalyzer, we


can filter trace messages that belong to any column, and if we filter by TID,
we get a view of any Thread of Activity (page 181). However, each thread
can “run” through any source directory, file name or function. If a function
belongs to a library, then multiple threads access it. This source location
(we can consider it as a subsystem), file or function view of activity is called
an Adjoint Thread. For example, if we filter only subsystemA column in the
trace above we get this table:
32 Adjoint Thread of Activity

# Source Dir PID TID File Name Function Message

1 \src\subsystemA 2792 5676 file1.cpp fooA Message …

2 \src\subsystemA 2792 5676 file1.cpp fooA Message …

3 \src\subsystemA 2792 5676 file1.cpp fooA Message …

5 \src\subsystemA 2792 5680 file1.cpp fooA Message …

6 \src\subsystemA 2792 5676 file1.cpp fooA Message …

7005 \src\subsystemA 2792 5664 file1.cpp fooA Message …

10198 \src\subsystemA 2792 5664 file1.cpp fooA Message …

10364 \src\subsystemA 2792 5664 file1.cpp fooA Message …

10417 \src\subsystemA 2792 5664 file1.cpp fooA Message …

10420 \src\subsystemA 2792 5676 file1.cpp fooA Message …

10422 \src\subsystemA 2792 5680 file1.cpp fooA Message …

10587 \src\subsystemA 2792 5664 file1.cpp fooA Message …

10767 \src\subsystemA 2792 5680 file1.cpp fooA Message …

11126 \src\subsystemA 2792 5668 file1.cpp fooA Message …

11131 \src\subsystemA 2792 5680 file1.cpp fooA Message …

11398 \src\subsystemA 2792 5676 file1.cpp fooA Message …

11501 \src\subsystemA 2792 5668 file1.cpp fooA Message …

11507 \src\subsystemA 2792 5668 file1.cpp fooA Message …

11509 \src\subsystemA 2792 5664 file1.cpp fooA Message …

11513 \src\subsystemA 2792 5680 file1.cpp fooA Message …

11524 \src\subsystemA 2792 5668 file1.cpp fooA Message …

… … … … … … …
Adjoint Thread of Activity 33

We can graphically view subsystemA as a braid string that


“permeates the fabric of threads”:

We can get many different braids by changing filters, hence


multbraiding. Here is another example of a driver source file view initially
permeating two process contexts and four threads:

# Source Dir PID TID File Name Function Message

41 \src\sys\driver 3636 3848 entry.c DriverEntry IOCTL …

80 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

99 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

102 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

179 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

180 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

311 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

447 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …


34 Adjoint Thread of Activity

448 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

457 \src\sys\driver 2792 5108 entry.c DriverEntry IOCTL …

608 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

614 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

655 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

675 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

678 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

680 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

681 \src\sys\driver 3636 3896 entry.c DriverEntry IOCTL …

1145 \src\sys\driver 3636 4960 entry.c DriverEntry IOCTL …

1153 \src\sys\driver 3636 4960 entry.c DriverEntry IOCTL …

1154 \src\sys\driver 3636 4960 entry.c DriverEntry IOCTL …

… … … … … … …
Anchor Messages 35

Anchor Messages

When a software trace is lengthy, it is useful to partition it into several


regions based on a sequence of Anchor Messages. We can determine the
choice of them by Vocabulary Index (page 208) or Adjoint Thread of
Activity (page 30). For example, an ETW trace with almost 900,000
messages recorded during a desktop connection for 6 minutes can be split
into 14 segments by the adjoint thread of DLL_PROCESS_ATTACH message
(the message was generated by DllMain of an injected module, not shown
in the trace output for formatting clarity):

# PID TID Time Message


24226 2656 3480 10:41:05.774 AppA.exe: DLL_PROCESS_ATTACH
108813 4288 4072 10:41:05.774 AppB.exe: DLL_PROCESS_ATTACH
112246 4180 3836 10:41:05.940 DllHost.exe: DLL_PROCESS_ATTACH
135473 2040 3296 10:41:12.615 AppC.exe: DLL_PROCESS_ATTACH
694723 1112 1992 10:44:23.393 AppD.exe: DLL_PROCESS_ATTACH
703962 5020 1080 10:44:42.014 DllHost.exe: DLL_PROCESS_ATTACH
705511 4680 3564 10:44:42.197 DllHost.exe: DLL_PROCESS_ATTACH
705891 1528 2592 10:44:42.307 regedit.exe: DLL_PROCESS_ATTACH
785231 2992 4912 10:45:26.516 AppE.exe: DLL_PROCESS_ATTACH
786523 3984 1156 10:45:26.605 powershell.exe: DLL_PROCESS_ATTACH
817979 4188 4336 10:45:48.707 wermgr.exe: DLL_PROCESS_ATTACH
834875 3976 1512 10:45:52.342 LogonUI.exe: DLL_PROCESS_ATTACH
835229 4116 3540 10:45:52.420 AppG.exe: DLL_PROCESS_ATTACH
36 Anchor Messages

Each region can be analyzed independently for any anomalies, for


example, to look for the answer to a question why wermgr.exe was
launched. We illustrate partitioning on the following schematic diagram:
Anchor Messages 37

It is also possible to make different trace segmentation by


interleaving regions above with another set of Anchor Messages (page 35)
comprising of the adjoint thread of DLL_PROCESS_DETACH message:
38 Back Trace

Back Trace

Usually, when we analyze traces and find Anchor Message (page 35) or
Error Message (page 75), we backtrack using a combination of Data Flow
(page 58) and Message Sets (page 129). Then we select the appropriate log
messages to form Back Trace leading to a possible root cause message:

This pattern is different from Error Thread (page 76) pattern that
just backtracks messages having the same TID (or, in general, ATID18). It is
also different from Exception Stack Trace (page 81) pattern that is just a
serialized stack trace from memory snapshot.
Background and Foreground Components 39

Background and Foreground Components

A metaphorical bijection 19 from the literary narratology to software


narratology 20 provides a pattern of Background and Foreground
Components. We can easily illustrate it on pseudo-trace color diagrams.
Suppose we troubleshoot a graphical issue using an ETW trace containing
the output from all components of the problem system. Graphics
components and their messages are considered foreground for a trace
viewer (a person) against numerous background components (for example,
database, file, and registry access, shown in shades of green):
40 Background and Foreground Components

Trace viewers (for example, CDFAnalyzer) can filter out background


component messages and present only foreground components (that I
propose to call the component foregrounding):
Background and Foreground Components 41

Of course, this process is iterative and parts of what once was


foreground become background and candidate for further filtering:
42 Basic Facts

Basic Facts

A typical trace is a detailed narrative accompanied by a problem


description that lists essential facts. For this reason, the first task of any
trace analysis is to check the presence of Basic Facts in the trace. If they
are not visible or do not correspond, the trace was possibly not recorded
during the problem occurrence or was taken from a different computer or
under different conditions. Here is an example. A user “test01″ can-
not connect to a published application in a terminal services environment.
We look at the trace and find this statement:

No PID TID Date Time Statement


[...]
3903 3648 5436 4/29/2009 16:17:36.150 User Name: test01
[...]

At least we can be sure that this trace was taken for the user test01
especially when we expect this or similar trace statements. If we do not
see this trace statement, we may suppose that the trace was taken at the
wrong time, for example, when the problem had already happened.
Bifurcation Point 43

Bifurcation Point

The following two software traces from working and non-working software
environments are a perfect example of the pattern. We borrow the name
21
of this pattern from catastrophe theory :

Working trace (issue is absent):

# PID TID Message


[...]
25 2768 3056 Trace Statement A
26 3756 2600 Trace Statement B
27 3756 2600 Trace Statement C
[...]
149 3756 836 Trace Statement X (Query result: XXX)
150 3756 836 Trace Statement 150.1
151 3756 836 Trace Statement 151.1
152 3756 836 Trace Statement 152.1
153 3756 836 Trace Statement 153.1
[...]

Non-working trace (issue is present):

# PID TID Message


[...]
27 2768 3056 Trace Statement A
28 3756 2176 Trace Statement B
29 3756 2176 Trace Statement C
[...]
151 3756 5940 Trace Statement Y (Query result: YYY)
152 3756 5940 Trace Statement 152.2
153 3756 5940 Trace Statement 153.2
154 3756 5940 Trace Statement 154.2
155 3756 5940 Trace Statement 155.2
[...]

First, we notice that in both traces PID are the same (2768 and
3756) and we can conclude for this reason that most likely both traces
came from the same environment and session. Second, messages A, B, C
and further are identical up to messages X and Y. The latter two messages
differ greatly in their query results XXX and YYY. After that, message
distribution differs greatly in both size and content. Despite the same
44 Bifurcation Point

tracing time, 15 seconds, Statement Current (page 178) is 155 msg/s for
working and 388 msg/s for the non-working case.

We can easily observe Bifurcation Points when tracing noise ratio is


small and, for example, in the case of terminal services environments, we
can achieve that by selecting appropriate tracing modules based on
problem description or filtering irrelevant modules from full CDF traces.
Blackout 45

Blackout

We recently analyzed a Process Monitor log that had a several hour gap
that we call Blackout. If you see such a pattern, it might have the following
possible causes:

 Some files from Split Trace (page 174) are missing;


 Split Trace file set was artificially created;
 The tracing scope system was paused or frozen (for example, a
virtualized system), or restarted;
 The tracing itself was paused.
46 Blackout

Blackout pattern is different from Visibility Limit (page 206) where


the latter is about the inherent inability to trace, but the former is only
temporary inability due to circumstances listed above. It is also different
from Discontinuity (page 71) pattern where the latter is about gaps in
individual Threads of Activity (page 181) or Adjoint Threads of Activity
(page 30).
Break-in Activity 47

Break-in Activity

This is a message or a set of messages that surface just before the end of
Discontinuity (page 71) of Adjoint Thread (page 30) and possibly triggered
it:
48 Calibrating Trace

Calibrating Trace

Multiple traces and logs are usually collected for diagnosing distributed
systems. Different tools and tracing settings (circular, sequential, file size
limit) may be used, systems may be unsynchronized, and individual system
tracing may be started at different times due to manual tracing setup and
switching between systems. There may be Blackouts (page 45), Circular
(page 52), and Truncated (page 203) traces. When we analyze such a trace
set (Inter-Correlation, page 110) we usually select one trace or log that is
used as Calibrating Trace. It is used for measuring all other traces against
Basic Facts (page 42) such as start and end tracing times, and the time of
the problem. One such scenario is illustrated in the following diagram:
Characteristic Message Block 49

Characteristic Message Block

Bird’s eye view of software traces22 makes it easier to see their coarse
blocked structure:
50 Characteristic Message Block

Further finer structure is discernible, and we can even see nested


blocks:
Characteristic Message Block 51

We can see some blocks of output when scrolling a trace viewer


window, but if a viewer supports zooming it is possible to get an overview
and jump directly into Characteristic Message Block, for example, debug
messages of repeated attempts to query a database. If a viewer supports
message coloring, it also helps here. Sometimes, the latter technique is
useful when we want to ignore bulk messages and start an analysis around
block boundaries.
52 Circular Trace

Circular Trace

It is an obvious structural trace analysis pattern. Sometimes, the


information about circularity is missing in the problem description, or the
trace metadata does not reflect it. Then Circular Traces can be detected by
trace File Size (page 88) (usually large) and from timestamps, like this
100Mb CDF trace snippet:

No Module PID TID Date Time Statement


[Begin of trace listing]
1 ModuleA 4280 1736 5/28/2009 08:53:50.496 [... Trace statement 1]
2 ModuleB 6212 6216 5/28/2009 08:53:52.876 [... Trace statement 2]
3 ModuleA 4280 4776 5/28/2009 08:54:13.537 [... Trace statement 3]
[... Some traced exceptions helpful for analysis ...]
3799 ModuleA 4280 3776 5/28/2009 09:15:00.853 [... Trace statement 3799]
3800 ModuleA 4280 1736 5/27/2009 09:42:12.029 [... Trace statement 3800]
[... Skipped ...]
[... Skipped ...]
[... Skipped ...]
579210 ModuleA 4280 4776 5/28/2009 08:53:35.989 [... Trace statement 579210]
[End of trace listing]

We can usually find the analysis region at the beginning of such


traces because as soon as elusive and hard to reproduce problem happens,
the trace is stopped.
Correlated Discontinuity 53

Correlated Discontinuity

When analyzing Inter-Correlation (page 105) or Intra-Correlation (page


112) and finding Discontinuities (page 71) in one part or in a different
trace (for example, in client-server environments) it is useful to see if there
are corresponding Correlated Discontinuities in another part of the same
trace. For example, in a different Thread of Activity, page 181) or a
different trace. Such a pattern may point to the underlying communication
problem and may suggest gathering a different trace (for example, a
network trace) for further analysis.
54 Corrupt Message

Corrupt Message

Sometimes log messages are formatted with mistakes; buffers are not
cleared before copying; copied strings are truncated; tracing
implementation and presentation contains coding defects. There can be
internal corruption when messages are formed or “corruption” during
presentation, for example, default field conversion rules (like in Excel). We
call this pattern Corrupt Message. Such messages may affect trace and log
analysis where data search may not show full relevant results. We then
recommend doublechecking findings by using Data Flow (page 58) of a
different Message Invariant (page 128).
Counter Value 55

Counter Value

This pattern covers performance monitoring and its logs. Counter Value is
some variable in memory, for example, Module Variable 23 memory
analysis pattern, that is updated periodically to reflect some aspect of state
or calculated from different variables and presented in trace messages. We
can organize such messages in a similar format as ETW based traces we
usually consider as examples for our trace patterns:

Source PID TID Function Value


=================================================
[...]
System 0 0 Committed Memory 12,002,234,654
Process 844 0 Private Bytes 345,206,456
System 0 0 Committed Memory 12,002,236,654
Process 844 0 Working Set 122,160,068
[...]

Therefore, all other trace and log analysis patterns such as Adjoint
Thread (page 30, can be visualized via different colors on a graph), Focus
of Tracing (page 90), Characteristic Message Block (page 48, for graphs),
Activity Region (page 23), Significant Event (page 167), and others can be
applicable here. There are also some specific patterns such as Global
Monotonicity and Constant Value that we discuss with examples in later
reference editions.
56 Coupled Activities

Coupled Activities

Sometimes we need to know about the client-server interaction between


components, threads, or processes in order to find out where the problem
started. For example, if we have Error Message (page 75) or Discontinuity
(page 71) in one PID Adjoint Thread of Activity (page 30), and we know
that that process uses API from another PID, we can look at the latter PID
Adjoint Thread to see if there are any Error Messages or other problems.
The failure in the server can propagate to the client as illustrated in the
following diagram:

We call this pattern Coupled Activities similar to Coupled Processes


memory analysis pattern24. It can help in Intra- (page 113) and Inter-
Correlation (page 110) analysis, for example in choosing adjoint threads
from Sheaf of Activities (page 165).
Data Association 57

Data Association

Sometimes we are interested in changes in particular {property, value}


pairs or tuples {x1, x2, x3, ...) in general where xi can be a number or a
substring. It is a more general pattern than Message Change (page 124)
because such tuples can be from different sources and belong to different
messages:

This pattern is also different from Data Flow (page 58) where a
value stays constant across different sources and messages. It is also
different from Gossip (page 94) pattern that involves more semantic
changes. Metaphorically we can think of this pattern as a partial
derivative25.
58 Data Flow

Data Flow

If trace messages contain some character or formatted data that is passed


from module to module or between threads and processes it is possible to
trace that data and form Data Flow thread similar to Adjoint Thread (page
30) we have when we filter by a specific message. However, in the former
case we have different message types.
Data Interval 59

Data Interval

When we have very large traces and Basic Facts (page 42) containing some
data values such as a user name, device name, or registry key value we
may use Data Interval analysis pattern to select the trace fragment for the
initial log analysis. The first and the last trace messages containing selected
data for the closed Data Interval. Depending on the trace size and other
considerations we can also choose open Data Intervals. It is illustrated in
the following diagram where we use Analysis interval notation borrowed
mathematics26:

Interval boundary messages may also be used as Trace Mask


(page 194) for another trace.
60 Data Reversal

Data Reversal

Sometimes we notice that data values are in a different order than


expected. We call this pattern Data Reversal. By data values, we mean
some variable parts of a specific repeated message such the address of
some structure or object. Data Reversal may happen for one message
type:
Data Reversal 61

But it can also happen for some message types and not for others.
Typical example here are Enter/Leave trace messages for nested
synchronization objects such as monitors and critical sections:

Since we talk about the same message type (the same Message
Invariant, page 128), this pattern is different from Event Sequence Order
(page 78) pattern.

In rare cases, we may observe Data Reversal inside one message


with several variable parts, but this may also be a case of Data Association
(page 57).
62 Data Selector

Data Selector

Data Selector is a variant of Inter-Correlation (page 110) trace analysis


pattern where we use data found in one trace to select Message Set (page
129) or Adjoint (page 30) Thread of Activity (page 181) in another trace.
This analysis activity is depicted in the following picture where we have a
client log and corresponding server log. In the server log, we have entries
for many client sessions. To select messages corresponding to our client
session we use some data attribute in the client trace, for example, the
username, and Linked Messages (page 120) analysis pattern to find one of
the messages in the server log that contains the same username. Then we
find out which user session it belongs to and form its Adjoint Thread (page
30):

This pattern is different from Identification Messages (page 98)


where we don’t even know the object that emitted trace messages. In
Data Selector case we know in principle what kind of messages we are
looking for. We just need to select among many alternatives.
Declarative Trace 63

Declarative Trace

The trace statements in source code can be considered as Declarative


Trace by analogy with variable declaration and definition in programming
languages such as C and C++. Declaration of the variable doesn’t mean that
the variable will be actually used. Some declared variables such as arrays
will actually expand in memory when used (as in .bss sections). The same is
with trace messages from Declarative Trace. Some of them will not appear
in the actual software execution trace, and some will be repeated because
of loops and multiple code reentrance. However, Declarative Traces are
useful for studying the possibilities of tracing and logging design,
implementation, and coverage (for example, Sparse Trace, page 171).
Some trace analysis patterns are also applicable for Declarative Traces
such as Message Sets (page 129) and Bifurcation Points (page 43) among
different source code versions. This is illustrated in the following picture:
64 Defamiliarizing Effect

Defamiliarizing Effect

“Capturing delicate moments, one gives birth to a poetry of traces ... ”

27
Ange Leccia, Motionless Journeys , by Fabien Danesi

In this pattern from software narratology,28 we see sudden unfamiliar trace


statements across the familiar landscape of Characteristic Blocks (page 48)
and Activity Regions (page 23).

Example of a familiar trace:


Defamiliarizing Effect 65

Example of the new trace from a problem system:


66 Density Distribution

Density Distribution

Sometimes we find a grouping of some messages in one trace and then we


are interested in the same groupings either in the same trace (Intra-
Correlation, page 112) or in another trace (Inter-Correlation, page 105).
We may consider such grouping as having some local density compared to
the global Statement Density (page 178) pattern. Then we might be
interested in that selected message grouping density distribution as
illustrated on this minimal trace graph:
Dialogue 67

Dialogue

Dialogue is an important pattern, especially in the network trace analysis.


It usually involves a message source, a different message target (although
both can be the same) and some alternation between them as shown in
this abstract trace diagram:
68 Dialogue

Message and source are not only IP addresses or port numbers but
also window handles, for example. Sometimes, the roles of source and
target play different Process ID and Thread ID combinations (Client ID,
CID). In such cases some parts of a message text may signify reply and
response as shown graphically:
Dialogue 69

The similar illustration can be done for multi-computer trace, for


example, when several traces from different servers are combined into
one, where a combination of CID and a computer ID (Co) or just CO can
play the roles of source and target.

Note that on all illustrations above the 3rd request does not have
a reply message: a possible Incomplete History (page 102) pattern.
70 Diegetic Messages

Diegetic Messages

Like in literature (and in narratology, in general), we have components that


trace themselves, and components that tell the story of computation
including status updates they query from other components and
subsystems. This pattern gets its name from diegesis 29 . Here’s the
difference between diegetic (in blue) and non-diegetic trace messages:

PID TID TIME MESSAGE


11864 11912 06:34:53.598 ModuleA: foo called bar. Status OK.
11620 10372 06:34:59.754 ModuleB: ModuleA integrity check. Status OK.

Some modules may emit messages that tell about their status, but
from their message text, we know the larger computation story like in a
process startup sequence example30.
Discontinuity 71

Discontinuity

Sometimes there are reported delays in application startup, session


initialization, long response times and simply the absence of response. All
these problems can be reflected in software traces showing sudden gaps in
Threads of Activity (page 30). This pattern is called Discontinuity per
analogy with continuous and discontinuous functions in mathematics. Here
is an example. One process had a long period of CPU spiking calculation,
and we recorded a CDF trace. When we open it we see this Periodic Error
(page 145):

N PID TID Time Message


[...]
326 2592 5476 08:17:18.823 OpenRegistry: Attempting to open [.. Hive path ..]
327 2592 5476 08:17:18.824 OpenRegistry: Failed: 2
[...]

However, when looking for any Discontinuities (page 71) for the
thread 5476 we see the gap of more than 7 minutes:

N PID TID Time Message


[...]
3395 2592 5476 08:17:19.608 OpenRegistry: Attempting to open [.. Hive path ..]
3396 2592 5476 08:17:19.608 OpenRegistry: Failed: 2
3461 2592 5476 08:24:31.137 OpenRegistry: Attempting to open [.. Hive path ..]
3462 2592 5476 08:24:31.137 OpenRegistry: Failed: 2
[...]

For this reason, we have three possibilities here:

1. The process twice did lengthy CPU spiking calculations involving


registry access and was quiet between them.
2. Registry access belonged to some background activity and ceased for 7
minutes, and during that time the process had CPU spiking intensive
calculation.
3. This discontinuity is irrelevant because either the calculation module
was not selected for tracing or it simply doesn’t have relevant tracing
statement coverage for the code that does the calculation.

The full case study is covered in September 2009 issue of Debugged!


MZ/PE magazine31.
72 Dominant Event Sequence

Dominant Event Sequence

Sometimes we have insufficiently detailed problem description or there


are several similar parallel user activities going on simultaneously, for
example, several sessions are launched in a terminal services
environment. In such cases when tracing is done for the duration of a
specific user activity, this pattern may help. Here we select a full sequence
of events or event sequence based on some Basic Facts (page 42). For
example, if a session ID was missing in the problem description we can
choose the longest and fullest process launch sequence32 and assume that
its session id was the one missing:
Empty Trace 73

Empty Trace

Empty Trace is another trivial missing pattern that we need to add to make
software log analysis pattern system complete. It ranges from an empty
trace message list where only a meta-trace header (if any) describing
overall trace structure is present to a few messages where we expect
thousands. It is also an extreme case of Truncated Trace (page 201), No
Activity (page 140) and Missing Component (page 134) patterns for a
trace taken as a whole. Note that an empty trace file does not necessarily
have a zero file size because a tracing architecture may preallocate some
file space for block data writing.
74 Error Distribution

Error Distribution

Sometimes we need to pay attention to Error Distribution, for example,


the distribution of the same error across a software log space or different
Error Messages (page 75) in different parts of the same software log or
trace (providing effective partition):
Error Message 75

Error Message

While working on Accelerated Windows Software Trace Analysis training,33


we discovered some missing patterns needed for completeness despite
their triviality. One of them is called Error Message. Here an error is
reported either explicitly (”operation failed”) or implicitly as an operation
status result such as 0xC00000XX. Sometimes, a trace message designer
specifies that the number value is supplied for information only and should
be ignored. Some Error Messages may contain information that is not
relevant to the current software incident, the so-called False Positive
Errors (page 85). Some tracing architectures and tools include message
information category for errors, such as Citrix CDF, where we can filter by
error category34 to get Adjoint Thread (page 30). Please note that the
association of a trace statement with an error category is left to the
discretion of an engineer writing code. Also, information category
messages may contain implicit errors such as the last error and return
status reports.
76 Error Powerset

Error Powerset

Typical software trace may contain several Error Messages (page 75) with
different error codes and different exception names with Exception Stack
Traces (page 81). Searching for individual codes or exceptions in problem
databases may show many matches. Searching for all of them may show
nothing. Therefore, we can construct the set of all subsets of the set of
codes and exceptions (a power set35) and perform analytic reasoning (and
a search) based on certain subsets based on the problem description,
Trace Viewpoints (page 198) such as Use Case Trails (page 205), Motifs
(page 137), Focus of Tracing (page 90), Foreground Components (page
39), (Adjoint, page 30) Threads of Activity (page 181), and simply some
Activity Regions (page 23) and Message Sets (page 129).

The following picture illustrates Error Powerset analysis pattern


with a trace that has 4 error messages where two messages have the same
error code.
Error Thread 77

Error Thread

When we see Error Message (page 75) or Exception Stack Trace (page 81)
in a log file, we might want to explore the sequence of messages from the
same Thread of Activity (page 181) that led to the error. Such Message
Set (page 129) has analogy with memory analysis patterns such as
Execution Residue (of partial stack traces without overwrites36) and Stack
Trace (where the error message is a top stack frame37):
78 Event Sequence Order

Event Sequence Order

In any system, this pattern is expected as a precondition to its normal


behavior. Any out-of-order events should raise the suspicion bar as they
might result or lead to synchronization problems. It needs not be a
sequence of trace messages from different threads but also between
processes. For example, image load events in CDF / ETW traces can
indicate the wrong configuration of a service startup order. The following
diagram depicts a possible pattern scenario:
Event Sequence Phase 79

Event Sequence Phase

Sometimes we have several use case instances traced into one log file.
Messages and Activity Regions (page 23) from many Use Case Trails (page
205) intermingle and make analysis difficult especially with the absence of
UCID (Use Case ID), any other identification tags, or Linked Messages
(page 120). However, initially, most of the time we are interested in a
sequence of Significant Events (page 167). After finding Anchor Messages
(page 35), we can use Time Deltas (page 184) to differentiate between
trace statements from different Use Case Trails. Here we assume correct
Event Sequence Order (page 78). We call this pattern Event Sequence
Phase by analogy with wave phases38. All such individual “waves” may
have different “shapes” due to various delays between different stages of
their use case and implementation narratives:
80 Event Sequence Phase

In the picture above, we also identified Dominant Event Sequence


(page 72) for use case instance C.
Exception Stack Trace 81

Exception Stack Trace

Often the analysis of software traces starts with searching for short textual
patterns, like a failure or an exception code or simply the word
“exception”. In addition, some software components can record their own
exceptions or exceptions that were propagated to them including full stack
traces. It is all common in .NET and Java environments. Here is a synthetic
and beautified example based on real software traces:

N PID TID Message


[...]
265799 8984 4216 ComponentA.Store.GetData threw exception:
‘System.Reflection.TargetInvocationException: DCOM connection to server failed
with error: ‘Exception from HRESULT: 0×842D0001′ —>
System.Runtime.InteropServices.COMException (0×842D0001): Exception from
HRESULT: 0×842D0001
at ComponentA.GetData(Byte[] pKey)
at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[]
arguments, SignatureStruct& sig, MethodAttributes methodAttributes,
RuntimeTypeHandle typeOwner)
at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[]
arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle
typeOwner)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
skipVisibilityChecks)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at ComponentB.Connections.ComInterfaceProxy.Invoke(IMessage message)’
265800 8984 4216 === Begin Exception Dump ===
265801 8984 4216 ComponentB.Exceptions.ConnectionException: DCOM connection to
server failed with error: ‘Exception from HRESULT: 0×842D0001′ —>
System.Runtime.InteropServices.COMException (0×842D0001): Exception from
HRESULT: 0×842D0001
265802 8984 4216 at ComponentA.Store.GetData(Byte[] pKey)
[...]
265808 8984 4216 Exception rethrown at [0]:
265809 8984 4216 at
System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg,
IMessage retMsg)
265810 8984 4216 at
System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData,
Int32 type)
265811 8984 4216 at ComponentA.Store.GetData(Byte[] pKey)
265812 8984 4216 at ComponentA.App.EnumBusinessObjects()
[...]
265816 8984 4216 ===> InnerException:
265817 8984 4216 ** COM Exception Error Code: 0×842d0001
265818 8984 4216 System.Runtime.InteropServices.COMException (0×842D0001):
Exception from HRESULT: 0×842D0001
265819 8984 4216 at ComponentA.Store.GetData(Byte[] pKey)
82 Exception Stack Trace

265820 8984 4216 === End Exception Dump ===


[...]

In the embedded stack trace we see that App object was trying to
enumerate business objects and asked Store object to get some data. The
latter object was probably trying to communicate with the real data store
via DCOM. The communication attempt failed with HRESULT.
Factor Group 83

Factor Group

We borrowed the next trace and log analysis pattern name from factor
groups in mathematics (or quotient groups39). Here a group is, of course,
not a mathematical group40, but just a group (or set) of log messages or
trace statements. However, every trace message has variable and invariant
parts (Message Invariant, page 128). Variable parts usually contain some
values, addresses or status bits. They can even be string values. Such
values from a set too and can be partitioned into disjoint (non-overlapping)
subsets. For example, a window foreground status can be either TRUE or
FALSE. In addition, we can group messages into disjoint factor groups each
one having either only true or only false foreground status. The following
trace graph illustrates a WindowHistory6441 log where it was reported that
one window was periodically losing and gaining focus:
84 Factor Group

We found messages related to the reported process window title.


We found another such group of messages for a different process window
by using Density Distribution (page 66) pattern. Then a factor group was
formed with two subgroups and their Relative Density (page 157) was
compared. For correlated alternating values, it was expected to be 1. It
was a very simple case, of course, which was analyzed just by looking at a
textual log, but in more complex cases computer assistance is required. A
member of a factor group can also be generalized as a message subset
with messages having variable part values from some domain subset or
even calculated from it (a predicate): Mi = { m | P(m) }, where the original
group of messages is a disjoint union of such message subsets: M = U Mi.
False Positive Error 85

False Positive Error

We often see such errors in software traces recorded during deviant


software behavior (often called non-working software traces). When we
double check their presence in normally expected software behavior traces
(often called working traces), we find them there too. We already
mentioned similar false positives when we introduced the first software
trace analysis pattern called Periodic Error (page 145). Here is an example
that was taken from the real trace. In a non-working trace we found the
following error in Adjoint Thread (page 30) of Foreground Component
(page 39):

OpenProcess error 5

However, we found the same error in the working trace, continued


looking and found several other errors:

Message request report: last error 1168, ...


[...]
GetMsg result -2146435043

The last one is 8010001D if converted to a hex status, but,


unfortunately, the same errors were present in the working trace too in
the same Activity Regions (page 23).

After that, we started comparing both traces looking for Bifurcation


Point (page 43), and we found the error that was only present in a non-
working trace with significant trace differences after that:

Error reading from the named pipe: 800700E9

My favorite tool (WinDbg) to convert error and status values gave


this description:

0:000> !error 800700E9


Error code: (HRESULT) 0x800700e9 (2147942633) - No process is on the
other end of the pipe.
86 Fiber Bundle

Fiber Bundle

The modern software trace recording, visualization, and analysis tools such
as Process Monitor, Xperf, WPR, and WPA provide stack traces associated
with trace messages. Consider stack traces as software traces we have, in a
more general case, traces (fibers) bundled together on (attached to) a base
software trace. For example, a trace message, that mentions an IRP can
have its I/O stack attached together with the thread stack trace with
function calls leading to a function that emitted the trace message.
Another example is an association of different types of traces with trace
messages such as managed and unmanaged ones. This general trace
analysis pattern needed a name, so we opted for Fiber Bundle as an
analogy with a fiber bundle 42 from mathematics. Here’s a graphical
representation of stack traces recorded for each trace message where one
message also has an associated I/O stack trace:
Fiber Bundle 87

Trace
messages

I/O stack
Thread stack trace
88 Fiber of Activity

Fiber of Activity

When using complex trace and log analysis patterns such as Fourier
Activity (page 91) we may be first interested in selecting all instances of a
particular message type from specific Thread of Activity (page 181) and
then look for Time Deltas (page 184), Discontinuities (page 71), Data Flow
(page 58), and other patterns. We call this analysis pattern Fiber of Activity
by the analogy of fibers43 (lightweight threads) since the individual thread
execution flow is “co-operative” inside, whereas threads themselves are
preempted outside. The following diagram from Fourier Activity analysis
pattern example illustrates the concept by showing three fibers:

This analysis pattern is different from trace-wide Sheaf of Activities


(page 165) where the latter is about selecting messages as Adjoint Threads
of Activity (page 30) which may span several processes and threads.
File Size 89

File Size

Trace and log analysis starts with the assessment of artifact File Size,
especially with multiple logging scenarios in distributed systems. If all log
files are of the same size, we might have either Circular Traces (page 52) or
Truncated Traces (page 201). Both point to wrong trace timing plan44 or
just using default tracing tool configuration.
90 Focus of Tracing

Focus of Tracing

Activity Region pattern (page 23) highlights “mechanical” and syntactical


aspects of trace analysis whereas this pattern brings attention to changing
semantics of trace message flow, for example, in a terminal services
environment, from login messages during session initialization to database
search. Here is a graphical illustration of the pattern where tracing focus
region spans three regions of activity:
Fourier Activity 91

Fourier Activity

Sometimes we have trace and log messages that appear with certain time-
frequency throughout all log or specific Thread of Activity (page 181). Such
frequencies may fluctuate reflecting varying system or process
performance. Analyzing trace areas where such messages have different
Time Deltas (page 184) may point to additional diagnostic log messages
useful for root cause analysis. The following minimal trace graph depicts
the recent log analysis for proprietary file copy operation where the
frequency of internal communication channel entry/exit Opposition
Messages (page 143) was decreasing from time to time. Such periods were
correlating with increased time intervals between “entry” and “exit”
messages. Analysis of messages between them revealed additional
diagnostic statements missing in periods of higher frequency and
corresponding Timeouts (page 185) adding up to overall performance
degradation and slowness of copy operation.
92 Fourier Activity

Additional analysis of Data Association (page 57) in a different


message type between available communication buffers and the total
number of such buffers revealed significant frequency drop during
constant Data Flow (page 58) of zero available communication buffers:

We call this analysis pattern Fourier Activity by analogy with Fourier


45
series in mathematics. This pattern is for individual message types and
can also be considered a fine-grained example of Statement Current (page
178) and Trace Acceleration (page 188) analysis patterns which can be
used to detect areas of different frequencies in individual Fibers (Adjoint
Threads of Activities, page 30, formed from the same Thread of Activity).
Glued Activity 93

Glued Activity

Adjoint Thread (page 30) invariants like PID can be reused giving rise to
curious CDF (ETW) traces where two separate execution entities are glued
together in one trace. For example, in one trace we see AppA and AppB
sharing the same PID:

# Module PID TID Time Message


[...]
242583 ProcMon 5492 9476 11:04:33 LoadImageEvent for ImageName: …\AppA.exe PID: 5492
256222 ProcMon 5492 9476 11:04:50 ProcessDestroyEvent for PPID: 12168 PID: 5492
274887 ProcMon 5492 1288 11:05:18 LoadImageEvent for ImageName: …\AppB.exe PID: 5492
[...]

Other similar examples may include different instances of


components sharing the same name, source code or even, in general,
periodic tracing sessions appended to the end of the same trace file.
Although we think that the latter should be a separate pattern. We named
this pattern Glued Activity by an analogy of different thread strings glued
together (in general, manifolds46 glued along their boundaries). Another
name might be along the line of adjoint thread ID reuse (ATID reuse).
94 Gossip

Gossip

This pattern has a funny name Gossip. We were thinking originally to call it
Duplicated Message but gave it the new name allowing for the possibility
of the semantics of the same message to be distorted in subsequent trace
messages from different Adjoint Threads (page 30). Here is a typical ETW /
CDF trace example (distortion free) of the same message content seen in
different modules (we omitted some columns like Date and Time):

# Module PID TID Message


[...]
26875 ModuleA 2172 5284 LoadImageEvent:
ImageName(\Device\HarddiskVolume2\Windows\System32\notepad.exe)
ProcessId(0x000000000000087C)
26876 ModuleB 2172 5284 LoadImageEvent:
ImageName(\Device\HarddiskVolume2\Windows\System32\notepad.exe),
ProcessId(2172)
26877 ModuleC 2172 5284 ImageLoad: fileName=notepad.exe, pid:
000000000000087C
[...]

In such cases, when constructing Event Sequence Order (page 78)


we recommend choosing messages from the one source instead of mixing
events from different sources, for example:

# Module PID TID Message


[...]
26875 ModuleA 2172 5284 LoadImageEvent:
ImageName(\Device\HarddiskVolume2\Windows\System32\notepad.exe)
ProcessId(0×000000000000087C)
[...]
33132 ModuleA 4180 2130 LoadImageEvent:
ImageName(\Device\HarddiskVolume2\Windows\System32\calc.exe)
ProcessId(0×0000000000001054)
[...]
Guest Component 95

Guest Component

Sometimes, when comparing normal, expected (working) and abnormal


(non-working) traces we can get a clue for further troubleshooting and
debugging by looking at module load events. For example, when we see an
unexpected module loaded in our non-working trace, its function (and
sometimes even module name) can signify some difference to pay
attention to:

# PID TID Time Message


[...]
4492 908 912 11:06:41.953 LoadImageEvent:ImageName(\WINDOWS\system32\3rdPartySso.dll)
[...]

We call this pattern Guest Component, and it is different from


Missing Component (page 134). Although, in the latter pattern, a missing
component in one trace may appear in another, but the component name
is known apriori and expected. In the former pattern, the component is
unexpected. For example, in the trace above, its partial name fragment
“Sso” may trigger a suggestion to relate differences in authentication
where in a non-working case SSO (single sign-on) was configured.
96 Hidden Error

Hidden Error

Sometimes we look at a trace or log and instead of Error Messages (page


75) we only see their “signs” such as a DLL load event for an error or fault
reporting module or a module that is related to symbol files such diasym-
reader.dll. This pattern is called by analogy to Hidden Exception47 in the
memory dump analysis pattern catalog although sometimes we can see
such modules in the memory dump Module Collection48. For example, the
presence of diasymreader module may signify an unreported .NET
exception and suggest a dump collection strategy.
Hidden Facts 97

Hidden Facts

The previous patterns such as Basic Facts (page 42) and Vocabulary Index
(page 208) address the mapping of a problem description to software
execution artifacts such traces and logs. Indirect Facts (page 104) analysis
pattern addresses the problem of an incomplete problem description.
However, we need another pattern for completeness that addresses the
mapping from a log to troubleshooting and debugging recommendations.
We call it Hidden Facts that are uncovered by trace and log analysis. Of
course, there can be many such hidden facts, and usually, they are
uncovered after narrowing down analysis to particular Threads of Activity
(page 181), Adjoint Threads (page 30), Message Context (page 125),
Message Set (page 129), or Data Flow (page 58) patterns. The need for
that pattern had arisen during the pattern-oriented analysis of the trace
case study from Malcolm McCaffery 49 and can be illustrated in the
following diagram:
98 Identification Messages

Identification Messages

Often, we need to identify the source of messages based on problem


object or subsystem description (what question) before we proceed to
answer where question (where in the trace we can find messages related
to the problem). Even when we know where are messages there can be
many sources to select from (if we don’t know the where question we can
use Indirect Message analysis pattern, page 105). To answer what
question we propose Identification Messages analysis pattern. Basic Fact
(page 42) problem description may include properties and behavioral
description of the problem object or subsystem. Based on that we can map
them to the log messages that such an object can produce:
Identification Messages 99

These messages may not be Error Messages (page 75) or some


other type of messages reflecting abnormal behavior. These messages are
only used to identify the software object, module or subsystem.

For example, in one case there were problems with the custom
status bar. However, the window handle for it or its parent wasn’t
specified in the problem report. In the log file, we had a lot of messages
describing GUI behavior of many windows. To find out the status bar we
thought that it should have small height but long width. Indeed, we found
one such child window. In addition, for this window the log file contained
many messages related to frequent window text changes, possibly
reflecting the status bar updates. Having identified the window handle, we
proceeded to the analysis of another log with thousands of window
messages. Because of the known window handle we were able to select
only messages pertaining to our problem status bar.
100 Implementation Discourse

Implementation Discourse

If we look at any non-trivial trace, we see different Implementation


Discourses. Components are written in different languages and adhere to
different runtime environments, binary models, and interface frameworks.
All these implementation variations influence the structure, syntax, and
semantics of trace messages. For example, .NET debugging traces differ
from file system driver or COM debugging messages. For this reason, we
establish the new field of Software Trace Linguistics as a science of
software trace languages. Some parallels can be drawn here towards
software linguistics (the science of software languages) although we came
to that conclusion independently while thinking about applying
“ethnography of speaking” to software trace narration.
Impossible Trace 101

Impossible Trace

Sometimes, we look at a trace and say it is impossible. For example, this


fragment shows that the function foo had been called:

# Module PID TID Message


[...]
1001 ModuleA 202 404 foo: start
1002 ModuleA 202 404 foo: end
[...]

However, if we look at the corresponding source code (PLOT50) we


see that something is missing: the function bar must have been called with
its own set of trace messages we don’t see in the trace:

void foo()
{
TRACE("foo: start");
bar();
TRACE("foo: end");
}

void bar()
{
TRACE("bar: start");
// some code ...
TRACE("bar: end");
}

We suspect the runtime code being modified, perhaps by patching.


In other cases of missing messages, we can also suspect thrown exceptions
or local buffer overflows that led to wrong return address skipping the
code with expected tracing statements. The mismatch between the trace
and the source code we are looking at is also possible if the old source
code did not have bar function called.
102 Incomplete History

Incomplete History

Typical software narrative history consists of requests and responses, for


example, function or object method calls and returns:

# Module PID TID Time File Function Message


[...]
26060 dllA 1604 7108 10:06:21.746 fileA.c foo Calling bar
[...]
26232 dllA 1604 7108 10:06:22.262 fileA.c foo bar returns 0x5
[...]

The code that generates execution history is response-complete if


it traces both requests and responses. For such code (except in cases
where tracing is stopped before a response) the absence of expected
responses could be a sign of blocked threads or quiet exception
processing. The code that generates execution history is exception-
complete if it also traces exception processing. Response-complete and
exception-complete code is called call-complete. If we do not see
response messages for call-complete code, we have Incomplete History.

In general, we can talk about the absence of certain messages in a


trace as a deviation from the standard trace sequence template
corresponding to a use case. The difference there is in a missing request
too.
Indexical Trace 103

Indexical Trace

This pattern describes Inter-Correlation (page 105) pattern variant when


we have a trace that has messages of interest pointing to specific Activity
Regions (page 23) in another trace. The latter trace can be very huge, from
another computer and split into many parts (Split Trace, page 174). This
pattern is very helpful when we need to diagnose the problem in the large
split trace, but we do not know when it happened. Then an index trace
that may have recorded software execution account (for example, in the
case of a broker-like architecture) can point to the right trace fragment
from the split trace.
104 Indirect Facts

Indirect Facts

Sometimes in the case of missing Basic Facts (page 42), we can discern
Indirect Facts from the message text and even from other patterns. For
example, in one incident we were interested in all messages from the
certain process name, but its PID was missing from the problem
description. Fortunately, we were able to get its PID from one of the
individual messages from the completely different source:
Indirect Message 105

Indirect Message

Sometimes we have Basic Facts (page 42) in a problem description but


can’t find messages corresponding to them in a trace or log file, but we are
sure the tracing (logging) was done correctly. This may be because we have
Sparse Trace (page 171), or we are not familiar well with product or
system tracing messages (such as with Implementation Discourse, page
98).
106 Indirect Message

In such a case we for search for Indirect Message of a possible cause:


Indirect Message 107

Having found such a message we may hypothesize that Missing


Message (page 135) should have located nearby (this is based on
semantics of both messages), and we then explore corresponding Message
Context (page 125):
108 Indirect Message

The same analysis strategy is possible for missing causal messages.


Here we search for effect or side effect messages:
Indirect Message 109

Having found them we proceed with further analysis:


110 Inter-Correlation

Inter-Correlation

This pattern is analogous to the previously described Intra-


Correlation pattern (page 112), but it involves several traces from possibly
different trace agents recorded (most commonly) at the same time or
during an overlapping time interval:

Let’s look at a typical example of an application subclassing


windows to add the additional look and feel element to its GUI or inject
hooks into window messaging. Suppose this application also records
important trace points like window parameters before and after
subclassing using ETW technology. When we run the application in
terminal services environment, all windows (including other processes) are
Inter-Correlation 111

shown with an incorrect dimension. We, for this reason, request the
application trace and in addition WindowHistory 51 trace to see
how coordinates of all windows change over time. We easily find some
Basic Facts (page 42) in both traces such as window class name or time,
but it looks like window handle is different. In another set of traces
recorded for comparison, we have same window handle values; class name
is absent from the ETW trace, but a process and thread ID for the same
window handle are different. We, for this reason, do not see a correlation
between these traces and suspect that both traces in two sets
were recorded in different terminal sessions, for example:

ETW trace:

# PID TID Time Message


[...]
46750 5890 6960 10:17:18.825 Subclassing, handle=0×100B8, class=MyWindowClass, [...]
[...]

WindowHistory trace:

Handle: 0001006E Class: “MyWindowClass” Title: “”


Captured at: 10:17:19:637
Process ID: 19e0
Thread ID: 16e4
Parent: 0
Screen position (l,t,r,b): (-2,896,1282,1026)
Client rectangle (l,t,r,b): (0,0,1276,122)
Visible: true
Window placement command: SW_SHOWNORMAL
Foreground: false
HungApp: false
Minimized: false
Maximized: false
[...]
112 Interspace

Interspace

General traces and logs52 may have Message Space (page 131) regions
“surrounded” by the so-called Interspace. Such Interspace regions may
link individual Message Space regions like in this diagram generalizing
WinDbg !process 0 3f command output:
Intra-Correlation 113

Intra-Correlation

Sometimes we see a functional activity and Basic Facts (page 42) in a trace.
Then we might want to find a correlation between that activity and facts in
another part of the trace. If that intra-correlation fits into our problem
description, we may claim a possible explanation or, if we are lucky, we
have just found, an inference to the best explanation, as philosophers of
science like to say. Here is an example, but this time using
WindowHistory tracing tool53 . A third-party application was frequently
losing the focus, and the suspicion was on a terminal services client
process. We found that the following WindowHistory trace fragment
corresponded to that application:

Handle: 00050586 Class: "Application A Class" Title: ""


Title changed at 15:52:4:3 to "Application A"
Title changed at 15:52:10:212 to "Application A - File1"
[...]
Process ID: 89c
Thread ID: d6c
[...]
Visible: true
Window placement command: SW_SHOWNORMAL
Placement changed at 15:54:57:506 to SW_SHOWMINIMIZED
Placement changed at 15:55:2:139 to SW_SHOWNORMAL
Foreground: false
Foreground changed at 15:52:4:3 to true
Foreground changed at 15:53:4:625 to false
Foreground changed at 15:53:42:564 to true
Foreground changed at 15:53:44:498 to false
Foreground changed at 15:53:44:498 to true
Foreground changed at 15:53:44:592 to false
Foreground changed at 15:53:45:887 to true
Foreground changed at 15:53:47:244 to false
Foreground changed at 15:53:47:244 to true
Foreground changed at 15:53:47:353 to false
Foreground changed at 15:54:26:416 to true
Foreground changed at 15:54:27:55 to false
Foreground changed at 15:54:27:55 to true
Foreground changed at 15:54:27:180 to false
Foreground changed at 15:54:28:428 to true
Foreground changed at 15:54:28:771 to false
Foreground changed at 15:54:28:865 to true
Foreground changed at 15:54:29:99 to false
Foreground changed at 15:54:30:877 to true
Foreground changed at 15:54:57:521 to false
Foreground changed at 15:55:2:76 to true
Foreground changed at 15:57:3:378 to false
Foreground changed at 15:57:11:396 to true
Foreground changed at 15:57:29:601 to false
114 Intra-Correlation

Foreground changed at 15:57:39:803 to true


Foreground changed at 15:58:54:41 to false
Foreground changed at 15:59:8:96 to true
Foreground changed at 16:1:19:478 to false
Foreground changed at 16:1:27:527 to true
Foreground changed at 16:1:39:914 to false
Foreground changed at 16:2:0:515 to true
Foreground changed at 16:7:14:628 to false
Foreground changed at 16:7:24:246 to true
Foreground changed at 16:9:53:523 to false
Foreground changed at 16:10:15:919 to true
Foreground changed at 16:10:31:426 to false
Foreground changed at 16:11:12:818 to true
Foreground changed at 16:11:59:538 to false
Foreground changed at 16:12:39:456 to true
Foreground changed at 16:13:6:364 to false

Corresponding terminal services client window trace fragment does


not show any foreground changes, but another application main window
has lots of them:

Handle: 000D0540 Class: "Application B Class" Title: "Application B"


[...]
Process ID: 3ac
Thread ID: bd4
[...]
Foreground: false
Foreground changed at 15:50:36:972 to true
Foreground changed at 15:50:53:732 to false
Foreground changed at 15:50:53:732 to true
Foreground changed at 15:50:53:826 to false
Foreground changed at 15:51:51:352 to true
Foreground changed at 15:51:53:941 to false
Foreground changed at 15:53:8:135 to true
Foreground changed at 15:53:8:182 to false
Foreground changed at 15:53:10:178 to true
Foreground changed at 15:53:13:938 to false
Foreground changed at 15:53:30:443 to true
Foreground changed at 15:53:31:20 to false
Foreground changed at 15:53:31:20 to true
Foreground changed at 15:53:31:129 to false
Foreground changed at 15:53:34:78 to true
Foreground changed at 15:53:34:795 to false
Foreground changed at 15:53:34:795 to true
Foreground changed at 15:53:34:873 to false
Foreground changed at 15:53:36:901 to true
Foreground changed at 15:53:42:502 to false
Foreground changed at 15:53:42:502 to true
Foreground changed at 15:53:42:564 to false
Foreground changed at 15:57:3:425 to true
Foreground changed at 15:57:4:595 to false
Foreground changed at 15:57:10:507 to true
Foreground changed at 15:57:11:318 to false
Foreground changed at 15:57:29:632 to true
Intra-Correlation 115

Foreground changed at 15:57:31:67 to false


Foreground changed at 15:57:32:721 to true
Foreground changed at 15:57:33:844 to false
Foreground changed at 15:58:54:88 to true
Foreground changed at 15:58:56:178 to false
Foreground changed at 15:59:6:505 to true
Foreground changed at 15:59:7:987 to false
Foreground changed at 16:1:19:525 to true
Foreground changed at 16:1:19:961 to false
Foreground changed at 16:1:26:607 to true
Foreground changed at 16:1:27:434 to false
Foreground changed at 16:1:39:914 to true
Foreground changed at 16:1:39:992 to false
Foreground changed at 16:1:49:798 to true
Foreground changed at 16:2:0:437 to false
Foreground changed at 16:7:14:628 to true
Foreground changed at 16:7:14:847 to false
Foreground changed at 16:7:18:76 to true
Foreground changed at 16:7:24:106 to false
Foreground changed at 16:9:58:790 to true
Foreground changed at 16:10:4:16 to false
Foreground changed at 16:10:4:874 to true
Foreground changed at 16:10:4:890 to false
Foreground changed at 16:10:8:634 to true
Foreground changed at 16:10:15:779 to false
Foreground changed at 16:10:56:766 to true
Foreground changed at 16:10:59:402 to false
Foreground changed at 16:10:59:652 to true
Foreground changed at 16:10:59:667 to false
Foreground changed at 16:12:9:397 to true
Foreground changed at 16:12:39:347 to false
Foreground changed at 16:13:18:375 to true
Foreground changed at 16:14:33:656 to false

We can see that most of the time when Application A window loses
focus, Application B window gets it.
116 Last Activity

Last Activity

Sometimes we need to analyze the last activity before Significant Event


(page 167) or Discontinuity (page 71). By this pattern, we mean a loose
semantic collection of messages before process exit, for example. It may
give some clues to further troubleshooting. In one incident, a process was
suddenly exiting. Its own detailed trace did not have any messages
explaining that probably due to insufficient tracing coverage (Sparse Trace,
page 171). Fortunately, a different external trace (from Process Monitor)
was collected (Inter-Correlation, page 105) and it had LDAP network
communication messages just before the thread and process exit events.
Layered Periodization 117

Layered Periodization

We borrowed this pattern name from historiography. This periodization54


of software trace messages includes individual messages, then
aggregated messages from threads, then processes as wholes, and finally,
individual computers (in a client-server or similar sense). We can better
illustrate this graphically.

Message layer:
118 Layered Periodization

Thread layer (different colors correspond to different TID):


Layered Periodization 119

Process layer (different colors correspond to different PID):

Please note that it is also possible to have a periodization based on


modules, functions, and individual messages, but it may be complicated
because different threads can enter the same module or function. Here
other patterns are more appropriate like Activity Region (page 23),
Characteristic Message Block (page 48), and Background and Foreground
Components (page 39).
120 Linked Messages

Linked Messages

Sometimes we have Linked Messages through some common parameter


or attribute. We can find one such example in ETW traces related to kernel
process creation notifications. Here we got Adjoint Thread (page 30) for
module PIDNotify:

# Module PID TID Time Message


[...]
128762 PIDNotify 1260 6208 15:53:15.691 Create: ParentID 0x000004EC PID 0×000018D4
[...]
128785 PIDNotify 6356 6388 15:53:15.693 Load: ImageName
\Device\HarddiskVolume1\Windows\System32\abscript.exe PID 0×000018D4
[...]
131137 PIDNotify 6356 4568 15:53:15.936 Create: ParentID 0×000018D4 PID 0×00001888
[...]
131239 PIDNotify 6280 6376 15:53:15.958 Load: ImageName
\Device\HarddiskVolume1\Windows\System32\wscript.exe PID 0×00001888
[...]
132899 PIDNotify 6356 5704 15:53:16.462 Create: ParentID 0×000018D4 PID 0×00001FD0
[...]
132906 PIDNotify 8144 7900 15:53:16.464 Load: ImageName
\Device\HarddiskVolume1\Windows\System32\cmd.exe PID 0×00001FD0
[...]

We see that messages 128762 and 128785 are linked by PID


parameter and linked to messages 131137 and 132899 by PID - ParentID
parameter relationship. Similar linkages exist for messages 131137 /
131239 and 132899 / 132906.
Macrofunction 121

Macrofunction

Macrofunction is a single semantic unit of several trace messages where


individual messages serve the role of microfunctions. We borrowed this
idea and distinction from functionalist linguistics. An example would be a
software trace fragment where messages log an attempt to update a
database:

# Module PID TID Time Message


[...]
42582 DBClient 5492 9476 11:04:33.398 Opening connection
[...]
42585 DBClient 5492 9476 11:04:33.398 Sending SQL command
[...]
42589 DBServer 6480 10288 11:04:33.399 Executing SQL command
[...]
42592 DBClient 5492 9476 11:04:33.400 Closing connection
[...]

These Macrofunctions need not be from the same ATID (Glued


Activity, page 93) in the traditional sense like in the example above unless
we form Adjoint Threads (page 30) from certain fragments like “DB”.
122 Marked Message

Marked Message

Based on ideas of Roman Jakobson55 about “marked” and “unmarked”


categories we propose this pattern that groups trace messages based on
having some feature or property. For example, marked messages may
point to some domain of software activity such related to functional
requirements and, for this reason, may help in troubleshooting and
debugging. Unmarked messages include all other messages that don’t say
anything about such activities (although they may include messages
pointing to such activities indirectly we unaware of) or messages that say
explicitly that no such activity has occurred. We can even borrow a
notation of distinctive features56 from phonology57 and annotate any trace
or log after analysis to compare it with Master Trace (page 123), for
example, compose the following list of software trace distinctive features:

session database queries [+]


session initialization [-]
socket activity [+]
process A launched [+]
process B launched [-]
process A exited [-]
[...]

Here [+] means the activity is present in the trace and [-] means
the activity is either undetected or definitely not present. Sometimes a
non-present activity can be a marked activity corresponding to all-inclusive
unmarked present activity (see, for example, No Activity pattern, page
140).
Master Trace 123

Master Trace

When reading and analyzing software traces and logs we always compare
them to Master Trace. Other names for this pattern borrowed from
narrative theory include Metatrace, Canonical Trace or Archetype. When
we look at the software trace from a system we either know the correct
sequence of Activity Regions (page 23), expect certain Background and
Foreground Components (page 39), Event Sequence Order (page 78), or
mentally construct a model based on our experience and Implementation
Discourse (page 98). For the latter example, software engineers internalize
software master narratives when they construct code and write tracing
code for supportability. For the former example, it is important to have a
repository of traces corresponding to Master Traces. Such a repository
helps in finding deviations after Bifurcation Point (page 43). Consider such
comparisons similar to regression testing when we check the computation
output against the expected prerecorded sequence.
124 Message Change

Message Change

Sometimes, when we find Anchor Message (page 35) related to our


problem description (for example, a COM port error) we are interested in
its evolution throughout a software narrative:

# PID TID Message


[...]
126303 5768 1272 OpenComPort returns Status = 0x0
[...]
231610 3464 1576 OpenComPort returns Status = 0x0
[...]
336535 5768 4292 OpenComPort returns Status = 0×0
[...]
423508 5252 2544 OpenComPort returns Status = 0xc000000f
[...]
531247 5768 5492 OpenComPort returns Status = 0xc000000f
[...]
639039 772 3404 OpenComPort returns Status = 0xc000000f
[...]

Then we can check activity between changes.


Message Context 125

Message Context

In some cases, it is useful to consider a message context: a set of


surrounding messages having some relation to the chosen message:
126 Message Cover

Message Cover

One of the powerful trace analysis techniques is using Adjoint Threads of


Activity (page 30) to filter various linear message activities (as a
generalization of Thread of Activity, page 181). Such filtered activities can
then be analyzed either separately (Sheaf of Activities, page 165) or
together such as a new pattern we introduce here. If we identify parallel
ATIDs (ATID is Adjoint TID58) and see that one covers the other, we can
then make a hypothesis that they are Intra-Correlated (page 112). Here is
a graphical example of Periodic Message Block (page 148) largely
composed from various Error Messages (page 75) that covers periodic
Discontinuities (page 71) from another ATID (we can also consider the
latter as Periodic Message Blocks consisted from Silent Messages, page
168):
Message Interleave 127

This pattern is analogous to a cover59 in topology.

Message Interleave

We factored out Anchor Messages (page 35) example of Message


Interleave into this pattern. It covers superposition of different Anchor
Messages, for example, process launch and exit, or DLL load and unload:
128 Message Invariant

Message Invariant

Most of the time software trace messages coming from the same source
code fragment (PLOT 60 ) contain invariant parts such as function and
variable names, descriptions, and mutable parts such as pointer values and
error codes. Message Invariant is a pattern useful for comparative analysis
of several trace files where we are interested in message differences. For
example, in one troubleshooting scenario, certain objects were not created
correctly for one user. We suspected a different object version was linked
to a user profile. We recorded separate application debug traces for each
user, and we could see the version 0×4 for a problem user and 0×5 for all
other normal users:

# Module PID TID Message


[...]
2782 ModuleA 2124 5648 CreateObject: pObject 0×00A83D30 data ([...]) version 0×4
[...]

# Module PID TID Message


[...]
4793 ModuleA 2376 8480 CreateObject: pObject 0×00BA4E20 data ([...]) version 0×5
[...]
Message Pattern 129

Message Pattern

Now we come to the trace and log analysis pattern that we call Message
Pattern. It is an ordered set of messages from Thread of Activity (page
181) or Adjoint Thread of Activity (page 30) having Message Invariants
(page 128) that can be used for matching another ordered set of messages
in another (Inter-Correlation, page 110) or the same trace or log (Intra-
Correlation, page 113). A typical Message Pattern from one of our own
trace and log analysis sessions is depicted in the following diagram:
130 Message Set

Message Set

Often, especially for large software logs, we need to select messages based
on some criteria be it a set of Error Messages (page 75), a set of messages
containing Basic Facts (page 42), or some other predicate. Then we can use
selected messages from that message set as Anchor Messages (page 35) or
reverse Pivot Messages (page 151) as an aid in further analysis.
Message Space 131

Message Space

The message stream can be considered as a union of Message Spaces. A


message space is an ordered set of messages preserving the structure of
the overall trace. Such messages may be selected based on memory space
they came from or can be selected by some other general attribute, or a
combination of attributes and facts. The differences from Message Set
(page 129) is that Message Space is usually much larger (with large scale
structure) with various Message Sets extracted from it later for fine-
grained analysis. This pattern also fits nicely with Adjoint Spaces (page 24).
Here’s an example of kernel and managed spaces in the same CDF / ETW
trace from Windows platform where we see that kernel space messages
came not only from System process but also from other process contexts:

In the context of general traces and logs61 such as debugger logs,


separate Message Space regions may be linked (or “surrounded”) by
Interspace (page 112).
132 Meta Trace

Meta Trace

So far, we have been discussing trace analysis patterns related to the


execution of a particular software version. However, software code
changes and also its tracing and logging output: from large-scale changes
where components are replaced to small scale code refactoring affecting
message structure and format. On a software narratological level, this
corresponds to a narrative about a software trace or log, its evolution.
Such an analogysis pattern is different from Master Trace (page 123)
pattern where the latter is similar to what Metanarrative62 is usually meant
in narratology: a master or grand idea, an expected trace if all functional
requirements were correctly identified and implemented during software
construction, and non-functional ones met during software execution.
Milestones 133

Milestones

Trace messages may correspond to specific implementation code such as


recording the status of an operation, dumping data values, printing errors,
or they may correspond to higher levels of software design and
architecture, and even to use case stories. We call such messages
Milestones by analogy with project management63. Alternative names can
be Chapter Messages, Summary Messages, Checkpoints, or Use Case
Messages. These are different from Macrofunctions (page 121) which are
collections of messages grouped by some higher function. Milestone
messages are specifically designed distinct trace statements:

They can also be a part of Significant Events (page 167), serve the
role of Anchor Messages (page 35), and be a part of Basic Facts (page 42)
and Vocabulary Index (page 208).
134 Missing Component

Missing Component

When we do not see expected trace statements we wonder whether the


component was not loaded, its container ceased to exist or simply it was
not selected for tracing. In many support cases, there is a trade-off
between tracing everything and the size of trace files. Customers and
engineers usually prefer smaller files to analyze. However, in the case of
predictable and reproducible issues with short duration, we can always
select all components or deselect a few (instead of selecting a few).

In Discontinuity (page 71) pattern, the possibility of a sudden and


silent gap in trace statements could happen because not all necessary
components were selected for tracing.

Sometimes, in cases when the missing component was selected for


tracing but we do not see any trace output from it, other module traces
can give us an indication, perhaps showing the load failure message. For
example, Process Monitor tracing done in parallel can reveal load failures.
Missing Data 135

Missing Data

Some tracing architectures, especially the ones that intercept API calls by
filtering or hooking, may log synchronous requests by remembering to
write done return result in the same trace message later on when the
response is available after the wait. If such data is still not available in the
log or trace, it may point to some blocked request for which another
software execution artifact analysis (such as memory dump analysis) is
necessary. In some cases, the analysis of the corresponding Fiber Bundle
(page 86) stack trace may point to Blocking Module64 or the involvement
of file system filters (Stack Trace65). This analysis pattern that we call
Missing Data is illustrated in the following diagram:
136 Missing Message

Missing Message

Sometimes the absence of messages, for example, errors and exceptions,


may save time during troubleshooting and debugging by pointing to what
was not happening and provide additional insight. For example, in the
picture below we see the same exceptions in the new and old incidents.
However, in the old incident we see another exception that was linked to
one unavailable server in distributed broker architecture. For this reason,
we can assume provisionally that all servers were operational when the
new incident happened.

Missing Message pattern is different from Missing Component


(page 134) pattern where the latter may point to the component that was
not loaded or executed, or simply that it was not selected for tracing.
Motif 137

Motif

Often, when we look at software trace fragments, we recognize certain


motifs such as client-server interaction, publisher-subscriber notifications,
database queries, plugin sequence initialization, and others. The idea of
this pattern name comes from motives66 in mathematics. It is different
from Master Trace (page 123) which corresponds to a normal use-case or
working software scenario and may contain several Motifs as it usually
happens in complex software environments. On the other side of the
spectrum, there are software narremes (basic narrative units 67 ) and
Macrofunctions (single semantic units, page 121). Motifs help to further
bridge the great divide between software construction and software
diagnostics with software narremes corresponding to implementation
patterns, Macrofunctions to design patterns, and Motifs to architectural
patterns although an overlap between these categories is possible.
138 News Value

News Value

News Value is a pattern that assigns relative importance to software traces


for problem-solving purposes, especially when it is related to problem
description, recent incidents, and timestamps of other supporting artifacts
(memory dumps, other traces). For example, in one scenario, an ETW trace
was provided with three additional log files:

# Source PID TID Date Time Message


0 Header 1260 1728 12/14/2010 06:48:56.289 ?????
[...]
215301 Unknown 640 808 12/14/2010 07:22:57.508 ????? Unknown( 16): GUID=[...]
(No Format Information found).

// LogA
05/11/10 18:28:15.1562 : Service() - entry
[...]
14/12/10 10:31:58.0381 : Notification: sleep
* Start of new log *
14/12/10 10:34:38.4687 : Service() - entry
[...]
14/12/10 11:53:35.2729 : Service.CleanUp complete
* Start of new log *
14/12/10 11:56:11.7031 : Service() - entry
[...]
14/12/10 15:25:23.3004 : Notification: sleep

// LogB
[ 1] 12/14 10:34:29:890 Entry: ctor
[...]
[ 2] 12/14 11:53:30:866 Exit: COMServer.Server.DeleteObject

// LogC
[ 1] 12/14 11:56:03:359 Entry: ctor
[...]
[20] 12/14 15:30:20:110 Exit: Kernel32.Buffer.Release

From the description of the problem, we expected LogB and LogC to


be logs from two subsequent process executions where the first launch
fails (LogB) and the second launch succeeds (LogC). Looking at their start
and end times we see that they make sense from the problem description
perspective, but we have to dismiss ETW trace and most of LogA as
recorded earlier and having no value for Inter-Correlation (page 105)
News Value 139

analysis of the more recent logs. We also see that portions of LogA overlap
with LogB and LogC and, for this reason, have analysis value for us.
140 No Activity

No Activity

No Activity is the limit of Discontinuity pattern (page 71). The absence of


activity can be seen at a thread level or at a process level where it is similar
to Missing Component pattern (page 134). The difference from the latter
pattern is that we know for certain that we selected our process modules
for tracing but do not see any trace messages. Consider this example:

# Source PID TID Time Function Message


1 TraceSettings 1480 8692 08:04:20.682 **** Start Trace Session
[... TraceSettings messages 2-11 show that we selected AppA for tracing ...]
12 ModuleB 3124 4816 08:04:37.049 WorkerThread Worker thread running
13 TraceSettings 1480 8692 08:04:41.966 **** Trace Session was stopped

Only modules from AppA process and modules from a coupled


process (for example, ModuleB) were selected. However, we only see a
reminder message from the coupled process (3124.4816:
ModuleB!WorkerThread) and no messages for 21 seconds. Fortunately,
AppA process memory dump was saved during the tracing session:

Debug session time: Fri May 21 08:04:31.000 2010 (GMT+0)

We see two threads waiting for a critical section:

0:000> ~*kL

14 Id: 640.8b8 Suspend: 1 Teb: 7ffa7000 Unfrozen


ChildEBP RetAddr
0248f8c0 7c827d29 ntdll!KiFastSystemCallRet
0248f8c4 7c83d266 ntdll!ZwWaitForSingleObject+0xc
0248f900 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0x1a3
0248f920 0040dea8 ntdll!RtlEnterCriticalSection+0xa8
[...]
0248f9a4 77ce78aa rpcrt4!Invoke+0×30
0248f9c0 77ce7a94 rpcrt4!NdrCallServerManager+0×17
0248fcb8 77ce7b7c rpcrt4!NdrStubCall+0×1d6
0248fcd0 77c7ff7a rpcrt4!NdrServerCall+0×15
0248fd04 77c8042d rpcrt4!DispatchToStubInCNoAvrf+0×38
0248fd58 77c80353 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×11f
0248fd7c 77c7e0d4 rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3
0248fdbc 77c7e080 rpcrt4!RPC_INTERFACE::DispatchToStubWithObject+0xc0
0248fdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0×41e
0248fe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0248ff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0248ff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
0248ffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0×9d
No Activity 141

15 Id: 640.18c0 Suspend: 1 Teb: 7ffdb000 Unfrozen


ChildEBP RetAddr
01b8ff40 7c827d29 ntdll!KiFastSystemCallRet
01b8ff44 7c83d266 ntdll!ZwWaitForSingleObject+0xc
01b8ff80 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0x1a3
01b8ffa0 0040dba7 ntdll!RtlEnterCriticalSection+0xa8
[...]
01b8ffec 00000000 kernel32!BaseThreadStart+0×34

Unfortunately, it belongs to a missing thread and blocked threads


wait forever:

0:000> !cs -l -o -s
-----------------------------------------
DebugInfo = 0x01facdd0
Critical section = 0x01da19c0 (+0x1DA19C0)
LOCKED
LockCount = 0×2
WaiterWoken = No
OwningThread = 0×00001384
RecursionCount = 0×1
LockSemaphore = 0×578
SpinCount = 0×00000000
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled

0:000> ~~[1384]
^ Illegal thread error in ‘~~[1384]’

Apparently, AppA process was hanging, and it explains why we do


not see any activity in the trace.
142 No Trace Metafile

No Trace Metafile

This pattern is similar to No Component Symbols 68 memory analysis


pattern:

# Module PID TID Time Message


21372 \src\dllA 2968 5476 3:55:10.004 Calling foo()
21373 Unknown 2968 5476 3:55:10.004 ????? Unknown( 27): GUID=1EF56EBD-A7FC-
4892-8DBA-00AD813F8A24 (No Format Information found).
21374 Unknown 2968 5476 3:55:10.004 ????? Unknown( 27): GUID=1EF56EBD-A7FC-
4892-8DBA-00AD813F8A24 (No Format Information found).
21375 Unknown 2968 5476 3:55:10.004 ????? Unknown( 27): GUID=1EF56EBD-A7FC-
4892-8DBA-00AD813F8A24 (No Format Information found).
21376 Unknown 2968 5476 3:55:10.004 ????? Unknown( 28): GUID=1EF56EBD-A7FC-
4892-8DBA-00AD813F8A24 (No Format Information found).
21377 Unknown 2968 5476 3:55:10.004 ????? Unknown( 23): GUID=1EF56EBD-A7FC-
4892-8DBA-00AD813F8A24 (No Format Information found).
21378 \src\dllA 2968 5476 3:55:10.004 Calling bar()

In some cases when we don’t have TMF files (Trace Meta Files) it is
possible to detect broad behavioral patterns such as:

 Circular Trace (page 52)


 Statement Density and Current (page 178)
 Discontinuity (page 71)
 Time Delta (page 184)
 Trace Acceleration (page 188)

By looking at Thread of Activity (page 181), we can sometimes also


infer the possible component name based on surrounding trace messages
with present TMF files, especially when we have source code access. For
example, in the trace shown above, it can be dllA or any other module that
foo function calls.
Opposition Messages 143

Opposition Messages

We borrowed this pattern name from the binary opposition69 originated in


Saussure’s structuralism 70 ). It covers the following pairs of messages
usually found in software traces and logs such as:

 open / close
 create / destroy
 allocate / free (deallocate)
 call / return
 enter / exit (leave)
 load / unload
 save / load
 lock / unlock
 map / unmap

The absence of an opposite may point to some problems such as


synchronization and leaks, or Incomplete History (page 102, for example,
wait chains). There can always be a possibility that a second term is
missing due to Sparse Trace (page 171), but this is a poor implementation
choice that leads to confusion during troubleshooting and debugging.
144 Original Message

Original Message

This pattern deals with software trace messages where a certain activity is
repeated several times, but the only the first message occurrence, or
specific message vocabulary has significance for analysis activity. Typical
example from CDF / ETW tracing is module load events:

# Module PID TID Time Message


[...]
35835 ModuleA 11000 11640 17:27:28.720 LoadImageEvent:
\Device\HarddiskVolume2\Windows\System32\userinit.exe PId 5208
[...]
37684 ModuleA 12332 9576 17:27:29.063 LoadImageEvent:
\Windows\System32\userinit.exe PId 573C
[...]
37687 ModuleA 12332 9576 17:27:29.064 LoadImageEvent:
\Windows\System32\userinit.exe PId 573C
[...]

What we are looking here is Message Invariant (page 128) like


“.exe”, but we are interested in the occurrence of specific path structures
like \Device\HarddiskVolume because, in our troubleshooting context, they
signify process launch sequence during terminal session initialization.
Palimpsest Messages 145

Palimpsest Messages

Palimpsest Messages are messages where some part or all of their content
was erased or overwritten.

The name of this pattern comes from palimpsest 71 manuscript


scrolls. Such messages may be a part of malnarratives72 or result from
Circular Tracing (page 52) or trace buffer corruption. Sometimes, not all
146 Palimpsest Messages

relevant data is erased and by using Intra- (page 112) and Inter-
Correlation (page 110), and via the analysis of Message Invariants (page
128), it is possible to recover the original data. Also, as in Recovered
Messages (page 157) pattern, it may be possible to use Message Context
(page 125) to infer some partial content.
Periodic Error 147

Periodic Error

Periodic Error is the obvious and to some extent the trivial pattern. It is an
error or status value that is observed periodically many times:

No PID TID Date Time Message


[...]
664957 1788 22504 4/23/2009 17:59:14.600 MyClass::Initialize: Cannot open connection
“Client ID: 310″, status=5
[...]
668834 1788 19868 4/23/2009 19:11:52.979 MyClass::Initialize: Cannot open connection
“Client ID: 612″, status=5
[...]

or

No PID TID Date Time Message


[...]
202314 1788 19128 4/21/2009 16:03:46.861 HandleDataLevel: Error 12005 Getting Mask
[...]
347653 1788 17812 4/22/2009 13:26:00.735 HandleDataLevel: Error 12005 Getting Mask
[...]

Here single trace entries can be isolated from the trace and studied
in detail. We should be aware though that some modules might report
Periodic Errors that are false positive, in a sense, that they are expected as
a part of implementation details, for example, when a function returns an
error to indicate that the bigger buffer is required or to estimate its size for
a subsequent call.
148 Periodic Message Block

Periodic Message Block

This pattern is similar to Periodic Error (page 145) but not limited to errors
or failure reports. One such example we recently encountered is when
some adjoint activity (such as messages from specific PID, Adjoint Thread,
page 30) stops appearing after the middle of the trace, and after that there
are repeated blocks of similar messages (Message Invariant, page 128)
from different PIDs with their threads checking some condition (for
example, waiting for an event and reporting timeouts):
Piecewise Activity 149

Piecewise Activity

Activity Regions (page 23) or blocks of messages having the same TID or
PID usually follow each other in a typical complex software trace. Such
following can be completely random and independent, or it may be linear
based on IPC or some inter-thread communication mechanism. For
example, after filtering out Background Components (page 39) we may
find that an RPC client call setup is followed by messages from an RPC
server:
150 Piecewise Activity

Using coordinate approach with the message number and PID axes
we can reformat this minimal trace diagram:

We borrowed the name for this pattern from the concept of a


piecewise linear function73 in mathematics (and piecewise continuity). In
some problem software behavior scenarios where we encountered such
analysis pattern, it was complemented by Discontinuity (page 71) pattern.
For example, an RPC call may be blocked, and we do not see client
messages after that break until the end of the trace. In such cases, we
always recommend forcing a complete memory dump to check for wait
chain memory analysis patterns74.
Pivot Message 151

Pivot Message

Suppose we form Adjoint Thread (page 30) based on some message or


operation type or some other attribute:
152 Pivot Message

However, we do not know where to start to look backward for any


anomalies relevant to our problem:
Pivot Message 153

We go back to our full trace and find a problem message:


154 Pivot Message

Although it is not in our Adjoint Thread we formed previously, it


was still considered as Pivot Message helping us to go backward there:
Punctuated Activity 155

Punctuated Activity

Sometimes we have a uniform stream of messages that belong to some


Activity Region (page 23), Thread of Activity (page 181), or Adjoint Thread
of Activity (page 30). We can use micro-Discontinuities (page 71) to
structure that message stream, especially if the semantics of trace
messages is not yet fully clear for us. This may also help us to recognize
Visitor Trace (page 207). Originally we wanted to call this pattern Micro
Delays, but, after recognizing that such delays only make sense for one
activity (since there can be too many of them in the overall log), we named
this pattern Punctuated Activity. Usually, such delays are small compare to
Timeouts (page 185) and belong to Silent Messages (page 168).
156 Quotient Trace

Quotient Trace

In Adjoint Message (page 25) analysis pattern description, we mentioned


compressing message sequences having the same message attribute into
one message. Considering the trace as “topological” space and message
attribute as “equivalence” relation we introduce Quotient Trace analysis
pattern by analogy with quotient space 75 in topology. By endowing
message sequences having the same attribute with some “metric” such as
cardinality of Message Set (page 130) we can also visually distinguish
resulted quotient messages if they have the same attribute but from
different sequences at different times. All this is illustrated in the following
diagram:
Recovered Messages 157

Recovered Messages

If we analyze ETW-based traces such as CDF, we may frequently encounter


No Trace Metafile (page 142) pattern especially after product updates and
fixes. This complicates pattern analysis because we may not be able to see
Significant Events (page 167), Anchor Messages (page 35), and Error
Messages (page 75). In some cases, we can recover messages by
comparing Message Context (page 125) for unknown messages. If we have
source code access, this may also help. Both approaches are illustrated in
the following diagram:

The same approach may also be applied to a different kind of trace


artifacts when some messages are corrupt. In such cases, it is possible to
recover diagnostic evidence and, therefore, we call this pattern Recovered
Messages.
158 Relative Density

Relative Density

This pattern describes anomalies in semantically related pairs of trace


messages, for example, “data arrival” and “data display”. Their Statement
Densities (page 178) can be put in a ratio (also called specific gravity76) and
compared between working and non-working scenarios. Because the total
numbers of trace messages cancel each other, we have just the mutual
ratio of two message types. In our hypothetical “data” example, the
increased ratio of “data arrival” to “data display” messages accounts for
reported visual data loss and sluggish GUI.
Resume Activity 159

Resume Activity

If Break-in Activity (page 47) is usually unrelated to a thread or an adjoint


thread that has a discontinuity then Resume Activity pattern highlights
messages from that thread:
160 Resume Activity

We can see the difference in the following graphical representation


of the two traces where in a working trace a break-in preceded resume
activity, but in a non-working trace both patterns were absent:
Ruptured Trace 161

Ruptured Trace

Recently we analyzed a few logs which ended with a specialized Activity


Region (page 23) from a subsystem that sets operational parameters. The
problem description stated that the system became unresponsive after
changing parameters in a certain sequence. Usually, for that system, when
we stop logging (even after setting parameters) we end up with messages
from some Background Components (page 39) since some time passes
between the end of setting parameters activity and the time the operator
sends stop logging request:
162 Ruptured Trace

However, in the problem case we see message flow stops right in


the middle of parameter setting activity:

So we advised to check for any crashes or hangs, and, indeed, it was


found that the system was actually experiencing system crashes, and we
got memory dumps for analysis where we found Top Module77 from a 3rd-
party vendor related to parameter setting activity.

Please also note an analogy here between normal thread stack


traces from threads that are waiting for most of the time and Spiking
Thread78 stack trace caught up in the middle of some function.

We call this pattern Ruptured Trace after a ruptured computation79.

Note, that if it is possible to restart the system and resume the


same tracing, we may get an instance of Blackout (page 45) analysis
pattern.
Sequence Repeat Anomaly 163

Sequence Repeat Anomaly

Sometimes we have Periodic Message Blocks (page 148) of a few adjacent


messages, for example, when flags are translated into separate messages
per bit. Then we may have a pattern of Sequence Repeat Anomaly when
one of several message blocks have missing or added messages compared
to more numerous number of expected identical message blocks. Then
Missing Message (page 135) Message Context (page 125) may be explored
further. The following diagram illustrates the pattern:

The name of the pattern comes from the notion of repeated DNA
sequences80.
164 Shared Point

Shared Point

Sometimes we know from Basic Facts (page 42) some data or activity we
seek to identify in different traces collected together to perform Inter-
Correlational analysis (page 105). It can be a shared file name, a named
synchronization object, a locked file with sharing violations, a common
virtual address in kernel space, or just some activity notification. We call
this pattern by analogy with intersecting curves in some abstract space.

It is similar to Linked Messages (page 120) pattern but is more high


level and not confined to a common parameter (can be an action
description).
Sheaf of Activities 165

Sheaf of Activities

Inter-Correlation (page 105) analysis between a normal and problem


logs to find Bifurcation Point (page 43, and a possible root cause) becomes
a difficult task when both traces come from separate environments with
different Background Components (page 39). Here a new analysis pattern
with a name borrowed from sheaves81 from mathematics can help. This
pattern is also a tool for tracking properties of trace message subsets. First,
we find out important message types around some Activity Region (page
23) where we hope to find a difference between two traces:
166 Sheaf of Activities

Then we create several Adjoint Threads (page 30) from different


message types, for example, based on the operation type or function
name:

Then we analyze subtraces separately to find out a bifurcation point


in each of them, and then use this knowledge to find out
differences between the original full traces.
Significant Event 167

Significant Event

When looking at software traces and doing either a search for or just
scrolling, certain messages grab attention immediately. We call them
Significant Events. It could be a recorded exception (Exception Stack
Trace, page 81) or an error, Basic Fact (page 42), a trace message from
Vocabulary Index (page 208), or just any trace statement that marks the
start of some activity we want to explore in depth, for example, a certain
DLL is attached to the process, a coupled process is started, or a function is
called. The start of a trace and the end of it are trivial Significant Events
and are used in deciding whether the trace is Circular (page 52), in
determining the trace recording interval or its average Statement Current
(page 178).
168 Silent Messages

Silent Messages

We mostly analyze real messages in software traces and logs. In such


message streams, we may easily see detectable Discontinuity (page 71)
patterns. However, in some cases, it is beneficial to analyze the absence of
messages. Message stream is not uniform; there may be different
Statement Currents (page 178). If time resolution is 1 ms, for example,
then we may have current N msg/ms, or in the case of lesser current, such
as 0.5 msg/ms, we have the so-called Silent Messages (----):

[...]
11 ms: message
12 ms: ----
13 ms: message
14 ms: ----
15 ms: message
16 ms: message
17 ms: ----
18 ms: ----
19 ms: message
[...]

So, by a silent message we understand the possible message that


would occupy the minimal time resolution gap. If we look at the following
illustration, we see that the whole pattern analysis apparatus can be
applied to the analysis of the distribution of silent messages.
Silent Messages 169

This pattern is different from Discontinuity pattern because the


latter is about large unexpected silences and is different from Sparse Trace
(page 171) which is about missing trace statements from source code.
170 Singleton Event

Singleton Event

There are events that by design or system configuration should be seen in


a log only once or not seen at all if code responsible for them was executed
before tracing session. For example, the launch of certain services during
system initialization should not be seen again when we trace system
activity long after that. It can also be just messages from singleton82
objects in the application log. The appearance of extra Singleton Events
may point to design violations or some abnormal system events such as
process restart. The latter may Intra-Correlate (page 112) with the start of
the fault handling process such as WerFault.exe in Windows Process
Monitor logs (Guest Component, page 95).
Small DA+TA 171

Small DA+TA

Recently we performed the diagnostic analysis of a software incident


where certain functionality was not available to users and provided the
report based on analysis patterns such as Focus of Tracing (page 90) and
Opposition Messages (page 143). We also conjectured some hypotheses
explaining the observed abnormal behavior. However, in the end, the
problem was solved not by the analysis of a lengthy software execution log
but by looking at the small configuration INI file where not working
functionality was simply disabled in one line:

EnableFunctionality = 0

Even before that analysis, we were thinking about the importance


of Small DA+TA such as configuration files and registry details that can be
considered as general software traces83. Here DA+TA means Dump Artifact
+ Trace Artifact and Big DA+TA refers to software execution memory dump
artifacts and trace artifacts that can be really huge. The analysis pattern is
illustrated in the following diagram where we see no difference between
working and non-working scenarios due to insufficient trace coverage
(Sparse Trace, page 173):
172 Small DA+TA
Sparse Trace 173

Sparse Trace

Sometimes we do not see anything in the trace or see very little because
trace statements did not cover particular source code fragment (see also
PLOTs84):

Sparse Trace pattern is different from Missing Component (page


134) pattern where some modules are not included for logging explicitly
although there is logging code there or Visibility Limit (page 206) pattern
where tracing is intrinsically impossible. Often technical support and
escalation engineers request to add more trace statements, and software
engineers extend tracing coverage iteratively as needed.
174 Split Trace

Split Trace

Some tracing tools such as CDFControl85 have the option to split software
traces and logs into several files during long recording. Although this
should be done judiciously, it is necessary sometimes. What to do if we get
several trace files and we want to use some other analysis tool such as
CDFAnalyzer86? If we know that the problem happened just before the
tracing was stopped, we can look at the last few such files from the file
sequence (although we recommend Circular Trace here, page 52).
Otherwise, we can convert them into CVS files and import into Excel, which
also supports adjoint threading87.
State and Event 175

State and Event

For the event- or message-driven architectures it is important to


differentiate between event and state messages (including state
transition). For example, a system may be doing some work while being in
some particular state with much tracing and respond to various external
events with each of them having a corresponding trace message. Upon
such an event, the system transitions to some other state with its own set
of possible trace messages. We call such a pattern State and Event. Typical
example here is a windowing terminal services system and
WM_ENDSESSION event illustrated in the following abstract trace diagram
with a corresponding state transition diagram below it:
176 State and Event
State Dump 177

State Dump

Introduced in Debugging TV88 Frames episode 0×32 about Android / Java


debugging this pattern solves the problem of program state analysis when
memory dump generation is not available or does not help or complicated
as in the case of interpreted code. A developer identifies a set of state
variables and periodically prints their values to the output logging stream.
Such output may also include but not limited to Counter Values (page 54).
178 Statement Density and Current

Statement Density and Current

Sometimes we have several disjoint Periodic Errors (page 145) and


possible False Positives (page 85). We wonder where we should start or
assign relative priorities for troubleshooting suggestions. Here Statement
Density and Current pattern can help. The statement (message) density is
simply the ratio of the number of occurrences of the specific trace
statement (message) in the trace to the total number of all different
recorded messages.

Consider this software trace with two frequent messages:

N PID TID
21 5928 8092 LookupAccountSid failed. Result = -2146238462
[...]
1013 5928 1340 SQL execution needs a retry. Result = 0

We have approx. 7,500 statements for the former and approx.


1,250 statements for the latter. The total number of trace statements is
185,700, so we have the corresponding approx. trace densities: 0.04 and
0.0067. Their relative ratio 7,500 / 1,250 is 6.

We collected another trace of the same problem at the different


time with the same errors. It has 71,100 statements and only 160 and 27
statements counted for messages above. We have a ratio 160 / 27 approx.
the same, 5.93 that suggests messages are correlated. However statement
density is much lower, 0,002 and 0.00038 approx., and this suggests the
closer look at the second trace to see whether these problems started at
some time later after the start of the recording.

We can also check Statement Current as the number of statements


(messages) per unit of time. We recorded the first trace over the period of
195 seconds and the second over the period of 650 seconds. For this
reason, we have 952 msg/s and 109 msg/s respectively. It suggests that the
problem might have started at some time during the second trace, or there
were more modules selected for the first trace. To make sure, we adjust
the total number of messages for these two traces. We find the first
occurrence of the error and subtract its message number from the total
Statement Density and Current 179

number of messages. For our first trace, we see that messages start from
the very beginning, and in our second trace they also almost start from the
beginning. So such adjustment should not give much better results here.
Also, these statements continue to be recorded till the very end of these
traces.

To avoid being lost in this discussion I repeat main results:

Density Relative Density Current,


all msg/s
Trace 1 0.04 / 0.0067 6 952
Trace 2 0.002 / 0.00038 5.93 109

The possibility that much more was traced that resulted in lower
density for the second trace should be discarded because we have much
lower current. Perhaps the environment was not quite the same for the
second tracing. However, the same relative density for two different errors
suggests that they are correlated, and the higher density of the first error
suggests that we should start our investigation from it.

The reason we came up with this statistical trace analysis pattern is


because two different engineers analyzed the same trace, and both were
suggesting different troubleshooting paths based on selected Error
Messages (page 75) from software traces. So we did a statistical analysis to
prioritize their suggestions.
180 Surveyor

Surveyor

Sometimes, the presence of some messages in a trace or log shows that


some other tracing or logging tool was running or that some process was
also doing tracing. We call this analysis pattern Surveyor. Such discovered
tracing may not be related to the trace we are looking at (compare to
Trace Extension, page 191) but may help with finding additional traces in
the system as illustrated in the following diagram:
Thread of Activity 181

Thread of Activity

When we have software traces that record process identifiers (PID) and
thread identifiers (TID) it is important to differentiate between trace
statements sorted by time and Thread of Activity. The latter is simply the
flow of trace messages sorted by TID, and it is very helpful in cases with
dense traces coming from hundreds of processes and components. Here is
89
an example from MessageHistory bulk trace fragment showing different
Threads of Activity in different font styles:

Start time: 21:5:36:651


Format time: 21:5:43:133
Number of messages sent: 24736
Number of messages posted: 905

[...]
21:5:41:990 S PID: a7c TID: 554 HWND: 0×0000000000010E62 Class:
“ToolbarWindow32″ Title: “” WM_USER+4b (0×44b) wParam: 0×14 lParam: 0×749e300
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E4A Class: “CtrlNotifySink”
Title: “” WM_NOTIFY (0×4e) wParam: 0×0 lParam: 0×749efa8
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+3f (0×43f) wParam: 0×14 lParam: 0×749e1e0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+4b (0×44b) wParam: 0×14 lParam: 0×749e300
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+19 (0×419) wParam: 0×14 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+61 (0×461) wParam: 0×6 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62 Class: “ToolbarWindow32″
Title: “” WM_USER+56 (0×456) wParam: 0×0 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E4A Class: “CtrlNotifySink”
Title: “” WM_NOTIFY (0×4e) wParam: 0×0 lParam: 0×749f290
21:5:41:990 S PID: a7c TID: 554 HWND: 0×000E04A8 Class: “CtrlNotifySink”
Title: “” WM_NCPAINT (0×85) wParam: 0xffffffffcc043bdb lParam: 0×0
21:5:41:990 P PID: a7c TID: 554 HWND: 0×000E04A8 Class: “CtrlNotifySink”
Title: “” WM_PAINT (0xf) wParam: 0×0 lParam: 0×0
21:5:42:007 S PID: 1a8 TID: 660 HWND: 0×0001003C Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_WINDOWPOSCHANGING (0×46) wParam: 0×0
lParam: 0×29af030
21:5:42:007 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×6 lParam: 0×0
21:5:42:007 P PID: 1a8 TID: 660 HWND: 0×0001003C Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×8 lParam: 0×0
21:5:42:007 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×9 lParam: 0×0
21:5:42:022 P PID: a7c TID: a28 HWND: 0×0001061A Class: “WPDShServiceObject”
182 Thread of Activity

Title: “WPDShServiceObject_WND” WM_TIMER (0×113) wParam: 0xd lParam: 0×0


21:5:42:022 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×8 lParam: 0×0
21:5:42:022 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_PAINT (0xf) wParam: 0×0 lParam: 0×0
21:5:42:036 P PID: 1a8 TID: 660 HWND: 0×0001003C Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×5 lParam: 0×0
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C Class: “ReBarWindow32″ Title:
“” WM_USER+10 (0×410) wParam: 0×2 lParam: 0×0
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C Class: “ReBarWindow32″ Title:
“” WM_USER+18 (0×418) wParam: 0×2 lParam: 0×1041a
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C Class: “ReBarWindow32″ Title:
“” WM_USER+1a (0×41a) wParam: 0×0 lParam: 0×1041c
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C Class: “ReBarWindow32″ Title:
“” WM_USER+19 (0×419) wParam: 0×0 lParam: 0×0
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_WINDOWPOSCHANGING (0×46) wParam: 0×0
lParam: 0×2bef960
21:5:42:054 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×10 lParam: 0×0
21:5:42:054 P PID: a7c TID: 9b4 HWND: 0×00010084 Class: “CiceroUIWndFrame”
Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×5 lParam: 0×0
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E32 Class: “DirectUIHWND” Title:
“” WM_NCHITTEST (0×84) wParam: 0×0 lParam: 0×640406
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E30 Class: “DUIViewWndClassName”
Title: “” WM_NCHITTEST (0×84) wParam: 0×0 lParam: 0×640406
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E32 Class: “DirectUIHWND” Title:
“” WM_SETCURSOR (0×20) wParam: 0×10e32 lParam: 0×2000001
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E30 Class: “DUIViewWndClassName”
Title: “” WM_SETCURSOR (0×20) wParam: 0×10e32 lParam: 0×2000001
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E20 Class: “ShellTabWindowClass”
Title: “Release” WM_SETCURSOR
[...]

Usually, when we see an error indication, we select its current


thread of activity and investigate what happened in this process and
thread before. Here is a synthesized example from real CDF traces:

No PID TID Time Message


[...]
165797 4280 5696 07:07:23.709 FreeToken Handle 00000000
165798 4660 7948 07:07:23.709 EnumProcesses failed. Error=-2144534527
165799 7984 6216 07:07:23.749 GetData threw exception
165800 7984 6216 07:07:23.750 === Begin Exception Dump ===
[...]
Thread of Activity 183

We sort by TID 7948 to see what happened before the error and get
additional information like the server name:

No PID TID Time Message


[...]
165223 4660 7948 07:07:23.704 GetServerName: Exit. ServerName = SERVER02
165224 4660 7948 07:07:23.704 GetServerProcesses: ServerName is SERVER02
165798 4660 7948 07:07:23.709 EnumProcesses failed. Error=-2144534527
[...]
184 Time Delta

Time Delta

Time Delta is a time interval between Significant Events (page 167) or any
messages of interest, in general. For example:

# Module PID TID Time File Function Message


1 10:06:18.994 (Start)
[...]
6060 dllA 1604 7108 10:06:21.746 fileA.c DllMain DLL_PROCESS_ATTACH
[...]
24480 dllA 1604 7108 10:06:32.262 fileA.c Exec Path: C:\Program
Files\CompanyA\appB.exe
[...]
24550 dllB 1604 9588 10:06:32.362 fileB.c PostMsg Event Q
[...]
28230 10:07:05.170 (End)

Such Time Deltas are useful in examining delays. In the trace


fragment above we are interested in dllA activity from its load until it
launches appB.exe. We see that the time delta was only 10 seconds. The
message #24550 was the last message from the process ID 1604, and after
that, we did not “hear” from that PID for more than 30 seconds until the
tracing was stopped.
Timeout 185

Timeout

Some Discontinuities (page 71) may be Periodic (page 148) as Silent


Messages (page 168). If such Discontinuities belong to the same Thread of
Activity (page 181) and their Time Deltas (page 184) are constant we may
see Timeout pattern. When Timeouts are followed by Error Message
(page 75), we can identify them by Back Tracing (page 38). Timeouts are
different from Blackouts (page 45) where the latter are usually Singleton
Events (page 170) and have large Time Deltas.

Here is a generalized graphical case study. An error message was


identified based on incident Basic Facts (page 42):
186 Timeout

We filtered the trace for the error message TID and found three
Timeouts 30 minutes each:
Timeout 187
188 Trace Acceleration

Trace Acceleration

Sometimes we have a sequence of Activity Regions (page 23) with


increasing values of Statement Current (page 178), like depicted here:

The boundaries of regions may be blurry and arbitrarily drawn.


Nevertheless, Statement Current is visibly increasing or decreasing, hence
the name of this pattern by analogy with physical acceleration, a second-
order derivative. We can also metaphorically use here the notion of a
partial derivative for trace Statement Current and Acceleration for
Threads of Activity (page 181) and Adjoint Threads of Activity (page 30),
but whether it is useful remains to be seen.
Trace Dimension 189

Trace Dimension

We would like to introduce Trace Dimension pattern to address the


emerging complexity of logs from distributed environments. By a
distributed environment we mean not only a collection of multiple
computers (for example, client-server) but also terminal services
environments with several different user sessions on one computer (OS)
and even multiple user processes (IPC) in some cases. If some task can be
performed on one machine or session or inside one process then splitting
it across several computers, sessions, or processes usually results in logs
with added Distributed Infrastructure Messages (DIM) such as from proxies
and channels:
190 Trace Dimension

So, one of the trace simplification strategies is to request the


reproduction and its tracing in a simplified environment (such as inside one
terminal services session) to eliminate DIMs. In one case, we analyzed a
trace for a clipboard paste problem in Windows terminal services
environment. After a clipboard copy, different data was pasted into
different applications. The same behavior was observed for application
processes running inside different sessions and processes running within
one session. However, the log was collected for the more complex multiple
session scenario with many False Positive Errors (page 85) which
completely disappeared from one session scenario log.

DIM abbreviation played a role in naming this pattern. Additionally,


if sessions can be considered a second dimension, then separate VMs can
be considered as a third dimension, separate clouds as a 4th dimension,
and so on.
Trace Extension 191

Trace Extension

Trace Extension is an obvious log analysis pattern that is about trace


messages that refer to some other trace or log that may or may not exist.
Sometimes, there can be instructions to enable additional tracing that is
not possible to cover by the current trace source. We have seen this in
some trace statements from .NET Exception Stack Traces (page 81).
192 Trace Frames

Trace Frames

Narrative theory distinguishes between frame types such as (Fludernik,


McHale, Nelles, Wolf):

 Introductory framing (missing end frame) [—————————-


 Terminal framing (missing opening frame) —————————-]
 [—————————-]
 Interpolated framing [—-[ ]—-[ ]——–]

At the level of the software trace or Adjoint Thread (page 30) as a


whole, the first three types correspond to various types of the pattern
Trace Partition (page 196) where certain parts are missing such as Head,
Prologue, Core, Epilogue, or Tail. The first two types can also be instances
of Truncated Trace pattern (page 201). Interpolated framing can be an
instance of multiple Discontinuities (page 71). All 4 types also correspond
to Foreground Component messages (page 39) and in general we have
multiple Trace Frames as depicted:
Trace Frames 193
194 Trace Mask

Trace Mask

Trace Mask is a superposition of two (or many) different traces. This is


different from Inter-Correlation (page 110) pattern where we may only
search for certain messages without the synthesis of a new log. The most
useful Trace Mask is when we have different time scales (or significantly
different Statement Currents, page 178). Then we impose an
additional structure on one of the traces:
Trace Mask 195

We got the idea from Narrative Masks discussed in Miroslav


Drozda’s book “Narativní masky ruské prózy” (”Narrative Masks in Russian
Prose”).

The very simple example of Trace Mask is shown in Debugging TV90


Episode 0×15.
196 Trace Partition

Trace Partition

Here we introduce a software narratological partitioning of a trace into


Head, Prologue, Core, Epilogue and Tail segments. It is useful for
comparative software trace analysis. Suppose, a trace started just before
the reproduction steps, or a start marker was injected (by CDFMarker91, for
example) and finished just after the last repro steps or after an end marker
was injected. Then its core trace messages are surrounded by prolog and
epilog statements. What is before and after are not necessary for analysis
and usually distract an analyst. They are shown as gray areas on the
following picture where the left trace is for working (non-working)
scenario, and the right trace is for non-working (working) scenario:
Trace Partition 197

The size of a core segment need not be the same because


environments and executed code paths might be different. However, often
some traces are truncated. Also, sometimes it is difficult to establish
whether the first trace is normal, and the second has a tail or the first one
is truncated and the second one is normal with an optional tail. Here
artificial markers are important.
198 Trace Viewpoints

Trace Viewpoints

Reading Boris Uspensky92’s book “A Poetics of Composition: The Structure


of the Artistic Text and Typology of a Compositional Form” (in its original
Russian version) led me to borrow the concept of viewpoints. The resulting
analysis pattern is called Trace Viewpoints. These viewpoints are,
“subjective” (semantically laden from the perspective of a trace and log
reader), and can be (not limited to):

 Error viewpoints (see also False Positive Error, page 85, Periodic
Error, page 145, and Error Distribution, page 74)
 Use case (functional) viewpoints (see also Use Case Trail, page
205)
 Architectural (design) viewpoints (see also Milestones, page 133)
 Implementation viewpoints (see also Implementation Discourse,
page 98, Macrofunctions, page 121, and Focus of Tracing, page
90)
 Non-functional viewpoints (see also Counter Value, page 54, and
Diegetic Messages, page 70)
 Signal / noise viewpoints (see also Background and Foreground
Components, page 39)
Trace Viewpoints 199

In comparison, Activity Regions (page 23), Data Flow (page 58),


Thread of Activity (page 181), and Adjoint Thread of Activity (page 30) are
“objective” (structural, syntactical) viewpoints.
200 Traces of Individuality

Traces of Individuality

If Implementation Discourse (page 98) focuses on objective technology-


specific discourse, then this pattern focuses on subjective elements in a
software log and its messages. Here we mean some specific naming or
logging conventions either from an individual engineer habit or some
corporate coding standard. As an example of it, consider a trace message
from a catch statement:

"Surprise, surprise, should have never been caught."


Translated Message 201

Translated Message

Sometimes we have messages that report about the error but do not give
exact details. For example, “Communication error. The problem on the
server side” or “Access denied error”. This may be the case of Translated
Messages. Such messages are plain language descriptions or
reinterpretations of flags, error and status codes contained in another log
message. These descriptions may be coming from system API, for example,
FormatMessage from Windows API, or may be from the custom
formatting code. Since the code translating the message is in close
proximity to the original message both messages usually follow each other
with zero or very small Time Delta (page 184), come from the same
component, file, function, and belong to the same Thread of Activity (page
181):
202 Translated Message

This pattern is different from Gossip (page 94) because the latter
messages come from different modules, and, although they reflect some
underlying event, they are independent of each.
Truncated Trace 203

Truncated Trace

Sometimes a software trace is truncated when the trace session was


stopped prematurely, often when a problem did not manifest itself
visually. We can diagnose such traces by their short time duration, missing
Anchor Messages (page 35) or Missing Components (page 134) necessary
for analysis. My favorite example is user session initialization in a terminal
services environment when problem effects are visible only after the
session is fully initialized and an application is launched, but a truncated
trace only shows the launch of winlogon.exe despite the presence of a
process creation trace provider or other components that record the
process launch sequence93. The trace itself lasts only a few seconds after
that.
204 UI Message

UI Message

This pattern is very useful for troubleshooting system-wide issues because


we can map visual behavior to various Activity Regions (page 23) and
consider such messages as Significant Events (page 167).

# Module PID TID Time Message


[...]
2782 ModuleA 2124 5648 10:58:03.356 CreateWindow: Title "..." Class "..."
[...]
3512 ModuleA 2124 5648 10:58:08.154 Menu command: Save Data
[...]
3583 ModuleA 2124 5648 10:58:08.155 CreateWindow: Title "Save As" Class "Dialog"
[... Data update and replication-related messages ...]
4483 ModuleA 2124 5648 10:58:12.342 DestroyWindow: Title "Save As" Class "Dialog"
[...]

By filtering the emitting module we can create Adjoint Thread (page 30):

# Module PID TID Time Message


[...]
2782 ModuleA 2124 5648 10:58:03.356 CreateWindow: Title "..." Class "..."
3512 ModuleA 2124 5648 10:58:08.154 Menu command: Save Data
3583 ModuleA 2124 5648 10:58:08.155 CreateWindow: Title "Save As" Class "Dialog"
4483 ModuleA 2124 5648 10:58:12.342 DestroyWindow: Title "Save As" Class "Dialog"
[...]
Use Case Trail 205

Use Case Trail

Use cases94 are implemented in various components such as subsystems,


processes, modules, and source code files. Most of the time, with good
logging implementation, we can see Use Case Trails: log messages
corresponding to use case scenarios. For simple systems, one log may fully
correspond to just one use case, but for complex systems, especially
distributed client-server ones, there may be several use case instances
present simultaneously in one log. One way to disentangle them in the
absence of UCID (Use Case ID) or some other grouping tag is to use Event
Sequence Phase (page 79).

Master Traces (page 123) may also correspond to use cases, but
they should ideally correspond to only one use case instance.
206 Visibility Limit

Visibility Limit

Often it is not possible to trace from the very beginning of the software
execution. Obviously, internal application tracing cannot trace anything
before that application start and its early initialization. The same is for
system-wide tracing that cannot trace before the logging subsystem or
service starts. For this reason, each log has its visibility limit in addition to
possible Truncation (page 201) or Missing Components (page 134):

One of the solutions would be to use different logging tools and


Inter-Correlation (page 105) to glue activities, for example, Process
Monitor and CDFControl for terminal services environments.
Visitor Trace 207

Visitor Trace

Some traces and logs may have Periodic Message Blocks (page 148) with
very similar message structure and content (mostly Message Invariants,
page 128). The only significant difference between them is some unique
data. We call such pattern Visitor Trace by analogy with Visitor design
pattern95 where tracing code “visits” each object data or data part to log its
content or status.
208 Vocabulary Index

Vocabulary Index

What will you do when confronted with one million trace messages
recorded between 10:44:15 and 10:46:55 with an average trace Statement
Current (page 178) of 7,000 msg/s from dozens of modules and having a
one sentence problem description? One solution is to try to search for a
specific vocabulary relevant to the problem description. For example, if a
problem is intermittent re-authentication then we might try to search for a
word “password” or a similar one drawn from a troubleshooting domain
vocabulary. So it is useful to have Vocabulary Index to search for. In our
trace example, the search for “password” jumps straight to small Activity
Region (page 23) of authorization modules starting from the message
number #180,010 and the last “password” occurrence is in the message
#180,490 that narrows initial analysis region to just 500 messages. Note
the similarity here between a book and its index and a trace as a software
narrative and its vocabulary index.
Watch Thread 209

Watch Thread

When we do tracing and logging much of computational activity is not


visible. For live tracing and debugging this can be alleviated by adding
Watch Threads. These are selected memory locations that may or may not
be formatted according to specific data structures and are inspected at
each main trace message occurrence or after specific intervals or events:
210 Watch Thread

This analysis pattern is different from State Dump (page 177) which
is about intrinsic tracing where the developer of logging statements
already incorporated variable watch in the source code. Watch Threads
are completely independent of original tracing and may be added
independently. Counter Value (page 54) is the simplest example of Watch
Thread if done externally because the former usually doesn’t require
source code and often means some OS or Module Variable96 independent
of product internals. Watch Thread is also similar to Data Flow (page 58)
pattern where specific data we are interested in is a part of every trace
message.
211

Index of Patterns

A B

Abnormal Value, 17 Back Trace, 41, 211

Activity Disruption, 3, 18 Background and Foreground


Components, 4, 42, 85, 95, 135,
Activity Divergence, 21 139, 169, 184, 188, 220, 227

Activity Overlap, 22 Basic Facts, 4, 46, 52, 65, 80, 109,


111, 117, 118, 125, 127, 148, 153,
Activity Region, 3, 18, 19, 25, 61, 71, 187, 191, 211
85, 89, 95, 101, 116, 135, 139,
169, 175, 184, 189, 215, 228, 233, Bifurcation Point, 47, 48, 70, 95,
238 139, 188

Activity Theatre, 3, 26, 27 Blackout, 4, 49, 50, 52, 186, 211

Adjoint Message, 3, 27, 28, 176 Break-in Activity, 51, 181

Adjoint Space, 30, 150 C


Adjoint Thread, 33, 34 Calibrating Trace, 4, 52

Adjoint Thread of Activity, 3, 18, 19, CDF, 25, 33, 48, 57, 79, 84, 88, 105,
21, 27, 33, 38, 50, 51, 61, 62, 64, 106, 150, 164, 208
69, 84, 85, 95, 100, 104, 105, 106,
110, 136, 137, 142, 146, 168, 171,
Characteristic Message Block, 53,
174, 175, 189, 216, 220, 228, 233
56, 61, 71, 135

Anchor Messages, 3, 38, 40, 41, 89,


Circular Trace, 52, 57, 100, 162, 165,
140, 143, 148, 152, 178, 232
191, 200

Constant Value, 61
212

Corrupt Message, 59 E
Counter Value, 4, 60, 203, 227, 240
Empty Trace, 82

Coupled Activities, 4, 62
Error Distribution, 5, 83, 227

Coupled Processes, 62
Error Message, 5, 18, 41, 62, 83, 84,
85, 86, 108, 111, 142, 148, 178,
D 205, 211

Data Association, 4, 63, 68, 103 Error Powerset, 5, 85

Data Flow, 4, 41, 59, 64, 99, 103, Error Thread, 41, 86
110, 228, 240
ETW, 25, 33, 38, 42, 60, 88, 105,
Data Interval, 4, 65 106, 124, 125, 136, 150, 158, 164,
178
Data Reversal, 4, 66, 68
Event Sequence Order, 5, 68, 88, 89,
Data Selector, 4, 69, 70 106, 139

Declarative Trace, 4, 70 Event Sequence Phase, 89, 234

Defamiliarizing Effect, 71 Exception Stack Trace, 42, 85, 86,


91, 191, 219
Density Distribution, 74, 94
Execution Residue, 86
Dialogue, 75
F
Diegetic Messages, 5, 78, 227
Factor Group, 93
Discontinuities, 58, 99, 175, 211
False Positive Error, 5, 18, 84, 95,
Discontinuity, 4, 5, 19, 50, 51, 58, 204, 217, 227
62, 79, 131, 142, 153, 160, 162,
170, 192, 193, 211, 220 Fiber Bundle, 5, 97, 154

Dominant Event Sequence, 80, 90 Fiber of Activity, 6, 99, 104


213

File Size, 57, 100 Indirect Facts, 109, 117

Focus of Tracing, 6, 61, 85, 101, 195, Indirect Message, 6, 111, 118, 119
227
Inter-Correlation, 7, 27, 52, 58, 63,
Fourier Activity, 6, 99, 102, 104 69, 74, 116, 124, 131, 146, 159,
166, 187, 188, 222, 236
G
Interspace, 7, 126, 150

Global Monotonicity, 61
Intra-Correlation, 7, 27, 58, 62, 74,
124, 127, 142, 146, 194
Glued Activity, 105, 137

Gossip, 64, 106 L

Guest Component, 107, 194 Last Activity, 131

Layered Periodization, 132


H
Linked Messages, 7, 69, 89, 136, 188
Hidden Error, 108

Hidden Exception, 108 M

Hidden Facts, 109 Macrofunction, 137, 152, 157, 227

Marked Message, 138


I
Master Trace, 138, 139, 151, 157,
Identification Messages, 6, 69, 110,
235
111

Message Change, 63, 140


Implementation Discourse, 6, 113,
118, 139, 227, 229
Message Context, 7, 110, 121, 141,
166, 178, 186
Impossible Trace, 114

Message Cover, 142


Incomplete History, 77, 115, 163

Message Interleave, 143


Indexical Trace, 116
214

Message Invariant, 7, 17, 59, 68, 93, O


145, 146, 164, 166, 168, 236

Opposition Messages, 8, 102, 163,


Message Pattern, 7, 26, 146
195

Message Set, 7, 22, 41, 69, 70, 85,


Original Message, 164
86, 110, 148, 149, 150, 176

Message Space, 7, 126, 149, 150 P

Meta Trace, 150 Palimpsest Messages, 8, 165

Milestones, 8, 152, 227 Paratext, 32

Missing Component, 82, 107, 153, Periodic Error, 8, 18, 19, 79, 95, 167,
156, 160, 198, 232, 235 168, 204, 227

Missing Data, 8, 154 Periodic Message Block, 19, 142,


168, 186, 211, 236
Missing Message, 8, 121, 156, 186
Piecewise Activity, 169
Module Collection, 108
Pivot Message, 148, 171, 174
Module Variable, 60, 240
Punctuated Activity, 9, 175
Motif, 26, 85, 157
Q
N
Quotient Trace, 9, 176
News Value, 158
R
No Activity, 8, 19, 82, 138, 160
Recovered Messages, 9, 166, 178,
No Component Symbols, 162 179

No Trace Metafile, 8, 162, 178 Relative Density, 94, 180, 205

Resume Activity, 181


215

Ruptured Trace, 9, 184, 185 Surveyor, 10, 206

S T

Sequence Repeat Anomaly, 9, 186 Thread of Activity, 10, 21, 33, 34,
50, 58, 69, 79, 85, 86, 99, 102,
Shared Point, 187 104, 110, 142, 146, 162, 175, 207,
211, 216, 228, 230
Sheaf of Activities, 9, 63, 99, 142,
188 Time Delta, 89, 99, 102, 162, 210,
211
Significant Event, 17, 22, 61, 89,
131, 152, 178, 191, 210, 233 Timeout, 102, 175, 211, 212

Silent Messages, 9, 142, 175, 192, Top Module, 185


211
Trace Acceleration, 10, 104, 162,
Singleton Event, 194, 211 215

Small DA+TA, 9, 195, 196 Trace Dimension, 10, 216

Sparse Trace, 9, 70, 118, 131, 163, Trace Extension, 10, 206, 219
193, 196, 198
Trace Frames, 220
Spiking Thread, 185
Trace Mask, 10, 66, 222, 223
Split Trace, 49, 116, 200
Trace Partition, 220, 224
Stack Trace, 86, 154
Trace Viewpoint, 10, 85, 227
State and Event, 201
Traces of Individuality, 229
State Dump, 10, 203, 240
Translated Message, 10, 230
Statement Density and Current, 25,
47, 74, 104, 162, 180, 191, 192, Truncated Trace, 52, 82, 100, 220,
204, 215, 222, 237 232, 235

Step Dumps, 30
216

U Visitor Trace, 11, 175, 236

Vocabulary Index, 38, 109, 153, 191,


UI Message, 233
237, 238

Use Case Trail, 11, 22, 26, 85, 89,


227, 234 W

V Watch Thread, 11, 239, 240

Visibility Limit, 50, 198, 235


217

Bibliography

Accelerated Windows Software Trace Analysis: Training Course Transcript (ISBN: 978-1-
908043-42-9)

Advanced Trace and Log Analysis (ISBN: 978-1-908043-77-1)

Introduction to Pattern-Driven Software Problem Solving (ISBN: 978-1-908043-17-7)

Malware Narratives: An Introduction (ISBN: 978-1-908043-48-1)

Memory Dump Analysis Anthology, Volume 3 (ISBN: 978-1-906717-43-8)

Memory Dump Analysis Anthology, Volume 4 (ISBN: 978-1-906717-86-5)

Memory Dump Analysis Anthology, Volume 5 (ISBN: 978-1-906717-96-4)

Memory Dump Analysis Anthology, Volume 6 (ISBN: 978-1-908043-19-1)

Memory Dump Analysis Anthology, Volume 7 (ISBN: 978-1-908043-51-1)

Memory Dump Analysis Anthology, Volume 8a (ISBN: 978-1-908043-53-5)

Memory Dump Analysis Anthology, Volume 8b (ISBN: 978-1-908043-54-2)

Memory Dump Analysis Anthology, Volume 9a (ISBN: 978-1-908043-35-1)

Memory Dump Analysis Anthology, Volume 9b (ISBN: 978-1-908043-36-8)

Mobile Software Diagnostics: An Introduction (ISBN: 978-1-908043-65-8)

Pattern-Based Software Diagnostics: An Introduction (ISBN: 978-1-908043-49-8)

Pattern-Driven Software Diagnostics: An Introduction (ISBN: 978-1-908043-38-2)

Pattern-Oriented Network Forensics: A Patten Language Approach (978-1-908043-78-8)

Pattern-Oriented Network Trace Analysis (ISBN: 978-1-908043-58-0)

Software Narratology: An Introduction to the Applied Science of Software Stories (ISBN: 978-
1-908043-07-8)
218

Software Trace and Memory Dump Analysis: Patterns, Tools, Processes and Best Practices
(ISBN: 978-1-908043-23-8)

Systemic Software Diagnostics: An Introduction (ISBN: 978-1908043-39-9)

Theoretical Software Diagnostics: Collected Articles (ISBN: 978-1-908043-98-6)


219

Notes

1
http://www.patterndiagnostics.com/accelerated-windows-software-trace-analysis-book

2
http://www.DumpAnalysis.org

3
http://www.DumpAnalysis.org/blog

4
http://www.patterndiagnostics.com/malware-narratives-materials

5
http://en.wikipedia.org/wiki/Divergence

6
Event Tracing for Windows http://msdn.microsoft.com/en-
us/library/windows/desktop/aa363668(v=vs.85).aspx

7
Citrix Diagnostic Facility

8
Memory Dump Analysis Anthology, Volume 9a, 149

9
Ibid., Volume 7, page 173

10
Ibid., Volume 7, page 225

11
http://www.debuggingexperts.com/adjoint-thread

12
Memory Dump Analysis Anthology, Volume 1, page 503

13
Looks like biology keeps giving insights into software, there is even a software phenotype
metaphor (http://turingmachine.org/~dmg/papers/dmg2009_iwsc_siblings.pdf) although a
bit restricted to code, and we also need an Extended Software Phenotype.
http://en.wikipedia.org/wiki/The_Extended_Phenotype

14
Event Tracing for Windows http://msdn.microsoft.com/en-
us/library/windows/desktop/aa363668(v=vs.85).aspx

15
Citrix Diagnostic Facility

16
http://support.citrix.com/article/ctx122741

17
http://en.wikipedia.org/wiki/Adjoint

18
Memory Dump Analysis Anthology, Volume 5, page 279

19
Ibid., Volume 4, page 241
220

20
Ibid., Volume 3, page 342

21
http://en.wikipedia.org/wiki/Catastrophe_theory

22
Memory Dump Analysis Anthology, Volume 4, page 329

23
Ibid., Volume 7, page 98

24
Ibid., Volume 1, page 419

25
http://en.wikipedia.org/wiki/Partial_derivative

26
http://en.wikipedia.org/wiki/Interval_(mathematics)

27
http://www.plpfilmmakers.com/motionless-journeys

28
Memory Dump Analysis Anthology, Volume 3, page 342

29
http://en.wikipedia.org/wiki/Diegesis

30
Memory Dump Analysis Anthology, Volume 2, page 387

31
http://www.debuggingexperts.com/memory-dump-trace-analysis-unified-pattern-
approach

32
Memory Dump Analysis Anthology, Volume 2, page 387

33
http://www.patterndiagnostics.com/accelerated-software-trace-analysis

34
http://support.citrix.com/article/CTX122741

35
http://en.wikipedia.org/wiki/Power_set

36
Memory Dump Analysis Anthology, Volume 2, page 239

37
Ibid., Volume 1, page 395

38
http://en.wikipedia.org/wiki/Phase_(waves)

39
http://en.wikipedia.org/wiki/Quotient_group

40
http://en.wikipedia.org/wiki/Group_(mathematics)

41
http://support.citrix.com/article/CTX109235

42
http://en.wikipedia.org/wiki/Fiber_bundle
221

43
https://en.wikipedia.org/wiki/Fiber_(computer_science)

44
Memory Dump Analysis Anthology, Volume 7, page 437

45
https://en.wikipedia.org/wiki/Fourier_series

46
http://en.wikipedia.org/wiki/Manifold

47
Memory Dump Analysis Anthology, Volume 1, page 271

48
Ibid., Volume 7, page 162

49
http://chentiangemalc.wordpress.com/2014/06/24/case-of-the-outlook-cannot-display-
this-view/

50
Memory Dump Analysis Anthology, Volume 5, page 272

51
http://support.citrix.com/article/CTX106985

52
Special and General Trace and Log Analysis, Memory Dump Analysis Anthology, Volume 8b,
page 119

53
http://www.dumpanalysis.org/blog/index.php/2007/02/15/windowhistory-40/

54
http://en.wikipedia.org/wiki/Periodization

55
http://en.wikipedia.org/wiki/Roman_Jakobson

56
http://en.wikipedia.org/wiki/Distinctive_features

57
http://en.wikipedia.org/wiki/Phonology

58
Memory Dump Analysis Anthology, Volume 5, page 279

59
http://en.wikipedia.org/wiki/Cover_(topology)

60
Memory Dump Analysis Anthology, Volume 5, page 272

61
Special and General Trace and Log Analysis, Memory Dump Analysis Anthology, Volume 8b,
page 119

62
http://en.wikipedia.org/wiki/Metanarrative

63
http://en.wikipedia.org/wiki/Milestone_(project_management)

64
Memory Dump Analysis Anthology, Volume 6, page 54
222

65
Ibid., Volume 8a, page 48

66
https://en.wikipedia.org/wiki/Motive_(algebraic_geometry)

67
Memory Dump Analysis Anthology, Volume 7, page 386

68
Ibid., Volume 1, page 298

69
http://en.wikipedia.org/wiki/Binary_opposition

70
http://en.wikipedia.org/wiki/Ferdinand_de_Saussure

71
http://en.wikipedia.org/wiki/Palimpsest

72
Memory Dump Analysis Anthology, Volume 8a, page 121

73
http://en.wikipedia.org/wiki/Piecewise_linear_function

74
http://www.dumpanalysis.org/blog/index.php/2009/02/17/wait-chain-patterns/

75
https://en.wikipedia.org/wiki/Quotient_space_(topology)

76
http://en.wikipedia.org/wiki/Relative_density

77
Memory Dump Analysis Anthology, Volume 6, page 62

78
Ibid., Volume 1, page 305

79
Ibid., Volume 4, page 279

80
http://en.wikipedia.org/wiki/Repeated_sequence_(DNA)

81
http://en.wikipedia.org/wiki/Sheaf_(mathematics)

82
http://en.wikipedia.org/wiki/Singleton_pattern

83
Special and General Trace and Log Analysis, Memory Dump Analysis Anthology, Volume 8b,
page 119

84
Memory Dump Analysis Anthology, Volume 5, page 272

85
http://support.citrix.com/article/CTX111961

86
http://support.citrix.com/article/CTX122741

87
http://www.debugging.tv/Frames/0x14/DebuggingTV_Frame_0x14.pdf
223

88
http://www.debugging.tv/

89
http://www.dumpanalysis.org/blog/index.php/2007/01/17/messagehistory-20/

90
http://www.debugging.tv/

91
Memory Dump Analysis Anthology, Volume 5, page 276

92
http://en.wikipedia.org/wiki/Boris_Uspensky

93
Memory Dump Analysis Anthology, Volume 2, page 387

94
http://en.wikipedia.org/wiki/Use_case

95
http://en.wikipedia.org/wiki/Visitor_pattern

96
Memory Dump Analysis Anthology, Volume 7, page 98

You might also like