0% found this document useful (0 votes)
9 views83 pages

Advances in Computers Atif Memon Eds Download

The document discusses the 91st volume of the 'Advances in Computers' series, edited by Atif Memon, which covers new developments in software and hardware. It includes chapters on reverse engineering, multicore processors, high-performance computing, and model-driven approaches for fault tolerance. The volume aims to provide insights into the complexities of modern computer systems and the techniques used to improve their performance and reliability.

Uploaded by

bzkspyamb4506
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views83 pages

Advances in Computers Atif Memon Eds Download

The document discusses the 91st volume of the 'Advances in Computers' series, edited by Atif Memon, which covers new developments in software and hardware. It includes chapters on reverse engineering, multicore processors, high-performance computing, and model-driven approaches for fault tolerance. The volume aims to provide insights into the complexities of modern computer systems and the techniques used to improve their performance and reliability.

Uploaded by

bzkspyamb4506
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Advances In Computers Atif Memon Eds download

https://ebookbell.com/product/advances-in-computers-atif-memon-
eds-4427006

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Advances In Computers Volume 131 1st Edition Ali Hurson

https://ebookbell.com/product/advances-in-computers-volume-131-1st-
edition-ali-hurson-51984522

Advances In Computers 82 1st Edition Marvin V Zelkowitz Editor

https://ebookbell.com/product/advances-in-computers-82-1st-edition-
marvin-v-zelkowitz-editor-2253216

Advances In Computers Volume 78 Improving The Web 1st Edition Marvin


Zelkowitz Phd Ms Bs

https://ebookbell.com/product/advances-in-computers-
volume-78-improving-the-web-1st-edition-marvin-zelkowitz-phd-ms-
bs-4542968

Advances In Computers 85 1st Edition Atif Memon Eds

https://ebookbell.com/product/advances-in-computers-85-1st-edition-
atif-memon-eds-4547710
Advances In Computers 86 1st Edition Ali Hurson And Atif Memon Eds

https://ebookbell.com/product/advances-in-computers-86-1st-edition-
ali-hurson-and-atif-memon-eds-4547712

Advances In Computers 87 1st Edition Ali Hurson And Atif Memon Eds

https://ebookbell.com/product/advances-in-computers-87-1st-edition-
ali-hurson-and-atif-memon-eds-4547714

Advances In Computers 89 1st Edition Atif Memon Eds

https://ebookbell.com/product/advances-in-computers-89-1st-edition-
atif-memon-eds-4547718

Advances In Computers Volume 92 1st Edition Ali Hurson Eds

https://ebookbell.com/product/advances-in-computers-volume-92-1st-
edition-ali-hurson-eds-4633924

Advances In Computers Volume 98 1st Edition Ali Hurson

https://ebookbell.com/product/advances-in-computers-volume-98-1st-
edition-ali-hurson-5427158
VOLUME NINETY ONE

Advances in
COMPUTERS

Edited by

ATIF MEMON
University of Maryland
4115 A.V. Williams Building
College Park, MD 20742,USA
Email: [email protected]

Amsterdam • Boston • Heidelberg • London


New York • Oxford • Paris • San Diego
San Francisco • Singapore • Sydney • Tokyo
Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
525 B Street, Suite 1800, San Diego, CA 92101-4495, USA
The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, UK
32, Jamestown Road, London NW1 7BY, UK
Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands
First edition 2013
Copyright © 2013 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted
in any form or by any means electronic, mechanical, photocopying, recording or otherwise
without the prior written permission of the publisher.
Permissions may be sought directly from Elseviers Science & Technology Rights
Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333;
email: [email protected]. Alternatively you can submit your request online by
visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting
Obtaining permission to use Elsevier material.

Notices
No responsibility is assumed by the publisher for any injury and/or damage to persons
or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-408089-8
ISSN: 0065-2458

For information on all Academic Press publications


visit our web site at store.elsevier.com

Printed and bound in USA


13 14 15 16 17 10 9 8 7 6 5 4 3 2 1
PREFACE

This volume of Advances in Computers is the 91st in this series. This


series, which has been continuously published since 1960, presents in each
volume four to seven chapters describing new developments in software,
hardware, or uses of computers.
Thanks in part to the recent advances in computers, our computer sys-
tems today have become large, complex, and difficult to fully understand.
At the same time, we constantly demand better quality—both in terms of
performance and fault tolerance—from these systems. We have come to
increasingly rely on multiple technologies to provide this quality improve-
ment. We use hardware solutions, in the form of multicore processors to
provide parallelism, leading to better performance. We use reverse-engi-
neering techniques to better understand the behavior of our software. We
use high performance frameworks for domain-specific applications. And we
employ model-driven approaches to develop fault-tolerance in our systems.
In this volume, we touch upon all these issues.
This volume is a compilation of a set of four chapters that study issues
of reverse engineering, multicore processors, high performance comput-
ing, and model-based approaches for fault-tolerance. The authors of these
chapters are world leaders in their fields of expertise.Together their chapters
provide a view into the state-of-the-art in their respective fields of expertise.
Software systems are large and intricate, often constituting hundreds
of components, where the source code may or may not be available. Fully
understanding the run-time behavior of such a system is a daunting task.
Over the past four decades, a range of semi-automated reverse-engineering
techniques have been devised to fulfill (or assist with the fulfillment) of
this goal. Chapter 1 entitled, “Reverse-Engineering Software Behavior,”
provides a broad overview of these techniques, incorporating elements of
source-code analysis, trace analysis, and model inference.
Run-time systems to mitigate memory resource contention problems on
multicore processors have recently attracted much research attention. One
critical component of these run times is the indicators to rank and classify
applications based on their contention characteristics. However, although
there has been significant research effort, application contention character-
istics remain not well understood and indicators have not been thoroughly
evaluated. Chapter 2 entitled, “Understanding Application Contentiousness

vii
viii Preface

and Sensitivity on Modern Multicores,” presents a thorough study of appli-


cations’ contention characteristics to develop better indicators to improve
contention-aware run-time systems.
Chapter 3 entitled, “An Outlook of High Performance Computing
Infrastructures for Scientific Computing,” presents an overview of the high
performance computing (HPC) infrastructures, both hardware and software.
The advent and ubiquitous acceptance of multi/many-core CPU chips has
brought a sea of change that is pushing scientists and engineers in a paradigm
shift so as to perceive and develop their scientific computer solutions accord-
ingly. A variety of well-established and mature trends and technologies for
HPC are discussed in the chapter. Parallelism, which is the most commonly
recognized approach of HPC, is firmly categorized into two forms: implicit
and explicit (from a programmer’s perspective depending upon how much
effort is required from the programmer to obtain a certain form of parallelism).
To improve the reliability of a system, one can add fault-tolerance
mechanisms, in order to tolerate faults that cannot be removed at design-
time. This, however, leads to a rise of complexity that increases the prob-
ability of software faults being introduced. Hence, unless the process is
handled carefully, adding fault tolerance may even lead to a less reliable
system. As a way to deal with the inherently high level of complexity of
fault-tolerant systems, some research groups have turned to the paradigm
of model-driven engineering. This results in a research field that crosscuts
the established fields of software engineering, system verification, depend-
ability, and distributed systems. Many works are presented in the context of
one of these traditional fields, making it difficult to get a good overview of
what is presently offered. Chapter 4 entitled, “Model-Driven Engineering
of Reliable Fault-Tolerant Systems—A State-of-the-Art Survey,” presents
a survey of 10 approaches for model-driven engineering of reliable fault-
tolerant systems and present 13 characteristics classifying the approaches
in a manner useful for both users and developers of such approaches. The
chapter also discusses the state of the field and what the future may bring.
I hope that you find these articles of interest. If you have any suggestions
of topics for future chapters, or if you wish to be considered as an author
for a chapter, I can be reached at [email protected].

Prof. Atif M Memon, Ph.D.


College Park, MD, USA
CHAPTER ONE

Reverse-Engineering Software
Behavior
Neil Walkinshaw
Department of Computer Science,The University of Leicester, Leicester, UK

Contents
1. Introduction 2
2. Background 4
2.1 Modeling Software Behavior 6
2.1.1 Modeling Data 6
2.1.2 Modeling Sequential Behavior 7
2.1.3 Combining Data and Control 8
2.2 The Reverse-Engineering Process 11
2.2.1 Summary 12
3. Static Analysis 13
3.1 Why do we Need Behavioral Models if we have Source Code? 13
3.2 Intermediate Representations—Source Code as a Graph 15
3.2.1 The Control Flow Graph 15
3.3 Answering Questions about Behavior with Static Analysis 19
3.3.1 Dominance Analysis, Dependence Analysis, and Slicing 20
3.4 Building behavioral Models by Static Analysis 25
3.4.1 Collecting “Static Traces”—Identifying Potential Program Executions 25
3.4.2 Deriving Models from Static Traces 29
3.5 Limitations of Static Analysis 30
4. Dynamic Analysis 31
4.1 Tracing and the Challenge of Inductive Inference 31
4.1.1 Tracing 32
4.1.2 The Essential Challenge of Dynamic Analysis 34
4.2 Practical Trace Collection Approaches 35
4.2.1 Selecting Inputs by the Category Partition Method 35
4.2.2 Selecting Inputs by Exercising the Surrounding System 37
4.2.3 Random Input Selection 38
4.3 Data Function Inference 39
4.3.1 Using General-Purpose Data Model Inference Algorithms 39
4.3.2 Inferring Pre-Conditions, Post-Conditions, and Invariants 42
4.4 State Machine Inference 43
4.5 Limitations of Dynamic Analysis 44

Advances in Computers, Volume 91 © 2013 Elsevier Inc.


ISSN 0065-2458, http://dx.doi.org/10.1016/B978-0-12-408089-8.00001-X All rights reserved. 1
2 Neil Walkinshaw

5. Evaluating Reverse-Engineered Models 46


5.1 Evaluating Syntactic Model Accuracy 47
5.1.1 Measuring the Difference 49
5.2 Comparing Model Behavior 49
6. Conclusions and Outstanding Challenges 51
6.1 Accommodating Scale 51
6.1.1 Static Analysis Technique 51
6.1.2 Dynamic Analysis Technique 52
6.2 Factoring in Domain Knowledge 53
6.3 Concurrency and Time-Sensitivity 54
References 55

Abstract
Software systems are large and intricate, often constituting hundreds of components,
where the source code may or may not be available. Fully understanding the runtime
behavior of such a system is a daunting task. Over the past four decades, a range of semi-
automated reverse-engineering techniques have been devised to fulfill (or assist with the
fulfillment) of this goal. This chapter provides a broad overview of these techniques,
incorporating elements of source code analysis, trace analysis, and model inference.

1. INTRODUCTION
This chapter presents a broad introduction to the challenge of reverse-
engineering software behavior. In broad terms, the challenge is to be able
to derive useful information about the runtime dynamics of a system by
analyzing it,either in terms of its source code,or by observing it as it executes.
The nature of the reverse-engineered information can be broad. Examples
include constraints over the values of variables, the possible sequences in
which events occur or statements are executed, or the order in which input
is supplied via a GUI.
As a research topic, the challenge of reverse-engineering behavioral mod-
els is one of the longest-established in Software Engineering. The first paper
(to the best of the authors’knowledge) on reverse-engineering state machines
(by dynamic analysis) was Moore’s 1956 paper on Gedanken Experiments
[39]. The paper that has perhaps had the biggest impact in the area and
remains among the most cited is Biermann’s 1972 paper on the k-Tails state
machine inference algorithm [3].
The topic is popular because it represents a fundamental, scientifically
interesting challenge. The topic of predicting software behavior is beset by
negative results and seemingly fatal limitations. If we want to analyze the
Reverse-Engineering Software Behavior 3

source code, even determining the most trivial properties can be generally
undecidable [45]. If we want to analyze software executions, there are an
infinite number of inputs, and we are confronted by the circular problem
that we need to know about the program behavior before we can select a
suitable sample of executions to analyze.
However, the topic is not only popular because it is scientifically inter-
esting. It also addresses a fundamentally important practical problem at the
heart of software-engineering. In practice, software systems are often devel-
oped and maintained on an ad hoc basis. Although there might exist an
initial model or statement of requirements, these rapidly become outdated
as requirements change and the system evolves. Ultimately,a system can reach
a point in its development where there is no point of reference about how
the system behaves or is supposed to behave, which has obvious implications
for crucial tasks such as testing and quality assurance.
This chapter aims to provide an overview of some of the key topics that
constitute this broad field. It discusses the basic goals (models of software
behavior), static and dynamic analysis techniques that aim to (at least in part)
achieve them, methods to evaluate the resulting models, and some of the
open challenges. It assumes no prior knowledge of source-code analysis,
dynamic analysis, and only a limited knowledge about the possible modeling
formalisms used to capture program behavior.
Clearly, given the breadth of the field, it is impossible to provide an
exhaustive analysis and set of references for each of these subtopics within a
single chapter. Instead this chapter aims to serve as an introduction. It aims
to provide a useful overview that, while avoiding too much detail, manages
to convey (at least intuitively) the key problems and research contributions
for the main topics involved. Although the chapter seeks to avoid technical
detail, where possible it uses applied examples and illustrations to provide
the reader with a high-level picture of how the various techniques work.
The rest of the chapter is structured as follows:

• Section 2: Background. This starts off by providing an overview of


some of the main modeling languages that can be used to capture soft-
ware behavior, providing small illustrative examples for each case. This is
followed by a high-level overview of the reverse-engineering process.
• Section 3: Static analysis. Introduces some of the key source code
analysis approaches that can be used to gain insights into program behav-
ior. It also discusses why automated techniques are necessary at all (why
manual inspection of source code alone is impractical),and finishes with a
4 Neil Walkinshaw

discussion of some of the key limitations of static analysis, explaining why


it is generally very difficult to relate software syntax to runtime behavior.
• Section 4: Dynamic analysis. Introduces the general problem of infer-
ring models from program traces. Many of the fundamental limitations
and concepts are rooted in the Machine Learning area of inductive infer-
ence,so the section starts by highlighting this relationship.This is followed
by some practical approaches to collect execution traces (inspired by soft-
ware testing),an overview of the key reverse-engineering techniques,and
some of the key limitations that hamper dynamic analysis in general.
• Section 5: Evaluation. Considers how to interpret and evaluate reverse-
engineered models. The usefulness of a model depends on its trustwor-
thiness. This is by definition difficult to establish if there is a lack of prior
knowledge about the system.This section shows how models can be eval-
uated if there exists some“gold standard”model against which to compare
the model, and considers possible techniques that can be adopted from
Machine Learning research to assess the accuracy of a model when there
is no existing model to draw upon.
• Section 6: Future work. Concludes by providing an overview of some
of the key outstanding challenges. It considers ongoing problems such as
scalability, and the inability to factor in domain knowledge. It also looks
toward the key looming challenge of being able to factor in concurrency
and time-sensitivity.

2. BACKGROUND
The term “software behavior” can assume a broad range of different
meanings. In this chapter, it is defined as broadly as possible:The rules that
capture constraints or changes to the state of a program as it executes. There are
numerous different ways in which such rules can be represented and cap-
tured.This subsection presents an overview of the two main facets of software
behavior: data and control. It presents an informal introduction to the key
modeling formalisms that can capture behavior in these terms, and provides
a high-level overview of the generic reverse-engineering process.
To illustrate the various types of modeling languages, we adopt a simple
running example of a bounded stack. As can be seen from Fig. 1, it has a con-
ventional interface (the push, pop, and peek functions), with the additional
feature that the bound on its size can be specified via its constructor.
Throughout this section, several notions and modeling formalisms will
be introduced. Since the purpose is merely to provide an intuition of the
Reverse-Engineering Software Behavior 5

public class Driver{

public static void main(String[] args){


BoundedStack bs = new BoundedStack(Integer.valueOf(5));
bs.push("object1");
bs.push("object2");
bs.push("object3");
bs.push("object4");
bs.push("object5");
bs.pop();
import java.util.Stack;
bs.pop();
bs.pop();
public class BoundedStack {
bs.peek();
private int lim;
bs.push("object6");
private Stack<Object> s;
bs.push("object7");
bs.push("object8");
public BoundedStack(int limit){
bs.push("object9");
lim = limit;
bs = new BoundedStack(Integer.valueOf(2));
s = new Stack<Object>();
bs.push("object1");
}
bs.push("object2");
bs.push("object3");
public boolean push(Object o){
bs.push("object4");
if(s.size()<lim){
bs.push("object5");
s.push(o);
bs.pop();
return true;
bs.pop();
}
bs.pop();
else
bs.peek();
return false;
bs.push("object6");
}
bs.push("object7");
bs.push("object8");
public Object pop(){
bs.push("object9");
return s.pop();
bs = new BoundedStack(Integer.valueOf(100));
}
bs.push("object1");
bs.push("object2");
public Object peek(){
bs.push("object3");
return s.peek();
bs.push("object4");
}
bs.push("object5");
}
bs.pop();
bs.pop();
bs.pop();
bs.peek();
bs.push("object6");
bs.push("object7");
bs.push("object8");
bs.push("object9");
}
}

Fig. 1. Code for BoundedStack.

different perspectives on software behavior, they will be introduced in an


informal,intuitive manner. Although notions such as state machines are com-
monly associated with formal, detailed definitions in conventional research
texts,in our context these definitions are deemed redundant,and are omitted.
The rest of the section is structured as follows. Section 2.1 introduces
an overview of some of the main modeling techniques, covering control,
data, and the combination of the two. This is followed by Section 2.2, which
provides an overview of the general reverse-engineering process.
6 Neil Walkinshaw

2.1 Modeling Software Behavior


Modeling languages for software systems can be crudely split into three
types: (1) languages to model the data state of the system, (2) languages
to model the sequential control within the system, and (3) languages that
attempt to combine the two. This subsection provides a brief overview of
these modeling techniques (in the same order).

2.1.1 Modeling Data


Data models capture the data state of a system or component. In other words,
they model the possible values of data variables at given points during the
execution of a program, and potentially the possible data-transformations
that lead from one state to another. In practice, such models are highly
popular, and are widely used. Developers can easily use them to encode their
assumptions into the system (e.g.,via assertions),and testing frameworks such
as JUnit use them to encode test-oracles.
Languages for specifying data behavior are commonly founded upon
Hoare Logic [23]. This provides a framework where specifications consist of
three parts: (1) a pre-condition, (2) the function itself, and (3) a post-condition.
Pre- and post-conditions are predicates over the variables of the program
that must hold for the behavior of the function to be correct.
Popular examples of modeling languages include Z [59],Alloy [26], and
the Object Constraint Language (OCL) [40]. Ultimately, these provide lin-
guistic constructs that can be used to express constraints over the variables of
a program. They tend to differ in their expressivity, suitability for automated
analysis,and target user-base. Most specification languages are associated with
automated reasoning tools,which can check that the generated specifications
are internally consistent.
Ultimately, regardless of the chosen formalism, a specification sets out
rules covering specific relationships between variables that must hold at cer-
tain points during a program execution. As an example, we provide a post-
condition for the push method in our stack example, expressed in OCL:
context: stack.push(Object):void
pre notNull: s!= null
post notEmpty: s.isEmpty() == false
post notOverUpperLimit: s.size <= lim.
The specification is relatively self-explanatory. The first line describes the
method in question. The prefixes pre and post denote pre- and post-
conditions respectively. This is followed by an identifier for the assertion,
and then the assertion itself. It is evident that these rules can be mapped
Reverse-Engineering Software Behavior 7

directly to source-code assertions, where pre-conditions appear before the


method body, and post-conditions are checked before the exit point of the
method.

2.1.2 Modeling Sequential Behavior


Data models are concerned purely with the data states; the values of vari-
ables at various points during execution. They ignore the sequential order in
which events occur that might cause these state changes,such as the provision
of inputs or the order in which functions are called. This facet of software
behavior is captured by “sequential models.” These are commonly repre-
sented either as state machines or as message sequence charts (StateCharts
or Sequence Diagrams in UML [40]).

2.1.2.1 State Machines


State machines represent a broad family of models that impose a partial order
on a sequence of events in the system. Essentially, a state machine has five
components: (1) a set of states, (2) a set of transitions between the states,
(3) a set of labels that annotate the transitions, and (4) a single initial state.
Depending on the type of system, states may also be associated with flags to
denote them as “final” or “accepting,” indicating that execution terminates
when these states are reached. Conventionally, transition labels are taken to

Fig. 2. State machine for BoundedStack where the limit is set to 3.


8 Neil Walkinshaw

represent different events or signals in the software system. A state machine


can thus be used to model the potential sequencing of these events.
An example of a state machine for the bounded stack source in Fig. 1 is
shown in Fig. 2. Here, labels denote input-output pairs (where the output
is in response to an input). All of the states are possible final states (denoted
by the double-circle). Note that, as state machines cannot incorporate data,
we have to presume that the limit of the size of the stack is fixed to some
given value (here we choose 3). If we wanted to produce an FSM that could
represent the inputs for any limit, it would require an infinite number of
states and transitions, or would need to incorporate data (more sophisticated
models to do this will be introduced later).

2.1.2.2 Message Sequence Charts and Sequence Diagrams


Whereas state machines serve to lay out every possible sequence of events,
Message sequence charts (MSCs) [25]—or their UML variants Sequence
Diagram [40] capture the order of particular sequences of events (“mes-
sages”). These are primarily used to show typical sequences of method calls
between objects within an object-oriented system, but can also be used more
generally,e.g.,to model signals sent between processes in concurrent systems.
Sequences might correspond to a particular scenario, or describe the
activity that underpins a particular software feature. The entities that send or
receive messages (e.g., objects or processes) are given a box at the top of the
diagram. A vertical line below the box signifies the life-span of the object
(where time runs from top to bottom). The messages are added as arrows
from one line to the other.
Figure 3 shows an example of a possible message sequence diagram for
the BoundedStack example. Each box corresponds to a class in the system,
and each arrow corresponds to a method call. As with state machines, there
are several notational variants of this notation that can incorporate partic-
ular characteristics of paradigms or types of systems (c.f., Live Sequence
Charts [12]).

2.1.3 Combining Data and Control


One of the weaknesses of the specification methods considered so far is the
fact that they only partially capture software behavior. Data models cap-
ture data states, but fail to capture the sequences of events that lead from
one data state to the other, and fail to show explicitly how data states affect
the possible sequencing of events during execution. FSMs and MSCs capture
the possible sequencing of events within or between components, but fail
Reverse-Engineering Software Behavior 9

Fig. 3. Sequence diagram for a BoundedStack scenario.

to track the data-state of the system throughout these events. To address this
problem, a range of extensions have been developed to existing modeling
techniques, that enable the two facets of data and control to be combined.
Such combined notations are generallyTuring-complete. In other words, these
notations can be used to capture any sequential program that can be encoded
as a Turing Machine.

2.1.3.1 EFSMs
Several such notations have been developed, which extend the conventional
FSM in such a way that states and transitions can be decorated with guards
and data operations [19, 6, 24]. Extended Finite State Machines (EFSMs) [6]
represent one particularly popular variant. These extend conventional state
machines in three ways: (1) by adding a memory to store data variables, (2) by
enabling transitions to have data-guards (expressed in terms of conditions on
the memory contents), and (3) by associating transitions with functions that
can transform the memory contents.
An example of an EFSM model of the BoundedStack example is shown
in Fig. 4. For the memory we use the two variables that correspond to the
10 Neil Walkinshaw

Fig. 4. An EFSM for BoundedStack.

data members in the source code: s and lim.The labels consist of three parts
(separated by a “|”). The first part is the input (in our case the method call),
the second part is the guard condition on the memory (a boolean predicate
on s and lim), and the third part is any operation on the memory that is
executed along with the state transition. For example, if we look to state
a, there are two outgoing transitions for the function push. If lim==0,
nothing can be pushed onto the stack, and a push has no effect on the data
state (thus looping to the same state). If lim>0 however, a push leads from
state a to b, and results in an actual push of o onto the stack.
As a whole,the three states distinguish behavior from what happens when
the stack is empty (state a), the stack is not empty, but also not full (state b),
and the stack is full (state c). It is important to note that there can be lots of
valid potential EFSMs for the same system, which may differ in the number
of states and the nature of the guards.
Models such as EFSMs are better able to capture software behavior
because they are “richer” in terms of their syntax and semantics. However,
an important downside is that this adds complexity, which makes them less
appealing to developers. For example,simple assertions in the source code are
much more widely used than EFSMs, because they can be readily produced
during development. Generating richer models such as EFSMs incurs an
overhead; the developer has to invest at least as much effort into generating
the EFSM as they might have to invest into the source code of the system
itself. This of course is a key reason for wanting to use reverse-engineering
Reverse-Engineering Software Behavior 11

techniques—to generate such specifications when the developer has not had
the time to, or when existing specifications have become outdated.

2.1.3.2 Algebraic Models


Algebraic specifications characterize a system in terms of the relationships
between its operations over the underlying data structure.The output of each
operation is captured in declarative terms, often recursively. As the number
of function definition expands, they can be used to define or characterize
each other’s behavior. A chapter on the practical application and value of
algebraic specifications can be found in Sommerville’s Software Engineering
book [49].
With respect to our stack example, some simple properties might be as
follows:
pop(push(x)) -> x
pop(new BoundedStack(y)) -> Exception
peek(push(y). push(z)) -> z
These are self-explanatory. Popping a stack where an x has been previ-
ously pushed onto it should yield an x. Popping an empty stack should raise
an exception. Peeking on a stack that has been constructed by push(y)
followed by push(z) should yield a z. This concise, declarative format is
precisely what makes algebraic specifications so popular. Such models are
especially relevant to functional programming languages (where the core
functionality is defined in a similar way), such as Scala, Erlang, and Prolog.

2.2 The Reverse-Engineering Process


The key conceptual steps involved in the general reverse-engineering process
are shown in Fig. 5. Reverse engineering was defined by Chikofsky and
Cross as “the process of analyzing a system to . . . create representations of the system
in another form, or at a higher level of abstraction.” [7]. Although this chapter is
concerned with a more specific definition (“another form” is restricted to
“a model of software behavior”), the general process discussed here applies
more generally.
Reverse-engineering starts by a process of “information gathering,”
which may involve analysis of source code, observation of program exe-
cutions, or some combination of the two. This information often has to be
abstracted—it has to be recoded in such a way that it captures the relevant
points of interest in the program at a useful level of detail, so that the final
result is readable and relevant. Finally, the collected information has to be
12 Neil Walkinshaw

Fig. 5. Schematic diagram of the key conceptual steps in the reverse-engineering


process.

used to infer or deduce a behavioral model. This step is often automated, but
may involve human intervention (e.g., to incorporate domain knowledge).
All reverse-engineering procedures that produce behavioral models can
be characterized in terms of these steps. Some solely rely on static analysis,
others on dynamic analysis, and some combine the two. Some incorporate
input from developers, and some are iterative—gradually refining the target
representation by soliciting additional traces, or additional inputs from the
developer at each iteration.
This schematic view of the reverse-engineering process provides a useful
reference point for the rest of this chapter. It shows where static and dynamic
analysis fit in, and how the developer can play a role in each case. It also
shows how techniques can be iterative. The final model can be fed back
to the reverse-engineer, who might seek to refine the result by considering
additional sets of traces, or by integrating other forms of source code analysis.

2.2.1 Summary
This section has presented shown some of the core languages with which
software behavior can be modeled. It has also presented an overview of the
general reverse-engineering process: the task of analyzing a software system
Reverse-Engineering Software Behavior 13

and deriving approximations of these abstract behavioral models. The next


two sections will look at some of the core tools for this process. The next
section will look at static analysis techniques, and how these can obtain
information about software behavior from source code.This will be followed
by a similar overview of dynamic analysis, showing how software behavior
rules can be extracted from observations program executions.

3. STATIC ANALYSIS
Static analysis is concerned with the analysis of the source code syntax
(i.e., without executing the program). In principle, source code can be seen
as the ultimate behavioral model of a software system. It encodes exactly
those instructions that will be executed at runtime; it imposes an order on
when events can occur, and governs the possible data values at different points
during the execution.
This section seeks to provide a relatively self-contained introduction to
the role of static analysis in reverse-engineering models of software behavior.
It starts by covering the basics—showing how source code can be interpreted
as a graph that captures the flow of control between individual expressions
and the associated flow of data values. These are presented in relative detail
here, not only because they form the basis for all more complex source code
analysis techniques, but also because they are of considerable value in their
own right. These representations alone provide a complete (albeit low level)
overview of the order in which statements can be executed, and how they
can pass data among each other.
This is followed by an overview of the analysis techniques that build upon
these notions of control and data flow to extract information about behavior
(the ability to reason about variable values at given points during the software
execution). Finally, the section concludes with an overview of the various
factors that limit the accuracy of static analysis. The descriptions of static
analysis techniques attempt to present an intuitive and informal introduction.
For more formal definitions, along with algorithms to compute the various
relations within the code, there are several comprehensive overviews [2].

3.1 Why do we Need Behavioral Models if we have Source


Code?
So why not simply refer to the source code in place of a behavioral model?
Why are models necessary, and what can code analysis provide? There are
three key reasons for this:
14 Neil Walkinshaw

Firstly, there is the problem of feature location. Source code that pertains
to a single facet of functionality can, depending on the language paradigm,
be spread across the system. Whereas the specific element of behavior that
we seek to reverse-engineer might only cover a small fraction of the source
code base (e.g., the code in a large drawing package that is responsible for
loading a drawing from a file), the code in question may not be localized
to a particular module or component in the system. High-level software
behavior often arises from extensive interaction between different modules
or packages [34], especially where Object-Oriented systems are concerned
[56]. So, although source code contains all of the relevant information, this
can be difficult to locate because it contains an overwhelming amount of
irrelevant information too.
Secondly, there is the related problem of abstraction. Once the relevant
source code has been localized, there is the challenge of interpreting its
functional behavior from its implementation. The way in which a particular
unit of functionality is encoded in source code depends on several factors,
such as the choice of programming language, paradigm, architecture, exter-
nal libraries, etc. Thus, a program that achieves a particular functionality in
one language or paradigm can differ substantially from another (the reader is
referred to the Rosetta Code web repository1 for some illustrative examples
of this).
Thirdly, source code is a static representation, whereas its behavior is
intrinsically dynamic.The challenge of discerning program behavior therefore
necessarily requires the developer to “mentally execute” the relevant source
code, to keep track of the values of various relevant variables, and to predict
the output. For any non-trivial source code and non-trivial functionality,
this can incur an intractable mental overhead [5].
In summary, attempting to develop a complete understanding of software
behavior from source code alone is rarely practical. Leaving aside the basic
problem that the source code might not be available in its entirety, the essen-
tial challenges listed above can be summarized as follows: There is (1) the
“horizontal” complexity of trying to discern the source code that is relevant
to the behavioral feature in question, there is (2) the “vertical” complexity of
interpreting the relevant behavior at a suitable level of abstraction, and finally
there is (3) the problem of mentally deducing the dynamic behavior of the
system, and how its state changes from one execution point to the next.

1 http://www.rosettacode.org.
Reverse-Engineering Software Behavior 15

3.2 Intermediate Representations—Source Code as a Graph


Most static analysis techniques can be discussed and implemented in graph-
theoretical terms. From this perspective,source code is interpreted in terms of
relations between statements.These can convey various types of information
about the program, with respect to the sequence in which instructions are
executed (the control ), and the values of variables at various points (the data).
This subsection will focus on the low-level code representations that are
especially useful as a basis for extracting knowledge about program behavior.
To be clear,this section does not constitute a section on“reverse-engineering”
in its own right, as the representations produced do not constitute the sort
of behavioral models that one would usually aim for. However, the repre-
sentations nonetheless provide valuable insights into software behavior, and
form the basis for most static reverse-engineering techniques, which will
be described in the following subsection. In any case, an understanding of
these basic analyses provides a useful insight into the general nature of the
information that can feasibly be reverse-engineered, without tying us down
to specific techniques.

3.2.1 The Control Flow Graph


Individual procedures or routines in the source code can be represented as
a control-flow graph (CFG) [2]. This is a directed graph, where nodes cor-
respond to individual instructions, and edges represent the flow of control
from one instruction to the next. Each graph has an entry point (indicat-
ing the point where the program starts), predicates such as if and while
statements correspond to branches, and return statements or other termi-
nating points in the program correspond to the exit nodes of the graph.
Every possible path through the graph corresponds to a potential program
execution (where loops in the graph indicate a potentially infinite number
of executions).
It is important to note the generality of this representation. CFGs can
be used to represent control flow for programs written in any programming
language. Whether the program in question is written in BASIC, Java, or
C, or is the Assembly code post-compilation, it will contain decision points
and instructions that are executed according to some predefined sequence,
which can always be represented in terms of a CFG.
To illustrate how a CFG is constructed we build a CFG for a simple
BASIC routine (written by David Ahl [1]2 ). The code, shown on the left
2 Reproduced here with his kind permission.
16 Neil Walkinshaw

Fig. 6. Bounce.bas BASIC routine to plot the bounce of a ball, and the corresponding
CFG.

in Fig. 6, plots out the trajectory of the bounce of a ball (shown in Fig. 7),
given a set of inputs denoting time-increment, velocity, and a coefficient
representing the elasticity of the ball.
The CFG is shown on the right of the figure. It plots the possible paths
through the function (the possible sequences in which statements can be
executed). It starts and ends with the entry and exit nodes, which do not
correspond to actual statements, but merely serve to indicate the entry and
exit points for the function. Predicate-statements (e.g., if or while state-
ments) are represented by branches in the CFG, where the outgoing edges
denote the true or false evaluation of the predicate. Statements at the end of
a loop include an edge back to the predicate.
Reverse-Engineering Software Behavior 17

Fig. 7. Output from bounce.bas.

3.2.1.1 Data Flow


The CFG in itself only conveys information about the possible order(s) in
which statements can be executed. To add information about the possible
flow of variable values from one statement to the other, we start by defining
two types of relationship. For a given node s in a CFG, the function def (s)
identifies the set of variables that are assigned a value (i.e., defined) at s. The
function use(s) identifies the set of variables that are used at s.
The above def and use relations can, along with the CFG, be used to
compute the reaching definitions [2]. For a given definition of a variable in
the code, the reaching definitions analysis computes the subsequent points
in the code where that specific definition is used. To make it slightly more
formal, each node s in the CFG is annotated with a set of variable-node
pairs v, n for each v ∈ use(s). These are computed such that the value of v
is defined at n, and there is a definition-free path with respect to v from n to
s. These reaching definitions are visualized as dashed edges in the extended
CFG, shown in Fig. 8.
Intuitively, this graph captures the basic elements of program behavior.
The complex interrelationships between data and control are summarized
in a single graph. The intuitive value is clear; it is straightforward to visually
trace how different variables affect each other, and how different sets of vari-
ables feature in particular paths through the code. In practice such graphs
18 Neil Walkinshaw

Fig. 8. CFG with reaching definitions shown as dashed lines.

are rarely used as visual aids, but form the basis for more advanced analyses,
as will be shown below.

3.2.1.2 Call Graphs


The above representations are solely concerned with the representation of
individual procedures or functions. In practice, for programs that exceed
the complexity of the bounce program, behavior is usually a product of
interactions between multiple functions.The possible calls from one method
or function to another can be represented as a “call graph” [47]. Such graphs
Reverse-Engineering Software Behavior 19

Fig. 9. Call graph for code in Fig. 1.

can, when used in conjunction with individual function CFGs, be used to


compute data and control-flow relationships that span multiple procedures.
For this chapter an intuitive description of a call graph will suffice. A
standard (context-insensitive) call graph consists of nodes that represent call
statements within methods or functions, and edges represent possible calls
between them. An example of a call graph with respect to the BoundedStack
code is shown in Fig. 9. Due to the simplicity of the code, this call graph is
straightforward; it is tree-shaped, there are no loops, and each call has only
one possible destination.
For larger systems, call graph computation can be exceptionally challeng-
ing [18], and has been the subject of an extensive amount of research. Calls in
the source code do not necessarily have a unique, obvious target. For exam-
ple,the target of a call in object-oriented system (because of mechanisms such
as polymorphism and runtime binding) is often only decided at execution-
time. The pointer-analysis algorithms that underpin these computations [18]
lie beyond the scope of this chapter, and the possible problems of inaccuracy
will be discussed in the wider context of static analysis in Section 3.5.

3.3 Answering Questions about Behavior


with Static Analysis
Having covered the basics of static analysis,we consider the essential questions
that they can answer about software behavior. As discussed in Section 2,
software behavior is multifaceted. Data models capture the values of data
variables (and possibly any associated transformations) at different points
in a program. Control models consider the order in which events or data
states can arise. Combined models, such as the EFSM, capture the interplay
between these two facets.
CFGs (with or without data annotations), coupled with call graphs, can
convey information about program behavior, albeit at a low level.They make
explicit the control and data flow between individual statements and func-
tions. However,if we recall the three problems of feature location,abstraction,
20 Neil Walkinshaw

and dynamism discussed at the beginning of this section, such a low-level


representation is of limited use.
These can however be attenuated by a selection of well-established anal-
ysis techniques that build upon these basic representations. They can, at least
to an extent, be used to answer specific questions about the possible behavior
of a system. They can expose the dependence between statements in a pro-
gram, can determine whether certain statements can or cannot be executed,
can estimate the value ranges for variables at various points in the program,
and can even provide the constraints on input parameters that are required
to reach particular areas of the source code. The rest of this subsection will
present an overview of these individual techniques. This will be followed
by a subsection that provides an overview of existing approaches that have
composed these techniques to build comprehensive behavioral models of
software systems.

3.3.1 Dominance Analysis, Dependence Analysis, and Slicing


The possible order in which events can occur, or the chains of causality that
lead from one state to another, form a fundamental part of the analysis of
software behavior. Looking at the representations discussed in Section 2, this
notion of sequence forms a fundamental aspect of state machines, Message
Sequence Charts, and their variants. This information can be extracted from
a CFG by two types of analysis: Dominance analysis [2] and Dependence
analyses [10].

3.3.1.1 Dominance Analysis


Given a CFG, dominance analysis can be used to draw out the order in
which statements are executed. A dominance analysis will, for each state-
ment, identify other statements that must be executed beforehand, or that
can only be executed subsequently. More formally, node B post-dominates a
node A if all paths from A to the exit node must pass through B. Conversely,
node A dominates node B if all paths from the entry node must pass through A.
These general notions of dominance and post-dominance are not par-
ticularly concise; lots of statements can dominate and post-dominate each
other, which makes it difficult to impose a specific order on the execution.
Instead of considering all possible dominating and post-dominating nodes,
this can be narrowed down by considering only the immediate dominators
and post-dominators. A node A immediately dominates another node B if
there are no other nodes that dominate B on the path in the CFG from A to
B. Conversely, a node B immediately post-dominates another node A if there
Reverse-Engineering Software Behavior 21

are no other nodes on the path from A to B that post-dominate A. Thus,


each node can only have a unique dominator and post-dominator.
These immediate relationships can be visualized as a tree.The dominance
and post-dominance trees for the bounce program are shown in Fig. 10.
In terms of program behavior, these are interesting because, for each state-
ment, they make explicit the set of other statements that must be executed
before or after (depending on the tree). If one selects a statement in the dom-
inator tree and takes the unique path to the root, all of the statements on that
path dominate that statement, and will be executed beforehand (in the order
in which they appear in the path). Similarly for the post-dominator tree; if
one selects a statement and traces its paths to the root, all of the statements
that appear on that path must be executed after the statement (this time in
the reverse-order in which they appear in the path).
This dominance analysis provides a useful layer of information about the
control-aspects of a program. It tells us which statements must be executed
before or after each other. If we want to know, in coarse terms, the order in
which statements are executed to calculate a particular output,then analyzing
them in terms of their dominance is a good starting point.

3.3.1.2 Control Dependence Analysis


Dominance trees omit an important aspect of information:conditions. Source
code is composed of logical constructs such as if-then-else blocks and
for loops. The execution of a statement is generally contingent upon the
outcome of decision points throughout the program. This conditional rela-
tionship between statements is not captured by dominance alone.
Control dependence analysis [10] builds upon dominance analysis and
data-flow analysis to capture the conditional relationships between state-
ments. A statement B is control-dependent upon a statement A if the execu-
tion of B is determined by the outcome of A, but B does not post-dominate
A. In other words, the execution of B must solely depend on the evaluation
of the predicate A; if B is executed regardless of the outcome of A, it is not
control-dependent.
These control dependence relationships can again be viewed in a graph-
ical format, referred to as the control dependence graph (CDG). The CDG for
the Bounce.bas example is shown in Fig. 11. By default, any top-level
statements that are not contained within a conditional block are control-
dependent upon the entry node to the CFG. Nested branches indicate nested
conditional blocks.
It is important to distinguish the nature of the information contained
within the dominance trees and the dependence graph. Dominance trees
22 Neil Walkinshaw

Fig. 10. Dominator and post-dominator trees for bounce (shown on the left and the
right respectively).
Reverse-Engineering Software Behavior 23

Fig. 11. Control dependence graph for bounce.bas.

contain sequential information, and can be used to answer questions of the


nature “Which statements must follow statement X ?” or “Is statement Y always
preceded by statement X ?”. On the other hand, the CDG ignores most of
the sequential information. It identifies only the essential orders between
statements that depend upon each other. It can identify relationships between
statements where there is not necessarily a dominance relationship,but where
one can still have a bearing on others’ execution.

3.3.1.3 The Program Dependence Graph and Slicing


Whereas dominance and control dependence solely focus on control—
whether or not (or the order in which) statements execute—the program
dependence graph (PDG) [16] combines control dependence with the dataflow
between statements. This (at least for a single-routine) includes all of the
dependencies that tie statements to each other. For any pair of statements a
and b that is connected by some path a → · · · → b through the program
dependence graph, it is possible that a will either affect whether or not b is
executed, or will have some effect on the variable values used at b. In other
words, it is possible that a will affect the behavior of b. The PDG for the
bounce.bas program is shown in Fig. 12.
This ability to automatically relate statements to each other in terms of
their underlying dependencies is clearly valuable from the perspective of
analyzing software behavior. Several automated tools have been developed
that can use the program dependence graph to compute program slices [57].
A slice is defined with respect to a particular statement in the program, and
a particular set of variables at that statement (together these are referred to
as the slicing criterion). It is defined as the subset of statements in the program
24 Neil Walkinshaw

Fig. 12. Program dependence graph for bounce.bas.

that are responsible for computing the variables in the slicing criterion.
When computed in terms of the PDG, a slice simply consists of all of those
statements in the PDG that can reach the node in the graph representing
the criterion.
Slices are clearly useful in their own right. As mentioned at the beginning
of this chapter, one of the core problems of understanding software behavior
is the information overload. There are lots of irrelevant source code, and the
code that is relevant to the feature in question might be spread across the
code base. Slices can (at least in theory) locate the relevant source code.
To get an intuition of how useful slicing can be, let us suppose that we are
interested in finding out which statements affect the behavior of variables
Reverse-Engineering Software Behavior 25

S1 and l at line 35. Extracting a slice from the dependence graph is simply
a matter of tracing back along every incoming path to node 35, marking the
statements that are traversed in the process. The resulting slice is shown in
Fig. 13—both in terms of the reduced PDG, as well as the corresponding
source code.
The “headline” value of slicing is clear. In reference to the challenges
discussed in Section 3.1, it limits the amount of source code that has to
be navigated with respect to a particular comprehension task, reducing the
mental overhead on the developer. Importantly, these benefits also transfer to
other non-manual tasks, including reverse-engineering. Even though these
source code analysis techniques do not directly reverse-engineer a model,
they can reduce the amount of source code that needs to be considered
for a re-engineering task. Slicing can be used to reduce the code base to
something more manageable. Slicing is used for the same purpose in other
domains, such as source-code verification [13], and (its originally intended
purpose) debugging [58].

3.4 Building behavioral Models by Static Analysis


As illustrated above, source code (1) provides a (partial) ordering on the
events that can occur during program execution, and (2) maps out the way
in which data can flow through the program. It is the ultimate blue-print
for program behavior. Although the low-level analyses discussed above can
provide some basic information about execution order, it remains difficult
to ascertain useful, abstract information about program behavior.
Numerous techniques have been developed that build upon the low-level
analyses presented above to generate models of software behavior.These date
back to work by Kung et al. [30], but the field has flourished only relatively
recently,with work by numerous groups around the world [50, 54, 48]. All of
these approaches share what is essentially the same underlying process. They
first derive a set of “static traces,” a tree-like representation of the possible
paths through the source code. These static traces are then combined and
abstracted, to produce an abstract model that captures the behavior of the
system as a whole.

3.4.1 Collecting “Static Traces”—Identifying Potential Program


Executions
Most static analysis approaches to deriving models rely on identifying a set
of potential program execution paths through the code. These can subse-
quently be mined to generate the final model. The broad task of identifying
26 Neil Walkinshaw

Fig. 13. Computing the slice in bounce.bas, where the slicing criterion is line 35, vari-
ables S1 and l. This shows that 18 of the 37 lines are relevant to the computation.
Reverse-Engineering Software Behavior 27

Fig. 14. An example of infeasible paths; the only feasible paths through this CFG are
those that take both true branches, or both false branches. Paths that mix true and false
branches are infeasible.

feasible executions from source code is (as will be elaborated in Section 3.5)
generally undecidable. However, there are numerous techniques that have
emerged from the field of software verification that can at least provide an
approximation of feasible executions.
The challenge of identifying a feasible path through a program is best con-
templated in terms of the CFG. In a CFG, a program execution corresponds
to a path from the entry to the exit node. The problem is that not every
walk through the CFG corresponds to a feasible execution. One branch in
the graph might give rise to a configuration of data values that prohibit the
execution of certain subsequent branches. This is illustrated in Fig. 14. Only
two of the four paths through the graph are actually feasible.
The task of collecting “static traces” involves processing the CFG to elicit
a collection of paths that are feasible. This is generally accomplished by
28 Neil Walkinshaw

Fig. 15. Small portion of a symbolic execution tree, for a portion of code taken from the
“bounce” code in Fig. 6. For the sake of readability the tree only reaches a depth of six
execution steps, which suffices to give an intuition of how it is constructed.

adopting static-analysis techniques that are capable of systematically explor-


ing the state-space of the program by traversing the CFG (without actually
executing it). Commonly used techniques include abstract interpretation
[9], and symbolic execution [27] (which is a specialized form of abstract
interpretation).
To provide an intuition of how such techniques work, a brief example of
symbolic execution is shown in Fig. 15. In general, abstract interpretation
approaches can vary substantially from this, so the purpose here is merely to
provide an intuition of what a possible approach looks like (this approach
is similar to those taken by Kung et al. [30] and Walkinshaw et al. [54]). A
“symbolic execution tree” is constructed by building a tree of all possible
paths through the CFG, and associating each node in the tree with a path
condition. The path condition represents the current constraints on the data
variables that must hold at that given point, as deduced from the predicates
Reverse-Engineering Software Behavior 29

and variable assignments that have occurred so far. The feasibility of a path
is reflected in the satisfiability of its path condition.
Even from this example,it becomes apparent that the tree expands rapidly.
The tree is limited to a depth of 6, but does not even contain the full bodies
of the nested for-loops. Each iteration of the outer for-loop would entail ten
nodes (three of which are branch-points). As the depth of the tree increases,
the number of nodes in the tree increases exponentially.
One way of limiting the expansion of the tree is to use SAT-solvers.
These can prevent the expansion of any nodes with path conditions that are
unsatisfiable (and so correspond to infeasible program executions). However,
this is often of limited use. If the program in question couples loops with non-
trivial predicate conditions (e.g., involving floating-point arithmetic) it can
become impossible to determine whether particular paths are feasible or not.
If loop-termination depends on the values of unknown input parameters,
there remains no option but to expand the loop indefinitely (or, in practice,
up to some arbitrary limit).

3.4.2 Deriving Models from Static Traces


Once the possible paths through a program have been extracted, it becomes
possible to derive behavioral models. Most published approaches focus on
FSMs or EFSMs [30, 50, 54, 48]. All techniques use some form of abstract
interpretation/symbolic execution to produce static traces (as described
above), but vary according to the nature of the state machine produced.
Given the variety in approaches, there is no single generic description that
can serve to capture how all of these techniques produce their state machines.
However, to provide an intuition we focus on one approach that is shared by
the work by Kung et al. and Walkinshaw et al. [30, 54]. Both start from an
execution tree and produce a state machine using roughly the same approach
(although they have different ways of producing the execution trees and seek
to derive models for different purposes).
Given an execution tree, the techniques operate in two phases. The first
phase is to identify points in the execution tree that correspond to“state tran-
sition points”—triggers that indicate the transfer from one state to another.
For example, these might be calls to an API (if the target model is to repre-
sent the API usage), or method-entry points if the model is to represent the
sequential behavior within a class.
Finally, the tree is used to construct the state machine. This can be done
very simply; for every pair of annotated states A and B in the tree such that
there is a path from A to B (without intermediate annotated states), it is
30 Neil Walkinshaw

possible for execution to reach state B from state A. In other words, there is
a state transition A → B.

3.5 Limitations of Static Analysis


Deriving models of software behavior from source code is fundamentally
limited by undecidability. Seemingly simple behavioral properties, such as
whether a program will terminate, have been proven to be undecidable in
the general case. More damning is the proof of Rice’s theorem, which shows
that the same limitations that give rise to the halting problem also mean that
all non-trivial behavioral properties are undecidable [46].
Representations that form the basis for static analysis—CFGs, dataflow
relations, symbolic execution trees, etc.—are intrinsically conservative.
Wherever the feasibility of a branch or a path cannot be decided, static
analysis is forced to return a result that allows for all possibilities. Thus, any
analyses that build upon these foundations tend to be severely hobbled by
inaccuracy.
The practical value of techniques that seek to predict dynamic program
behavior from source code is severely limited. For languages such as Java,
symbolic execution environments do exist,but cannot (at the time of writing)
be used “out of the box.” Although some systems can be used in certain
circumstances at a unit-level, they cannot readily scale to larger systems.
Complex data-types and constraints exacerbate this problem even further.
Finally,there is the problem of libraries. Even a trivially small Java program
can make extensive use of built-in libraries ( System.out.println,
etc.). Calls to libraries can be instrumental to the behavior of the system
under analysis. For example, the question of whether a call to a
Collections object is a mutator (changes the state of the object) or
an observer (merely returns the state) is crucial for determining whether
that call should be treated as a def or a use. This can only be determined by
analyzing the source code (or byte-code) of the class in question.
There are only two possible solutions to this. The first solution is to
develop all-encompassing static analysis techniques that pull in the respective
source code or byte-code for the libraries in question.This can severely affect
scalability.The only other option is to produce manual“stubs” to the libraries
that summarize the relevant information.This is, for example, carried out for
the JavaPathFinder model-checker (and its symbolic execution framework)
[51]. This however also has the clear downside that it is prohibitively time
consuming.
Reverse-Engineering Software Behavior 31

4. DYNAMIC ANALYSIS
Whereas static analysis is concerned with deriving information from
the source code syntax, dynamic analysis is broadly concerned with analyz-
ing the program while it executes. This can involve keeping track of data
and control events—either as externally observable input/output, or inter-
nal information such as the execution sequence of individual statements or
values that are assigned to particular variables.
Whereas static analysis often consumes an overwhelming amount of
effort to increase precision by eliminating infeasible predictions, this is not
a problem with dynamic analysis. If anything, the challenge with dynamic
analysis is the converse [14]. We begin with a set of observed executions,
and from this have to make generalizations about program behavior, based
on inferences from the given set of executions.
As with static analysis, dynamic analysis encompasses a very broad range
of techniques with different purposes. For example, profilers can provide a
high-level view of the performance of a programming, debuggers can be
used to make explicit the internal state of a program at particular points, and
testing can check that a program is returning the correct outputs.This section
focusses on a selection of dynamic analysis techniques that are particularly
useful from a reverse-engineering perspective.
The chapter is structured as follows. It begins by presenting the process
of tracing—the fundamental step of recording the information about pro-
gram executions, which forms the basis for the subsequent analysis. It then
continues to discuss in broad terms the general relationship between the
inference of software models, and the general Machine Learning problem
of inferring models from given examples. This is followed by the presen-
tation of the more specific challenges of inferring state machines and data
models from traces. The section is, as was the case with the static analysis
section, finished with a discussion of the intrinsic limitations of dynamic
analysis.

4.1 Tracing and the Challenge of Inductive Inference


In broad terms, the goal of dynamic analysis (in the context of reverse engi-
neering) is to produce a model that generalizes some facet of program behav-
ior from a finite sample of program executions. Program executions are
recorded by a process known as tracing. Tracing can be time and resource
consuming, and requires the considered selection of program inputs. Then
32 Neil Walkinshaw

there is the challenge of inferring a model that is accurate and complete.


This subsection introduces these problems, to provide a sufficiently detailed
background for the rest of this section.

4.1.1 Tracing
Software tracing is, in broad terms, the task of recording software executions
and storing them for subsequent analysis. Intuitively, the task is simple. To
monitor program execution, and to feed the name of every significant event
(such as the name of a method call), along with the values of relevant data
parameters and state variables,to a text file. Many modern programming lan-
guages are accompanied by relatively sophisticated trace generators (c.f., the
Eclipse TPTP tracing plugin for Java3 , or the Erlang OTP built-in tracing
functions4 ).
A trace is, put simply, a sequence of observations of the system. Each
observation might contain an element of control (e.g., a method-entry point,
or some other location in the code), and an associated data state (variable
values, object-types, etc.). In simple terms, for the purpose of discussion here,
we envisage a trace of n observations in the following simple format:

T = (p0 , D0 ), . . . , (pn , Dn )

Each observation takes the form p, D, where p is some point in the code
(this same point may feature in several observations within the same trace),
and D represents the state of the various relevant data variables at that point.
To produce a program trace, it is first necessary to determine what needs
to be traced. Tracing is an expensive process; recording information is time
consuming, trace data can consume large volumes of storage space, and traces
can be time consuming to analyze.The question of what to trace depends on
the nature of the analysis to be carried out. The question is critical because
recording too much information can lead to traces that become unman-
ageable, and can lead to unintended side-effects that affect the subsequent
behavior of the program (both of these problems are elaborated below).
The default option of recording everything at every point becomes rapidly
intractable for all but the simplest programs. To provide an intuition, we can
draw upon an example of a trace in JHotDraw (a Java drawing problem
that is popular for case-studies in the Software Engineering community—
see Fig. 16). The system (version 5.2) is relatively simple, consisting of 142
3 http://www.eclipse.org/tptp/.
4 http://www.erlang.org/doc/.
Reverse-Engineering Software Behavior 33

Fig. 16. Screenshot from a simple JHotDraw sample application execution.

classes. However, the trace that captures the sequence of method signatures
involved in the (roughly four-second long) process of starting the program,
drawing the rectangle shown in the figure, and closing the window, contains
a total of 2426 method invocations. And this is still relatively lightweight; the
trace only traces method-entry points (omitting individual statements, data
values, and object-identities), which would lead to an order-of-magnitude
increase in the size of the trace.
This illustrates an important trade-off between the utility of the traces
and their scale. On the one hand it is important to collect all of the infor-
mation that is necessary for the analysis in question. If we are constructing
a behavioral model, this should include all of the information at each point
during the execution that is relevant to the behavior that is of interest. On
the other hand,it is important to avoid an information overload,which could
reduce the accuracy of the model, and increase the expense of the analysis.
There is also the question of which traces to collect. Programs tend to accept
inputs from an infinite domain. Given that dynamic analysis techniques are
restricted to a finite sample of these, it is vital that they are in some sense
representative of “general” or “routine” program usage. Otherwise the results
of the dynamic analysis risk become biased.
This tension between scale and the need for a representative set of traces
represents the core of the dynamic analysis challenge. This is exacerbated
when the analyst lacks an in-depth familiarity with the program (which
34 Neil Walkinshaw

is probable in a reverse-engineering scenario). In reality it is unrealistic to


expect the developer to know exactly what information needs to be extracted
from which program points. It is also unlikely that they will be aware of (and
have the time to exhaustively execute) the inputs that are required to elicit a
“typical” set of program executions. Indeed, these factors set out the param-
eters of the dynamic-analysis challenge; to make-do with the given informa-
tion, and to nonetheless infer a model that is at least approximately accurate.

4.1.2 The Essential Challenge of Dynamic Analysis


The broad task of reverse-engineering an accurate general model from a
(potentially small) selection of traces is challenging. The process of deriving
an accurate model is invariably a product of guesswork,where the accuracy of
the final result depends on several factors, such as the breadth and availability
of traces. This subsection briefly covers the essential reasons that underpin
the problem of model inference (and dynamic analysis in general).
Ultimately, a software system can usually produce an infinite number of
different executions, taking inputs from an infinite domain, and producing
an infinite range of outputs. This means that a given selection of traces is
inevitably only a partial representation of the general program behavior. A
model that is generated from these traces cannot merely parrot-back the
trace-values that have been observed. It has to somehow use the traces to
also guess any features of program behavior that might not been explicitly
available in the actual traces. In other words, the inference process must
include a process of induction—of making guesses, based on the given traces,
of the more general system behavior that the model should reflect.
As such, many dynamic analysis techniques belong to a family of tech-
niques from the field of Machine Learning, known as inductive inference tech-
niques [38].These tend to be based on some form of statistical reasoning;they
might for example derive rules from data variables that occur frequently, or
method sequences that are repeated often. Ultimately, any technique that is
essentially founded on guesswork is at risk of making mistakes [14].The given
sample of traces might be a poor sample, with misleading patterns that point
toward non-existent rules in the system, or that miss out crucial features.
An illustrative example of the induction problem is shown in Fig. 17.
Consider the relationship between two numerical variables, where the
observed values from a trace are plotted in a scatter plot. Based on this
information, an inference technique might guess that the data points are
produced by the straight-line function on the left when, in fact, they are
produced by the polynomial function on the right.
Reverse-Engineering Software Behavior 35

Fig. 17. Illustration of the problem of inferring behavior from limited data points.

The challenge of inferring the correct model places a great premium on


the provision of the traces. To reduce the chances that an inference algo-
rithm makes mistakes, it is necessary to make the set of traces as “rich” and
“representative” of typical behavior as possible. Finding such a set of traces is
of course a significant problem in its own right. The problem of not know-
ing which inputs have to be exercised to capture every significant facet of
behavior is akin to the test-data generation problem [60].
In the broad field of inductive model inference, it is taken for granted that
the sample of observations (read “traces” in our context) has been carefully
collected to ensure that it is at least representative to some basic extent.
This is a necessary precondition for dynamic analysis too [14]. Traces must
be selected with care, otherwise it becomes difficult to make any reliable
assertions about software behavior, because the model could be skewed by
the underlying set of traces.

4.2 Practical Trace Collection Approaches


Depending on the system we seek to reverse-engineer, the challenge of col-
lecting a sufficient set of traces is potentially achievable, despite the problems
discussed previously. Two techniques that can be used are briefly presented
here. The first assumes that the subject system is an isolated component,
where we only have a vague idea of how the input parameters affect pro-
gram behavior. The second assumes that the subject system is a component
embedded within a wider system, where its usage is governed by its sur-
rounding components.

4.2.1 Selecting Inputs by the Category Partition Method


The Category-Partition (CP) method [41] is a popular test set generation
technique. The idea is that we have some software system, that accepts as
input a set of parameters, each of which has a certain bearing on the output
produced by the system. The CP Method provides a framework within
36 Neil Walkinshaw

Table 1 Possible categories for push method.

Input Categories

parameter o Null, non-null object


lim Negative, zero, positive
s.size() Zero, positive

which to select “significant” inputs that ought to collectively cover every


important facet of program behavior.
For each parameter, the first task is to identify its “categories.” These
are ranges of values that (are thought to) yield the same software behavior. So,
for example,let us suppose that we want to test the BoundedStack.push
(Object) method in isolation (shown in Fig. 1). The inputs that would
affect the behavior of the method are the parameter object (the obvious
input), as well as the current state of the stack—so the value of lim, and the
size of the underlying stack s.
Table 1 shows some possible categories for the push method. A category
represents a particular value or range of values that could (at least potentially)
individually affect the output of the program. So one might wish to explore
what would happen if a parameter is null, or some arbitrary object.The limit
lim is merely an integer; we are unaware of the bounds, so it makes sense to
see what happens if it is set to zero, negative, or positive value. With s.size(),
given that s is an array that was instantiated elsewhere, one can surmise that
its size cannot be negative. Nonetheless, it still makes sense to establish what
happens when it is empty or populated.
Once the categories have been developed, they can be combined in a
systematic way to yield “test-frames”—combinations of inputs that system-
atically exercise the input-space. For the categories in Table 1 this would
yield 2 × 3 × 2 = 12 different test frames, which capture every combination
of categories. The test set that is obtained is shown in Table 2.
Depending on the number of input parameters, and the number of cate-
gories per parameter, systematically generating all possible combinations can
result in too many combinations to execute exhaustively. In the context of
testing, the problem generally comes down to time; too many test cases mean
that testing will take too long. However, in the context of dynamic analysis
there is an additional problem of space; each execution is being recorded
as a data-trace which, depending on the contents of the trace, can rapidly
become intractable.
Reverse-Engineering Software Behavior 37

Table 2 Final input combinations based on categories in Table 1.

lim o s.size()

Negative Null Zero


Negative Not null Zero
Negative Null Positive
Negative Not null Positive
Positive Null Zero
Positive Not null Zero
Positive Null Positive
Positive Not null Positive
Zero Null Zero
Zero Not null Zero
Zero Null Positive
Zero Not null Positive

One way to attenuate this problem is to prioritize the test cases. Certain
category combinations will be more critical than others—some may elicit a
greater variance of program behavior than others. Therefore, if it is going to
be impossible to exhaustively execute the test cases,it makes sense to concen-
trate on the most important ones or, if possible, to order the combinations
according to some order of importance, so that the key combinations are
guaranteed to be executed.

4.2.2 Selecting Inputs by Exercising the Surrounding System


The category-partition approach makes the big assumption that there is a
sufficient amount of a priori knowledge about the system to identify the
key input categories. Although this is perhaps a reasonable presumption in
the context of software testing, it is improbable that this is always the case in
a reverse-engineering scenario. The system is being reverse-engineered pre-
cisely because of a lack of detailed knowledge about the system in question.
In this situation, the reverse-engineer is seemingly reduced to attempting
arbitrary inputs. However, using such an input selection strategy is unlikely
to amount to a representative set of traces. Any purely random test-input
technique would require an enormous number of tests to produce a truly
representative sample of behavior, which would far exceed the time and
space limits that any practical reverse-engineer operates in (see point (2) on
random input selection below).
There is however often an alternative. Software components rarely oper-
ate in isolation. They tend to operate within a broader software framework.
38 Neil Walkinshaw

If the software we seek to reverse-engineer is perhaps an abstract data type


or a utility class, the reverse-engineer is furnished with a valuable additional
source of data: its typical runtime usage.
Obtaining runtime data in this way is straightforward. The component
can simply be left within its original environment. Its behavior can then be
traced while controlling the inputs to its host system. Whereas identifying
the necessary inputs to an isolated, anonymous low-level component can be
difficult, identifying inputs to the host system (e.g., via a GUI [20] or a set
of user-level test cases) can be (1) much more straightforward and (2) can
lead to a set of relatively representative traces.
It is important to note that gathering trace-data in this way places an
important caveat on any interpretations that can be drawn from inferred
models. In giving up control over the component-level inputs, any
executions of the component are highly specific to the broader framework
within which they operate. This naturally means that any models that are
inferred from the traces are similarly biased.

4.2.3 Random Input Selection


As a last resort, in the absence of any prior knowledge about the categories
of inputs or the context in which the system is being used, it is possible to
resort to the random selection of inputs. The approach is to be used as a last
resort because of two big problems:
1. Generating random inputs:For any system that takes non-trivial input
parameter types (e.g., complex objects, sequences of button-clicks, etc.)
there is no accepted basis for producing inputs that are truly random [22].
This means that there is a danger that the traces will be biased toward those
aspects of behavior that happen to be the easiest to execute, potentially
missing out important facets of behavior that are triggered by more subtle
input combinations.
2. Number of inputs required: Even if it is possible to obtain a “truly”
random input generation procedure, it usually takes a vast number of
random test cases to even approximate the number of tests required to
adequately exercise the full range of software behavior. Results from
the field of Machine Learning indicate that the number of random tests
required is exponential with respect to the complexity of the system (in
terms of the state-space) [32].
Nonetheless, there are upsides to adopting this technique. There are
numerous automated quasi-random input-generation techniques that will
rapidly generate large volumes of random tests. At the time of writing, one
of the most popular implementations is the Randoop tool [42] (which is
Reverse-Engineering Software Behavior 39

vulnerable to the first problem mentioned above). This will, for a given
Java class, generate extensive numbers of inputs (in the form of JUnit tests),
regardless of the complexity of the input parameter objects.

4.3 Data Function Inference


Data function inference aims to infer data-models from traces. As discussed
in Section 1, these generally include pre-/post-conditions for functions, and
invariants (for objects or loops, etc.). Given a trace, the task is generally to
derive a separate specification for each unique point in the code p that is
recorded in the trace (i.e., for each method-entry that appears in the trace).
The trace is first processed to map each location in the code to the set of
data configurations that have been observed at that location. This produces a
one-to-many mapping. Subsequently, for each point, an inference algorithm
is used to identify a data-model that fits the data.
There are numerous inference algorithms that can be used to identify
useful data functions from multiple data observations. These involve numer-
ous general-purpose data-mining algorithms, as well as techniques that are
especially tailored to infer models from software traces. Examples of both
are shown below.

4.3.1 Using General-Purpose Data Model Inference Algorithms


Perhaps the most straightforward way to infer a data model from traces is
to employ general-purpose data mining algorithms for the task. There are
hundreds of algorithms [38, 21]. These vary in several dimensions—from
their scalability, to their ability to cope with noise, and the readability of
the models they infer. Determining which algorithm is best for a particular
type of software system or trace is beyond the scope of this chapter. This
subsection provides an illustrative example of how a generic data mining
algorithm can be applied, to provide an intuition of how such algorithms
can be employed in general.
Let us suppose that we want to reverse-engineer the rule that governs
the behavior of the push function in the BoundedStack class in Fig. 1.
We begin by running a few executions to trace the relevant variable values
in the push method throughout. This gives rise to the data in Fig. 18. At
this point the dynamic-analysis reverse-engineering challenge is clear. How
can we obtain from the data in Fig. 18 a model that describes the general
behavior of the push method?
There are numerous general-purpose Data-Mining/Machine Learning
algorithms that can be used to analyze such data [38]. For the sake of illus-
tration we use the WEKA data-mining package [21] to analyze the trace. It
40 Neil Walkinshaw

o lim s.size return


obj1 5 0 TRUE
obj2 5 1 TRUE
obj3 5 2 TRUE
obj4 5 3 TRUE
obj5 5 4 TRUE
obj6 5 2 TRUE
obj7 5 3 TRUE
obj8 5 4 TRUE
obj9 5 5 FALSE
obj1 2 0 TRUE
obj2 2 1 TRUE
obj3 2 2 FALSE
obj4 2 2 FALSE
obj5 2 2 FALSE
obj6 2 0 TRUE
obj7 2 1 TRUE
obj8 2 2 FALSE
obj9 2 2 FALSE
obj1 100 0 TRUE
obj2 100 1 TRUE
obj3 100 2 TRUE
obj4 100 3 TRUE
obj5 100 4 TRUE
obj6 100 2 TRUE
obj7 100 3 TRUE
obj8 100 4 TRUE
obj9 100 5 TRUE

Fig. 18. Sample trace for the push method, containing 27 trace elements.

contains numerous data mining algorithm implementations, which can be


used to derive models from data of the sort shown in Fig. 18.The sheer vari-
ety of algorithms is (to an extent) problematic; there are only few guidelines
about which type of algorithm suits a given set of data characteristics. In
our case, we choose to use the NNge (Non-Nested generalized exemplars)
algorithm [37],which has been anecdotally shown to work well for examples
with variables of different types.
The rules that are produced by the NNge algorithm are shown in Fig. 19.
They hypothesize how the values of o,lim,and s.size affect the outcome
Reverse-Engineering Software Behavior 41

class TRUE IF : o in {obj9} ^ lim=100.0 ^ s.size=5.0 (1)


class TRUE IF : o in {obj3} ^ 5.0<=lim<=100.0 ^ s.size=2.0 (2)
class TRUE IF : o in {obj4} ^ 5.0<=lim<=100.0 ^ s.size=3.0 (2)
class TRUE IF : o in {obj1,obj2,obj6,obj7} ^ 2.0<=lim<=100.0 ^ 0.0<=s.size<=3.0 (12)
class TRUE IF : o in {obj5} ^ 5.0<=lim<=100.0 ^ s.size=4.0 (2)
class TRUE IF : o in {obj8} ^ 5.0<=lim<=100.0 ^ s.size=4.0 (2)
class FALSE IF : o in {obj9} ^ 2.0<=lim<=5.0 ^ 2.0<=s.size<=5.0 (2)
class FALSE IF : o in {obj3,obj4,obj5,obj8} ^ lim=2.0 ^ s.size=2.0 (4)

Fig. 19. Output from the NNge Algorithm—the numbers in parentheses after each rule
indicate the number of trace observations that support a given rule.

(true or false—representing whether an object has been pushed onto the


stack or not). For example, if o = “obj3,” 5 ≤ lim ≤ 100, and s.size = 2,
the element will be pushed. However if o=“obj3,” lim=2, and s.size=2,
it will not.
Looking at the model, it becomes clear that some of the rules seem
inaccurate. They might pertain to a particular execution, but not be represen-
tative of the program behavior in general. For example, we can tell from the
source code that the actual values of o do not affect the behavior of the push
method. That is, unless o=null, but this is not captured by any of the traces.
We can also tell from the source code that the rule we are after is much
simpler than the one given here; an element is added if lim > s.size.
However, these rules fail to capture this explicit relationship between the
variables, instead focusing on the individual relative value ranges.
As discussed above, the problem of generality highlights a fundamental
problem in dynamic analysis. If the given set of traces fails to highlight an
element of program behavior (e.g., what happens when o = null), then
this cannot be taken into account by any hypothesis model. If the set of
traces is too small, there may simply not be enough evidence for a particular
behavioral feature to factor in the model (e.g., that the value of o does not
really matter).
It is also important to bear in mind that we are only able to tell that the
model is inaccurate because we have access to the source code,and can derive
this rule because the source code in question is very simple. Of course, in
reality, the source code that produces the trace could be inaccessible (e.g.,
belonging to a closed-source library), or might simply be too complex to
readily inspect and understand in this way.Thus,treating the system as a black-
box, and making no presumptions about the roles of different variables, given
these traces, this model would seem to be a reasonable guess at the rules.
The second criticism (the omission of the seemingly obvious relationship
between lim and s.size) points toward a more general characteristic of
data mining algorithms. They do not assume that the data was produced by
42 Neil Walkinshaw

some computational process. The systems for which they were developed
might produce noisy meteorological data, or arbitrary heights and genders of
shoppers in a supermarket,or share-prices in a recession.They do not tend to
look for explicit,discrete relationships between variables,but instead focus on
statistical, probabilistically justified correlations and clusterings, which tend
to produce models that seem counterintuitive when the system in question
is a discrete logical software system.

4.3.2 Inferring Pre-Conditions, Post-Conditions, and Invariants


This leads us onto a slightly different class of inference techniques—purpose-
built software reverse-engineering tools. Unlike generic data-mining and
machine learning tools, these make the explicit assumption that groups of
variables will either feed into or be the product of some software system.
The most popular tool in this field is Ernst et al.’s Daikon tool [15], which
was published over a decade ago. This puts explicit functional relationships
between variables at its heart. The tool is equipped with a set of standard
rules that might be of interest to a software developer, such as x > y, or
a + b + c ≤ d, etc. (Daikon contains about 50 of these relationships).
Such rules are akin to the types of rules described in Section 2.1.1. As
such,techniques such as Daikon can be highly effective at reverse-engineering
pre-/post-conditions, or invariants that hold for objects or classes. In the
original paper, Ernst et al. argued that these techniques could be embedded
into the source code as assertions, to ensure that changes do not lead to
unintended changes in behavior.
Given a set of traces, Daikon uses its internal list of rules as a checklist,
and identifies all of the rules that can fit for a given observation point in
the trace. If one rule is a more specific variant of another, it will choose the
most specific variant, so that the user is given the most accurate possible set
of rules, and is not overloaded with rules that are superfluous.
The tool is demonstrated with respect to our BoundedStack exam-
ple in Fig. 1. The output produced by Daikon is shown in Fig. 20. The
type of analysis offered by Daikon is more tailored to source code con-
structs than the more generic Machine Learning analysis discussed pre-
viously. It distinguishes between distinct exit points in the program, and
links up the entry and exit conditions. Thus, the statement this.lim ==
orig (this.lim) implies that the value of lim remains the same at the
entry and exit points. The rules are generally very simple in nature, picking
out pertinent inter-variable relationships that remain constant for particular
sets of executions.
Reverse-Engineering Software Behavior 43

===========================================================================
source.BoundedStack.push(java.lang.Object):::ENTER
o != null
o.getClass() == java.lang.String.class
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT17
return == true
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT17;condition="return == true"
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT21
this.lim one of { 2, 5 }
return == false
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT21;condition="not(return == true)"
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT
this.lim == orig(this.lim)
this.s == orig(this.s)
(return == false) ==> (this.lim one of { 2, 5 })
(return == true) ==> (this.lim one of { 2, 5, 100 })
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT;condition="return == true"
return == true
===========================================================================
source.BoundedStack.push(java.lang.Object):::EXIT;condition="not(return == true)"
this.lim one of { 2, 5 }
return == false
===========================================================================

Fig. 20. Daikon output for the push method, using the trace from Fig. 18. The points
EXIT17 and EXIT21 represent the two individual return statements. The EXIT point covers
both of these return points.

Daikon has proven itself to be a valuable tool for a range of program-


ming activities (notably detecting conditions that can be hard-coded as
assertions to prevent program deterioration). There are however downsides
too. For one, a rule will only be suggested if it belongs to the list of pro-
vided rules. The need to contain every rule that might arise means that
the rules have to be relatively simple and generic—staying restricted to ele-
mentary relationships between small numbers of variables. Though often
useful, the downside is that this could miss out important, more complex
rules.

4.4 State Machine Inference


State machines make the sequential behavior of a system explicit. A large
number of algorithms have been developed to infer such specifications from
traces. The first such approach (which was published 40 years ago [3]) is the
k-tails approach, which forms the basis for most approaches that are popular
today.
44 Neil Walkinshaw

Without going into details, state machine inference algorithms tend to


operate by a process known as “state merging.” The set of traces is first
arranged into a“prefix tree” 5 ,where paths from the root to a branch represent
shared prefixes, and bifurcations represent points at which the traces differ.
This in itself represents a state machine, albeit one that exactly mirrors the
sequences of events embodied in the traces. The state-merging challenge is
to identify those states in the tree that could, in fact, be equivalent to each
other, and to merge them. Each merge results in a state machine that may
contain loops, and may represent a broader range of sequence of events that
better capture the possible behavior of the underlying software system.
Figure 21 illustrates the state-merging procedure with respect to a set
of traces from the BoundedStack system. It starts from the prefix-tree
(shown on the left), and then iteratively merges pairs of states until it arrives
at the final state machine (shown on the right). The details of specific algo-
rithms are beyond the scope of this chapter6 ; it is enough to know that
all algorithms operate on the same premise, of merging pairs of states with
similar subsequent behavior.
Recently, several approaches have been developed to infer more complex
state machines than simple FSMs. A recent technique by Lorenzoli et al. [36]
shows how this technique can be combined with Daikon to produce fully
fledged EFSMs (see Section 2.1.3). Here, the Daikon specifications can act
as guards to specify when a transition can be executed.

4.5 Limitations of Dynamic Analysis


The limitations of dynamic analysis are in many ways the converse of static
analysis [14]. The accuracy of the results is entirely dependent on the provi-
sion of a sufficiently “representative” set of traces. If the set of traces is a poor
sample, this will produce a behavioral model that only captures a specific
facet of program behavior.
This risk has to be weighed up against the intrinsic expense of collecting
and processing traces. Traces tend to use a large amount of storage, and
become cumbersome to process. Thus, the essential challenge is to attempt
to find a compromise, where the set of traces is sufficiently diverse without
becoming too unwieldy.

5Also referred to as a “trie” in the Information Retrieval community.


6 The algorithm used here was an implementation of the Blue-Fringe EDSM inference
algorithm [31].
Reverse-Engineering Software Behavior 45

Fig. 21. Set of traces arranged into a prefix-tree at the top, and the final “merged” state
machine on the bottom.

Depending on the type of program, this might simply be impossible. If


its range of behavior is broad with lots of complex interactions between
variables and system events, the accurate inference of a model might simply
require more traces than can be obtained and processed in a tractable amount
46 Neil Walkinshaw

of time. Alongside the space-constraints, it is also necessary to factor in the


amount of human effort that can be required to collect such a set of traces;
merely identifying the requisite set of inputs to trigger the necessary program
executions can be prohibitively time consuming.
In practice, this limitation is well understood. It is generally accepted
that the outputs from dynamic analysis techniques such as Daikon should
(at least at first) be interpreted accordingly: as a reflection of the inputs that
were used to derive them, not a reflection of the program itself. Depend-
ing on the developer’s familiarity with the program and confidence in the
representativeness of the traces, the results of dynamic analysis can at best be
used to corroborate or challenge hypotheses program behavior.Without any
strong convictions about the representativeness of the set of traces reverse-
engineered models suffer the same problems as those produced by static
analysis; they cannot be treated as exact or authoritative.

5. EVALUATING REVERSE-ENGINEERED MODELS


Having surveyed some of the key static and dynamic reverse-
engineering techniques and their limitations, one could be excused for feel-
ing somewhat despondent. Numerous techniques exist to generate models,
but no guarantees can be made about their accuracy. What is the point in
reverse-engineering a model without being able to attribute to it some mea-
sure of accuracy? This section surveys some of the approaches that can be
used to assess reverse-engineered models.
Essentially there are two approaches that can be used. The first approach
is to compare the internal syntax of the model against some reference. The
alternative is to ignore the internal syntax, and to treat the model as a black-
box, evaluating a model in terms of its behavior.
For the first situation, the choice of approach clearly depends on the type
of model. Several approaches have been developed to establish syntactical dif-
ferences between state machines and sequence diagrams [35, 53, 29, 43].The
basic challenges and techniques for doing so will be covered in Section 5.1.
The second approach of only considering models in terms of their exter-
nal behavior is more established. Comparing models to actual behavior forms
the core of Model-BasedTesting [33] and Machine Learning [38]. One pop-
ular technique from the latter domain (k-folds cross validation [28]) is presented
in Section 5.2.
Another Random Document on
Scribd Without Any Related Topics
and occasionally glancing at me to see if I too took in the situation.
Although I did not yet know a word of their language, I could
understand perfectly what she was saying, and I never passed an
evening that gave me a better idea of family happiness, or greater
satisfaction. When I went up to my little room I seemed, somehow,
to have gotten into a world of reality and content: a new world.
I awaked in a new world—the one I had reached the night before:
the land of hope and content—and when I came down-stairs I was
as fresh as a shriven soul, and I walked out into the street with Dix
at my heel, as though I owned the earth.
The morning was as perfect as though God had just created light.
The sky was as blue and the atmosphere as clear as though the rain
that had fallen had washed away with the smoke all impurity
whatsoever, and scoured the floor of Heaven afresh.
Elsa, with her chequered skirt turned back and a white apron about
her comely figure, was singing as she polished the outer steps,
before going to her work in a box factory, and the sun was shining
upon her bare head with its smooth hair, and upon the little rose-
bush by the door, turning the rain-drops that still hung on it into
jewels. She stopped and petted Dix, who had followed me down-
stairs, and Dix, who, like his master, loved to be petted by a pretty
woman, laid back his ears and rubbed his head against her. And, an
hour later, a group of little muddy boys with their books in their
hands had been beguiled by a broad puddle on their way to school
and were wading in the mud and laughing over the spatters and
splotches they were getting on their clothes and ruddy faces. As I
watched them, one who had been squeezed out of the fun and
stood on the sidewalk looking on and laughing, suddenly seized with
fear or envy shouted that if they did "not come on, Mith Thelly
would keep them in"; and, stricken with a sudden panic, the whole
flock of little sand-pipers started off and ran as hard as their dumpy
legs would carry them around the corner. I seemed to be
emancipated.
I made my breakfast on a one-cent loaf of bread, taking a little
street which, even in that section, was a back street, to eat it in, and
for butter amused myself watching a lot of little children (among the
last of whom I recognized my muddy boys, who must have found
another puddle) lagging in at the door of a small old frame building,
which I knew must be their school, though I could not understand
why it should be in such a shanty when all the public schools I had
seen were the most palatial structures.
I took the trouble to go by that day and look at the house on the
corner. It was as sunny as ever. And when on my way back to my
office I passed Miss Leigh, the central figure of a group of fresh
looking girls, I felt that the half shy smile of recognition which she
gave me was a shaft of light to draw my hopes to something better
than I had known. Dix was with me, and he promptly picked out his
friend and received from her a greeting which, curiously enough,
raised my hopes out of all reason. I began to feel that the dog was a
link between us.
XIX
RE-ENTER PECK
It happened that the building in which I had taken an office bore a
somewhat questionable reputation. I had selected it because it was
cheap, and it was too late when I discovered its character. I had no
money to move. The lawyers in it were a nondescript lot—criminal
practitioners, straw-bail givers, haunters of police courts, etc.; and
the other occupants were as bad—adventurers with wild-cat
schemes, ticket-scalpers, cranks, visionaries with fads, frauds,
gamblers, and thieves in one field or another, with doubtless a good
sprinkling of honest men among them.
It was an old building and rather out of the line of the best growth
of the city, but in a convenient and crowded section. The lower floor
was occupied with bucket-shops and ticket-scalpers' offices, on the
street; and at the back, in a sort of annex on an alley, was a saloon
known as Mick Raffity's; the owner being a solid, double-jointed son
of Erin, with blue eyes as keen as tacks; and over this saloon was
the gambling house where I had been saved by finding Pushkin.
On the second floor, the best offices were a suite occupied by a
lawyer named McSheen, a person of considerable distinction, after
its own kind, as was the shark created with other fish of the sea
after its kind: a lawyer of unusual shrewdness, a keen political boss,
and a successful business man. I had, as happened, rented a cubby-
hole looking out on a narrow well opposite the rear room of his
suite.
Collis McSheen was a large, brawny man, with a broad face, a big
nose, blue eyes, grizzled black hair, a tight mouth and a coarse fist.
He would have turned the scales at two hundred, and he walked
with a step as light as a sick-nurse's. The first time I ever saw him
was when I ran into him suddenly in a winding, unswept back
stairway that came down on an alley from the floor below mine and
was used mainly by those in a hurry, and I was conscious even in
the dim light that he gave me a look of great keenness. As he
appeared in a hurry I gave way to him, with a "Beg pardon" for my
unintentional jostle, to which he made no reply except a grunt. I,
however, took a good look at him as he passed along under a street
lamp, with his firm yet noiseless step—as noiseless as a cat's—and
the heavy neck and bulk gave me a sense of his brute strength,
which I never lost afterward. I soon came to know that he was a
successful jury-lawyer with a gift of eloquence, and a knack of
insinuation, and that he was among the most potent of the political
bosses of the city, with a power of manipulation unequalled by any
politician in the community. He had good manners and a ready
smile. He was the attorney or legal agent for a number of wealthy
concerns, among them the Argand estate, and had amassed a
fortune. He was also "the legal adviser" of one of the afternoon
papers, the Trumpet, in which, as I learned later, he held, though it
was not generally known, a large and potent interest. He was now
looming up as the chief candidate of the popular party for Mayor, an
office which he expected to secure a few months later. He was
interested in a part of the street-car system of the city, that part in
which "the Argand estate" held the controlling interest, and which
was, to some extent, the rival system of that known as the "West
Line," in which Mr. Leigh held a large interest. I mention these facts
because, detached as they appear, they have a strong bearing on my
subsequent relation to McSheen, and a certain bearing on my whole
future. But, on occasion he was as ready for his own purposes to
attack these interests secretly as those opposed to them. He always
played his own hand. To quote Kalender "he was deep."
My first real meeting with him gave me an impression of him which I
was never able to divest myself of. I was in my little dark cupboard
of an office very lonely and reading hard to keep my mind occupied
with some other subject than myself, when the door half opened
quietly, with or without a preliminary knock, I never could tell which,
and a large man insinuated himself in at it and, after one keen look,
smiled at me. I recalled afterward how catlike his entrance was. But
at the moment I was occupied in gauging him. Still smiling he
moved noiselessly around and took his stand with his back to the
one window.
"You are Mr. Glave?" he smiled. "Glad to see you?" He had not quite
gotten rid of the interrogation.
I expressed my appreciation of his good-will and with, I felt, even
more sincerity than his; for I was glad to see any one.
"Always pleased to see young lawyers—specially bright ones." Here I
smiled with pleasure that he should so admirably have "sized me
up," as the saying goes.
"You are a lawyer also?" I hazarded.
"Yes. Yes. I see you are studious. I always like that in a young man
—gives him breadth—scope."
I assented and explained that I had been in politics a little also, all of
which he appeared to think in my favor. And so it went on till he
knew nearly all about me. In fact, I became quite communicative. It
had been so long since I had had a lawyer to talk with. I found him
to be a remarkably well-informed man, and with agreeable, rather
insinuating manners. He knew something of books too, and he
made, I could not tell whether consciously or unconsciously, a
number of literary allusions. One of them I recall. It was a Spanish
proverb, he said: "The judge is a big man, but give your presents to
the clerk."
"Well, you'll do well here if you start right. The tortoise beats the
hare, you know—every time—every time."
I started, so apt was the allusion. I wondered if he could ever have
known Peck.
"Yes, I know that. That's what I mean to do," I said.
"Get in with the right sort of folks, then when there's any sweeping
done you'll be on the side of the handle." He was moving around
toward the door and was looking out of the window reflecting.
"I have a letter to a gentleman named Leigh," I said. "I have not yet
presented it."
"Ah!"
I turned and glanced at him casually and was struck with the
singular change that had come over his face. It was as if he had
suddenly drawn a fine mask over it. His eyes were calmly fixed on
me, yet I could hardly have said that they saw me. His countenance
was absolutely expressionless. I have seen the same detached look
in a big cat's eyes as he gazed through his bars and through the
crowd before him to the far jungle, ocean spaces away. It gave me a
sudden shiver and I may have shown that I was startled, but, as I
looked, the mask disappeared before my eyes and he was smiling as
before.
"Got a pretty daughter?" he said with a manner which offended me,
I could hardly tell why.
"I believe so; but I do not know her." I was angry with myself for
blushing, and it was plain that he saw it and did not believe me.
"You know a man 't calls himself Count Pushkin?"
"Yes, I know him."
"He knows her and she knows him."
"Does she? I know nothing about that."
"Kind o' makin' a set for him, they say?"
"Is she? I hardly think it likely, if she knows him," I said coldly. I
wondered with what malignant intuition he had read my thoughts.
"Oh! A good many people do that. They like the sound. It gives 'em
power."
"Power!"
"Yes. Power's a pretty good thing to have. You can—" He looked out
of the window and licked his lips in a sort of reverie. He suddenly
opened and closed his hand with a gesture of crushing. "Power and
money go togither?" And still smiling, with a farewell nod, he
noiselessly withdrew and closed the door.
When he was gone I was conscious of a feeling of intense relief, and
also of intense antagonism—a feeling I had never had for but one
man before—Peck: a feeling which I never got rid of.
One evening a little later I missed Dix. He usually came home even
when he strayed off, which was not often, unless as happened he
went with Elsa, for whom he had conceived a great fondness, and
who loved and petted him in return. It had come to be a great bond
between the girl and me, and I think the whole family liked me the
better for the dog's love of the daughter. But this evening he did not
appear; I knew he was not with Elsa, for I remembered he had been
in my office during the afternoon, and in consequence I spent an
unhappy night. All sorts of visions floated before my mind, from the
prize-ring to the vivisection table. I rather inclined to the former; for
I knew his powerful chest and loins and his scarred shoulders would
commend him to the fancy. I thought I remembered that he had
gone out of my office just before I left and had gone down the steps
which led to the alley I have mentioned. This he sometimes did. I
recalled that I was thinking of Miss Eleanor Leigh and had not seen
or thought of him between the office and my home.
I was so disturbed about him by bedtime that I went out to hunt for
him and returned to my office by the same street I had walked
through in the afternoon. When I reached the building in which my
office was, I turned into the alley I have mentioned and went up the
back stairway. It was now after midnight and it was as black as
pitch. When I reached my office, thinking that I might by a bare
possibility have locked him in, I opened the door and walked in,
closing it softly behind me. The window looked out on the well left
for light and air, and was open, and as I opened the door a light was
reflected through the window on my wall. I stepped up to close the
window and, accidentally looking across the narrow well to see
where the light came from, discovered that it was in the back office
of Coll McSheen, in which were seated Mr. McSheen and the sour-
looking man I had seen on the train with the silk hat and the paste
diamond studs, and of all persons in the world, Peck! The name
Leigh caught my ear and I involuntarily stopped without being aware
that I was listening. As I looked the door opened and a man I
recognized as the janitor of the building entered and with him a
negro waiter, bearing two bottles of champagne and three glasses.
For a moment I felt as though I had been dreaming. For the negro
was Jeams. I saw the recognition between him and Peck, and
Jeams's white teeth shone as Peck talked about him. I heard him
say:
"No, suh, I don' know nuthin' 't all about him. Ise got to look out for
myself. Yes, suh, got a good place an' I'm gwine to keep it!"
He had opened the bottles and poured out the wine, and McSheen
gave him a note big enough to make him bow very low and thank
him volubly. When he had withdrawn Peck said:
"You've got to look out for that rascal. He's an awfully smart
scoundrel."
"Oh! I'll own him, body and soul," said McSheen.
"I wouldn't have him around me."
"Don't worry—he won't fool me. If he does—" He opened and closed
his fist with the gesture I had seen him use the first day he paid me
a visit.
"Well, let's to business," he said when they had drained their
glasses. He looked at the other men. "What do you say, Wringman?"
"You pay me the money and I'll bring the strike all right," said the
Labor-leader, "and I'll deliver the vote, too. In ten days there won't
be a wheel turning on his road. I'll order every man out that wears a
West Line cap or handles a West Line tool."
The "West Line"! This was what the street-car line was called which
ran out into the poor section of the city where I lived, which Mr.
Leigh controlled.
"That's all right. I'll keep my part. D——n him! I want to break him.
I'll show him who runs this town. With his d——d airs."
"That's it," said Peck, leaning forward. "It's your road or his. That's
the way I figure it." He rubbed his hands with satisfaction. "I am
with you, my friends. You can count on the Poole interest backing
you."
"You'll keep the police off?" said the Labor-leader.
"Will I? Watch 'em!" McSheen poured out another glass, and offered
the bottle to Peck, who declined it.
"Then it's all right. Well, you'd better make a cash payment down at
the start," said the Labor-leader.
McSheen swore. "Do you think I have a bank in my office, or am a
faro dealer, that I can put up a pile like that at midnight? Besides,
I've always heard there're two bad paymasters—the one that don't
pay at all and the one 't pays in advance. You deliver the goods."
"Oh! Come off," said the other. "If you ain't a faro dealer, you own a
bank—and you've a bar-keeper. Mick's got it down-stairs, if you ain't.
So put up, or you'll want money sure enough. I know what that
strike's worth to you."
McSheen rose and at that moment I became aware of the
impropriety of what I was doing, for I had been absolutely absorbed
watching Peck, and I moved back, as I did so, knocking over a chair.
At the sound the light was instantly extinguished and I left my office
and hurried down the stairs, wondering when the blow was to fall.
The afternoon following my surprise of the conference in McSheen's
back room, there was a knock at my door and Peck walked into my
office. I was surprised to see what a man-of-fashion air he had
donned. He appeared really glad to see me and was so cordial that I
almost forgot my first feeling of shame that he should find me in
such manifestly straitened circumstances, especially as he began to
talk vaguely of a large case he had come out to look after, and I
thought he was on the verge of asking me to represent his client.
"You know we own considerable interests out here both in the
surface lines and in the P. D. & B. D.," he said airily.
"No, I did not know you did. I remember that Mr. Poole once talked
to me about some outstanding interests in the P. D. & B. D., and I
made some little investigation at the time; I came to the conclusion
that his interest had lapsed; but he never employed me."
"Yes, that's a part of the interests I speak of. Mr. Poole is a very
careful man."
"Very. Well, you see I have learned my lesson. I have learned
economy, at least," I laughed in reply to his question of how I was
getting along in my new home. He took as he asked it an appraising
glance at the poor little office.
"A very important lesson to learn," he said sententiously. "I am glad
I learned it early." He was so smug that I could not help saying,
"You were always economical?"
"Yes, I hope so. I always mean to be. You get much work?"
"No, not much—yet; still, you know, I always had a knack of getting
business," I said. "My trouble was that I used to disdain small things
and I let others attend to them. I know better than that now. I don't
think I have any right to complain."
"Oh—I suppose you have to put in night work, too, then?" he added,
after a pause.
This then was the meaning of his call. He wished to know whether I
had seen him in Coll McSheen's office the night before. He had
delivered himself into my hands. So, I answered lightly.
"Oh! yes, sometimes."
I had led him up to the point and I knew now he was afraid to take
a step further. He sheered off.
"Well, tell me something," he said, "if you don't mind. Do you know
Mr. Leigh?"
"What Mr. Leigh?"
"Mr. Walter Leigh, the banker."
"I don't mind telling you at all that I do not."
"Oh!"
I thought he was going to offer me a case; but Peck was
economical. He already had one lawyer.
"I had a letter of introduction to him from Mr. Poole," I said. "But
you can say to Mr. Poole that I never presented it."
"Oh! Ah! Well—I'll tell him."
"Do."
"Do you know Mr. McSheen?"
I nodded "Yes."
"Do you know him well?"
"Does any one know him well?" I parried.
"He has an office in this building?"
I could not, for the life of me, tell whether this was an affirmation or
a question. So I merely nodded, which answered in either case. But
I was pining to say to him, "Peck, why don't you come out with it
and ask me plainly what I know of your conference the other night?"
However, I did not. I had learned to play a close game.
"Oh! I saw your nigger, Jeams—ah—the other day."
"Did you? Where is he?" I wanted to find him, and asked innocently
enough.
"Back at home."
"How is he getting on?"
"Pretty well, I believe. He's a big rascal."
"Yes, but a pleasant one, and an open one."
Peck suddenly rose, "Well, I must be going. I have an engagement
which I must keep." At the door he paused. "By the way, Mrs. Peck
begged to be remembered to you."
He had a way of blinking, like a terrapin—slowly. He did so now.
He did not mean his tone to be insolent—only to be insolent himself
—but it was.
"I'm very much obliged to her. Remember me to her."
That afternoon I strolled out, hoping to get a glimpse of Miss Leigh.
I did so, but Peck was riding in a carriage with her and her father. So
he won the last trick, after all. But the rubber was not over. I was
glad that they did not see me, and I returned to my office filled with
rage and determined to unmask Peck the first chance I should have,
not because he was a trickster and a liar, but because he was
applying his trickiness in the direction of Miss Leigh.
That night the weather changed and it turned off cold. I remember it
from a small circumstance. The wind appeared to me to have shifted
when Miss Leigh's carriage drove out of sight with Peck in it. I went
home and had bad dreams. What was Peck doing with the Leighs?
Could I have been mistaken in thinking he and McSheen had been
talking of Mr. Leigh in their conference? For some time there had
been trouble on the street-car lines of the city and a number of small
strikes had taken place on a system of lines running across the city
and to some extent in competition with the West Line, which Mr.
Leigh had an interest in. According to the press the West Line, which
ran out into a new section, was growing steadily while the other line
was falling back. Could it be that McSheen was endeavoring to
secure possession of the West Line? This, too, had been intimated,
and Canter, one of the richest men of the town, was said to be
behind him. What should I do under the circumstances? Would Peck
tell Miss Leigh any lies about me? All these suggestions pestered me
and, with the loss of Dix, kept me awake, so that next morning I
was in rather a bad humor.
In my walk through the poorer quarter on my way to my office I
used to see a great deal of the children, and it struck me that one of
the saddest effects of poverty—the dire poverty of the slum—was
the debasement of the children. Cruelty appears to be the natural
instinct of the young as they begin to gain in strength. But among
the well-to-do and the well-brought-up of all classes it is kept in
abeyance and is trained out. But in the class I speak of at a certain
age it appears to flower out into absolute brutality. It was the chief
drawback to my sojourn in this quarter, for I am very fond of
children, and the effect of poverty on the children was the saddest
part of my surroundings. To avoid the ruder element, I used to walk
of a morning through the little back street where I had discovered
that morning the little school for very small children, and I made the
acquaintance of a number of the children who attended the school.
One little girl in particular interested me. She was the poorest clad of
any, but her cheeks were like apples and her chubby wrists were the
worst chapped of all; and with her sometimes was a little crippled
girl, who walked with a crutch, whom she generally led by the hand
in the most motherly way, so small that it was a wonder how she
could walk, much more study.
My little girls and I got to that point of intimacy where they would
talk to me, and Dix had made friends with them and used to walk
beside them as we went along.
The older girl's first name was Janet, but she spoke with a lisp and I
could not make out her name with a certainty. Her father had been
out of work, she said, but now was a driver, and her teacher was
"Mith Thellen." The little cripple's name was "Sissy"—Sissy Talman.
This was all the information I could get out of her. "Mith Thellen"
was evidently her goddess.
On the cool, crisp morning after the turn in the weather, I started
out rather earlier than usual, intending to hunt for Dix and also to
look up Jeams. I bought a copy of the Trumpet and was astonished
to read an account of trouble among the employees of the West
Line, for I had not seen the least sign of it. The piece went on
further to intimate that Mr. Leigh had been much embarrassed by his
extension of his line out into a thinly populated district and that a
strike, which was quite sure to come, might prove very disastrous to
him. I somehow felt very angry at the reference to Mr. Leigh and
was furious with myself for having written for the Trumpet. I walked
around through the street where the school was, though without any
definite idea whatever, as it was too early for the children. As I
passed by the school the door was wide open and I stopped and
looked in. The fire was not yet made. The stove was open; the door
of the cellar, opening outside, was also open, and at the moment a
young woman—the teacher or some one else—was backing up the
steps out of the cellar lugging a heavy coal-scuttle. One hand, and a
very small one, was supporting her against the side of the wall,
helping her push herself up. I stepped forward with a vague pity for
any woman having to lift such a weight.
"Won't you let me help you?" I asked.
"Thank you, I believe I can manage it." And she pulled the scuttle to
the top, where she planted it, and turned with quite an air of
triumph. It was she! my young lady of the sunny house: Miss Leigh!
I had not recognized her at all. Her face was all aglow and her eyes
were filled with light at a difficulty overcome. I do not know what my
face showed; but unless it expressed conflicting emotions, it belied
my feelings. I was equally astonished, delighted and embarrassed. I
hastened to say something which might put her at her ease and at
the same time prove a plea for myself, and open the way to further
conversation.
"I was on my way to my law-office, and seeing a lady struggling with
so heavy a burden, I had hoped I might have the privilege of
assisting her as I should want any other gentleman to do to my
sister in a similar case." I meant if I had had a sister.
She thanked me calmly; in fact, very calmly.
"I do it every morning; but this morning, as it is the first cold
weather, I piled it a little too high; that is all." She looked toward the
door and made a movement.
I wanted to say I would gladly come and lift it for her every
morning; that I could carry all her burdens for her. But I was almost
afraid even to ask permission again to carry it that morning. As,
however, she had given me a peg, I seized it.
"Well, at least, let me carry it this morning," I said, and without
waiting for an answer or even venturing to look at her, I caught up
the bucket and swung it into the house, when seeing the sticks all
laid in the stove, and wishing to do her further service, without
asking her anything more, I poured half the scuttleful into the stove.
"I used to be able to make a fire, when I lived in my old home," I
said tentatively; then as I saw a smile coming into her face, I added:
"But I'm afraid to try an exhibition of my skill after such boasting,"
and without waiting further, I backed out, bringing with me only a
confused apparition of an angel lifting a coal-scuttle.
I do not remember how I reached my office that day, whether I
walked the stone pavements through the prosaic streets or trod on
rosy clouds. There were no prosaic streets for me that day. I
wondered if the article I had seen in the paper had any foundation.
Could Mr. Leigh have lost his fortune? Was this the reason she
taught school? I had observed how simply she was dressed, and I
thrilled to think that I might be able to rescue her from this
drudgery.
The beggars who crossed my path that morning were fortunate. I
gave them all my change, even relieving the necessities of several
thirsty imposters who beset my way, declaring with unblushing,
sodden faces that they had not had a mouthful for days.
I walked past the little school-house that night and lingered at the
closed gate, finding a charm in the spot. The little plain house had
suddenly become a shrine. It seemed as if she might be hovering
near.
The next morning I passed through the same street, and peeped in
at the open door. There she was, bending over the open stove in
which she had already lighted her fire, little knowing of the flame
she had kindled in my heart. How I cursed myself for being too late
to meet her. And yet, perhaps, I should have been afraid to speak to
her; for as she turned toward the door, I started on with pumping
heart in quite a fright lest she should detect me looking in.
I walked by her old home Sunday afternoon. Flowers bloomed at the
windows. As I was turning away, Count Pushkin came out of the
door and down the steps. As he turned away from the step his
habitual simper changed into a scowl; and a furious joy came into
my heart. Something had gone wrong with him within there. I
wished I had been near enough to have crossed his path to smile in
his face; but I was too distant, and he passed on with clenched fist
and black brow.
After this my regular walk was through the street of the baby-school,
and when I was so fortunate as to meet Miss Leigh she bowed and
smiled to me, though only as a passing acquaintance, whilst I on my
part began to plan how I should secure an introduction to her. Her
smile was sunshine enough for a day, but I wanted the right to bask
in it and I meant to devise a plan. After what I had told Peck, I could
not present my letter; I must find some other means. It came in an
unexpected way, and through the last person I should have
imagined as my sponsor.
XX
MY FIRST CLIENT
But to revert to the morning when I made Miss Leigh's fire for her. I
hunted for Dix all day, but without success, and was so busy about it
that I did not have time to begin my search for Jeams. That evening,
as it was raining hard, I treated myself to the unwonted luxury of a
ride home on a street-car. The streets were greasy with a thick,
black paste of mud, and the smoke was down on our heads in a
dark slop. Like Petrarch, my thoughts were on Laura, and I was
repining at the rain mainly because it prevented the possibility of a
glimpse of Miss Leigh on the street: a chance I was ever on the
watch for.
I boarded an open car just after it started and just before it ran
through a short subway. The next moment a man who had run after
the car sprang on the step beside me, and, losing his footing, he
would probably have fallen and might have been crushed between
the car and the edge of the tunnel, which we at that moment were
entering, had I not had the good fortune, being on the outer seat, to
catch him and hold him up. Even as it was, his coat was torn and my
elbow was badly bruised against the pillar at the entrance. I,
however, pulled him over across my knees and held him until we had
gone through the subway, when I made room for him on the seat
beside me.
"That was a close call, my friend," I said. "Don't try that sort of thing
too often."
"It was, indeed—the closest I ever had, and I have had some pretty
close ones before. If you had not caught me, I would have been in
the morgue to-morrow morning."
This I rather repudiated, but as the sequel showed, the idea
appeared to have become fixed in his mind. We had some little talk
together and I discovered that, like myself, he had come out West to
better his fortune, and as he was dressed very plainly, I assumed
that, like myself, he had fallen on rather hard times, and I expressed
sympathy. "Where have I seen you before?" I asked him.
"On the train once coming from the East."
"Oh! yes." I remembered now. He was the man who knew things.
"You know Mr. McSheen?" he asked irrelevantly.
"Yes—slightly. I have an office in the same building."
I wondered how he knew that I knew him.
"Yes. Well, you want to look out for him. Don't let him fool you. He's
deep. What's that running down your sleeve? Why, it's blood! Where
did it come from?" He looked much concerned.
"From my arm, I reckon. I hurt it a little back there, but it is
nothing."
He refused to be satisfied with my explanation and insisted strongly
on my getting off and going with him to see a doctor. I laughed at
the idea.
"Why, I haven't any money to pay a doctor," I said.
"It won't cost you a cent. He is a friend of mine and as good a
surgeon as any in the city. He's straight—knows his business. You
come along."
So, finding that my sleeve was quite soaked with blood, I yielded
and went with him to the office of his friend, a young doctor named
Traumer, who lived in a part of the town bordering on the working
people's section, which, fortunately, was not far from where we got
off the car. Also, fortunately, we found him at home. He was a slim
young fellow with a quiet, self-assured manner and a clean-cut face,
lighted by a pair of frank, blue eyes.
"Doc," said my conductor, "here's a friend of mine who wants a little
patching up."
"That's the way with most friends of yours, Bill," said the doctor, who
had given me a single keen look. "What's the matter with him? Shot?
Or have the pickets been after him?"
"No, he's got his arm smashed saving a man's life."
"What! Well, let's have a look at it. He doesn't look very bad." He
helped me off with my coat and, as he glanced at the sleeve, gave a
little exclamation.
"Hello!"
"Whose life did he save?" he asked, as he was binding up the arm.
"That's partly a mash."
"Mine."
"Oh! I see." He went to work and soon had me bandaged up. "Well,
he's all right now. What were you doing?" he asked as he put on the
last touches.
"Jumping on a car."
"Ah!" The doctor was manifestly amused. "You observe that our
friend is laconic?" he said to me.
"What's that?" asked the other. "Don't prejudice him against me. He
don't know anything against me yet—and that's more than some
folks can say."
"Who was on that car that you were following?" asked the doctor,
with a side glance at my friend. The latter did not change his
expression a particle.
"Doc, did you ever hear what the parrot said to herself after she had
sicked the dog on, and the dog not seeing anything but her, jumped
on her?"
"No—what?"
"'Polly, you talk too d——d much.'"
The doctor chuckled and changed the subject. "What's your labor-
friend, Wringman, doing now? What did he come back here for?"
"Same old thing—dodging work."
"He seems to me to work other people pretty well."
The other nodded acquiescingly.
"He's on a new line now. McSheen's got him. Yes, he has," as the
doctor looked incredulous.
"What's he after? Who's he working for?"
"Same person—Coll McSheen. Pretty busy, too. Mr. Glave there
knows him already."
"Glave!—Glave!" repeated the doctor. "Where did I hear your name?
Oh, yes! Do you know a preacher named John Marvel!"
"John Marvel! Why, yes. I went to college with him. I knew him
well."
"You knew a good man then."
"He is that," said the other promptly. "If there were more like him I'd
be out of a job."
"You know Miss Leigh, too?"
"What Miss Leigh?" My heart warmed at the name and I forgot all
about Marvel. How did he know that I knew her?
"'The Angel of the Lost Children.'"
"'The Angel—'? Miss Eleanor Leigh?" Then as he nodded—"Slightly."
My heart was now quite warm. "Who called her so?"
"She said she knew you. I look after some of her friends for her."
"Who called her the 'Angel of the Lost Children'?"
"A friend of mine—Leo Wolffert, who works in the slums—a writer.
She's always finding and helping some one who is lost, body or
soul."
"Leo Wolffert! Do you know him?"
"I guess we all know him, don't we, Doc?" put in the other man.
"And so do some of the big ones."
"Rather."
"And the lady, too—she's a good one, too," he added.
I was so much interested in this part of the conversation that I
forgot at the moment to ask the doctor where he had known John
Marvel and Wolffert.
I, however, asked him what I owed him, and he replied,
"Not a cent. Any of Langton's friends here or John Marvel's friends,
or (after a pause) Miss Leigh's friends may command me. I am only
too glad to be able to serve them. It's the only way I can help."
"That's what I told him," said my friend, whose name I heard for the
first time. "I told him you weren't one of these Jew doctors that
appraise a man as soon as he puts his nose in the door and skin him
clean."
"I am a Jew, but I hope I am not one of that kind."
"No; but there are plenty of 'em."
I came away feeling that I had made two friends well worth making.
They were real men.
When I parted from my friend he took out of his pocket-book a card.
"For my friends," he said, as he handed it to me. When I got to the
light I read:
"Wm. Langton, Private Detective."
It was not until long afterward that I knew that the man he was
following when he sprang on the car and I saved him was myself,
and that I owed the attention to my kinsman and to Mr. Leigh, to
whom Peck had given a rather sad account of me. My kinsman had
asked him to ascertain how I lived.
I called on my new friend, Langton, earlier than he had expected. In
my distress about Dix I consulted him the very next day and he
undertook to get him back. I told him I had not a cent to pay him
with at present, but some day I should have it and then——
"You'll never owe me a cent as long as you live," he said. "Besides,
I'd like to find that dog. I remember him. He's a good one. You say
you used the back stairway at times, opening on the alley near Mick
Raffity's?"
"Yes."
He looked away out of the window with a placid expression.
"I wouldn't go down that way too often at night," he said presently.
"Why?"
"Oh! I don't know. You might stumble and break your neck. One or
two men have done it."
"Oh! I'll be careful," I laughed. "I'm pretty sure-footed."
"You need to be—there. You say your dog's a good fighter?"
"He's a paladin. Can whip any dog I ever saw. I never fought him,
but I had a negro boy who used to take him off till I stopped him."
"Well, I'll find him—that is, I'll find where he went."
I thanked him and strolled over across town to try to get a glimpse
of the "Angel of the Lost Children." I saw her in a carriage with
another young girl, and as I gazed at her she suddenly turned her
eyes and looked straight at me, quite as if she had expected to see
me, and the smile she gave me, though only that which a pleasant
thought wings, lighted my heart for a week.
A day or two later my detective friend dropped into my office.
"Well, I have found him." His face showed that placid expression
which, with him, meant deep satisfaction. "The police have him—are
holding him in a case, but you can identify and get him. He was in
the hands of a negro dog-stealer and they got him in a raid. They
pulled one of the toughest joints in town when there was a fight
going on and pinched a full load. The nigger was among them. He
put up a pretty stiff fight and they had to hammer him good before
they quieted him. He'll go down for ninety days sure. He was a
fighter, they said—butted men right and left."
"I'm glad they hammered him—you're sure it's Dix?"
"Sure; he claimed the dog; said he'd raised him. But it didn't go. I
knew he'd stolen him because he said he knew you."
"Knew me—a negro? What did he say his name was?"
"They told me—let me see—Professor Jeams—something."
"Not Woodson?"
"Yes, that's it."
"Well, for once in his life he told the truth. He sold me the dog. You
say he's in jail? I must go and get him out."
"You'll find it hard work. Fighting the police is a serious crime in this
city. A man had better steal, rob, or kill anybody else than fight an
officer."
"Who has most pull down there?"
"Well, Coll McSheen has considerable. He runs the police. He may be
next Mayor."
I determined, of course, to go at once and see what I could do to
get Jeams out of his trouble. I found him in the common ward
among the toughest criminals in the jail—a massive and forbidding
looking structure—to get into which appeared for a time almost as
difficult as to get out. But on expressing my wish to be accorded an
interview with him, I was referred from one official to another, until,
with my back to the wall, I came to a heavy, bloated, ill-looking
creature who went by the name of Sergeant Byle. I preferred my
request to him. I might as well have undertaken to argue with the
stone images which were rudely carved as Caryatides beside the
entrance. He simply puffed his big black cigar in silence, shook his
head, and looked away from me; and my urging had no other effect
than to bring a snicker of amusement from a couple of dog-faced
shysters who had entered and, with a nod to him, had sunk into
greasy chairs.
"Who do you know here?"
A name suddenly occurred to me, and I used it.
"Among others, I know Mr. McSheen," and as I saw his countenance
fall, I added, "and he is enough for the present." I looked him
sternly in the eye.
He got up out of his seat and actually walked across the room,
opened a cupboard and took out a key, then rang a bell.
"Why didn't you say you were a friend of his?" he asked surlily. "A
friend of Mr. McSheen can see any one he wants here."
I have discovered that civility will answer with nine-tenths or even
nineteen-twentieths of the world, but there is a class of intractable
brutes who yield only to force and who are influenced only by fear,
and of them was this sodden ruffian. He led the way now
subserviently enough, growling from time to time some explanation,
which I took to be his method of apologizing. When, after going
through a number of corridors, which were fairly clean and well
ventilated, we came at length to the ward where my unfortunate
client was confined, the atmosphere was wholly different: hot and
fetid and intolerable. The air struck me like a blast from some
infernal region, and behind the grating which shut off the miscreants
within from even the modified freedom of the outer court was a
mass of humanity of all ages, foul enough in appearance to have
come from hell.
At the call of the turnkey, there was some interest manifested in
their evil faces and some of them shouted back, repeating the name
of Jim Woodson; some half derisively, others with more kindliness.
At length, out of the mob emerged poor Jeams, but, like Lucifer, Oh,
how changed! His head was bandaged with an old cloth, soiled and
stained; his mien was dejected, and his face was swollen and
bruised. At sight of me, however, he suddenly gave a cry, and
springing forward tried to thrust his hands through the bars of the
grating to grasp mine. "Lord, God!" he exclaimed. "If it ain't de
Captain. Glory be to God! Marse Hen, I knowed you'd come, if you
jes' heard 'bout me. Git me out of dis, fur de Lord's sake. Dis is de
wuss place I ever has been in in my life. Dey done beat me up and
put handcuffs on me, and chain me, and fling me in de patrol-
wagon, and lock me up and sweat me, and put me through the third
degree, till I thought if de Lord didn't take mercy 'pon me, I would
be gone for sho. Can't you git me out o' dis right away?"
I explained the impossibility of doing this immediately, but assured
him that he would soon be gotten out and that I would look after his
case and see that he got justice.
"Yes, sir, that is what I want—jestice—I don't ax nothin' but jestice."
"How did you get here?" I demanded. And even in his misery, I
could not help being amused to see his countenance fall.
"Dey fetched me here in de patrol-wagon," he said evasively.
"I know that. I mean, for what?"
"Well, dey say, Captain, dat I wus desorderly an' drunk, but you
know I don' drink nothin'."
"I know you do, you fool," I said, with some exasperation. "I have
no doubt you were what they say, but what I mean is, where is Dix
and how did you get hold of him?"
"Well, you see, Marse Hen, it's dthis way," said Jeams falteringly. "I
come here huntin' fur you and I couldn' fin' you anywheres, so then
I got a place, and while I wus lookin' 'roun' fur you one day, I come
'pon Dix, an' as he wus lost, jes' like you wus, an' he didn't know
where you wus, an' you didn't know where he wus, I tuk him along
to tek care of him till I could fin' you."
"And incidentally to fight him?" I said.
Again Jeams's countenance fell. "No, sir, that I didn't," he declared
stoutly. "Does you think I'd fight dthat dog after what you tol' me?"
"Yes, I do. I know you did, so stop lying about it and tell me where
he is, or I will leave you in here to rot till they send you down to the
rockpile or the penitentiary."
"Yes, sir; yes, sir, I will. Fur God's sake, don' do dat, Marse Hen. Jes'
git me out o' here an' I will tell you everything; but I'll swear I didn't
fight him; he jes' got into a fight so, and then jist as he hed licked
de stuffin out of dat Barkeep Gallagin's dog, them d——d policemen
come in an' hammered me over the head because I didn't want
them to rake in de skads and tek Dix 'way from me."
I could not help laughing at his contradictions.
"Well, where is he now?"
"I'll swear, Marse Hen, I don' know. You ax the police. I jes' know he
ain't in here, but dey knows where he is. I prays night and day no
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like