0% found this document useful (0 votes)

31 views50 pages

Week-4 (Lecture-2)

The document discusses approaches to reverse engineering, including data gathering techniques like lexical and syntactic analysis of source code as well as control flow and data flow graphing. It provides examples and explanations of these techniques.

Uploaded by

i200521 Muhammad Arbaz Ishfaq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views50 pages

Week-4 (Lecture-2)

Uploaded by

i200521 Muhammad Arbaz Ishfaq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Reverse Engineering

Week-4
Reverse Engineering
• Reverse Engineering supports understanding of a system
through identification of the components or artifacts of the
system, discovering relationships between them and
generating abstractions of that information.

• The goal of reverse engineering is not to alter the system in

any way.
Reverse Engineering Activities
The three main Reverse Engineering activities:

1. Data Gathering
2. Knowledge Organization
3. Information Exploration
Reverse Engineering Activities
1. Data Gathering

Raw data is used to identify a system’s artifacts and relationships

Data Gathering
Approaches to Automating Reverse Engineering

• A variety of approaches for automated assistance are available for the

reverse engineer in program comprehension
• Some of the prominent approaches include:

1. Textual, lexical and syntactic analysis

• These approaches focus on the source code itself and its representations.
• These include the use of lexical metrics (counting assignments, identifiers,
etc.), and even automated parsing of the code.
• The unit of examination is the program source itself.
Data Gathering
• Textual, lexical and syntactic analysis

• Lexical analysis is the process of decomposing the sequence of characters

in the source code into its constituent lexical units.

• A program performing lexical analysis is called a lexical analyzer, and it is

a part of a programming language’s compiler.

• Typically, it uses rules describing lexical program structures that are

expressed in a mathematical notation called regular expressions.
Data Gathering
Textual, lexical and syntactic analysis
Data Gathering
Lexical and syntactic analysis:
• Tokenization: Lexical analysis involves breaking down the source code
or binary file into a stream of tokens. Tokens are the smallest units of
the code, such as keywords, identifiers, constants, and operators.

• Whitespace and Comments Handling: Lexical analysis also deals with

handling whitespace and comments. Removing or ignoring these
elements simplifies the code, making it easier to work with.
• Reverse engineering:
• Lexical analysis helps in understanding the basic structure of the code,
identifying keywords, and recognizing variables and functions.

• It aids in identifying potential vulnerabilities or suspicious code patterns

by extracting relevant information.
Data Gathering
Lexical and syntactic analysis:
• Parsing: Syntactic analysis, also known as parsing, checks whether the
sequence of tokens adheres to the grammar rules of the programming
language. It builds a hierarchical structure, often represented as a parse tree
or abstract syntax tree (AST).

• Reverse engineering:
• Syntactic analysis is essential for reconstructing a high-level representation of
the code. This representation makes it easier to understand and modify the
code.

• By analyzing the AST or parse tree, reverse engineers can identify control
flow structures, data structures, and relationships between different parts of
the code.

• It helps in identifying functions and their parameters, which is crucial for

understanding the code's functionality.
Data Gathering
Lexical and syntactic analysis:
• Lexeme
– A sequence of characters in the source program with the lowest level
of syntactic meanings
E.g., sum, +, -
• Token
– A category of lexemes
– A lexeme is an instance of token
– The basic building blocks of programs
Data Gathering

Lexical and syntactic analysis:

result = oldsum – value / 100;

Tokens and lexemes of this

statement?
Data Gathering
Lexical and syntactic analysis:
Assignment Task
• Parse the code to understand its structure and create an AST of the
code.
Data Gathering
Approaches to Automating Reverse Engineering

2. Graphing methods
• There are a variety of graphing approaches for program understanding.
• These include, in increasing order of complexity and richness:
• graphing the control flow of the program,
• the data flow of the program ,
• and program dependence graphs.

• The unit of examination is a graphical representation of the program source.

Data Gathering
Graphing Method
• Static Source Code Analysis

• Static analysis of code a program is the analysis of the code without regard to
its execution or input.

• What analysis is useful for understanding:

• Control flow analysis; what pieces of the code would be executed and in what sequence

• Data flow analysis; how does information flow within a program and across programs
Control Flow – Introduction
• Control Flow
• Used to identify the possible paths through the program
• The flow is represented as a directed graph with splits and joins
• Identify loops

• Control Flow represented as a graph of Basic Blocks

• Sequence of operations with 1-entry and 1-exit (usually a sequence of
statements)
• Unique start point where program begins
• Edge between basic blocks shows the flow
Control Flow Analysis
• The two kinds of control flow analysis are:

1. Intraprocedural: It shows the order in which statements are executed within a

subprogram.
2. Interprocedural: It shows the calling relationship among program units.

Intraprocedural analysis:
• The idea of basic blocks is central to constructing a CFG.
• A basic block is a maximal sequence of program statements such that execution
enters at the top of the block and leaves only at the bottom via a conditional or an
unconditional branch statement.
• A basic block is represented with one node in the CFG, and an arc indicates possible
flow of control from one node to another.
Control Flow Analysis
Interprocedural analysis:
• Interprocedural analysis is performed by constructing a call graph.

• Calling relationships between subroutines in a program are represented as a call

graph which is basically a directed graph.

• Specifically, a procedure in the source code is represented by a node in the graph,

and the edge from node f to g indicates that procedure f calls procedure g.

• Call graphs can be static or dynamic. A dynamic call graph is an execution trace of
the program.

• Thus, a dynamic call graph is exact, but it only describes one run of the program.

• On the other hand, a static call graph represents every possible run of the program.
Control Flow Analysis
• An approach that avoids the burden of annotations, and can capture
what a procedure actually does as used in a particular program, is
building a control flow graph for the entire program, rather than just
one procedure.

• To make this work, we handle call and return instructions specially as

follows:

• We add additional edges to the control flow graph. For every call to function g,
we add an edge from the call site to the first instruction of g, and from every
return statement of g to the instruction following that call.
Control Flow Graph
Control Flow Graph
Control Flow Graph
Control Flow Graph
Flow Graphs of various blocks
Flow Graphs of various blocks
Control Flow – Example
Control Flow – Code View
• Another example of visualizing the control flow of a program is using a Control
Structure Diagram (CSD).

• CSD is a algorithmic level graphical representation for software source code.

• It automatically documents the program flow within the source code and adds
indentation with graphical symbols

• The following notations are used:

• Sequential flow – straight line

• If/The/Else/Switch statements – diamonds
• For/While – elongated loop
• Loop exit – arrow
• Function – open-ended box
CSD Example
CSD Program Components
CSD Control Constructs
• The basic control constructs are grouped into the following
categories:

• Sequence
• Selection
• Iteration
• Exception Handling
CSD Control Constructs
CSD Control Constructs
CSD Control Constructs
CSD Control Constructs
CSD Control Constructs
Data Flow Graph: Data Analysis
• All control edges together form a graph called the Control Flow Graph
(CFG).
• All data edges together form a graph called the Data Flow Graph (DFG).
• A DFD shows what kind of information will be

• Input to and output from the system,

• Where the data will come from and go to,
• Where the data will be stored.

• A data flow graph is information oriented.

• It passes data between other components.

Example
Example
int max(int a, int b) { The data flow graph (DFG) must be
if (a > b) derived after the control flow graph.
r = a;
The data flow graph has the same set of
else
nodes as the control flow graph.
r = b;
return r; The data flow graph requires
identification of the data dependencies
between every node.
Example
• Start by annotating, with each node, what variables are read and what
variables are written.

// a b r
• int max(int a, int b) { // 1 W W
• if (a > b) // 2 R R
• r = a; // 3 R W
• else
• r = b; // 4 R W
• return r; // 5 R
Example
• Next, we draw each data dependency.
• A data dependency goes from a node that writes into a variable to
another node that reads from the variable.
• To have a valid dependency, we must identify the correct ‘write’ node
for each ‘read’ node. That is done as follows.

• Start with a node that reads from a variable. For example, node 3 in the
example reads variable a. That read operation is the endpoint of a data
dependency.

• Next, walk backward in the control flow graph until you find a node that
writes the same variable. That is the starting point of a data dependency. For
example, going backward from node 3, we visit node 2, and then node 1. Only
node 1 writes a. Therefore, the data dependency for a goes from 1 to 3.
Example
• Data Flow graph
Class Activity
Definition-Use Pairs
• A def-use (du) pair associates a point in a program where a value is
produced with a point where it is used

• Definition: where a variable gets a value

– Variable declaration
– Variable initialization
– Assignment
– Values received by a parameter
• Use: extraction of a value from a variable
– Expressions
– Conditional statements
– Parameter passing
– Returns
Definition-Use Pairs
Definition-Use Pairs
Definition-Clear & Killing

• A definition-clear path is a path along the CFG from a definition to a use

of the same variable without another definition of the variable between.

• If, instead, another definition is present on the path, then the latter
definition kills the former

• A def-use pair is formed if and only if there is a definition-clear path

between the definition and the use
Definition-Clear & Killing
(Direct) Data Dependence Graph
Control Dependence

Code Optimization
No ratings yet
Code Optimization
32 pages
Reverse Engineering
No ratings yet
Reverse Engineering
39 pages
Program Analysis ThuTrangNguyen Day 2
No ratings yet
Program Analysis ThuTrangNguyen Day 2
108 pages
Path Testing
No ratings yet
Path Testing
39 pages
Codeoptimization-Module 4B
No ratings yet
Codeoptimization-Module 4B
66 pages
Unit4 Contd CD
No ratings yet
Unit4 Contd CD
49 pages
Unit 4.2
No ratings yet
Unit 4.2
44 pages
4-Reverse Engineering
No ratings yet
4-Reverse Engineering
39 pages
Lecture 5
No ratings yet
Lecture 5
33 pages
Chapter - Four Structural (White Box) Testing Part II
No ratings yet
Chapter - Four Structural (White Box) Testing Part II
34 pages
Chapter8 Subroutines PP
No ratings yet
Chapter8 Subroutines PP
27 pages
Unit 2 (2) 2
No ratings yet
Unit 2 (2) 2
41 pages
Compiler Design
No ratings yet
Compiler Design
25 pages
Finite Models of Abstraction
No ratings yet
Finite Models of Abstraction
59 pages
Flow Graphs and Path Testing
No ratings yet
Flow Graphs and Path Testing
29 pages
Cdunit 6
No ratings yet
Cdunit 6
20 pages
Amit Se
No ratings yet
Amit Se
19 pages
SENG 421: Software Metrics: Measuring Internal Product Attributes: Structural Complexity (Chapter 6)
No ratings yet
SENG 421: Software Metrics: Measuring Internal Product Attributes: Structural Complexity (Chapter 6)
76 pages
High Level Synthesis - 02 - Basic Concepts
No ratings yet
High Level Synthesis - 02 - Basic Concepts
27 pages
Lecture 11 Part 2
No ratings yet
Lecture 11 Part 2
19 pages
Ia3 1
No ratings yet
Ia3 1
11 pages
CD 2 Marks
No ratings yet
CD 2 Marks
15 pages
Compiler Design Unit 5 Part 2
No ratings yet
Compiler Design Unit 5 Part 2
34 pages
1.1 Static Reverse Engineering
No ratings yet
1.1 Static Reverse Engineering
10 pages
CD Unit-V
No ratings yet
CD Unit-V
14 pages
SQE 2nd Assignment-2
No ratings yet
SQE 2nd Assignment-2
8 pages
Intermediate Representation: Goals
No ratings yet
Intermediate Representation: Goals
40 pages
Chapter 4 Part II
No ratings yet
Chapter 4 Part II
32 pages
Unit 4 CD
No ratings yet
Unit 4 CD
21 pages
Software Testing and Quality Assurance
0% (1)
Software Testing and Quality Assurance
30 pages
Unit 2
No ratings yet
Unit 2
34 pages
SWQTesting Unit 2
No ratings yet
SWQTesting Unit 2
40 pages
280425
No ratings yet
280425
11 pages
Machine Independent Optimizations
No ratings yet
Machine Independent Optimizations
10 pages
ST - Unit - 2
No ratings yet
ST - Unit - 2
26 pages
Coding-M 4
No ratings yet
Coding-M 4
35 pages
Unit 4 Program Modeling
No ratings yet
Unit 4 Program Modeling
35 pages
STM Path Testing
No ratings yet
STM Path Testing
24 pages
Unti 3
No ratings yet
Unti 3
14 pages
UNIT 5 Notes CD
No ratings yet
UNIT 5 Notes CD
6 pages
CD Unit-5
No ratings yet
CD Unit-5
30 pages
Chapter 1b 8-12-2022
No ratings yet
Chapter 1b 8-12-2022
11 pages
04 Avp 2015
No ratings yet
04 Avp 2015
29 pages
Chapter - 5: Data Flow Testing
No ratings yet
Chapter - 5: Data Flow Testing
40 pages
Software Testing: Static Analysis
No ratings yet
Software Testing: Static Analysis
42 pages
Class Data Flow Analysis
No ratings yet
Class Data Flow Analysis
44 pages
Static Testing: Defect Prevention
No ratings yet
Static Testing: Defect Prevention
22 pages
Assignment No. 0 4: Subject: Theory of Programming Languages
No ratings yet
Assignment No. 0 4: Subject: Theory of Programming Languages
9 pages
Static Testing
No ratings yet
Static Testing
22 pages
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
21 pages
CS 470 - Week 11 - Session 1
No ratings yet
CS 470 - Week 11 - Session 1
11 pages
A Study of Data Flow Graph Representation Analysis With Syntax and Semantics
No ratings yet
A Study of Data Flow Graph Representation Analysis With Syntax and Semantics
4 pages
A Brief Odyssey of Dataflow Analysis in Optimizing Compilers
No ratings yet
A Brief Odyssey of Dataflow Analysis in Optimizing Compilers
20 pages
Basics of Data Flow Testing
100% (1)
Basics of Data Flow Testing
13 pages
Data Flow Diagram
No ratings yet
Data Flow Diagram
55 pages
Chapter 9: Subprogram Control: o o o o o o
No ratings yet
Chapter 9: Subprogram Control: o o o o o o
8 pages
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
23 pages
11.intermediate Code Generation Quadruple, Triple, Indirect Triple
No ratings yet
11.intermediate Code Generation Quadruple, Triple, Indirect Triple
27 pages
4 Data-Testing PDF
No ratings yet
4 Data-Testing PDF
79 pages

Week-4 (Lecture-2)

Uploaded by

Week-4 (Lecture-2)

Uploaded by

Reverse Engineering

• The goal of reverse engineering is not to alter the system in

Raw data is used to identify a system’s artifacts and relationships

• A variety of approaches for automated assistance are available for the

1. Textual, lexical and syntactic analysis

• Lexical analysis is the process of decomposing the sequence of characters

• A program performing lexical analysis is called a lexical analyzer, and it is

• Typically, it uses rules describing lexical program structures that are

• Whitespace and Comments Handling: Lexical analysis also deals with

• It aids in identifying potential vulnerabilities or suspicious code patterns

• It helps in identifying functions and their parameters, which is crucial for

Lexical and syntactic analysis:

result = oldsum – value / 100;

Tokens and lexemes of this

• The unit of examination is a graphical representation of the program source.

• What analysis is useful for understanding:

• Control Flow represented as a graph of Basic Blocks

1. Intraprocedural: It shows the order in which statements are executed within a

• Calling relationships between subroutines in a program are represented as a call

• Specifically, a procedure in the source code is represented by a node in the graph,

• To make this work, we handle call and return instructions specially as

• CSD is a algorithmic level graphical representation for software source code.

• The following notations are used:

• Sequential flow – straight line

• Input to and output from the system,

• A data flow graph is information oriented.

• It passes data between other components.

• Definition: where a variable gets a value

• A definition-clear path is a path along the CFG from a definition to a use

• A def-use pair is formed if and only if there is a definition-clear path

You might also like