Compiler Design - Variants of Syntax Tree
Last Updated :
18 Feb, 2022
A syntax tree is a tree in which each leaf node represents an operand, while each inside node represents an operator. The Parse Tree is abbreviated as the syntax tree. The syntax tree is usually used when representing a program in a tree structure.
Rules of Constructing a Syntax Tree
A syntax tree's nodes can all be performed as data with numerous fields. One element of the node for an operator identifies the operator, while the remaining field contains a pointer to the operand nodes. The operator is also known as the node's label. The nodes of the syntax tree for expressions with binary operators are created using the following functions. Each function returns a reference to the node that was most recently created.
1. mknode (op, left, right): It creates an operator node with the name op and two fields, containing left and right pointers.
2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which is a reference to the identifier's symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the number's value, val. Make a syntax tree for the expression a 4 + c, for example. p1, p2,..., p5 are pointers to the symbol table entries for identifiers 'a' and 'c', respectively, in this sequence.
Example 1: Syntax Tree for the string a - b ∗ c + d is:
Syntax tree for example 1
Example 2: Syntax Tree for the string a * (b + c) – d /2 is:
Syntax tree for example 2Variants of syntax tree:
A syntax tree basically has two variants which are described below:
- Directed Acyclic Graphs for Expressions (DAG)
- The Value-Number Method for Constructing DAGs
Directed Acyclic Graphs for Expressions (DAG)
A DAG, like an expression's syntax tree, includes leaves that correspond to atomic operands and inside codes that correspond to operators. If N denotes a common subexpression, a node N in a DAG has many parents; in a syntax tree, the tree for the common subexpression would be duplicated as many times as the subexpression appears in the original expression. As a result, a DAG not only encodes expressions more concisely but also provides essential information to the compiler about how to generate efficient code to evaluate the expressions.
The Directed Acyclic Graph (DAG) is a tool that shows the structure of fundamental blocks, allows you to examine the flow of values between them, and also allows you to optimize them. DAG allows for simple transformations of fundamental pieces.
Properties of DAG are:
- Leaf nodes represent identifiers, names, or constants.
- Interior nodes represent operators.
- Interior nodes also represent the results of expressions or the identifiers/name where the values are to be stored or assigned.
Examples:
T0 = a+b --- Expression 1
T1 = T0 +c --- Expression 2
Expression 1: T0 = a+b
Syntax tree for expression 1
Expression 2: T1 = T0 +c
Syntax tree for expression 2The Value-Number Method for Constructing DAGs:
An array of records is used to hold the nodes of a syntax tree or DAG. Each row of the array corresponds to a single record, and hence a single node. The first field in each record is an operation code, which indicates the node's label. In the given figure below, Interior nodes contain two more fields denoting the left and right children, while leaves have one additional field that stores the lexical value (either a symbol-table pointer or a constant in this instance).
Nodes of a DAG for i = i + 10 allocated in an array
The integer index of the record for that node inside the array is used to refer to nodes in this array. This integer has been referred to as the node's value number or the expression represented by the node in the past. The value of the node labeled -I- is 3, while the values of its left and right children are 1 and 2, respectively. Instead of integer indexes, we may use pointers to records or references to objects in practice, but the reference to a node would still be referred to as its "value number." Value numbers can assist us in constructing expressions if they are stored in the right data format.
- Algorithm: The value-number method for constructing the nodes of a Directed Acyclic Graph.
- INPUT: Label op, node /, and node r.
- OUTPUT: The value number of a node in the array with signature (op, l,r).
- METHOD: Search the array for node M with label op, left child I, and right child r. If there is such a node, return the value number of M. If not, create in the array a new node N with label op, left child I, and right child r, and return its value number.
While Algorithm produces the intended result, examining the full array every time one node is requested is time-consuming, especially if the array contains expressions from an entire program. A hash table, in which the nodes are divided into "buckets," each of which generally contains only a few nodes, is a more efficient method. The hash table is one of numerous data structures that may effectively support dictionaries. 1 A dictionary is a data type that allows us to add and remove elements from a set, as well as to detect if a particular element is present in the set. A good dictionary data structure, such as a hash table, executes each of these operations in a constant or near-constant amount of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that computes the bucket index for a signature (op, I, r) in such a manner that the signatures are distributed across buckets and no one bucket gets more than a fair portion of the nodes. The bucket index h(op, I, r) is deterministically computed from the op, I, and r, allowing us to repeat the calculation and always arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The bucket headers are stored in an array indexed by the hash value, each of which corresponds to the first cell of a list. Each column in a bucket's linked list contains the value number of one of the nodes that hash to that bucket. That is, node (op,l,r) may be located on the array's list whose header is at index h(op,l,r).
Data structure for searching buckets
We calculate the bucket index h(op,l,r) and search the list of cells in this bucket for the specified input node, given the input nodes op, I, and r. There are usually enough buckets that no list has more than a few cells. However, we may need to examine all of the cells in a bucket, and for each value number v discovered in a cell, we must verify that the input node's signature (op,l,r) matches the node with value number v in the list of cells (as in fig above). If a match is found, we return v. We build a new cell, add it to the list of cells for bucket index h(op, l,r), and return the value number in that new cell if we find no match.
Similar Reads
Introduction of Compiler Design A compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler Design Basics
Introduction of Compiler DesignA compiler is software that translates or converts a program written in a high-level language (Source Language) into a low-level language (Machine Language or Assembly Language). Compiler design is the process of developing a compiler.The development of compilers is closely tied to the evolution of
9 min read
Compiler construction toolsThe compiler writer can use some specialized tools that help in implementing various phases of a compiler. These tools assist in the creation of an entire compiler or its parts. Some commonly used compiler construction tools include: Parser Generator - It produces syntax analyzers (parsers) from the
4 min read
Phases of a CompilerA compiler is a software tool that converts high-level programming code into machine code that a computer can understand and execute. It acts as a bridge between human-readable code and machine-level instructions, enabling efficient program execution. The process of compilation is divided into six p
10 min read
Symbol Table in CompilerEvery compiler uses a symbol table to track all variables, functions, and identifiers in a program. It stores information such as the name, type, scope, and memory location of each identifier. Built during the early stages of compilation, the symbol table supports error checking, scope management, a
8 min read
Error Handling in Compiler DesignDuring the process of language translation, the compiler can encounter errors. While the compiler might not always know the exact cause of the error, it can detect and analyze the visible problems. The main purpose of error handling is to assist the programmer by pointing out issues in their code. E
5 min read
Language Processors: Assembler, Compiler and InterpreterComputer programs are generally written in high-level languages (like C++, Python, and Java). A language processor, or language translator, is a computer program that convert source code from one programming language to another language or to machine code (also known as object code). They also find
5 min read
Generation of Programming LanguagesProgramming languages have evolved significantly over time, moving from fundamental machine-specific code to complex languages that are simpler to write and understand. Each new generation of programming languages has improved, allowing developers to create more efficient, human-readable, and adapta
6 min read
Lexical Analysis
Introduction of Lexical AnalysisLexical analysis, also known as scanning is the first phase of a compiler which involves reading the source program character by character from left to right and organizing them into tokens. Tokens are meaningful sequences of characters. There are usually only a small number of tokens for a programm
6 min read
Flex (Fast Lexical Analyzer Generator)Flex (Fast Lexical Analyzer Generator), or simply Flex, is a tool for generating lexical analyzers scanners or lexers. Written by Vern Paxson in C, circa 1987, Flex is designed to produce lexical analyzers that is faster than the original Lex program. Today it is often used along with Berkeley Yacc
7 min read
Introduction of Finite AutomataFinite automata are abstract machines used to recognize patterns in input sequences, forming the basis for understanding regular languages in computer science. They consist of states, transitions, and input symbols, processing each symbol step-by-step. If the machine ends in an accepting state after
4 min read
Classification of Context Free GrammarsA Context-Free Grammar (CFG) is a formal rule system used to describe the syntax of programming languages in compiler design. It provides a set of production rules that specify how symbols (terminals and non-terminals) can be combined to form valid sentences in the language. CFGs are important in th
4 min read
Ambiguous GrammarContext-Free Grammars (CFGs) is a way to describe the structure of a language, such as the rules for building sentences in a language or programming code. These rules help define how different symbols can be combined to create valid strings (sequences of symbols).CFGs can be divided into two types b
7 min read
Syntax Analysis & Parsers
Syntax Directed Translation & Intermediate Code Generation
Syntax Directed Translation in Compiler DesignSyntax-Directed Translation (SDT) is a method used in compiler design to convert source code into another form while analyzing its structure. It integrates syntax analysis (parsing) with semantic rules to produce intermediate code, machine code, or optimized instructions.In SDT, each grammar rule is
8 min read
S - Attributed and L - Attributed SDTs in Syntax Directed TranslationIn Syntax-Directed Translation (SDT), the rules are those that are used to describe how the semantic information flows from one node to the other during the parsing phase. SDTs are derived from context-free grammars where referring semantic actions are connected to grammar productions. Such action c
4 min read
Parse Tree and Syntax TreeParse Tree and Syntax tree are tree structures that represent the structure of a given input according to a formal grammar. They play an important role in understanding and verifying whether an input string aligns with the language defined by a grammar. These terms are often used interchangeably but
4 min read
Intermediate Code Generation in Compiler DesignIn the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an independent intermediate code, then the back end of the compiler uses this intermediate code to generate the target code (which can be understood by the machine). The benefits of using mach
6 min read
Issues in the design of a code generatorA code generator is a crucial part of a compiler that converts the intermediate representation of source code into machine-readable instructions. Its main task is to produce the correct and efficient code that can be executed by a computer. The design of the code generator should ensure that it is e
7 min read
Three address code in CompilerTAC is an intermediate representation of three-address code utilized by compilers to ease the process of code generation. Complex expressions are, therefore, decomposed into simple steps comprising, at most, three addresses: two operands and one result using this code. The results from TAC are alway
6 min read
Data flow analysis in CompilerData flow is analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done. In general, its process in which values are computed using data flow analysis. The data flow property represents information that can b
6 min read
Code Optimization & Runtime Environments
Practice Questions