0% found this document useful (0 votes)
13 views

Using-SableCC

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Using-SableCC

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Using a Compiler

Construction Tool
Georges Edouard KOUAMOU
National Advanced School of Engineering-Yaoundé
Note
• This part follows the theory of compilation technique
• Objectives
• Specify grammars in a given framework (SableCC)
• Learn how to write the semantics actions
• Generated code in an Object Oriented Style

Using a Compiler Construction Tool 2


Compiler Construction Tools
• Definition (What is a Compiler Construction Tool)
• programs or environments that assist the developer in the creation of an entire compiler or its
parts
• Compiler Construction Tools (CCT) generate
• lexical analyzers (Scanner)
• syntax analyzers (parser)
• semantic analyzers
• intermediate code
• optimized target code
• Examples
• Lex/YACC : the elder in the category of CCTs, originally for Unix OS
• FLEX/BISON: Lex/YACC compatible, the GNU version made for Linux OS
• AntLR: more general LL(k) CCT. Could generated code in multiple languages
• SableCC: object oriented CCT based on LALR(1) parsing technique
• And many others: JLex, Java Cup, Java CC, …

Using a Compiler Construction Tool 3


Advantages of SableCC
• SableCC is designed to make good use of the advantages of Java
• it is object-oriented and makes extensive use of class inheritance.
• With SableCC compilation errors are easier to fix.
• SableCC generates modular software
• Each class in a separate file.
• SableCC generates syntax trees from which atoms or code can be
generated.
• SableCC can accommodate a wider class of languages than other like
JavaCC, Jlex, Antrl
• The latest permit only LL(k) grammars.

Using a Compiler Construction Tool 4


Structure of the sableCC input file
• The input to SableCC consists of a text file, named with a .scc extension (formerly
.grammar suffix)
• Components of a .scc file
• Six sections can be distinguished for lexical analysis and parsing:
1. Package declaration
2. Helper declarations
3. States declarations
4. Token declarations
5. Ignored tokens
6. Productions
• For lexical analysis purpose, use only the first four of these sections
• Comments may be used in any of these sections:
• single-line comments, beginning with //
• or multi-line comments, enclosed in /* .. */

Using a Compiler Construction Tool 5


Tokens declarations
• The tokens are typically the ”words” which are to be recognized in the
input language, such as
• numbers, identifiers, operators, keywords, ....
• A Token declaration takes the form: Token-name = Token-definition ;
• Example: left_paren = ‘(’
• Token definition may be any of the following:
• A character in single quotes, such as ’w’, ’9’, or ’$’.
• A number, written in decimal or hexadecimal, matches the character with that
ascii (actually unicode) code.
• Example: the number 13 matches a newline character (the character ’\n’ works as well).

Using a Compiler Construction Tool 6


Token definition: using regular expressions
• A set of characters, specified in one of the following ways:
• A single quoted character qualifies as a set consisting of one character.
• A range of characters, with the first and last placed in brackets:
• – [’a’..’z’] // all lower case letters
• – [’0’..’9’] // all numeric characters
• – [9..99] // all characters whose codes are in the range 9 through 99,inclusive
• A union of two sets, specified in brackets with a plus as in [set1 + set2].
• Example: [[’a’..’z’] + [’A’..’Z’]] // matches any letter
• A difference of two sets, specified in brackets with a minus as in [set1 - set2]
This matches any character in set1 which is not also in set2.
• Example: [[0..127] - [’\t’ + ’\n’]] // matches any ascii character except tab and newline.
• A string of characters in single quotes, such as ’while’.

Using a Compiler Construction Tool 7


Operations on regular expressions
• If p and q are token definitions
• (p) parenthesis may be used to determine the order of operations.
• pq the concatenation of two token definitions is a valid token definition.
• p|q the union of two token definitions
• note the plus symbol (+) has a different meaning.
• p* the closure (kleene *) is a valid token definition, matching 0 or more
repetitions of p.
• p+ similar to closure, matches 1 or more repetitions of the definition p.
• p? matches an optional p, i.e. 0 or 1 repetition of the definition p.

Using a Compiler Construction Tool 8


Examples
• number = [’0’..’9’]+ ;
• A number is 1 or more decimal digits.
• identifier = [[’a’..’z’]|[’A’..’Z’]]([’a’..’z’] | [’A..’Z’] | [’0’..’9’] | ’_’)*
• An identifier must begin with an alphabetic character
• rel_op = [’<’ , ’>’] ’=’? | ’==’ | ’!=’ ;
• Six relational operators
• Notes
• When two token definitions match the input, the one matching the longer
input string is selected.
• When two token definitions match input strings of the same length, the token
definition listed first is selected.

Using a Compiler Construction Tool 9


Helpers definition
• Helper permit to simplify the definitions of tokens
• Any helper which is defined in the Helpers section may be used as part of a token definition
in the Tokens section
• Examples : we define three helpers below to facilitate the definitions of number,
identifier, and space
• Helpers
• digit = [’0’..’9’] ;
• letter = [[’a’..’z’] + [’A’..’Z’]] ;
• sign = ’+’ | ’-’ ;
• Newline = 10 | 13 ; //ascii codes
• tab = 9 ; // ascii code for tab
• Tokens
• number = sign? digit+ ; // A number is an optional sign, followed by 1 or more digits.
• identifier = letter (letter | digit | ’_’)* ; // An identifier is a letter followed by 0 or more letters, digits, or
underscores
• space = ’ ’ | newline | tab ;

Using a Compiler Construction Tool 10


Ignored tokens
• The Ignored Tokens section of the SableCC grammar file is optional.
• It provides the capability of declaring tokens that are ignored (not put
out by the lexer).
• Typically things like comments and white space will be ignored.
• The declaration takes the form of a list of the ignored tokens,
separated by commas, and ending with a semicolon
• Ignored Tokens
• space, comment ;

Using a Compiler Construction Tool 11


Steps to create a compiler

Using a Compiler Construction Tool 12


Generated files
• SableCC generates files into four sub-packages lexer, parser, node and
analysis.
• Each file contains either a class or an interface definition.
• The lexer package contains the Lexer and LexerException classes.
• These classes are, the generated lexer and the exception thrown in case of a lexing
error, respectively.
• The parser package contains the Parser and ParserException classes.
• As expected, these classes are the parser and the exception thrown in case of a
parsing errors.
• The node package contains all the classes defining the typed AST.
• The analysis package contains one interface and three classes. These
classes are used mainly to define AST walkers
Using a Compiler Construction Tool 13
Generated files
• DepthFirstAdapter: which has methods capable of visiting every node in the
syntax tree
• The actions are implemented by extending this class and override methods corresponding to
rules (or tokens) in the grammar
• There is an ’in’ method for each alternative, which is invoked when a node is
about to be visited. In our example, this would include the method public void
inAMultTerm (AMultTerm node)
• The methods which begin with ‘out’ will be invoked when this node in the syntax
tree, and all its descendants, have been visited in a depth-first traversal
• There is a ’case’ method for each alternative. This is the method that visits all the
descendants of a node, and it is not normally necessary to override this method.
An example would be public void caseAMultTerm(AMultTerm node)
• There is also a ’case’ method for each token; the token name is prefixed with a ’T’
an example is public void caseTNumber (TNumber token)

Using a Compiler Construction Tool 14


Example
A calculator

Using a Compiler Construction Tool 15


Purpose
• Given an infix expressions involving addition, subtraction,
multiplication, and division
• Translate infix expression into postfix expressions, in which the
operations are placed after both operands
• Write the actions to evaluate the expression
• Return the result after evaluation
• Objective:
• This example shows that different actions can be applied on a single Syntax
Tree

Using a Compiler Construction Tool 16


SableCC source file
• Package example;
• Productions
• /* define token */ • expr
• Tokens • = {factor} factor
• number = ['0' .. '9']+; • |{plus} expr plus factor
• plus = '+'; • |{minus} expr minus factor;
• minus = '-'; • factor
• mult = '*'; • = {term} term
• div = '/'; • | {mult} factor mult term
• mod = '%'; • | {div} factor div term
• l_par = '('; • | {mod} factor mod term;
• r_par = ')';
• blank = (' ' | 13 | 10)+; • term
• = {number} number
• /* Token to be ignored */ • | {expr} l_par expr r_par;
• Ignored Tokens
• blank;

Using a Compiler Construction Tool 17


PostFixTranslation.java
• import example.analysis.*; • public void outAMultFactor(AMultFactor node) {
• import example.node.*; • // out of alternative {mult} in Factor, we print the mult.
• public class PostFixTranslation extends DepthFirstAdapter { • System.out.print(node.getMult());
• public void caseTNumber(TNumber node) { • }
• // When we see a number, we print it. • public void outADivFactor(ADivFactor node) {
• System.out.print(node); • // out of alternative {div} in Factor, we print the div.
• } • System.out.print(node.getDiv());
• public void outAPlusExpr(APlusExpr node) { • }
• // out of alternative {plus} in Expr, we print the plus. • public void outAModFactor(AModFactor node) {
• System.out.print(node.getPlus()); • // out of alternative {mod} in Factor, we print the mod.
• } • System.out.print(node.getMod());
• public void outAMinusExpr(AMinusExpr node) { • }
• // out of alternative {minus} in Expr, we print the minus. • }
• System.out.print(node.getMinus());
• }

Using a Compiler Construction Tool 18


Main.java
• import ensp.example.lexer.*; Scanner(System.in).nextLine(); • // Apply the translation on the syntax tree.
• import example.node.*; • // Create a Parser instance. • System.out.print("PostFix Expression: ");
• import java.io.*; • Parser p = new Parser(new Lexer(new • tree.apply(new PostFixTranslation());
PushbackReader(
• import java.util.Scanner; • System.out.print("\nPreFix Expression:
• new StringReader(str), 1024))); ");
• /*Parser p = new Parser(new Lexer(new • // tree.apply(new PreFixTranslation());
• public class Calculator { PushbackReader(
• tree.apply(new Evaluation());
• new StringReader("(45 + 36/2) * 3 + 5 *
2"), 1024)));*/ • System.exit(0);
• /**
• /* Parser p • } catch (Exception e) {
• * @param args the command line
arguments • = new Parser( • System.out.println("Error occurs: " +
e.getMessage());
• */ • new Lexer(
• //e.printStackTrace();
• public static void main(String[] args) { • new PushbackReader(
• }
• // TODO code application logic here • new InputStreamReader(System.in),
1024)));*/ • }
• try {
• //crtl+D en mode dos pour EOF • }
• System.out.print("Type an arithmetic
expression: "); • // Parse the input and build the syntax tree.
• String str = new • Start tree = p.parse();

Using a Compiler Construction Tool 19

You might also like