0% found this document useful (0 votes)
57 views

Lecture 3 - LexicalAnalysis

Syntax analysis consists of two parts: a lexical analyzer and a parser. Syntax analyzer is a function that is called by the parser when it needs the next token. Parsers based on syntax analyzers are easy to maintain.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Lecture 3 - LexicalAnalysis

Syntax analysis consists of two parts: a lexical analyzer and a parser. Syntax analyzer is a function that is called by the parser when it needs the next token. Parsers based on syntax analyzers are easy to maintain.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 4

Lexical and Syntax Analysis

ISBN

0-321-49362-1

Chapter 4 Topics
Introduction Lexical Analysis

Copyright 2007 Addison-Wesley. All rights reserved.

4-2

Introduction
Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal description of the syntax of the source language (BNF)

Copyright 2007 Addison-Wesley. All rights reserved.

4-3

Syntax Analysis
The syntax analysis portion of a language processor nearly always consists of two parts:
A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF)

Copyright 2007 Addison-Wesley. All rights reserved.

4-4

Advantages of Using BNF to Describe Syntax


Provides a clear and concise syntax description The parser can be based directly on the BNF Parsers based on BNF are easy to maintain

Copyright 2007 Addison-Wesley. All rights reserved.

4-5

Reasons to Separate Lexical and Syntax Analysis


Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser Efficiency - separation allows optimization of the lexical analyzer Portability - parts of the lexical analyzer may not be portable, but the parser always is portable

Copyright 2007 Addison-Wesley. All rights reserved.

4-6

Lexical Analysis
A lexical analyzer is a pattern matcher for character strings A lexical analyzer is a front-end for the parser Identifies substrings of the source program that belong together - lexemes
Lexemes match a character pattern, which is associated with a lexical category called a token sum is a lexeme; its token may be IDENT

Copyright 2007 Addison-Wesley. All rights reserved.

4-7

Lexical Analysis (continued)


The lexical analyzer is usually a function that is called by the parser when it needs the next token Three approaches to building a lexical analyzer:
Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description Design a state diagram that describes the tokens and write a program that implements the state diagram Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram

Copyright 2007 Addison-Wesley. All rights reserved.

4-8

Lexical Analyzer Generation


Specification of Tokens (PL0.jflex)
JFlex

Lexical Analyzer Class (PL0Lexer.java)

Copyright 2008 Barrett All rights reserved. Copyright 2007 Addison-Wesley.R. Bryant. All rights

reserved

JFlex Example
vulcan6% cat PL0.jflex %% %{ private void echo () { System . out . print (yytext ()); } public int position () { return yycolumn; } %} %class PL0Lexer %function nextToken %type Token %unicode %line %column %eofval{ { return new Token (Token . EOF); } %eofval}
Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


Identifier = [:letter:] [:letter: | :digit:]* Integer = [:digit:] [:digit:]* %%

Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


[ \t\n] ";" "." "," "<" "<=" ">" ">=" "=" "<>" "(" ")" "+" "" "*" "/" ":=" { echo (); } { echo (); return new Token (Token.SEMICOLON); } { echo (); return new Token (Token.PERIOD); } { echo (); return new Token (Token.COMMA); } { echo (); return new Token (Token.LT); } { echo (); return new Token (Token.LE); } { echo (); return new Token (Token.GT); } { echo (); return new Token (Token.GE); } { echo (); return new Token (Token.EQ); } { echo (); return new Token (Token.NE); } { echo (); return new Token (Token.LPAREN); } { echo (); return new Token (Token.RPAREN); } { echo (); return new Token (Token.PLUS); } { echo (); return new Token (Token.MINUS); } { echo (); return new Token (Token.TIMES); } { echo (); return new Token (Token.SLASH); } { echo (); return new Token (Token.ASSIGN); }

Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


begin call const do end if odd procedure then var while {Integer} {Identifier} { { { { { { { { { { { { echo (); return new Token (Token.BEGIN); } echo (); return new Token (Token.CALL); } echo (); return new Token (Token.CONST); } echo (); return new Token (Token.DO); } echo (); return new Token (Token.END); } echo (); return new Token (Token.IF); } echo (); return new Token (Token.ODD); } echo (); return new Token (Token.PROC); } echo (); return new Token (Token.THEN); } echo (); return new Token (Token.VAR); } echo (); return new Token (Token.WHILE); } echo (); return new Token (Token.INTEGER, yytext ()); } { echo (); return new Token (Token.ID, yytext ()); } { echo (); ErrorMessage.print (yycolumn, "Illegal character"); }

Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


vulcan6% jflex PL0.jflex Reading "PL0.jflex" Constructing NFA : 144 states in NFA Converting NFA to DFA : .................................................................... 70 states before minimization, 66 states in minimized DFA Writing code to "PL0Lexer.java" vulcan6% javac PL0Lexer.java

Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


vulcan6% cat Token.java // Token class definition // Token is a class to represent lexical tokens in the PL/0 programming // language, described in Algorithms + Data Structures = Programs by // Niklaus Wirth, Prentice-Hall, 1976. public class Token { // token classes public static final int EOF = -1; public static final int BEGIN = 0; public static final int CALL = 1; public static final int CONST = 2; public static final int DO = 3; public static final int END = 4; public static final int IF = 5; public static final int ODD = 6; public static final int PROC = 7; public static final int THEN = 8; public static final int VAR = 9; public static final int WHILE = 10; public static final int ID = 11; public static final int INTEGER = 12;
Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


public public public public public public public public public public public public public public public public static static static static static static static static static static static static static static static static final final final final final final final final final final final final final final final final int int int int int int int int int int int int int int int int ASSIGN PLUS MINUS TIMES SLASH EQ LT GT NE LE GE LPAREN RPAREN COMMA PERIOD SEMICOLON = 13; = '+'; = ''; = '*'; = '/'; = '='; = '<'; = '>'; = GT + 1; = NE + 1; = LE + 1; = '('; = ')'; = ','; = '.'; = ';';

Copyright 2008 Barrett R. All rights reserved. Copyright 2007 Addison-Wesley. Bryant. All rights reserved

JFlex Example (continued)


private int symbol; private String lexeme; public Token () { } public Token (int symbol) { this (symbol, null); } // current token // lexeme

public Token (int symbol, String lexeme) { this . symbol = symbol; this . lexeme = lexeme; }
public int symbol () { return symbol; } public String lexeme () { return lexeme; }

Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


public String toString () { switch (symbol) { case BEGIN : return "(keyword, begin) "; case CALL : return "(keyword, call) "; case CONST : return "(keyword, const) "; case DO : return "(keyword, do) "; case END : return "(keyword, end) "; case IF : return "(keyword, if) "; case ODD : return "(keyword, odd) "; case PROC : return "(keyword, proc) "; case THEN : return "(keyword, then) "; case VAR : return "(keyword, var) "; case WHILE : return "(keyword, while) "; case ASSIGN : return "(operator, :=) "; case PLUS : return "(operator, +) "; case MINUS : return "(operator, ) "; case TIMES : return "(operator, *) "; case SLASH : return "(operator, /) ";
Copyright 2008 Barrett R. All rights reserved. Copyright 2007 Addison-Wesley. Bryant. All rights reserved

JFlex Example (continued)


case EQ : return "(operator, =) "; case LT : return "(operator, <) "; case GT : return "(operator, >) "; case NE : return "(operator, <>) "; case LE : return "(operator, <=) "; case GE : return "(operator, >=) "; case LPAREN : return "(operator, () "; case RPAREN : return "(operator, )) "; case COMMA : return "(punctuation, ,) "; case PERIOD : return "(punctuation, .) "; case SEMICOLON : return "(punctuation, ;) "; case ID : return "(identifier, " + lexeme + ") "; case INTEGER : return "(integer, " + lexeme + ") "; default : ErrorMessage . print (0, "Unrecognized token"); return null;

Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


vulcan6% cat ErrorMessage.java // ErrorMessage class // This class prints error messages. class ErrorMessage { public static void print (int position, String message) { int i; System . out . println (); for (i = 0; i < position; i++) System . out . print (" "); System . out . println ("^"); System . out . println ("***** Error: " + message + " *****"); System . exit (0); } }
Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


vulcan6% cat PL0Lex.java // PL0Lex class // This class is a PL/0 lexical analyzer which reads a PL/0 source // program and outputs the list of tokens comprising that program. class PL0Lex { private static final int MAX_TOKENS = 100; public static void main (String [] args) throws java.io.IOException { int i, n; Token [] token = new Token [MAX_TOKENS]; PL0Lexer lexer = new PL0Lexer (System . in); System . out . println ("Source Program"); System . out . println ("--------------"); System . out . println ();
Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


n = 1; do { if (n < MAX_TOKENS) token [++n] = lexer . nextToken (); else ErrorMessage . print (0, "Maximum number of tokens exceeded"); } while (token [n] . symbol () != Token . EOF); System . out . println System . out . println System . out . println System . out . println for (i = 0; i < n; i++) token [i] . print (); System . out . println (); ("List of Tokens"); ("--------------"); ();

();

Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


vulcan6% javac PL0Lex.java vulcan6% java PL0Lex < quotrem.pl0 Source Program -------------var q, r, x, y; begin x := 32; y := 5; q := 0; r := x; while r >= y do begin q := q + 1; r := r y end end.
Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


List of Tokens -------------(keyword, var) (identifier, q) (punctuation, ,) (identifier, r) (punctuation, ,) (identifier, x) (punctuation, ,) (identifier, y) (punctuation, ;) (keyword, begin) (identifier,x) (operator, :=) (number, 32) (punctuation, ;)
Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


(identifier, y) (operator, :=) (number, 5) (punctuation, ;) (identifier, q) (operator, :=) (number, 0) (punctuation, ;) (identifier, r) (operator, :=) (identifier, x) (punctuation, ;)
Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

JFlex Example (continued)


(keyword, while) (identifier, r) (operator, >=) (identifier, y) (keyword, do) (keyword, begin) (identifier, q) (operator, :=) (identifier, q) (operator, +) (number, 1) (punctuation, ;) (identifier, r) (operator, :=) (identifier, r) (operator, -) (identifier, y) (keyword, end)
Copyright 2008 Barrett R. Bryant. All rights reserved Copyright 2007 Addison-Wesley. All rights reserved.

JFlex Example (continued)


(keyword, end) (punctuation, .)

Copyright 2007 Addison-Wesley. Bryant. All rights reserved Copyright 2008 Barrett R. All rights reserved.

You might also like