0% found this document useful (0 votes)

44 views7 pages

Lexical Analysis: 2/24/2018 John Roberts

The document discusses lexical analysis and the lexer project code. It describes how lexical analysis works by reading a stream of characters and generating a stream of tokens. It then provides an overview of the lexer project code, including how the tokens are defined in a tokens file and how the Symbol and Token classes are used. It explains how the lexer initializes known tokens and then lexes the user's program to generate symbols for identifiers and integers not already in the symbol table.

Uploaded by

eveswan2008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views7 pages

Lexical Analysis: 2/24/2018 John Roberts

Uploaded by

eveswan2008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1

8. Lexical Analysis

2/24/2018

John Roberts

2
Overview
• Lexical Analysis

• Assignment Two

• Project Code Overview

• Lexing

3
Lexical Analysis

• Read a stream of characters that make up the source

program, and create a stream of tokens by combining the
characters appropriately

• tokens are sometimes also referred to as lexical units,

or lexemes

• Example: the characters ’t’, ’h’, ‘e’, ’n’ will be combined

to build the then token

• Example: the characters ‘1’, ‘2’, ‘4’, ‘7’ will be combined

to form an integer token with a value of 1247
4
Overview
• Lexical Analysis

• Assignment Two

• Project Code Overview

• Lexing

5
Overview
• Lexical Analysis

• Assignment Two

• Project Code Overview

• Lexing

6
The Lexer

• We will be working with the lexer package

• Recall that the responsibility is to generate Tokens

7 Generating tokens
Token Categories

Category Tokens

Reserved Words program int boolean if then else while return

Identifiers <the same as Java identifiers>

Integers <a sequence of digits>

Operators = == != < <= + - * / | &

Separators {}(),

Comments // until end of line

Whitespace <spaces> <newlines> and other Java whitespace

characters

We’ll see how we use this shortly

1 Program program
2 Int int 8
3 BOOLean boolean
tokens file 4
5
If if
Then then
6 Else else
7 While while
• The tokens are defined in a tokens 8 Function function
9 Return return
file 10 Identifier <id>
11 INTeger <int>
12 LeftBrace {
• Each line in the file will have two 13 RightBrace }
strings: 14
15
LeftParen (
RightParen )
16 Comma ,
17 Assign =
• The Symbolic constant we will use 18 Equal ==
in the compiler for the token 19
20
NotEqual !=
Less <
21 LessEqual <=
22 Plus +
• The actual token 23 Minus -
24 Or |
25 And &
26 Multiply *
27 Divide /
28 Comment //

9
Token Setup

• TokenSetup.java will read tokens, and automatically

generate the files Tokens.java and
TokenType.java

• The Tokens enum is actually a class - you can add

methods, instance fields, and a constructor that can only
be used to construct the enumerated values

• Values are accessed as Tokens.If, etc.

10
TokenSetup.java

• Examine code to ensure we understand how it works

• Execute TokenSetup and inspect Tokens.java and

TokenTypes.java

11
SourceReader.java

• Examine code to ensure we understand how it works

• Note that we will be updating this file to generate better

output (which we’ll see in a minute when we run Lexer)

12
Token.java

• Each Token contains four pieces of information

• String of Token found in source

• TokenType

• Starting column from source file

• Ending column

• The first two items are grouped as a Symbol

13 Note we’ve seen this hash pattern before…
Symbol.java

• String from the source, and TokenType

• All Strings (corresponding to tokens) found in the source

program will be placed into the hash table in the Symbol
class (the Symbol table)

• Before we begin, we place all Tokens in the Symbol

hash table

• Each String will (should) be inserted exactly once

1 program { int j int k 14

2 j = j + k
Symbol.java example 3 }

Token( Symbol( "program", Tokens.Program ), 1, 7 )

Token( Symbol( "{", Tokens.LeftBrace ), 9, 9 )
Token( Symbol( "int", Tokens.Int ), 11, 13 )
Token( Symbol( "j", Tokens.Identifier ), 15, 15 )
Token( Symbol( "int", Tokens.Int ), 17, 19 )
Token( Symbol( "k", Tokens.Identifier ), 21, 21 )
Token( Symbol( "j", Tokens.Identifier ), 2, 2 )
Token( Symbol( "=", Tokens.Assign ), 4, 4 )
Token( Symbol( "j", Tokens.Identifier ), 6, 6 )
Token( Symbol( "+", Tokens.Plus ), 8, 8 )
Token( Symbol( "k", Tokens.Identifier ), 10, 10 )
Token( Symbol( "}", Tokens.RightBrace ), 1, 1 )

• Symbol( String s, Tokens kind ) - insert

s into the hash table with value given by kind; if the
entry is already in the table, then just return the entry

15
Symbol.java example

• Note that we repeated a Symbol three times 

Symbol( “j”, Tokens.Identifier )

• For efficiency, we only want to create one instance of

each Symbol, so we use the hash table to check if the
Symbol has already been created. If so, re-use, if not,
create a new instance.

• Logic encapsulated in Symbol class

16
Overview
• Lexical Analysis

• Assignment

• Project Code Overview

• Lexing

17
Performing Lexical Analysis

• Prior to processing the user’s program, we’ll create

Symbol instances for all reserved words, operators, etc.
so we can find them later (see TokenType.java)

• Once the lexer starts processing the user’s program, the

only new symbols that will be created (added to the hash
map) will be identifiers and numbers - all other symbols
would have been created before

18
Initializing

• Insert all token in HashMap<String, Symbol>

• tokens HashMap in TokenType holds all of the known Token/

Symbol pairs, e.g. 
tokens.put( 
Tokens.Program,  
Symbol.symbol(“program",Tokens.Program) 
);

• Each of these are stored in the symbol table as they are

generated (see implementation for Symbol.symbol)

• At this point, Symbol.symbols.get( “program” ) yields

Symbol( “program”, Tokens.Program )
19
Lexing

• Scan the program line by line (character by character), and

insert symbols not already in the the symbols table (identifiers
and ints)

• If we look up an identifier in the symbols:

• Reserved word (e.g. program) - found and

Symbol( “program”, Tokens.Program ) returned

• User id not already in symbols - we don’t find it, so we put

a new entry return the new Symbol

• User id already in symbols - return the entry

20
Lexing

• If we look up other tokens in symbols

• Numbers - put new entry, if not already there

• Not found - don’t do anything

• e.g. = vs. == vs. !=, / vs. // - these are either one or

two character tokens

• e.g x =abc + y - we can key on the character =,

and save the a for the start of the next token (the abc
identifier)

Number Theory
No ratings yet
Number Theory
52 pages
101 Shortcut Maths
100% (3)
101 Shortcut Maths
188 pages
Rondalla Exam
50% (2)
Rondalla Exam
5 pages
EC-211 Object Oriented: Programming and Data Structures Using C++
100% (1)
EC-211 Object Oriented: Programming and Data Structures Using C++
45 pages
0llcomputer Applications ICSE 10th Answer PDF
100% (1)
0llcomputer Applications ICSE 10th Answer PDF
372 pages
Level 4 Workbook
No ratings yet
Level 4 Workbook
37 pages
Compiler Lab Manual Final E-Content
75% (16)
Compiler Lab Manual Final E-Content
55 pages
Chorus - Semester 1 Final Exam Review
0% (1)
Chorus - Semester 1 Final Exam Review
4 pages
Bill Evans My Funny Valentine
100% (1)
Bill Evans My Funny Valentine
4 pages
Lexeme Generator 5th Sem 2009 REPORT
100% (1)
Lexeme Generator 5th Sem 2009 REPORT
78 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Write A Computer Language Using Go (Golang)
100% (1)
Write A Computer Language Using Go (Golang)
14 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Operator'S Manual & Parts List
No ratings yet
Operator'S Manual & Parts List
9 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
Lecture 05
No ratings yet
Lecture 05
33 pages
Lecture - 03 Compiler Overview - Lexical Analysis, Tokens, Ad-Hoc Lexer
No ratings yet
Lecture - 03 Compiler Overview - Lexical Analysis, Tokens, Ad-Hoc Lexer
50 pages
Sample
No ratings yet
Sample
15 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
Music Notation Bass
No ratings yet
Music Notation Bass
1 page
JFlex
No ratings yet
JFlex
22 pages
Level 5 Workbook
No ratings yet
Level 5 Workbook
31 pages
493 Pupil Book Abrsm Grades 6 To 8
No ratings yet
493 Pupil Book Abrsm Grades 6 To 8
16 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
Lexical Analysis (Scanner)
No ratings yet
Lexical Analysis (Scanner)
26 pages
AutoCAD - Text Symbols and Special Characters Reference
No ratings yet
AutoCAD - Text Symbols and Special Characters Reference
4 pages
Cat 1
No ratings yet
Cat 1
150 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Experiment-1 Problem Definition
No ratings yet
Experiment-1 Problem Definition
28 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
Sviridov SnowStorm Percussion 2020-07-17
No ratings yet
Sviridov SnowStorm Percussion 2020-07-17
30 pages
SML Chapter9
No ratings yet
SML Chapter9
40 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Compilation Techniques
No ratings yet
Compilation Techniques
20 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
Parsing Using Java CUP: An Example (INTERP1)
No ratings yet
Parsing Using Java CUP: An Example (INTERP1)
21 pages
CT - Lecture 2
No ratings yet
CT - Lecture 2
23 pages
Interview Question Python
No ratings yet
Interview Question Python
14 pages
Lexical Analysis of Compiler
No ratings yet
Lexical Analysis of Compiler
13 pages
Unit 5 SP
No ratings yet
Unit 5 SP
28 pages
Lec 3-Compiler Construction
0% (1)
Lec 3-Compiler Construction
26 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Lexical Analysis
No ratings yet
Lexical Analysis
15 pages
Letters.: /double For /fraktur For
No ratings yet
Letters.: /double For /fraktur For
4 pages
CHLD 2 Sec02W Syllabus Summer 18
No ratings yet
CHLD 2 Sec02W Syllabus Summer 18
8 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Digital Electronics PDF
No ratings yet
Digital Electronics PDF
6 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
PL Lec 2 Syntax and Semantics
No ratings yet
PL Lec 2 Syntax and Semantics
48 pages
Concepts - Assignment (Technical Report Template)
No ratings yet
Concepts - Assignment (Technical Report Template)
14 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
The Go Programming Language Specification
No ratings yet
The Go Programming Language Specification
119 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
Lecture 3 - LexicalAnalysis
No ratings yet
Lecture 3 - LexicalAnalysis
27 pages
ENGLISH 2 - Q1 - MODULE2 - FINAL VERSION
No ratings yet
ENGLISH 2 - Q1 - MODULE2 - FINAL VERSION
9 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Lec3-CompilerConstruction 2
No ratings yet
Lec3-CompilerConstruction 2
26 pages
Modern Compiler Design T1 - Overview
No ratings yet
Modern Compiler Design T1 - Overview
35 pages
Fractions Fractions Fractions Fractions Fractions: 2 Pooris + Half-Poori-Subhash 2 Pooris + Half-Poori-Farida
No ratings yet
Fractions Fractions Fractions Fractions Fractions: 2 Pooris + Half-Poori-Subhash 2 Pooris + Half-Poori-Farida
31 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Compilers: Topic 2: Lexical Analysis
No ratings yet
Compilers: Topic 2: Lexical Analysis
29 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
CD 1
No ratings yet
CD 1
92 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Compiler Easy Notes - Hamza Zahoor
No ratings yet
Compiler Easy Notes - Hamza Zahoor
37 pages
Approximation and Estimation (Level 1 Practice)
No ratings yet
Approximation and Estimation (Level 1 Practice)
3 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Learning Materials, CD, Unit-2 (Lexical Analysis)
No ratings yet
Learning Materials, CD, Unit-2 (Lexical Analysis)
13 pages
C Notes III-UNIT
No ratings yet
C Notes III-UNIT
20 pages
FormCalc - Manual PDF
No ratings yet
FormCalc - Manual PDF
117 pages
Compiler Construction: Tahir Iqbal
No ratings yet
Compiler Construction: Tahir Iqbal
28 pages
Jimmy Rowles 1980 My Funny Valentine Transcribed by Raphael Gest PDF
No ratings yet
Jimmy Rowles 1980 My Funny Valentine Transcribed by Raphael Gest PDF
9 pages
Lab 2: Lexer Implementation: Preparation
No ratings yet
Lab 2: Lexer Implementation: Preparation
6 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
CS321 hw1
No ratings yet
CS321 hw1
4 pages
Moment For Morricone-2nd Trumpet in BB
No ratings yet
Moment For Morricone-2nd Trumpet in BB
3 pages
Projectlexernew1 - Copie
No ratings yet
Projectlexernew1 - Copie
7 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Whole Number RD Sharma Class 6
No ratings yet
Whole Number RD Sharma Class 6
72 pages
Part 1. Experiments With Javacc: Source Code Source Code
No ratings yet
Part 1. Experiments With Javacc: Source Code Source Code
3 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
CPP Program For The Above Approach
No ratings yet
CPP Program For The Above Approach
11 pages
Quick Reference Specman
No ratings yet
Quick Reference Specman
4 pages
Java for Black Jack: Learn the Java Programming Language in One Session by Writing and Running a Java-Based Card Game Simulation
From Everand
Java for Black Jack: Learn the Java Programming Language in One Session by Writing and Running a Java-Based Card Game Simulation
U.Q. Magnusson
No ratings yet
EPL Programming Guide34343
No ratings yet
EPL Programming Guide34343
6 pages
Dosage Calculation Conversions Tle
No ratings yet
Dosage Calculation Conversions Tle
1 page
SIM21 Simonton Madeira Most Efficient Energy Star Sell Sheet 7108207151101 RevA MS CG 0321 WEB
No ratings yet
SIM21 Simonton Madeira Most Efficient Energy Star Sell Sheet 7108207151101 RevA MS CG 0321 WEB
2 pages
#Include #Include #Ifndef #Define: "Car.h"
No ratings yet
#Include #Include #Ifndef #Define: "Car.h"
1 page
Multiplication of Decimals
No ratings yet
Multiplication of Decimals
9 pages
Vam MS Mus Inst
No ratings yet
Vam MS Mus Inst
4 pages
Coding Decoding Part 2
No ratings yet
Coding Decoding Part 2
7 pages
OSDC - Cheatsheet-Compose-2022 4 15
No ratings yet
OSDC - Cheatsheet-Compose-2022 4 15
2 pages
Infix To Postfix and Evaluate
No ratings yet
Infix To Postfix and Evaluate
3 pages
Lettering Fonts FontSpace
No ratings yet
Lettering Fonts FontSpace
1 page
Sleigh Ride Holiday
No ratings yet
Sleigh Ride Holiday
1 page

Lexical Analysis: 2/24/2018 John Roberts

Uploaded by

Lexical Analysis: 2/24/2018 John Roberts

Uploaded by

1

• Project Code Overview

• Read a stream of characters that make up the source

• tokens are sometimes also referred to as lexical units,

• Example: the characters ’t’, ’h’, ‘e’, ’n’ will be combined

• Example: the characters ‘1’, ‘2’, ‘4’, ‘7’ will be combined

• Project Code Overview

• Project Code Overview

• We will be working with the lexer package

• Recall that the responsibility is to generate Tokens

Reserved Words program int boolean if then else while return

Identifiers <the same as Java identifiers>

Integers <a sequence of digits>

Operators = == != < <= + - * / | &

Comments // until end of line

Whitespace <spaces> <newlines> and other Java whitespace

We’ll see how we use this shortly

• TokenSetup.java will read tokens, and automatically

• The Tokens enum is actually a class - you can add

• Values are accessed as Tokens.If, etc.

• Examine code to ensure we understand how it works

• Execute TokenSetup and inspect Tokens.java and

• Examine code to ensure we understand how it works

• Note that we will be updating this file to generate better

• Each Token contains four pieces of information

• String of Token found in source

• Starting column from source file

• The first two items are grouped as a Symbol

• String from the source, and TokenType

• All Strings (corresponding to tokens) found in the source

• Before we begin, we place all Tokens in the Symbol

• Each String will (should) be inserted exactly once

1 program { int j int k 14

Token( Symbol( "program", Tokens.Program ), 1, 7 )

• Symbol( String s, Tokens kind ) - insert

• Note that we repeated a Symbol three times

• For efficiency, we only want to create one instance of

• Logic encapsulated in Symbol class

• Project Code Overview

• Prior to processing the user’s program, we’ll create

• Once the lexer starts processing the user’s program, the

• Insert all token in HashMap<String, Symbol>

• tokens HashMap in TokenType holds all of the known Token/

• Each of these are stored in the symbol table as they are

• At this point, Symbol.symbols.get( “program” ) yields

• Scan the program line by line (character by character), and

• If we look up an identifier in the symbols:

• Reserved word (e.g. program) - found and

• User id not already in symbols - we don’t find it, so we put

• User id already in symbols - return the entry

• If we look up other tokens in symbols

• Numbers - put new entry, if not already there

• Not found - don’t do anything

• e.g. = vs. == vs. !=, / vs. // - these are either one or

• e.g x =abc + y - we can key on the character =,

You might also like

• Note that we repeated a Symbol three times