0% found this document useful (0 votes)

26 views15 pages

Sample

Uploaded by

syncheap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views15 pages

Sample

Uploaded by

syncheap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Thorsten Ball

writing an
INTERPRETER
in go
Writing An Interpreter In Go

Thorsten Ball
Chapter 1

Lexing

1.1 - Lexical Analysis

In order for us to work with source code we need to turn it into a more accessible form. As easy
as plain text is to work with in our editor, it becomes cumbersome pretty fast when trying to
interpret it in a programming language as another programming language.
So, what we need to do is represent our source code in other forms that are easier to work with.
We’re going to change the representation of our source code two times before we evaluate it:

The first transformation, from source code to tokens, is called “lexical analysis”, or “lexing” for
short. It’s done by a lexer (also called tokenizer or scanner – some use one word or the other
to denote subtle differences in behaviour, which we can ignore in this book).
Tokens themselves are small, easily categorizable data structures that are then fed to the parser,
which does the second transformation and turns the tokens into an “Abstract Syntax Tree”.
Here’s an example. This is the input one gives to a lexer:
"let x = 5 + 5;"

And what comes out of the lexer looks kinda like this:
[
LET,
IDENTIFIER("x"),
EQUAL_SIGN,
INTEGER(5),
PLUS_SIGN,
INTEGER(5),
SEMICOLON
]

All of these tokens have the original source code representation attached. "let" in the case of
LET, "+" in the case of PLUS_SIGN, and so on. Some, like IDENTIFIER and INTEGER in our example,
also have the concrete values they represent attached: 5 (not "5"!) in the case of INTEGER and

1
"x" in the case of IDENTIFIER. But what exactly constitutes a “token” varies between different
lexer implementations. As an example, some lexers only convert the "5" to an integer in the
parsing stage, or even later, and not when constructing tokens.
A thing to note about this example: whitespace characters don’t show up as tokens. In our case
that’s okay, because whitespace length is not significant in the Monkey language. Whitespace
merely acts as a separator for other tokens. It doesn’t matter if we type this:
let x = 5;

Or if we type this:
let x = 5;

In other languages, like Python, the length of whitespace is significant. That means the lexer
can’t just “eat up” the whitespace and newline characters. It has to output the whitespace
characters as tokens so the parser can later on make sense of them (or output an error, of
course, if there are not enough or too many).
A production-ready lexer might also attach the line number, column number and filename to
a token. Why? For example, to later output more useful error messages in the parsing stage.
Instead of "error: expected semicolon token" it can output:
"error: expected semicolon token. line 42, column 23, program.monkey"

We’re not going to bother with that. Not because it’s too complex, but because it would take
away from the essential simpleness of the tokens and the lexer, making it harder to understand.

1.2 - Defining Our Tokens

The first thing we have to do is to define the tokens our lexer is going to output. We’re going
to start with just a few token definitions and then add more when extending the lexer.
The subset of the Monkey language we’re going to lex in our first step looks like this:
let five = 5;
let ten = 10;

let add = fn(x, y) {

x + y;
};

let result = add(five, ten);

Let’s break this down: which types of tokens does this example contain? First of all, there are
the numbers like 5 and 10. These are pretty obvious. Then we have the variable names x, y,
add and result. And then there are also these parts of the language that are not numbers, just
words, but no variable names either, like let and fn. Of course, there are also a lot of special
characters: (, ), {, }, =, ,, ;.
The numbers are just integers and we’re going to treat them as such and give them a separate
type. In the lexer or parser we don’t care if the number is 5 or 10, we just want to know if it’s
a number. The same goes for “variable names”: we’ll call them “identifiers” and treat them
the same. Now, the other words, the ones that look like identifiers, but aren’t really identifiers,
since they’re part of the language, are called “keywords”. We won’t group these together since
it should make a difference in the parsing stage whether we encounter a let or a fn. The
same goes for the last category we identified: the special characters. We’ll treat each of them
separately, since it is a big difference whether or not we have a ( or a ) in the source code.

2
Let’s define our Token data structure. Which fields does it need? As we just saw, we definitely
need a “type” attribute, so we can distinguish between “integers” and “right bracket” for exam-
ple. And it also needs a field that holds the literal value of the token, so we can reuse it later
and the information whether a “number” token is a 5 or a 10 doesn’t get lost.
In a new token package we define our Token struct and our TokenType type:
// token/token.go

package token

type TokenType string

type Token struct {

Type TokenType
Literal string
}

We defined the TokenType type to be a string. That allows us to use many different values
as TokenTypes, which in turn allows us to distinguish between different types of tokens. Using
string also has the advantage of being easy to debug without a lot of boilerplate and helper
functions: we can just print a string. Of course, using a string might not lead to the same
performance as using an int or a byte would, but for this book a string is perfect.
As we just saw, there is a limited number of different token types in the Monkey language. That
means we can define the possible TokenTypes as constants. In the same file we add this:
// token/token.go

const (
ILLEGAL = "ILLEGAL"
EOF = "EOF"

// Identifiers + literals
IDENT = "IDENT" // add, foobar, x, y, ...
INT = "INT" // 1343456

// Operators
ASSIGN = "="
PLUS = "+"

// Delimiters
COMMA = ","
SEMICOLON = ";"

LPAREN = "("
RPAREN = ")"
LBRACE = "{"
RBRACE = "}"

// Keywords
FUNCTION = "FUNCTION"
LET = "LET"
)

As you can see there are two special types: ILLEGAL and EOF. We didn’t see them in the example
above, but we’ll need them. ILLEGAL signifies a token/character we don’t know about and EOF
stands for “end of file”, which tells our parser later on that it can stop.

3
So far so good! We are ready to start writing our lexer.

1.3 - The Lexer

Before we start to write code, let’s be clear about the goal of this section. We’re going to write
our own lexer. It will take source code as input and output the tokens that represent the source
code. It will go through its input and output the next token it recognizes. It doesn’t need to
buffer or save tokens, since there will only be one method called NextToken(), which will output
the next token.
That means, we’ll initialize the lexer with our source code and then repeatedly call NextToken()
on it to go through the source code, token by token, character by character. We’ll also make
life simpler here by using string as the type for our source code. Again: in a production
environment it makes sense to attach filenames and line numbers to tokens, to better track
down lexing and parsing errors. So it would be better to initialize the lexer with an io.Reader
and the filename. But since that would add more complexity we’re not here to handle, we’ll
start small and just use a string and ignore filenames and line numbers.
Having thought this through, we now realize that what our lexer needs to do is pretty clear.
So let’s create a new package and add a first test that we can continuously run to get feedback
about the working state of the lexer. We’re starting small here and will extend the test case as
we add more capabilities to the lexer:
// lexer/lexer_test.go

package lexer

import (
"testing"

"monkey/token"
)

func TestNextToken(t *testing.T) {

input := `=+(){},;`

tests := []struct {
expectedType token.TokenType
expectedLiteral string
}{
{token.ASSIGN, "="},
{token.PLUS, "+"},
{token.LPAREN, "("},
{token.RPAREN, ")"},
{token.LBRACE, "{"},
{token.RBRACE, "}"},
{token.COMMA, ","},
{token.SEMICOLON, ";"},
{token.EOF, ""},
}

l := New(input)

for i, tt := range tests {

tok := l.NextToken()

4
if tok.Type != tt.expectedType {
t.Fatalf("tests[%d] - tokentype wrong. expected=%q, got=%q",
i, tt.expectedType, tok.Type)
}

if tok.Literal != tt.expectedLiteral {
t.Fatalf("tests[%d] - literal wrong. expected=%q, got=%q",
i, tt.expectedLiteral, tok.Literal)
}
}
}

Of course, the tests fail – we haven’t written any code yet:

$ go test ./lexer
# monkey/lexer
lexer/lexer_test.go:27: undefined: New
FAIL monkey/lexer [build failed]

So let’s start by defining the New() function that returns *Lexer.

// lexer/lexer.go
package lexer

type Lexer struct {

input string
position int // current position in input (points to current char)
readPosition int // current reading position in input (after current char)
ch byte // current char under examination
}

func New(input string) *Lexer {

l := &Lexer{input: input}
return l
}

Most of the fields in Lexer are pretty self-explanatory. The ones that might cause some confusion
right now are position and readPosition. Both will be used to access characters in input by
using them as an index, e.g.: l.input[l.readPosition]. The reason for these two “pointers”
pointing into our input string is the fact that we will need to be able to “peek” further into
the input and look after the current character to see what comes up next. readPosition always
points to the “next” character in the input. position points to the character in the input that
corresponds to the ch byte.
A first helper method called readChar() should make the usage of these fields easier to under-
stand:
// lexer/lexer.go

func (l *Lexer) readChar() {

if l.readPosition >= len(l.input) {
l.ch = 0
} else {
l.ch = l.input[l.readPosition]
}
l.position = l.readPosition
l.readPosition += 1
}

The purpose of readChar is to give us the next character and advance our position in the input

5
string. The first thing it does is to check whether we have reached the end of input. If that’s
the case it sets l.ch to 0, which is the ASCII code for the "NUL" character and signifies either
“we haven’t read anything yet” or “end of file” for us. But if we haven’t reached the end of
input yet it sets l.ch to the next character by accessing l.input[l.readPosition].

After that l.position is updated to the just used l.readPosition and l.readPosition is incre-
mented by one. That way, l.readPosition always points to the next position where we’re going
to read from next and l.position always points to the position where we last read. This will
come in handy soon enough.
While talking about readChar it’s worth pointing out that the lexer only supports ASCII char-
acters instead of the full Unicode range. Why? Because this lets us keep things simple and
concentrate on the essential parts of our interpreter. In order to fully support Unicode and
UTF-8 we would need to change l.ch from a byte to rune and change the way we read the
next characters, since they could be multiple bytes wide now. Using l.input[l.readPosition]
wouldn’t work anymore. And then we’d also need to change a few other methods and functions
we’ll see later on. So it’s left as an exercise to the reader to fully support Unicode (and emojis!)
in Monkey.
Let’s use readChar in our New() function so our *Lexer is in a fully working state before anyone
calls NextToken(), with l.ch, l.position and l.readPosition already initialized:
// lexer/lexer.go

func New(input string) *Lexer {

l := &Lexer{input: input}
l.readChar()
return l
}

Our tests now tell us that calling New(input) doesn’t result in problems anymore, but the
NextToken() method is still missing. Let’s fix that by adding a first version:

// lexer/lexer.go

package lexer

import "monkey/token"

func (l *Lexer) NextToken() token.Token {

var tok token.Token

switch l.ch {
case '=':
tok = newToken(token.ASSIGN, l.ch)
case ';':
tok = newToken(token.SEMICOLON, l.ch)
case '(':
tok = newToken(token.LPAREN, l.ch)
case ')':
tok = newToken(token.RPAREN, l.ch)
case ',':
tok = newToken(token.COMMA, l.ch)
case '+':
tok = newToken(token.PLUS, l.ch)
case '{':
tok = newToken(token.LBRACE, l.ch)
case '}':

6
tok = newToken(token.RBRACE, l.ch)
case 0:
tok.Literal = ""
tok.Type = token.EOF
}

l.readChar()
return tok
}

func newToken(tokenType token.TokenType, ch byte) token.Token {

return token.Token{Type: tokenType, Literal: string(ch)}
}

That’s the basic structure of the NextToken() method. We look at the current character under
examination (l.ch) and return a token depending on which character it is. Before returning the
token we advance our pointers into the input so when we call NextToken() again the l.ch field
is already updated. A small function called newToken helps us with initializing these tokens.
Running the tests we can see that they pass:
$ go test ./lexer
ok monkey/lexer 0.007s

Great! Let’s now extend the test case so it starts to resemble Monkey source code.
// lexer/lexer_test.go

func TestNextToken(t *testing.T) {

input := `let five = 5;
let ten = 10;

let add = fn(x, y) {

x + y;
};

let result = add(five, ten);

tests := []struct {
expectedType token.TokenType
expectedLiteral string
}{
{token.LET, "let"},
{token.IDENT, "five"},
{token.ASSIGN, "="},
{token.INT, "5"},
{token.SEMICOLON, ";"},
{token.LET, "let"},
{token.IDENT, "ten"},
{token.ASSIGN, "="},
{token.INT, "10"},
{token.SEMICOLON, ";"},
{token.LET, "let"},
{token.IDENT, "add"},
{token.ASSIGN, "="},
{token.FUNCTION, "fn"},
{token.LPAREN, "("},
{token.IDENT, "x"},

7
{token.COMMA, ","},
{token.IDENT, "y"},
{token.RPAREN, ")"},
{token.LBRACE, "{"},
{token.IDENT, "x"},
{token.PLUS, "+"},
{token.IDENT, "y"},
{token.SEMICOLON, ";"},
{token.RBRACE, "}"},
{token.SEMICOLON, ";"},
{token.LET, "let"},
{token.IDENT, "result"},
{token.ASSIGN, "="},
{token.IDENT, "add"},
{token.LPAREN, "("},
{token.IDENT, "five"},
{token.COMMA, ","},
{token.IDENT, "ten"},
{token.RPAREN, ")"},
{token.SEMICOLON, ";"},
{token.EOF, ""},
}
// [...]
}

Most notably the input in this test case has changed. It looks like a subset of the Monkey
language. It contains all the symbols we already successfully turned into tokens, but also new
things that are now causing our tests to fail: identifiers, keywords and numbers.
Let’s start with the identifiers and keywords. What our lexer needs to do is recognize whether
the current character is a letter and if so, it needs to read the rest of the identifier/keyword
until it encounters a non-letter-character. Having read that identifier/keyword, we then need
to find out if it is a identifier or a keyword, so we can use the correct token.TokenType. The
first step is extending our switch statement:
// lexer/lexer.go

import "monkey/token"

func (l *Lexer) NextToken() token.Token {

var tok token.Token

switch l.ch {
// [...]
default:
if isLetter(l.ch) {
tok.Literal = l.readIdentifier()
return tok
} else {
tok = newToken(token.ILLEGAL, l.ch)
}
}
// [...]
}

func (l *Lexer) readIdentifier() string {

position := l.position
for isLetter(l.ch) {

8
l.readChar()
}
return l.input[position:l.position]
}

func isLetter(ch byte) bool {

return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_'
}

We added a default branch to our switch statement, so we can check for identifiers whenever
the l.ch is not one of the recognized characters. We also added the generation of token.ILLEGAL
tokens. If we end up there, we truly don’t know how to handle the current character and declare
it as token.ILLEGAL.
The isLetter helper function just checks whether the given argument is a letter. That sounds
easy enough, but what’s noteworthy about isLetter is that changing this function has a larger
impact on the language our interpreter will be able to parse than one would expect from such
a small function. As you can see, in our case it contains the check ch == '_', which means that
we’ll treat _ as a letter and allow it in identifiers and keywords. That means we can use variable
names like foo_bar. Other programming languages even allow ! and ? in identifiers. If you
want to allow that too, this is the place to sneak it in.
readIdentifier() does exactly what its name suggests: it reads in an identifier and advances
our lexer’s positions until it encounters a non-letter-character.
In the default: branch of the switch statement we use readIdentifier() to set the Literal field
of our current token. But what about its Type? Now that we have read identifiers like let, fn
or foobar, we need to be able to tell user-defined identifiers apart from language keywords. We
need a function that returns the correct TokenType for the token literal we have. What better
place than the token package to add such a function?
// token/token.go

var keywords = map[string]TokenType{

"fn": FUNCTION,
"let": LET,
}

func LookupIdent(ident string) TokenType {

if tok, ok := keywords[ident]; ok {
return tok
}
return IDENT
}

LookupIdent checks the keywords table to see whether the given identifier is in fact a keyword.
If it is, it returns the keyword’s TokenType constant. If it isn’t, we just get back token.IDENT,
which is the TokenType for all user-defined identifiers.
With this in hand we can now complete the lexing of identifiers and keywords:
// lexer/lexer.go

func (l *Lexer) NextToken() token.Token {

var tok token.Token

switch l.ch {
// [...]

9
default:
if isLetter(l.ch) {
tok.Literal = l.readIdentifier()
tok.Type = token.LookupIdent(tok.Literal)
return tok
} else {
tok = newToken(token.ILLEGAL, l.ch)
}
}
// [...]
}

The early exit here, our return tok statement, is necessary because when calling
readIdentifier(), we call readChar() repeatedly and advance our readPosition and position
fields past the last character of the current identifier. So we don’t need the call to readChar()
after the switch statement again.
Running our tests now, we can see that let is identified correctly but the tests still fail:
$ go test ./lexer
--- FAIL: TestNextToken (0.00s)
lexer_test.go:70: tests[1] - tokentype wrong. expected="IDENT", got="ILLEGAL"
FAIL
FAIL monkey/lexer 0.008s

The problem is the next token we want: a IDENT token with "five" in its Literal field. Instead
we get an ILLEGAL token. Why is that? Because of the whitespace character between “let” and
“five”. But in Monkey whitespace only acts as a separator of tokens and doesn’t have meaning,
so we need to skip over it entirely:
// lexer/lexer.go

func (l *Lexer) NextToken() token.Token {

var tok token.Token

l.skipWhitespace()

switch l.ch {
// [...]
}

func (l *Lexer) skipWhitespace() {

for l.ch == ' ' || l.ch == '\t' || l.ch == '\n' || l.ch == '\r' {
l.readChar()
}
}

This little helper function is found in a lot of parsers. Sometimes it’s called eatWhitespace and
sometimes consumeWhitespace and sometimes something entirely different. Which characters
these functions actually skip depends on the language being lexed. Some language implementa-
tions do create tokens for newline characters for example and throw parsing errors if they are
not at the correct place in the stream of tokens. We skip over newline characters to make the
parsing step later on a little easier.
With skipWhitespace() in place, the lexer trips over the 5 in the let five = 5; part of our test
input. And that’s right, it doesn’t know yet how to turn numbers into tokens. It’s time to add
this.
As we did previously for identifiers, we now need to add more functionality to the default

10
branch of our switch statement.
// lexer/lexer.go

func (l *Lexer) NextToken() token.Token {

var tok token.Token

l.skipWhitespace()

switch l.ch {
// [...]
default:
if isLetter(l.ch) {
tok.Literal = l.readIdentifier()
tok.Type = token.LookupIdent(tok.Literal)
return tok
} else if isDigit(l.ch) {
tok.Type = token.INT
tok.Literal = l.readNumber()
return tok
} else {
tok = newToken(token.ILLEGAL, l.ch)
}
}
// [...]
}

func (l *Lexer) readNumber() string {

position := l.position
for isDigit(l.ch) {
l.readChar()
}
return l.input[position:l.position]
}

func isDigit(ch byte) bool {

return '0' <= ch && ch <= '9'
}

As you can see, the added code closely mirrors the part concerned with reading identifiers and
keywords. The readNumber method is exactly the same as readIdentifier except for its usage
of isDigit instead of isLetter. We could probably generalize this by passing in the character-
identifying functions as arguments, but won’t, for simplicity’s sake and ease of understanding.
The isDigit function is as simple as isLetter. It just returns whether the passed in byte is a
Latin digit between 0 and 9.
With this added, our tests pass:
$ go test ./lexer
ok monkey/lexer 0.008s

I don’t know if you noticed, but we simplified things a lot in readNumber. We only read in
integers. What about floats? Or numbers in hex notation? Octal notation? We ignore them
and just say that Monkey doesn’t support this. Of course, the reason for this is again the
educational aim and limited scope of this book.
It’s time to pop the champagne and celebrate: we successfully turned the small subset of the
Monkey language we used in the our test case into tokens!

11
With this victory under our belt, it’s easy to extend the lexer so it can tokenize a lot more of
Monkey source code.

12
End Of Sample

You’ve reached the end of the sample. I hope you enjoyed it. You can buy the full version of
the book (in multiple formats, including the code) online at:
https://interpreterbook.com

CD Notes by Quantum City AIR 107, GATE CS 2024, Shreyas Rathod Compiler
No ratings yet
CD Notes by Quantum City AIR 107, GATE CS 2024, Shreyas Rathod Compiler
37 pages
Token Lexeme Pattern
No ratings yet
Token Lexeme Pattern
4 pages
Write A Computer Language Using Go (Golang)
100% (1)
Write A Computer Language Using Go (Golang)
14 pages
Chapter 3 Lexical Analyser
No ratings yet
Chapter 3 Lexical Analyser
29 pages
01 134201 011 9556776808 15032022 124916pm
No ratings yet
01 134201 011 9556776808 15032022 124916pm
6 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
Lexical Analysis: 2/24/2018 John Roberts
No ratings yet
Lexical Analysis: 2/24/2018 John Roberts
7 pages
Dota 2 Ozkavos Language
No ratings yet
Dota 2 Ozkavos Language
6 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
5 pages
Lecture - 03 Compiler Overview - Lexical Analysis, Tokens, Ad-Hoc Lexer
No ratings yet
Lecture - 03 Compiler Overview - Lexical Analysis, Tokens, Ad-Hoc Lexer
50 pages
Signposts Upon The Path To Seeking Knowledge
100% (1)
Signposts Upon The Path To Seeking Knowledge
44 pages
Characteristics of Young Learners
100% (1)
Characteristics of Young Learners
5 pages
CHUYÊN ĐỀ THÌ HIỆN TẠI ĐƠN LỚP 6
No ratings yet
CHUYÊN ĐỀ THÌ HIỆN TẠI ĐƠN LỚP 6
2 pages
03LexicalAndSyntaxAnalysis 1
No ratings yet
03LexicalAndSyntaxAnalysis 1
25 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Lexical Analysis of Compiler
No ratings yet
Lexical Analysis of Compiler
13 pages
حاسبة
No ratings yet
حاسبة
58 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
cs212 Lect05 63 Inter
No ratings yet
cs212 Lect05 63 Inter
48 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
SML Chapter9
No ratings yet
SML Chapter9
40 pages
Compilation Techniques
No ratings yet
Compilation Techniques
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
03 Lexing Parsing
No ratings yet
03 Lexing Parsing
78 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
Lec 02
No ratings yet
Lec 02
17 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
3 pages
Learning Materials, CD, Unit-2 (Lexical Analysis)
No ratings yet
Learning Materials, CD, Unit-2 (Lexical Analysis)
13 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
2as Test Unit Make Peace Nobel Peace Prize Tests 131908
100% (1)
2as Test Unit Make Peace Nobel Peace Prize Tests 131908
6 pages
Deutsch-Jozsa Algorithm PDF
No ratings yet
Deutsch-Jozsa Algorithm PDF
7 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
DWC Pcie DM Databook 4.10a 2012
No ratings yet
DWC Pcie DM Databook 4.10a 2012
1,460 pages
CD 1
No ratings yet
CD 1
92 pages
2 Lexing
No ratings yet
2 Lexing
73 pages
Sahil Kundal: UI/UX Designer and Software Engineer
No ratings yet
Sahil Kundal: UI/UX Designer and Software Engineer
3 pages
1971 I Jan
No ratings yet
1971 I Jan
81 pages
Act II, Scene 3 Group Analysis
No ratings yet
Act II, Scene 3 Group Analysis
3 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Data Structures - Algorithms - Chapter 1
No ratings yet
Data Structures - Algorithms - Chapter 1
11 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
TokiPona PDF
No ratings yet
TokiPona PDF
3 pages
Chapter 5 Gleancoe Book
No ratings yet
Chapter 5 Gleancoe Book
84 pages
BK Vwo - Grammar U1-U5
No ratings yet
BK Vwo - Grammar U1-U5
11 pages
01 Windows 2019 Server Index & Install WS2K19-DC01
No ratings yet
01 Windows 2019 Server Index & Install WS2K19-DC01
27 pages
The Phonology of Biblical Hebrew Sound C
No ratings yet
The Phonology of Biblical Hebrew Sound C
30 pages
The Fortune Teller
No ratings yet
The Fortune Teller
3 pages
Exercises: 25.1here Are Some Things That Sarah Said To You Earlier
No ratings yet
Exercises: 25.1here Are Some Things That Sarah Said To You Earlier
2 pages
Careerit Devops Course Content
No ratings yet
Careerit Devops Course Content
9 pages
Data Gathering
No ratings yet
Data Gathering
17 pages
PRESENT PERFECT SIMPLE - Lecture
No ratings yet
PRESENT PERFECT SIMPLE - Lecture
3 pages
Michael Jackson: Pre-Reading Quick Facts
No ratings yet
Michael Jackson: Pre-Reading Quick Facts
7 pages
Gr11-Gr12 Trigs Study Sheet
No ratings yet
Gr11-Gr12 Trigs Study Sheet
6 pages
Learning Styles
No ratings yet
Learning Styles
22 pages
Bec601 Module 1 Notes
No ratings yet
Bec601 Module 1 Notes
61 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Haldwani Genral Store & Dairy Etc.
No ratings yet
Haldwani Genral Store & Dairy Etc.
12 pages
Hamilton 2025 Paper
No ratings yet
Hamilton 2025 Paper
3 pages
Rubric Creating A Triptych
100% (3)
Rubric Creating A Triptych
1 page
Solution1 10
No ratings yet
Solution1 10
5 pages

Sample

Uploaded by

Sample

Uploaded by

Thorsten Ball

1.1 - Lexical Analysis

1.2 - Defining Our Tokens

let add = fn(x, y) {

let result = add(five, ten);

type TokenType string

type Token struct {

1.3 - The Lexer

func TestNextToken(t *testing.T) {

for i, tt := range tests {

Of course, the tests fail – we haven’t written any code yet:

So let’s start by defining the New() function that returns *Lexer.

type Lexer struct {

func New(input string) *Lexer {

func (l *Lexer) readChar() {

func New(input string) *Lexer {

func (l *Lexer) NextToken() token.Token {

func newToken(tokenType token.TokenType, ch byte) token.Token {

func TestNextToken(t *testing.T) {

let add = fn(x, y) {

let result = add(five, ten);

func (l *Lexer) NextToken() token.Token {

func (l *Lexer) readIdentifier() string {

func isLetter(ch byte) bool {

var keywords = map[string]TokenType{

func LookupIdent(ident string) TokenType {

func (l *Lexer) NextToken() token.Token {

func (l *Lexer) NextToken() token.Token {

func (l *Lexer) skipWhitespace() {

func (l *Lexer) NextToken() token.Token {

func (l *Lexer) readNumber() string {

func isDigit(ch byte) bool {

You might also like