0% found this document useful (0 votes)

761 views10 pages

Haskell Tokenizer

This document discusses tokenizing strings in Haskell. It begins by defining the Token data type to represent the different types of tokens: operators, identifiers, and numbers. It then discusses enumerating the possible operators and using pattern matching to extract values from Tokens. The document also discusses representing strings as lists and defines the basic List data type to represent lists through recursive cons and empty constructs. Overall, the document provides an introduction to tokenizing strings in Haskell by defining the necessary data types and discussing techniques like pattern matching, recursion, and representing strings as lists.

Uploaded by

Neuro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

761 views10 pages

Haskell Tokenizer

Uploaded by

Neuro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

5.

Tokenizer: Data Types

In the previous tutorial I sketched the desing of a calculator and implemented
the top-level input/output loop. This is a typical pattern in Haskell: the top
level is implemented in the IOmonad (after all, the signature of main is IO ())
but, as you descend to the lower levels, you enter the realm of side-effect-free
pure functions. The first such function is tokenize with the following signature:
tokenize :: String -> [Token]

Before we can start implementing it, we have to define the Token data type and
learn more about Strings.

Haskell Data Types

There is one major difference between data in imperative languages and data
in Haskell. Haskell data is immutable. Once you construct a data item, it will
forever stay the same.
Well, it's not entirely true because of another property of Haskell: laziness.
Calling a constructor of a data type is not the same as evaluating it. It's only
when you actually peek inside a data item that the constructor is evaluated,
and only the part that you're looking at.
But for all intents and purposes, the state of a data item remains frozen after
its construction. Moreover, every data item remembers the way it's been
constructed. It remembers which constructor was used and what values were
passed to it.
But how can you write programs without mutable data? Actually, those of us
who had to deal with concurrent programming in imperative languages had to
learn (often the hard way) to eschew mutability whenever possible. The fewer
opportunities for those hard to reproduce and debug low-level data races, the
more reliable your code. This is one more reason to learn programming in
Haskell even if your job requries the use of imperative languages: You'll learn
how to solve problems without mutable variables.
In Haskell you'll often see mutation replaced by construction. Instead of
modifying one element of a data structure, you construct a copy of it with the
appropriate change in place. This trick could be prohibitively expensive if you
use the wrong data structures. We'll be steering away from such data
structures in favor of the so called persistent data structures, which don't require
a lot of copying when they are modified. For instance, the workhorse of Haskell
data structures is the list, not the array of the vector. We'll talk more about this
later.

Enumerated Data Types

The simplest data types just enumerate all possible values. For
instance, Bool is an enumeration of True and False (as defined in the Prelude,

the Haskell's standard library):

data Bool = True | False

A data structure definition is introduced by the keyword data. Bool is the name
of the type we are defining. The right hand side of the equal sign lists
the constructors separated by vertical bars. When you create a new Bool value,
you use one of these two constructors. Constructor names must start with a
capital letter and must be unique per file (two data structures can't share the
same constructor name).
When you want to inspect a Bool value, you match it with one of the
constructors (remember, a value remembers how it was constructed). There are
several ways of matching values to constructors in Haskell. Let's start with the
simplest one: Defining a function using multiple equations. Instead of defining
a function with one equation, like this:
boolToInt :: Bool -> Int
boolToInt b = if b then 1 else 0

main = print $ boolToInt False

you may split it into two equations corresponding to two constructor

patterns, True and False:
boolToInt :: Bool -> Int
boolToInt True

= 1

boolToInt False = 0

main = print $ boolToInt False

Patterns are matched in order, so when boolToInt is called with False, the
runtime first tries to match it to True and fails, so it moves to the second
pattern False and succeeds. (All equations for the same function must be
consecutive.)
(Note: In order to save on parentheses I will start using the function
application operator $that I introduced in the first tutorial. It's been a long
time, so here's a quick recap: $ separates a function call from its argument.
It's very useful when the argument is another function call, because function
calls bind to the left. In our example, without the $ or parenteheses, the
function calls would bind: (print boolToInt) False, and would fail to compile.
Operator $ has very low precedence so the thing to its right will be evaluated
before the function to the left is called, and it binds to the right.)
Here's a useful enumeration that we will use in our project:
data Operator = Plus | Minus | Times | Div

Ex 1. Write a function that takes an Operator and returns one of the

characters, '+', '-', '*', or '/'.

data Operator = Plus | Minus | Times | Div

opToChar :: Operator -> Char

opToChar = undefined

main = print $ opToChar Plus

data Operator = Plus | Minus | Times | Div

opToChar :: Operator -> Char

opToChar Plus

= '+'

opToChar Minus = '-'

opToChar Times = '*'
opToChar Div

= '/'

main = print $ opToChar Plus

Token
Our tokenizer should recognize operators, identifiers, and numbers. We can
enumerate the four operators, but we can't enumerate all possible indentifiers
or numbers. For those tokens we need to store additional information:
a String and an Int respectively. Here's the definition ofToken:
data Token = TokOp Operator
| TokIdent String
| TokNum Int
deriving (Show, Eq)

All three constructors now take arguments. The TokOp constructor takes a value
of the typeOperator, TokIdent takes a String, and TokNum takes an Int. For
instance, you can create aToken using (TokIdent "x"), etc.
I'll explain the deriving clause in more detail when we talk about type classes.
For now it will suffice to know that deriving Show means that there is a way to
convert any Token to string (either by calling show or by print'ing it),
and deriving Eq means that we can compareTokens for (in-)equality. The
compiler is clever enough to implement this functionality all by itself (if it
can't, it will issue an error).
Pattern matching on these constructors is more interesting: We not only match
the constructor name but also the value with which it was originally called.
Here's a definition of a functionshowContent that uses this kind of pattern

matching:
-- show
data Token = TokOp Operator
| TokIdent String
| TokNum Int
deriving (Show, Eq)

showContent :: Token -> String

showContent (TokOp op) = opToStr op
showContent (TokIdent str) = str
showContent (TokNum i) = show i

token :: Token
token = TokIdent "x"

main = do
putStrLn $ showContent token
print token
-- /show
data Operator = Plus | Minus | Times | Div
deriving (Show, Eq)

opToStr :: Operator -> String

opToStr Plus

= "+"

opToStr Minus = "-"

opToStr Times = "*"
opToStr Div

= "/"

Notice that non-trivial constructor patterns require parentheses. In these

patterns the argument to the constructor is replaced by a (lower-case)
variable that is to be bound to the value stored inside the Token. For instance,
in the (TokIdent str) pattern, str will be bound to the string that was used in
the construction of the matched token. If the token was constructed
using TokIdent "x", str will be bound to "x". (For immutable variables we prefer
to use the word "bind" rather than "assign.")

In general, constructors may take many arguments of various types, and they
can all be matched by patterns.
Ex 2. Define a data type Point with one constructor Pt that takes two Doubles,
corresponding to the x and y coordinates of a point. Write a function inc that
takes a Point and returns a newPoint whose coordinates are one more than the
original coordinates. Use pattern matching.
data Point = Pt ...
deriving Show

inc :: Point -> Point

inc ... = ...

p :: Point
p = Pt (-1) 3

main = print $ inc p

data Point = Pt Int Int
deriving Show

inc :: Point -> Point

inc (Pt x y) = Pt (x + 1) (y + 1)

p :: Point
p = Pt (-1) 3

main = print $ inc p

By the way, we've seen pattern matching previously applied to pairs. The
constructor of a pair is (,).
Ex 3. Solve the previous exercise using pairs rather than Points.
inc :: (Int, Int) -> (Int, Int)
inc ... = ...

p :: (Int, Int)
p = ...

main = print $ inc p

inc :: (Int, Int) -> (Int, Int)
inc (x, y) = (x + 1, y + 1)

p :: (Int, Int)
p = (-1, 3)

main = print $ inc p

Lists and Recursion

In Haskell a String is a list of characters. Admittedly, list storage and
processing is less space/time efficient than the processing of arrays of
characters in imperative languages. However, unless your application is
string-intensive, the convenience of list manipulation overcomes these
shortcomings. And it's easy enough to replace String with the more efficient
array-based ByteString in string-intensive applications.
Since we'll be manipulating strings -- and strings are list of characters -- we
need to learn about lists first.
First we have to ask ourselvest: What is a list? If you're thinking, "Singly-linked
or doubly-linked?", you are talking about implementation, not the essence of a
list. So what's the essence of a list? Like any abstract data type, list is defined
by operations you can perform on it. The most essential operation is
the creation of a list.
One should be able to create a new list by prepending an element to an
existing list. This operation is often called "cons," a word taken from Lisp
jargon. Notice that this definition is self-referential -- you create a list from a
list. To start somewhere, you should also be able to create a list from nothing -an empty list. Here's a definition of a list of integers that is based just on this
description:
data List = Cons Int List | Empty

The fact that this definition is recursive shouldn't bother us in the least. The
important thing is that it lets us create arbitrary lists:
lst0, lst1, lst2 :: List
lst0 = Empty

-- empty list

lst1 = Cons 1 lst0

-- one-element list

lst2 = Cons 2 lst1

-- two-element list

This definition can also be used in pattern matching. For instance, here's a
function that checks if a list is a singleton:

data List = Cons Int List | Empty

singleton :: List -> Bool

singleton (Cons _ Empty) = True
singleton _ = False

main = do
print $ singleton Empty
print $ singleton $ Cons 2 Empty
print $ singleton $ Cons 3 $ Cons 4 Empty

In this example, I made use of a wildcard pattern _. Let me remind you that his
pattern matches anything (without evaluating it). For instance, in the first
clause of singleton I'm discarding the integer stored in the list. In the second
clause I'm ignoring the whole list, because I know that the first clause, which
catches one-element lists, is tried first.
Most importantly, because list is defined recursively, it's easy to implement
recursive algorithms for it. For instance, to calculate the sum of all list elements
it's enough to say that the sum is equal to the first element plus the sum of the
rest. And, of course, the sum of an empty list is zero. So here we go:
data List = Cons Int List | Empty

sumLst :: List -> Int

sumLst (Cons i rest) = i + sumLst rest
sumLst Empty = 0

lst = Cons 2 (Cons 4 (Cons 6 Empty))

main = do
print (sumLst lst)
print (sumLst Empty)

But you don't want to be defining a new list type for each possible element
type. Fortunately, static polymorphism in Haskell is embarassingly easy. No
need for the verbosetemplate<typename T> ugliness. You just parameterize types
by specifying a type argument. You may define a generic list by
replacing Int by a type parameter a (type parameters must start with lower
case and are typically taken from the beginning of the alphabet):
data List a = Cons a (List a) | Empty

List a in this definition is a generic type; List itself is called a type constructor,

because you can use it to construct a new type by providing a type argument,
as in List Int, orList (List Char) (a list of lists of characters). To avoid
confusion, the constructors on the right hand side of a data definition are often
called data constructors, as opposed to the type constructor on the left.
In reality, you don't need to define a list type -- its definition is built into the
language, and it's syntax is very convenient. The type name for a list consists
of a pair of square brackets with the type varaible between them; Cons is
replaced by an infix colon, :; and the Empty list is an empty pair of square
brackets, []. You may think of the built-in list type as defined by this equation:
data [a] = a : [a] | []

Let me rewrite the previous example with this new notation:

sumLst :: [Int] -> Int
sumLst (i : rest) = i + sumLst rest
sumLst [] = 0

lst = [2, 4, 6]

main = do
print (sumLst lst)
print (sumLst [])

There is another convenient feature: special syntax for list literals. Instead of
writing a series of constructors, 2:8:64:[], you can write [2, 8, 64].
Pattern matching may be nested. For instance, you may match the first three
elements of a list with the pattern (a : (b : (c : rest))) or, taking advantage
of the right associativity of :, simply (a : b : c : rest).
Finally, this is the definition of String:
type String = [Char]
String comes with some syntactic sugar of its own: When defining string
literals, you can write"Hello" instead of the more verbose ['H', 'e', 'l', 'l',
'o'] .

Here, the type keyword introduces a type synonym (like the typedef in C). You can
always go back and treat a String as a list of Char -- in particular, you may
pattern match it like a list. We'll be doing a lot of this in the implementation
of tokenize. Type synonyms increase the readability of code and lead to better
error messages, but they don't create new types.
In the next tutorial we'll continue to work on the tokenizer and learn about
guards and touch upon currying.

Exercises
Ex 4. Implement norm that takes a list of Doubles and returns the square root
(sqrt) of the sum of squares of its elements.
norm :: [Double] -> Double
norm lst = undefined

main = print (norm [1.1, 2.2, 3.3])

norm :: [Double] -> Double
norm lst = sqrt (squares lst)

squares :: [Double] -> Double

squares [] = 0.0
squares (x : xs) = x * x + squares xs

main = print (norm [1.1, 2.2, 3.3])

Ex 5. Implement the function decimate that skips every other element of a list.
decimate :: [a] -> [a]
decimate = undefined

-- should print [1, 3, 5]

main = print (decimate [1, 2, 3, 4, 5])
decimate :: [a] -> [a]
decimate (a:_:rest) = a : decimate rest
decimate (a:_) = [a]
decimate _ = []

main = print (decimate [1, 2, 3, 4, 5])

Ex 6. Implement a function that takes a pair of lists and returns a list of pairs.
For instance([1, 2, 3, 4], [1, 4, 9]) should produce [(1, 1), (2, 4), (3, 9)].
Notice that the longer of the two lists is truncated if necessary. Use nested
patterns.
zipLst :: ([a], [b]) -> [(a, b)]

zipLst = undefined

main = print $ zipLst ([1, 2, 3, 4], "Hello")

zipLst :: ([a], [b]) -> [(a, b)]
zipLst ((x : xs), (y: ys)) = (x, y) : zipLst (xs, ys)
zipLst (_, _) = []

main = print $ zipLst ([1, 2, 3, 4], "Hello")

Incidentally, there is a two-argument function zip in the Prelude that does the
same thing:
main = print $ zip [1, 2, 3, 4] "Hello"

Verified Functional Programming in Agda by Aaron Stump
No ratings yet
Verified Functional Programming in Agda by Aaron Stump
256 pages
Haskell Solutions
No ratings yet
Haskell Solutions
31 pages
Haskell
No ratings yet
Haskell
33 pages
CS571 sp24 Lecture16
No ratings yet
CS571 sp24 Lecture16
71 pages
A Taste of Function Programming Using Haskell PDF
100% (1)
A Taste of Function Programming Using Haskell PDF
60 pages
Haskell Programming
No ratings yet
Haskell Programming
12 pages
Prop A Cheat Sheet
No ratings yet
Prop A Cheat Sheet
76 pages
Haskell Ucs 0.4 PDF
No ratings yet
Haskell Ucs 0.4 PDF
2 pages
Haskell Introduction
100% (1)
Haskell Introduction
26 pages
4 Basic Data Types
No ratings yet
4 Basic Data Types
27 pages
Haskell Notes
No ratings yet
Haskell Notes
95 pages
Brief History of "Programming"
No ratings yet
Brief History of "Programming"
50 pages
3120 - Merged
No ratings yet
3120 - Merged
134 pages
Computer Science Notes Year 3
No ratings yet
Computer Science Notes Year 3
130 pages
Haskell Course PDF
No ratings yet
Haskell Course PDF
30 pages
TEXTFP
No ratings yet
TEXTFP
57 pages
Basic Haskell Cheat Sheet: Declaring Types and Classes Common Functions
No ratings yet
Basic Haskell Cheat Sheet: Declaring Types and Classes Common Functions
2 pages
An Introduction To Programming in Haskell: Mark P Jones Portland State University
100% (1)
An Introduction To Programming in Haskell: Mark P Jones Portland State University
58 pages
Introduction To Programming in Haskell
100% (1)
Introduction To Programming in Haskell
111 pages
4 1 CustomType
No ratings yet
4 1 CustomType
32 pages
2010 03 21 Dan - Vasicek.functional Programming Using Haskell
No ratings yet
2010 03 21 Dan - Vasicek.functional Programming Using Haskell
54 pages
Functions
No ratings yet
Functions
5 pages
1 Basics
No ratings yet
1 Basics
39 pages
chp1 Programs
No ratings yet
chp1 Programs
3 pages
Computer Science Notes Year 3
No ratings yet
Computer Science Notes Year 3
130 pages
Haskell - Types & Functions
No ratings yet
Haskell - Types & Functions
9 pages
Programming in Haskell: Chapter 8 - Declaring Types and Classes
No ratings yet
Programming in Haskell: Chapter 8 - Declaring Types and Classes
22 pages
Programming in Haskell Solutions To Exer
No ratings yet
Programming in Haskell Solutions To Exer
31 pages
Haskell PDF
No ratings yet
Haskell PDF
683 pages
Functional Patterns, Recursion & Polymorphism: MPCS 51400
100% (1)
Functional Patterns, Recursion & Polymorphism: MPCS 51400
49 pages
Cheat Sheet
No ratings yet
Cheat Sheet
13 pages
In Haskell, Less Is More - 2016
No ratings yet
In Haskell, Less Is More - 2016
54 pages
FP c03 NoAnim
No ratings yet
FP c03 NoAnim
70 pages
Programming in Haskell: Chapter 3 - Types and Classes
No ratings yet
Programming in Haskell: Chapter 3 - Types and Classes
27 pages
SML Tutorial PDF
No ratings yet
SML Tutorial PDF
34 pages
4.types and Typeclasses in Haskell
No ratings yet
4.types and Typeclasses in Haskell
19 pages
Functional Programming
No ratings yet
Functional Programming
55 pages
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
GADTs
100% (1)
GADTs
23 pages
Algebraic
No ratings yet
Algebraic
45 pages
Haskell PDF
100% (2)
Haskell PDF
504 pages
Exercise 1 (4 + 3 + 4 + 5 + 6 22 Points)
No ratings yet
Exercise 1 (4 + 3 + 4 + 5 + 6 22 Points)
10 pages
Haskell The Craft of Funtional Programing
100% (1)
Haskell The Craft of Funtional Programing
624 pages
Study Notes On Graham Hutton's Haskell Book (2nd Edition)
No ratings yet
Study Notes On Graham Hutton's Haskell Book (2nd Edition)
24 pages
Real World OCaml
No ratings yet
Real World OCaml
13 pages
Haskell Made Easy
100% (1)
Haskell Made Easy
479 pages
4 Types
No ratings yet
4 Types
44 pages
Haskell Cheat Sheet
No ratings yet
Haskell Cheat Sheet
27 pages
Functional Programming 2.2
No ratings yet
Functional Programming 2.2
114 pages
CS571 sp24 Lecture18
No ratings yet
CS571 sp24 Lecture18
36 pages
1 - Haskell Basics - School of Haskell - School of Haskell
No ratings yet
1 - Haskell Basics - School of Haskell - School of Haskell
12 pages
Haskell - Haskell Cheat Sheet
No ratings yet
Haskell - Haskell Cheat Sheet
1 page
Simon Thompson - Haskell - The Craft of Functional Programming (2011, Addison Wesley) - Libgen - Li
No ratings yet
Simon Thompson - Haskell - The Craft of Functional Programming (2011, Addison Wesley) - Libgen - Li
609 pages
05 Intro2FunctionalPP
No ratings yet
05 Intro2FunctionalPP
65 pages
Presentation of Type Classes
0% (1)
Presentation of Type Classes
15 pages
Exercise 1 (4 + 5 + 4 + 5 + 6 24 Points)
No ratings yet
Exercise 1 (4 + 5 + 4 + 5 + 6 24 Points)
10 pages
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Monad Transformers Explained
No ratings yet
Monad Transformers Explained
2 pages
(Guide) MU Offline - Running OFFLINE Server & Client With NO Internet Connection - RaGEZONE - MMO Development Community
No ratings yet
(Guide) MU Offline - Running OFFLINE Server & Client With NO Internet Connection - RaGEZONE - MMO Development Community
6 pages
Haskell Programming Info
No ratings yet
Haskell Programming Info
10 pages
Haskell State Monad
No ratings yet
Haskell State Monad
21 pages
Yashaswi Soni Resume
No ratings yet
Yashaswi Soni Resume
1 page
Chapter 5
No ratings yet
Chapter 5
9 pages
HTML Style Guide and Coding Conventions
No ratings yet
HTML Style Guide and Coding Conventions
16 pages
Personalized Movie Database System PDF
No ratings yet
Personalized Movie Database System PDF
15 pages
Grade8 - Q1-4 - W3 PCO Part 2
No ratings yet
Grade8 - Q1-4 - W3 PCO Part 2
22 pages
Python Syllabus 20
No ratings yet
Python Syllabus 20
3 pages
Dashboard Templates XML Version
No ratings yet
Dashboard Templates XML Version
59 pages
Inter Thread Communication
No ratings yet
Inter Thread Communication
23 pages
HLL Debugging
No ratings yet
HLL Debugging
47 pages
Server Parameter Files
No ratings yet
Server Parameter Files
4 pages
Salidalog
No ratings yet
Salidalog
4 pages
A Summer Project Report Presentation On Online Cake Ordering Sytem For Live Bakery Nepaltar
No ratings yet
A Summer Project Report Presentation On Online Cake Ordering Sytem For Live Bakery Nepaltar
20 pages
Habilitar o Deshabilitar Windows Defender PUA Protection en Win10
No ratings yet
Habilitar o Deshabilitar Windows Defender PUA Protection en Win10
5 pages
Osy Notes
No ratings yet
Osy Notes
13 pages
BCA 421-Java: Tilak Maharashtra University Bachelor of Computer Applications (BCA)
No ratings yet
BCA 421-Java: Tilak Maharashtra University Bachelor of Computer Applications (BCA)
8 pages
Compiler Easy Notes - Hamza Zahoor
No ratings yet
Compiler Easy Notes - Hamza Zahoor
37 pages
Topic 6 - Software Testing
No ratings yet
Topic 6 - Software Testing
31 pages
Intro To Java Programming
No ratings yet
Intro To Java Programming
14 pages
Akhil Sanade Updated Resume
No ratings yet
Akhil Sanade Updated Resume
2 pages
UNIX Question Bank
No ratings yet
UNIX Question Bank
3 pages
Job Description PHP Laravel Developer: Duties & Responsibilities
No ratings yet
Job Description PHP Laravel Developer: Duties & Responsibilities
9 pages
C With C++Object Oriented Programming Development: By: Suryakant Kamble
No ratings yet
C With C++Object Oriented Programming Development: By: Suryakant Kamble
32 pages
Catia V5R20: CATIA - Sheetmetal Production 1 (SH1)
No ratings yet
Catia V5R20: CATIA - Sheetmetal Production 1 (SH1)
4 pages
Online Doctor Appointment
100% (1)
Online Doctor Appointment
97 pages
Computer Graphics Lab Manual
No ratings yet
Computer Graphics Lab Manual
18 pages
Data Base Management
No ratings yet
Data Base Management
35 pages
10 Free Online Courses To Learn Mongodb and Nosql in 2021: Javarevisited
No ratings yet
10 Free Online Courses To Learn Mongodb and Nosql in 2021: Javarevisited
48 pages
DOS Internal Commands
No ratings yet
DOS Internal Commands
40 pages
Chapter2 1
No ratings yet
Chapter2 1
26 pages
Object Oriented ABAP
No ratings yet
Object Oriented ABAP
42 pages