COMP0020-2020-lecture17-ProgrammingExamplesContinued With Captions
COMP0020-2020-lecture17-ProgrammingExamplesContinued With Captions
Christopher D. Clack
2020
Lecture 17
PROGRAMMING EXAMPLES
continued
CONTENTS
• Type:
where x successively is a, b, c, d, e or f
The functions f1, f2, … and states S1, S2, … and remaining
inputs R1, R2, … are chained together as illustrated:
f1 f2 f3
EXAMPLE 2: STATE
Example code:
prog xs = s4
where
(r4, s4) = f3 (r3, s3)
(r3, s3) = f2 (r2, s2)
(r2, s2) = f1 (r1, s1)
r1 = xs
s1 = …. (the start state, whatever it is)
f1 f2 f3
f3 f2 f1
f4 f5 f6 f7
EXAMPLE 2: STATE
• The overall process of converting a string into an This activity is often broken down into
two steps:
internal format is often broken down into two steps: • “lexing” (or “lexical analysis”) uses a
• lexer: [char] -> [token] function called a “lexer” to convert a
list of characters into a list of symbols
• often simple and may not need to be broken (called “tokens” or “lexemes”)
down into subsidiary functions representing different kinds of items
in the input – verbs, nouns,
• parser: [token] -> structure (e.g. a tree output) punctuation, numbers, etc
• more complex and can use the state-passing • “parsing” (or “syntactic analysis”)
uses a function called a “parser” to
structure previously described convert a list of symbols (tokens, e.g.
constructors of an algebraic type) into
• Simplistic example: a structured form, sometimes called a
"parse tree"
lexer “Bob paid Jane 23.50”
• As a simplistic example:
=> [Name “Bob”, Verb “paid”, Name “Jane”, Number 23.50] lexer "Bob paid Jane 23.50”
might evaluate to
[Name "Bob", Verb "paid", Name "Jane",
Number 23.50]
• The top-level parser is looking for a successful parse and tree ::= NullParse | Number num |
BinaryOp tree op tree |
otherwise returns a null result. Brackets tree
op ::= Plus | Minus | Times | Divide
Sometimes there might be more than
• Imagine a parser of arithmetic expressions, using these types: one successful parse for a given input.
parser :: [token] -> tree For example the input
[TNumber 3, TOp Plus, TNumber 4, TOp Plus,
token ::= TNumber num | TOp op | TLbracket | TRbracket TNumber 5]
can parse correctly in two ways:
tree ::= NullParse | Number num | BinaryOp tree op tree |
Brackets tree • BinaryOp (BinaryOp (Number 3) Plus
(Number 4)) Plus (Number 5)
op ::= Plus | Minus | Times | Divide
• BinaryOp (Number 3) Plus (BinaryOp
(Number 4) Plus (Number 5))
• There might be more than one successful parse. E.g. We therefore generalise the type of
[TNumber 3, TOp Plus, TNumber 4, TOp Plus, TNumber 5] parser to produce a list of correct
can parse in two ways: results. The new type will be [token] ->
[tree]. A null parse can now also be
BinaryOp (BinaryOp (Number 3) Plus (Number 4)) Plus (Number 5) represented by an empty output list
BinaryOp (Number 3) Plus (BinaryOp (Number 4) Plus (Number 5)) Subfunction types for use in foldr would
typically have type:
• So output a list of correct results, so parser:: [token] -> [tree]. [([token],tree)] -> [([token],tree)]
However, the following slides will focus
• Subfunction types for use in foldr would typically have type: on simpler subfunctions of type
[([token],tree)] -> [([token],tree)] ([token],tree) -> [([token],tree)]
© 2020 Christopher D. Clack 16
We want to create generic functions,
FUNCTIONAL PROGRAMMING and both the type of the tokens and the
way that the structure is described will
PROGRAMMING EXAMPLES vary between applications. Thus for now
we’ll use the polymorphic type * for the
token and ** for the result (which may
or may not be a tree). A top-level parser
therefore can have type [*] -> [**]
DESIGN Subsidiary parser functions typically
return a list of 2-tuples each containing
(i) the part of the input string they have
• Further generalise the type of a parser. Instead of not yet processed together with (ii) the
resultant structure of what was parsed.
[token] -> [tree] use polymorphic types: [*] -> [**] For convenience we’ll also say that they
take one such tuple, so they must have
type
• A complex parser uses subsidiary parser functions each ([*],**) -> [([*] ,**)]
of which parses only a part of the input and has type: We can give this generic type a name:
([*],**) -> [([*], **)] t_parser * ** == ([*],**) -> [ ([*], **) ]
The name t_parser has been chosen to
• Give this type a name: distinguish the type name t_parser from
the function name parser.
For example, using the algebraic types
t_parser * ** == ([*],**) -> [ ([*], **) ] token and tree defined on the previous
page, a subfunction f1 can have type:
• For any given parser we can instantiate the * and ** to f1 :: t_parser token tree
be what we want. For example, using the algebraic
types token and tree defined on the previous page, a
subfunction f1 can have type:
p2 = satisfy (=‘A’)
• p2 (“ABC”,’[]) => [(“BC”, “A”)]
© 2020 Christopher D. Clack 18
How do we construct complex parsers
FUNCTIONAL PROGRAMMING from smaller, simpler, parsers?
A syntax specification for the input
PROGRAMMING EXAMPLES might say that the input might be (for
example) Token1 followed by Token2
followed by either Token3 or Token4
We now define two parser combinators
that can be used to combine individual
PARSER COMBINATORS parser subfunctions for each of the
tokens into a complex parser
subfunction for the sequence of tokens
then :: t_parser * ** -> t_parser * ** -> t_parser * ** The function then is a parser combinator
then p1 p2 (xs,s) for sequential composition. First p1 is
= [ (xs2, s2) | (xs1, s1) <- p1 (xs,s); applied to the input (xs,s) and generates
a list of results. A first loop binds (xs1,
(xs2, s2) <- p2 (xs1,s1)] s1) to each of the results in turn, and for
each such binding p2 is applied to
(xs1,s1) to generate a list of results; a
second loop binds (xs2, s2) to each of
the results in turn, and each time a new
value (xs2, s2) is created for the final list
alt :: t_parser * ** -> t_parser * ** -> t_parser * ** of results. The result of the function is
alt p1 p2 (xs,s) the list of all possible tuples comprising
the remaining (unprocessed) input xs2
= (p1 (xs,s)) ++ (p2 (xs,s)) with the associate parsed output s2.
The function alt is simpler; it represents
choice and merely needs to concatenate
the results of the two parse functions p1
and p2.
then and alt show two different kinds of
“plumbing” of state. You can create
additional combinators for different
kinds of plumbing.