100% found this document useful (1 vote)
313 views23 pages

Flex and Bison

This document provides an overview of Lex/Flex and Yacc/Bison, which are tools used for lexical analysis and parsing. Lex/Flex is used to generate scanners or lexical analyzers from regular expression rules. It divides input into tokens which are passed to a parser generated by Yacc/Bison. Yacc/Bison generates parsers based on grammar rules. Flex uses a specification file with regular expression patterns and actions to generate a C program that scans input and identifies tokens. Bison works with a parser generated by Flex to analyze program syntax based on a grammar.

Uploaded by

G/her
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
313 views23 pages

Flex and Bison

This document provides an overview of Lex/Flex and Yacc/Bison, which are tools used for lexical analysis and parsing. Lex/Flex is used to generate scanners or lexical analyzers from regular expression rules. It divides input into tokens which are passed to a parser generated by Yacc/Bison. Yacc/Bison generates parsers based on grammar rules. Flex uses a specification file with regular expression patterns and actions to generate a C program that scans input and identifies tokens. Bison works with a parser generated by Flex to analyze program syntax based on a grammar.

Uploaded by

G/her
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

LEX/FLEX AND YACC/BISON

OVERVIEW

Department of Computer Science


Mekelle University
General Lex/Flex Information
• lex
 is a tool to generate lexical analyzers.
 It was written by Mike Lesk and Eric Schmidt (the
Google guy).
 divides a stream of input characters into meaningful units
(lexemes), identifies them (token) and may pass the token
to a parser generator, yacc
 lex specifications are regular expressions
• flex (fast lexical analyzer generator)
–Free and open source alternative.
–You’ll be using this.
General Yacc/Bison Information
• Yacc (yet another compiler compiler)
–Is a tool to generate parsers (syntactic analyzers).
–Generated parsers require a lexical analyzer.
–It isn’t used anymore.
• Bison
–Free and open source alternative.
–You’ll be using this.
Lex/Flex and Yacc/Bison relation to a
compiler tool chain
FLEX IN DETAIL
How Flex Works
• Flex uses a .l spec file to generate a tokenizer/scanner.

• The tokenizer reads an input file and chunks it into a


series of tokens which are passed to the parser.
• Flex is a program that automatically creates a scanner
in C, using rules for tokens as regular expressions.
Internal Structure of Lex/Flex

Regular
NFA DFA Minimal DFA
expressions

The final states of the DFA are associated with actions


Flex/Lex Structure
• Format of the input file is like …
Definitions section (1)
• There are three things that can go in the definitions section:
• C code Any indented code between %{ and %} is copied to the
C file. This is typically used for defining file variables, and for
prototypes of routines that are defined in the code segment.
• definitions A definition is very much like a #define cpp
directive. For example
letter [a-zA-Z]
digit [0-9]
punct [,.:;!?]
nonblank [ˆ \t]
These definitions can be used in the rules section: one could start
a rule {letter}+ {...
Definitions section (2)
• State definitions If a rule depends on context, it’s possible to
introduce states and incorporate those in the rules. A state
definition looks like %s STATE, and by default a state
INITIAL is already given.
Rules section
• The rules section has a number of pattern-action
pairs.
• The patterns are regular expressions and the
actions are either a single C command, or a
sequence enclosed in braces.
• Example:
RE Action
\n linenum++;
[0-9]+ printf(“integer”);
[a-zA-Z] printf(“letter”);
Lex/Flex Regular Expression (1)
• Regular Expression contains:
– text characters (which match the corresponding characters in the strings
being compared) and
– operator characters (which specify repetitions, choices, and other
features).
• Text characters: the letters of the alphabet and the digits are
always text characters.
• Operator Characters: the operator characters are:
“\[]^-?.*+|()$/{}%<>
• and, if they are to be used as text characters, an escape (\) should be used.
• The quotation mark operator (“) indicates that whatever is contained
between a pair of quotes is to be taken as text character.
Lex/Flex Regular Expression (2)
• Character classes:
• Classes of characters can be specified using the operator pair [].
• The construction [abc] matches a single character, which may be
a, b, c.
• Within square brackets, most operator meanings are ignored.
Only three characters are special: these are \ - and ^
• The – character indicates ranges, for example, [a-z0-9]
(i.e., it indicates the character class containing all the lower case
letters and digits)
• If it is desired to include the character - in a character class, it
should be first or last; Ex: [-+0-9] matches all digits and two
signs.
Lex/Flex Regular Expression (3)
• In character classes, the ^ operator must appear as the first
character after the left bracket;
• It indicates that the resulting string is to be complemented with
respect to the computer character set.
• Example:
• [^abc] matches all characters except a, b, or c
• [^a-zA-Z] matches any character which is not a letter.
• The \ character provides the usual escapes within character
class brackets.
Lex/Flex Regular Expression (4)
• Arbitrary character:
• To match almost any character, the operator character. (dot)
is the class of all characters except newline.
• Optional Expression:
• The operator ? Indicates an optional element of an
expression. Ex: ab?c matches either ac or abc
• Repeated Expressions:
• Repetitions of classes are indicated by the operators * and +
• Ex: a* is any number of consecutive a characters
including zero; while a+ is one or more instance of a.
Lex/Flex Regular Expression (5)
• Example:
• [a-z]+ is all strings of lower case letters
• [A-Z][a-z]+ indicates strings with a first upper case
letter followed by any number of lower case letters.
• Alternation and Grouping:
• The operator | indicates alternation:
• Ex: (ab|cd) matches either ab or cd.
• Ex: (ab | cd+)?(ef)* matches such strings as abefef, efefef,
cdef, or cddd; but not abc, abcd, or abcdef
Lex/Flex Regular Expression (6)
• Repetition and Definitions:
• The operators { } specify either repetitions (if they enclose
number) or definition expression (if they enclose a name)
• Ex: {digit} looks for a predefined string named digit and
inserts it at that point in the expression. The definitions are
given in the first part of the Lex input, before the rules.
• In contrast, a{1, 5} looks for 1 to 5 occurrences of a.
Example
Pattern Meaning
c The char “c”
“c” The char “c” even if it is a special char in this table
\c Same as “c”, used to quote a single char
[cd] The char c or the char d
[a-z] Any single char in the range a through z
[^c] Any char but c
. Any char but newline
^x The pattern x if it occurs at the beginning of a line
x$ The pattern x at the end of a line
x? An optional x
Example
x* Zero or more occurrences of the pattern x
x+ One or more occurrences of the pattern x
xy The pattern x concatenated with the pattern y
x|y An x or a y
(x) An x
x/y An x only if followed by y
<S>x The pattern x when lex is in start condition S
{name} The value of a macro from definitions section
x{m} m occurrences of the pattern x
x{m,n} m through n occurrences of x (takes precedence over
concatenation)
Flex/Lex Predefined Variables
Name function
int yylex(void) call to invoke lexer, returns token
char *yytext pointer to matched string
yyleng length of matched string
yylval value associated with token
int yywrap(void) wrapup, return 1 if done, 0 if not done
FILE *yyout output file
FILE *yyin input file
INITIAL initial start condition
BEGIN condition switch start condition
ECHO write matched string
Flex/Lex Action
• When an expression written matched, Lex/Flex executes the
corresponding actions.
• Example:
%%
[a-z]+ printf (“alpha\n”);
[0-9]+ printf (“numeric\n”);
[a-z0-9]+ printf (“alphanumeric\n”);
[ \t]+ printf (“white space\n”);
. printf (“special char\n”);
\n ;
%%
Disambiguation Rules
• If there are several patterns which match the current input,
yylex() chooses one of them according to these rules:

1. The longest match is preferred.


2. Among rules that match the same number of characters, the
rule that occurs earliest in the list is preferred.
Example
• Show the output if the input to yylex() generated by the lex
program above is abc123 abc 123?x
• Solution:
alphanumeric
white space
alpha
white space
numeric
special char
alpha

You might also like