0% found this document useful (0 votes)
57 views27 pages

System Software Manual

The document describes a system software lab involving the use of Lex and Yacc tools. It outlines 10 programs to be executed using Lex that involve tasks like counting characters, recognizing expressions, and identifying keywords. It also lists 10 programs to be implemented using Yacc, including evaluating expressions, recognizing grammars, and validating code. The document then provides details on Lex including its file structure, rules, and regular expressions used. It also explains how to compile and run Lex programs.

Uploaded by

Ashwini SD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views27 pages

System Software Manual

The document describes a system software lab involving the use of Lex and Yacc tools. It outlines 10 programs to be executed using Lex that involve tasks like counting characters, recognizing expressions, and identifying keywords. It also lists 10 programs to be implemented using Yacc, including evaluating expressions, recognizing grammars, and validating code. The document then provides details on Lex including its file structure, rules, and regular expressions used. It also explains how to compile and run Lex programs.

Uploaded by

Ashwini SD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 27

SYSTEM SOFTWARE LAB

Part A

Execution of the following programs using LEX:

1) Program to count the number of vowels and consonants in a given string.


2) Program to count the number of characters, words, spaces and lines in a given
input file.
3) Program to count number of
a) positive and negative integers
b) positive and negative fractions
4) Program to count the number of comment line in a given C program. Also
eliminate them and copy that program into separate file.
5) Program to count the number of ‘scanf’ and ‘printf’ statements in a c
program . replace them with ‘readf’ and ‘writef’ statements respectively.
6) Program to recognize a valid arithmetic expression and identify the identifiers
and operators present.
7) Program to recognize whether a given sentence is simple or compound.
8) Program to recognize and count the number of identifiers in a given input file.
9) Write a lex program to identify the hyperlinks from the given input string.
10) Write a lex program to identify the capital strings from the given input string.

Part B

Execution of the following programs using YACC:

1) Program to test the validity of a simple expression involving operators +,-,*


and /.
2) Program to recognize nested IF control statements and display the number of
levels of nesting.
3) Program to recognize the grammar an b where n>=0.
4) Program to recognize a valid variable, which starts with a letter, followed by
any number of letters or digits.
5) Program to evaluate an arithmetic expression involving operators +.-.*and /.
6) Program to recognize strings ‘aaab’, ‘abbb’, ‘ab’, and ‘ a’ using the grammar
(am bn , where m>0and n>=0)
7) Program to recognize the grammar(an b, n>=10)
8) Program to check the validity of simple if else statements.
9) Program to accept the print the name, salary and age of the employee.
10) Program to recognize the grammar(ambn, where m>=0 and n> 2).
Lex and Yacc

Lex is a tool for building lexers or lexical analyzers. It takes an arbitrary input stream
and tokenizes it. The Lex utility generates a 'C' code which is nothing but a yylex()
function which can be used as an interface to YACC. A good amount of details on
Lex can be obtained from the Man Pages itself. A Practical approach to certain
fundamentals are given here.
        The General Format of a Lex File consists of three sections:
            1. Definitions
            2. Rules
            3. User Subroutines
Definitions consists of any  external 'C' definitions used in the lex actions or
subroutines . e.g all preprocessor directives like #include, #define macros etc. These
are simply copied to the lex.yy.c file.  The other type of definitions are Lex definitions
which are essentially the lex substitution strings,lex start states and lex table size
declarations. The Rules is the basic part which specifies the regular expressions and
their corresponding actions. The User Subroutines are the function definitions of the
functions that are used in the Lex actions.

Things to remember:
1. If there is no R.E for the input string , it will be copied to the standard output.
2. The Lex resolves the ambiguity in case of matching by choosing the longest match
first or by choosing the rule given first.
3. All the matched expressions are contained in yytext whose length is yyleng.

Structure of Lex program

Definition Section
%%
Rules Section
%%
User Subroutines Section

1. Definition Section: It includes the literal block, definitions, start conditions.


i. Literal block: a C code bracketed by the lines
%{
C code, declarations
%}
ii. Definition: allow us to give name to all or part of a RE(regular expression)
that can be referred by name in the rules section.

2. Rules Section: Contains pattern lines and C code. Pattern is written using RE and C
code, also called the action part acts according to the pattern specified. If C code
exceeds one line, then it must be enclosed in braces { }.

3. User Subroutine Section: This section includes routines called from the rules.
main()
{
yylex(); /*lexer or scanner*/
}
Lex specifications are set of patterns, that is pattern part of the rules section, in which
Lex matches against the input. Each time one of the patterns matches, the Lex
program invokes C code, that is the action part of rules section, which takes some
action with the matched token.

Lex translates the lex specifications into a file containing C routine called yylex().The
yylex() will recognize expressions in a stream and perform the specified actions for
each expression as it is detected.

The pattern part of rules section is written using Regular Expressions (REs) RE is a
pattern description using a meta language. REs are composed of normal characters
and meta characters.
The characters/Meta characters that form regular expression along
with their descriptions are listed below:

. Matches any single character except the new line character “\n”
[] Matches any one of the characters within brackets. Also called as character
class. If the first character is circumflex “^”, it changes the meaning to match any
character except those within the brackets. A range of characters is indicated with ‘-‘.
Example:
1. [a-z0-9] indicates the character class containing all the lower case
letters, and the digits.
2. [^ask] matches all characters except a,s, and k

* Matches zero or more of the preceding expression.


Ex: [A-Za-z][A-za-z0-9]* => ap90,a1, z23, w…. indicates all alphanumeric strings
with a leading alphabetic character. This is a typical expression for recognizing
identifiers in computer language.

+ Matches one or more of the preceding expression Ex: a+ => a, aa, aaa….
[a-z]+ is all strings of lower case letters. [ab]+ => ab, abab, ababab…..

? The operator ? indictes an option element of an expression Ex: ab?c matches


either ac or abc. i.e., matches zero or one occurrence of the preceding RE ,here b is
optional.

$ If the very last character is $, the expression will only be matched at the end of
a line. i.e., matches the end of line as the last character of RE. Ex:ab$ matches any
stream that ends with b.

{} Specify either repetitions (if the enclose numbers) or definition expansion (if
the enclose a name). Ex: {digit} looks for a predefined string named digit and inserts
it at that point in the expression. A{1,5} matches looks for 1 to 5 occurrences of a.
i.e., indicates how many times the previous RE is allowed to match when containing
one or two numbers.
| Indicates alternation Ex: (ab|cd) matches either ab or cd. i.e., matches either
the preceding RE or the following RE.
() Groups a series of REs together into a new RE. (ab|cd+)?(ef)* matches such
strings abefef, efef, cdef, cddd.

“..” Interprets everything within the quotation marks literally. Meta characters
other than C escape sequence lose their meaning. Ex:”/*” matches the two characters
* & /.

^ As the first character of RE, it matches the beginning of a line. Also used for
negation within [].

\ used to escape meta characters. If the following character is a lower case


letter, then it is a C escape sequence such as \t,\n etc.,

/ Matches the preceding RE but only if followed by the following RE. Ex:0/1
matches ‘0’ in the string ‘01’ but does not match anything in the string ‘0’or ‘02’.
Only one slash is permitted per pattern.
<> A name or list of names in angle brackets at the beginning of a pattern makes
that pattern apply only in the given start states.

Commands to compile and execute lex programs:


Lex programs has to be stored with filename.l extension, then there are two
steps in compiling the lex program.
1. The Lex source must be turned into generated program in the host general
purpose language. i.e., C language, using the command
$lex filename.l, this lex compiler generates a C file called lex.yy.c, the literal
block, action part of rules section, and user subroutine section of lex program
where C valid statements will be included gets copied as it is to this C file
lex.yy.c. This C file contains the lexer, yylex().When lex scanner runs, it
matches the input against the patterns in the rules section.Every time it finds a
match, it executes the C code associated with the pattern.When no match, lex
writes a copy of the token to the output. Lex executes action for the longest
possible match for the current input.
2. This C file will be compiled using C compiler and loaded, usually with a
library of lex subroutines. command for compiling this is
$cc lex.yy.c –ll, where –ll is the loader flag accesss the lex library.
The resulting program is placed on the usual file a.out for later execution. Or
we can create our own executable file using the command
$cc lex.yy.c –o filename –ll, where filename is our executable file. To
terminate, press Cntrl+d.

Lex source program


filename.l Lex Compiler Lex.yy.c

a.out
C compiler

Input Stream a.out Sequence of tokens

Fig : Creating a Lexical Analyzer with Lex

Lex Practice

Metacharacter Matches

. any character except newline

\n newline

* zero or more copies of the preceding expression

+ one or more copies of the preceding expression

? zero or one copy of the preceding expression

^ beginning of line

$ end of line
a|b a or b

(ab)+ one or more copies of ab (grouping)

"a+b" literal "a+b" (C escapes still work)

[] character class

Table 1: Pattern Matching Primitives

Expression Matches

abc abc

abc* ab abc abcc abccc ...

abc+ abc, abcc, abccc, abcccc, ...

a(bc)+ abc, abcbc, abcbcbc, ...

a(bc)? a, abc

[abc] one of: a, b, c

[a-z] any letter, a through z

[a\-z] one of: a, -, z

[-az] one of: - a z

[A-Za-z0-9]+ one or more alphanumeric characters

[ \t\n]+ whitespace

[^ab] anything except: a, b

[a^b] a, ^, b

[a|b] a, |, b

a|b a, b

Table 2: Pattern Matching Examples


Regular expressions in lex are composed of metacharacters (Table 1). Pattern-match-
ing examples are shown in Table 2. Within a character class, normal operators lose
their meaning. Two operators allowed in a character class are the hyphen ("-") and cir-
cumflex ("^"). When used between two characters, the hyphen represents a range of
characters. The circumflex, when used as the first character, negates the expression. If
two patterns match the same string, the longest match wins. In case both matches are
the same length, then the first pattern listed is used.

... definitions ...


%%
... rules ...
%%
... subroutines ...

Input to Lex is divided into three sections, with %% dividing the sections. This is
best illustrated by example. The first example is the shortest possible lex file:

%%

Input is copied to output, one character at a time. The first %% is always required, as
there must always be a rules section. However, if we don’t specify any rules, then the
default action is to match everything and copy it to output. Defaults for input and out-
put are stdin and stdout, respectively. Here is the same example, with defaults explic-
itly coded:

%%
/* match everything except newline */
. ECHO;
/* match newline */
\n ECHO;

%%

int yywrap(void) {
return 1;
}

int main(void) {
yylex();
return 0;
}

Two patterns have been specified in the rules section. Each pattern must begin in col-
umn one. This is followed by whitespace (space, tab or newline), and an optional ac-
tion associated with the pattern. The action may be a single C statement, or multiple C
statements enclosed in braces. Anything not starting in column one is copied verbatim
to the generated C file. We may take advantage of this behavior to specify comments
in our lex file. In this example there are two patterns, "." and "\n", with an ECHO ac-
tion associated for each pattern. Several macros and variables are predefined by lex.
ECHO is a macro that writes code matched by the pattern. This is the default action
for any unmatched strings. Typically, ECHO is defined as:

#define ECHO fwrite(yytext, yyleng, 1, yyout)

Variable yytext is a pointer to the matched string (NULL-terminated), and yyleng is


the length of the matched string. Variable yyout is the output file, and defaults to std-
out. Function yywrap is called by lex when input is exhausted. Return 1 if you are
done, or 0 if more processing is required. Every C program requires a main function.
In this case, we simply call yylex, the main entry-point for lex. Some implementations
of lex include copies of main and yywrap in a library, eliminating the need to code
them explicitly. This is why our first example, the shortest lex program, functioned
properly.

Name Function

int yylex(void) call to invoke lexer, returns token

char *yytext pointer to matched string

yyleng length of matched string


yylval value associated with token

int yywrap(void) wrapup, return 1 if done, 0 if not done

FILE *yyout output file

FILE *yyin input file

INITIAL initial start condition

BEGIN condition switch start condition

ECHO write matched string

Table 3: Lex Predefined Variables

Here is a program that does nothing at all. All input is matched, but no action is asso-
ciated with any pattern, so there will be no output.

%%
.
\n

The following example prepends line numbers to each line in a file. Some implemen-
tations of lex predefine and calculate yylineno. The input file for lex is yyin, and de-
faults to stdin.

%{
int yylineno;
%}
%%
^(.*)\n printf("%4d\t%s", ++yylineno, yytext);
%%
int main(int argc, char *argv[]) {
yyin = fopen(argv[1], "r");
yylex();
fclose(yyin);
}
The definitions section is composed of substitutions, code, and start states. Code in
the definitions section is simply copied as-is to the top of the generated C file, and
must be bracketed with "%{" and "%}" markers. Substitutions simplify pattern-
matching rules. For example, we may define digits and letters:

digit [0-9]
letter [A-Za-z]
%{
int count;
%}
%%
/* match identifier */
{letter}({letter}|{digit})* count++;
%%
int main(void) {
yylex();
printf("number of identifiers = %d\n", count);
return 0;
}

Whitespace must separate the defining term and the associated expression. References
to substitutions in the rules section are surrounded by braces ({letter}) to distinguish
them from literals. When we have a match in the rules section, the associated C code
is executed. Here is a scanner that counts the number of characters, words, and lines
in a file (similar to Unix wc):

%{
int nchar, nword, nline;
%}
%%
\n { nline++; nchar++; }
[^ \t\n]+ { nword++, nchar += yyleng; }
. { nchar++; }
%%
int main(void) {
yylex();
printf("%d\t%d\t%d\n", nchar, nword, nline);
return 0;
}

Yacc(Yet another compiler compiler)

Yacc provides a general tool for imposing structure on the input to a computer
program. Yacc is the Utility which generates the function 'yyparse' which is indeed
the Parser. Yacc describes a context free , LALR(1) grammar and supports both
bottom-up and top-down parsing.The general format for the YACC file is very similar
to that of the Lex file.

          1. Declarations
          2. Grammar Rules
          3. Subroutines
In Declarations apart from the legal 'C' declarations there are few Yacc specific
declarations which begins with a %sign.

          1.  %union    It defines the Stack type for the Parser.


                        It is a union of various datas/structures/  objects.

          2.  %token    These are the terminals returned by  the yylex


                        function to the yacc. A token can also have type
                        associated with it for good type checking and
                        syntax directed translation. A type of a token
                        can be specified as %token <stack member>
                        tokenName.

          3. %type      The type of a non-terminal symbol in


                        the Grammar rule can be specified with this.
                        The format is %type <stack member>
                        non-terminal.

          4. %noassoc   Specifies that there is no associativity


                        of a terminal symbol.

          5. %left      Specifies the left associativity of


                        a Terminal Symbol

          6. %right     Specifies the right assocoativity of


                        a Terminal Symbol.

          7. %start     Specifies the L.H.S non-terminal symbol of a


                        production rule which should be taken as the
                        starting point of the grammar rules.

          8. %prec     Changes the precedence level associated with


                       a particular rule to that of the following
                       token name or literal.
                       The grammar rules are specified as follows:
                       Context-free grammar production-
                         p->AbC
                       Yacc Rule-
                           p : A b C   { /*   'C' actions   */}
The general style for coding the rules is to have all Terminals in upper-case and all
non-terminals in lower-case.
To facilitates a proper syntax directed translation the Yacc has something called
pseudo-variables which forms a bridge between the values of terminal/non-terminals
and the actions. These pseudo variables are $$,$1,$2,$3......   The $$ is the L.H.S
value of the rule whereas $1 is the first R.H.S value of the rule and so is $2 etc. The
default type for pseudo variables is integer unless they are specified by %type ,
%token <type> etc.
Structure of Yacc program
Declaration section
%%
Rules section
%%
User subroutine section
Declaration/Definition section:
. Includes declarations of the tokens used in the grammar. It can also include a
literal block, C code enclosed in
%{
%}
. Includes %token, %union, %start, %type, %left, %right, and %nonassoc
declarations.
Rules section:
. Contains the grammar rules and actions containing C code.
. Each rule starts with a non-terminal symbol and a colon followed by a
possibly empty list of symbols or tokens and actions. Ex: e:e’+’e
. Blanks, tabs, and new lines are ignored except that the may not appear in
names or multi-character reserved symbols.
User Subroutine section:
. Yacc copies the contents of this section verbatim to the C file.
. Typically includes routines called from the actions.
main()
{
yyparse() /* parser*/
}
Compiling and executing Yacc programs:
Yacc programs must be stored as filename.y Extension, then there are two steps in
compiling the Yacc program.

 The Yacc source must be turned into generated program in the host
general purpose language. i.e., C language, using the command $yacc -
d filename.y(-d is token definition), this yacc compiler generates a C
file called y.tab.c, the literal block, action part of rules section, and
user subroutine section of Yacc program where C valid statements will
be included gets copied as it is to this C file y.tab.c. This C file
contains the parser, yyparse().When Yacc parser runs, it in turn
repeatedly calls yylex, the lexical analyzer which supplies tokens to
yacc as and when required. When an error is detected, parse returns the
value 1, or the lexical analyzer returns the end marker token and the
parser accepts. In this case, yyparse returns the value 0.
 This C file will be compiled using C compiler and loaded, usually with
a library of yacc and lex subroutines. Here first lex program must be
compiled as usual which generates the C file lex.yy.c, then Yacc
program must be compiled which generates the C file called y.tab.c.
Now both C files will be compiled using C compiler.
$cc lex.yy.c y.tab.c –ll -ly, where –ly is the loader flag accesss the
Yacc library.
The resulting program is placed on the usual file a.out for later
execution. To terminate, press Cntrl+d.

Yacc specification Yacc Compiler y.tab.c

C compiler a.out

Input Stream a.out Sequence of tokens

Fig : Parser construction using yacc

Special characters and Library routines:


1. yylex() => The scanner/lexer created b Lex has the entry point yylex().It
scans the program. All code in the rules section is copied into yylex().

2. yytext => Whenever the lexer matches a token, the text of the token is
stored in the null terminated string yytext. It is array of characters whose
contents are replaced each time new token is matched.

3. yywrap() => When a lexer encounters an end of file, it calls the routine
yywrap() to find out what to do next. If yywrap() returns 0, the scanner
continues scanning, if it returns 1, the scanner returns zero token to report
end of file.

4. yyin,yyout => Standard input and output files of lex. Like stdin & stdout
files used in c.

5. Echo => Writes the token to the current output file yyout. Equivalent to
fprintf(yyout,”%s”,yytext);

6. input() => Provides character to the lexer. Also yyinput()

7. output => Writes its arguments to the ouput file yyout.i.e putc(c,yyout).
Also yyout().

8. unput() => returns the character to the input stream. Also yyunput().

9. yyleng => Stores the length of yytext. Same as strlen(yytext).

10. yyless() => yyless(n) is used to push back the ‘n’ characters of the token.

11. yymore() => Can be used to append more text to the token.

12. yyparse() => The entry point to the yacc generated parser. Returns zero on
success and non-zero on failure.
13. yyerror() => Simple error reporting routine, yyerror(char *msg).

14. % => Used to declare the definitions like %token, %start, %type, %left,
%right, %union.

15. $ => Introduces a value of reference in actions. Ex: $3 refers the value of
third symbol in the RHS of the rule, c=12+89, $3 refers to value 89.

16. ‘ => Used to define literal tokens Ex: ‘+’, ‘-‘,…

17. ; => Each rule in the rule section end with a semicolon.

18. | => To specify the alternative RHS for the same LHS in a rule. Ex:
e : e’+’e|e’-‘e|e’*’e.

19. : => Used to separate LHS and RHS of a rule.

20. %token => Are the symbols that the lexer passes to the parser. So parser
need to call yylex() which returns the tokens required by the parser. All
tokens must be explicitly defined in the definition section.

21. %left, %right, %nonassoc => Explicit means of specifying left, right, and
no associativity.

22. %start <rule name> => Specifies the first rule that the parser should start.

23. %prec => Changes the precedence level associated with a particular
grammar rule. Ex: unary minus may be given highest level of precedence,
whereas binary minus will have lower level precedence.

24. %s or %x => Indicates start condition.


25. %union => As there may be multiple types of symbol values, expressions
may have double values, lexer should return the value of the variable as
double, this is accomplished by %union.

26. %type => Sets the tpe for non-terminals. Ex: %union { double dval;}
%type <dval> expression.

27. YYABORT => Causes yyparse() to return immediately with a non zero
value(failure).

28. YYACCEPT => Causes yyparse() to return immediately with a


zero(success).

Yacc Practice, Part II

... definitions ...


%%
... rules ...
%%
... subroutines ...

Input to yacc is divided into three sections. The definitions section consists of token
declarations, and C code bracketed by "%{" and "%}". The BNF grammar is placed
in the rules section, and user subroutines are added in the subroutines section.

This is best illustrated by constructing a small calculator that can add and subtract
numbers. We’ll begin by examining the linkage between lex and yacc. Here is the def-
initions section for the yacc input file:

%token INTEGER
This definition declares an INTEGER token. When we run yacc, it generates a parser
in file y.tab.c, and also creates an include file, y.tab.h:

#ifndef YYSTYPE
#define YYSTYPE int
#endif
#define INTEGER 258
extern YYSTYPE yylval;

Lex includes this file and utilizes the definitions for token values. To obtain tokens,
yacc calls yylex. Function yylex has a return type of int, and returns the token. Values
associated with the token are returned by lex in variable yylval. For example,

[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}

would store the value of the integer in yylval, and return token INTEGER to yacc.
The type of yylval is determined by YYSTYPE. Since the default type is integer, this
works well in this case. Token values 0-255 are reserved for character values. For ex-
ample, if you had a rule such as

[-+] return *yytext; /* return operator */

the character value for minus or plus is returned. Note that we placed the minus sign
first so that it wouldn’t be mistaken for a range designator. Generated token values
typically start around 258, as lex reserves several values for end-of-file and error pro-
cessing. Here is the complete lex input specification for our calculator:

%{
#include "y.tab.h"
#include <stdlib.h>
void yyerror(char *);
%}
%%

[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}

[-+\n] return *yytext;

[ \t] ; /* skip whitespace */

. yyerror("invalid character");

%%

int yywrap(void) {
return 1;
}

Internally, yacc maintains two stacks in memory; a parse stack and a value stack. The
parse stack contains terminals and nonterminals, and represents the current parsing
state. The value stack is an array of YYSTYPE elements, and associates a value with
each element in the parse stack. For example, when lex returns an INTEGER token,
yacc shifts this token to the parse stack. At the same time, the corresponding yylval is
shifted to the value stack. The parse and value stacks are always synchronized, so
finding a value related to a token on the stack is easily accomplished. Here is the yacc
input specification for our calculator:

%{
int yylex(void);
void yyerror(char *);
%}
%token INTEGER

%%

program:
program expr '\n' { printf("%d\n", $2); }
|
;

expr:
INTEGER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;

%%

void yyerror(char *s) {


fprintf(stderr, "%s\n", s);
}

int main(void) {
yyparse();
return 0;
}

The rules section resembles the BNF grammar discussed earlier. The left-hand side of
a production, or nonterminal, is entered left-justified, followed by a colon. This is fol-
lowed by the right-hand side of the production. Actions associated with a rule are en-
tered in braces.
By utilizing left-recursion, we have specified that a program consists of zero or more
expressions. Each expression terminates with a newline. When a newline is detected,
we print the value of the expression. When we apply the rule

expr: expr '+' expr { $$ = $1 + $3; }

we replace the right-hand side of the production in the parse stack with the left-hand
side of the same production. In this case, we pop "expr '+' expr" and push "expr".
We have reduced the stack by popping three terms off the stack, and pushing back one
term. We may reference positions in the value stack in our C code by specifying "$1"
for the first term on the right-hand side of the production, "$2" for the second, and so
on. "$$" designates the top of the stack after reduction has taken place. The above ac-
tion adds the value associated with two expressions, pops three terms off the value
stack, and pushes back a single sum. Thus, the parse and value stacks remain synchro-
nized.

Numeric values are initially entered on the stack when we reduce from INTEGER to
expr. After INTEGER is shifted to the stack, we apply the rule

expr: INTEGER { $$ = $1; }

The INTEGER token is popped off the parse stack, followed by a push of expr. For
the value stack, we pop the integer value off the stack, and then push it back on again.
In other words, we do nothing. In fact, this is the default action, and need not be spec-
ified. Finally, when a newline is encountered, the value associated with expr is
printed.

In the event of syntax errors, yacc calls the user-supplied function yyerror. If you
need to modify the interface to yyerror, you can alter the canned file that yacc in-
cludes to fit your needs. The last function in our yacc specification is main … in case
you were wondering where it was. This example still has an ambiguous grammar.
Yacc will issue shift-reduce warnings, but will still process the grammar using shift as
the default operation.
Yacc Practice, Part II

In this section we will extend the calculator from the previous section to incorporate
some new functionality. New features include arithmetic operators multiply, and di-
vide. Parentheses may be used to over-ride operator precedence, and single-character
variables may be specified in assignment statements. The following illustrates sample
input and calculator output:

user: 3 * (4 + 5)
calc: 27
user: x = 3 * (4 + 5)
user: y = 5
user: x
calc: 27
user: y
calc: 5
user: x + 2*y
calc: 37

The lexical analyzer returns VARIABLE and INTEGER tokens. For variables, yyl-
val specifies an index to sym, our symbol table. For this program, sym merely holds
the value of the associated variable. When INTEGER tokens are returned, yylval
contains the number scanned. Here is the input specification for lex:

%{
#include <stdlib.h>
#include "y.tab.h"
void yyerror(char *);
%}

%%

/* variables */
[a-z] {
yylval = *yytext - 'a';
return VARIABLE;
}

/* integers */
[0-9]+ {
yylval = atoi(yytext);
return INTEGER;
}

/* operators */
[-+()=/*\n] { return *yytext; }

/* skip whitespace */
[ \t] ;

/* anything else is an error */


. yyerror("invalid character");

%%

int yywrap(void) {
return 1;
}

The input specification for yacc follows. The tokens for INTEGER and VARIABLE
are utilized by yacc to create #defines in y.tab.h for use in lex. This is followed by
definitions for the arithmetic operators. We may specify %left, for left-associative, or
%right, for right associative. The last definition listed has the highest precedence.
Thus, multiplication and division have higher precedence than addition and subtrac-
tion. All four operators are left-associative. Using this simple technique, we are able
to disambiguate our grammar.

%token INTEGER VARIABLE


%left '+' '-'
%left '*' '/'

%{
void yyerror(char *);
int yylex(void);
int sym[26];
%}

%%

program:
program statement '\n'
|
;

statement:
expr { printf("%d\n", $1); }
| VARIABLE '=' expr { sym[$1] = $3; }
;

expr:
INTEGER
| VARIABLE { $$ = sym[$1]; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;

%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}

int main(void) {
yyparse();
return 0;
}

You might also like