A3 47 Practical2
A3 47 Practical2
Roll No: 47
Practical No. 2
Theory:
YACC:
Yacc (Yet Another Compiler-Compiler) is a computer program for
the Unix operating system developed by Stephen C. Johnson. It is a Look
Ahead Left-to-Right (LALR) parser generator, generating a parser, the part of
a compiler that tries to make syntactic sense of the source code, specifically
a LALR parser, based on an analytic grammar written in a notation similar
to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD
and AT&T Unix. GNU-based Linux distributions include Bison, a forward-
compatible Yacc replacement.
The input to Yacc is a grammar with snippets of C code (called "actions")
attached to its rules. Its output is a shift-reduce parser in C that executes the
C snippets associated with each rule as soon as the rule is recognized.
Typical actions involve the construction of parse trees. Using an example from
Johnson, if the call node (label, left, right) constructs a binary parse tree node
with the specified label and children, then the rule.
recognizes summation expressions and constructs nodes for them. The
special identifiers $$, $1 and $3 refer to items on the parser's stack.
Yacc produces only a parser (phrase analyzer); for full syntactic analysis this
requires an external lexical analyzer to perform the first tokenization stage
(word analysis), which is then followed by the parsing stage proper. Lexical
analyzer generators, such as Lex or Flex are widely available.
The IEEE POSIX P1003.2 standard defines the functionality and requirements
for both Lex and Yacc.
Some versions of AT&T Yacc have become open source. For
example, source code is available with the standard distributions of Plan 9.
Diagram of YACC
Basic Specifications
declarations
%%
rules
%%
Programs
Yacc turns the specification file into a C program, which parses the input
according to the specification given. The algorithm used to go from the
specification to the parser is complex, and will not be discussed here (see the
references for more information). The parser itself, however, is relatively
simple, and understanding how it works, while not strictly necessary, will
nevertheless make treatment of error recovery and ambiguities much more
comprehensible.
The parser produced by Yacc consists of a finite state machine with a stack.
The parser is also capable of reading and remembering the next input token
(called the lookahead token). The current state is always the one on the top of
the stack. The states of the finite state machine are given small integer labels;
initially, the machine is in state 0, the stack contains only state 0, and no
lookahead token has been read.
The machine has only four actions available to it, called shift, reduce, accept,
and error. A move of the parser is done as follows:
1. Based on its current state, the parser decides whether it needs a lookahead
token to decide what action should be done; if it needs one, and does not
have one, it calls yylex to obtain the next token.
2. Using the current state, and the lookahead token if needed, the parser
decides on its next action, and carries it out. This may result in states being
pushed onto the stack, or popped off of the stack, and in the lookahead token
being processed or left alone.
Actions
With each grammar rule, you can associate actions to be performed when the
rule is recognized. Actions can return values and can obtain the values
returned by previous actions. Moreover, the lexical analyzer can return values
for tokens, if desired.
A : '(' B ')'
{
hello(1, "abc" );
}
and
{$$ = 1;}
To obtain the values returned by previous actions and the lexical analyzer, the
action can use the pseudo-variables $1, $2, ... $n. These refer to the values
returned by components 1 through n of the right side of a rule, with the
components being numbered from left to right. If the rule is
A: B C D ;
then $2 has the value returned by C, and $3 the value returned by D. The
following rule provides a common example:
You would expect the value returned by this rule to be the value of
the expr within the parentheses. Since the first component of the action is the
literal left parenthesis, the desired logical result can be indicated by:
By default, the value of a rule is the value of the first element in it ($1). Thus,
grammar rules of the following form frequently need not have an explicit
action:
A : B ;
In previous examples, all the actions came at the end of rules. Sometimes, it
is desirable to get control before a rule is fully parsed. yacc permits an action
to be written in the middle of a rule as well as at the end.
A : B
{
$$ = 1;
}
C {
x = $2;
y = $3;
}
;
Question:
1. What is the use of yyparse()
yyparse() is the main parsing function generated by YACC
2. What is y.tab.h contains?
It contains the token for all the grammar.
3. How to declare terminals, nonterminals & start symbol in yacc file.
%token PLUS MINUS MUL DIV ID
%start input
4. Justify the need of yyerror(). Specify its syntax
Handles syntax errors encountered during parsing
void yyerror(const char *s) { printf("Syntax Error: %s\n", s); }
Practical E1
Aim: Write YACC specification to check syntax of a simple expression involving
operators +, -, *
and /. Also convert the arithmetic expression to postfix.
Program:
%{
#include "y.tab.h"
%}
%%
. { return yytext[0]; }
%%
int yywrap() {
return 1;
}
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern FILE *yyin;
void yyerror(const char *s);
extern int yylex();
extern int yywrap();
char postfix[100];
int index = 0;
%}
%union {
char str[100];
}
%token <str> ID
%token PLUS MINUS MUL DIV
%left PLUS MINUS
%left MUL DIV
%%
input:
expr {
printf("The Entered expression is valid.\n");
printf("The postfix notation is : %s\n", $1);
}
;
expr:
ID {
strcpy($$, $1);
}
| expr PLUS expr {
snprintf($$, sizeof($$), "%s%s+", $1, $3);
}
| expr MINUS expr {
snprintf($$, sizeof($$), "%s%s-", $1, $3);
}
| expr MUL expr {
snprintf($$, sizeof($$), "%s%s*", $1, $3);
}
| expr DIV expr {
snprintf($$, sizeof($$), "%s%s/", $1, $3);
}
;
%%
int main() {
yyin = fopen("expression.txt", "r");
yyparse();
return 0;
}
Output:
Input : x + y * z - w / p
The Entered expression is valid.
The postfix notation is : xyz*+wp/-
Practical E2
Program:
%{
#include "y.tab.h" // Include the header file generated by Bison
%}
%%
%%
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%union {
int num;
char *id;
}
%%
program:
while_loop { printf("Valid while loop syntax.\n"); }
;
while_loop:
WHILE LPAREN condition RPAREN LBRACE statements RBRACE
;
condition:
expression
| TRUE
| FALSE
;
expression:
NUMBER
| ID
| expression PLUS expression
| expression MINUS expression
| expression MUL expression
| expression DIV expression
| expression OR expression
| expression AND expression
| expression EQ expression
| expression NEQ expression
| expression LT expression
| expression LE expression
| expression GT expression
| expression GE expression
| LPAREN expression RPAREN
;
statements:
/* No statements */
| statement statements
;
statement:
ID ASSIGN expression SEMICOLON
;
%%
int main() {
printf("Enter a while loop to validate:\n");
yyparse();
return 0;
}
int yywrap() {
return 1;
}
Output:
Practical E3
Program:
%{
#include "y.tab.h" // Include the header file generated by Bison
%}
%%
%%
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%union {
int num;
char *id;
}
%%
program:
while_loop { printf("Valid while loop syntax.\n"); }
;
while_loop:
WHILE LPAREN condition RPAREN LBRACE statements RBRACE
;
condition:
expression
| TRUE
| FALSE
;
expression:
NUMBER
| ID
| expression PLUS expression
| expression MINUS expression
| expression MUL expression
| expression DIV expression
| expression OR expression
| expression AND expression
| expression EQ expression
| expression NEQ expression
| expression LT expression
| expression LE expression
| expression GT expression
| expression GE expression
| LPAREN expression RPAREN
;
statements:
/* No statements */
| statement statements
;
statement:
ID ASSIGN expression SEMICOLON
;
%%
int main() {
printf("Enter a while loop to validate:\n");
yyparse();
return 0;
}
int yywrap() {
return 1;
}
Output