Compiler design file part 1
Compiler design file part 1
AIM: Practice of LEX/Yacc tool of compiler writing and wap to calculate tokens
THEORY:
A compiler or interpreter for a programming language is often decomposed into two parts:
1. Read the source program and discover its structure.
2. 2. Process this structure, e.g. to generate the target program.
Lex and Yacc can generate program fragments that solve the first task. The task of
discovering the source structure again is decomposed into subtasks:
1. Split the source file into tokens (Lex).
2. Find the hierarchical structure of the program (Yacc).
Lex – A Lexical Analyzer Generator Lex is a program generator designed for Lexical
processing of character input streams. It accepts a high – level, problem-oriented
specification for character string matching, and produces a program in a general-purpose
language which recognizes regular expressions. The regular expressions are specified by the
user in the source specifications given to Lex. The Lex written code recognizes these
expressions in an input stream and partitions the input stream into strings matching the
expressions. At the boundaries between strings program sections provided by the user are
executed. The lex source file associates the regular expressions and the program fragments.
As each expression appears in the input to the program written by Lex, the corresponding
fragment is executed. Lex helps write programs whose control flow is directed by instances
of regular expressions in the input stream. It is well suited for editor – script type
transformations and for segmenting input in preparation for a parsing routine. Lex source is
a table of regular expressions and corresponding program fragments. The table is translated
to a program which reads an input stream, copying it to an output stream and partitioning
the input into strings which match the given expressions. As each such string is recognized
the corresponding program fragment is executed. The recognition of the expressions is
performed by a deterministic finite automation generated by Lex. The program fragments
written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream. The lexical analysis programs written with Lex accept
ambiguous specifications and choose the longest match possible at each input point. If
necessary, substantial lookahead is performed on the input, but the input stream will be
backed up to the end of the current partition, so that the user has general freedom to
manipulate it. Lex can generate analyzers in either C or Ratfor, a language which can be
translated automatically to portable Fortran. It is available on the PDP-11 UNIX, Honeywell
GCOS, and IBM OS systems. This manual, however, will only discuss generating analyzers in
C on the UNIX system, which is the only supported form of Lex under UNIX Version 7.
Lex is designed to simplify interfacing with Yacc, for those with access to this compiler-
compiler system. Lex generates programs to be used in simple lexical analysis of text. The
input files (standard input default) contain regular expressions to be searched for and
actions written in C to be executed when expressions are found. A C source program,
lex.yy.c is generated. This program, when run, copies unrecognized portions of the input to
the output, and executes the associated C action foreach regular expression that is
recognized. The options have the following meanings. Place the result on the standard
output instead of in file lex.yy.c.-v Print a one- line summary of statistics of the generated
analyzer. -n Opposite of – v; -n is default. -9 Adds code to be able to compile through the
native C compilers.
EXAMPLE This program converts upper case to lower, removes blanks at the end of lines,
and replaces multiple blanks by single blanks
%%[A-Z}
Putchar(yytext[0]+’a’-A);
[]+$ []+putchar(‘ ‘)
Yacc is a computer program that serves as the standard parser generator on Unix systems.
The name is an acronym for “Yet Another Compiler Compiler.” It generates a parser (the
part of a compiler that tries to make sense of the input) based on an analytic grammar
written in BNF notation. Yacc generates the code for the parser in the C programming
language. Yacc provides a general tool for imposing structure on the input to a computer
program. The Yacc user prepares a specification of the input process; this includes rules
describing the input structure, code to be invoked when these rules are recognized, and a
low-level routine to do the basic input. Yacc then generates a function to control the input
process. This function, called a parser, calls the usersupplied low-level input routine (the
lexical analyzer) to pick up the basic items (called tokens) from the input stream. These
tokens are organized according to the input structure rules, called grammar rules; when one
of these rules has been recognized, then user code supplied for this rule, an action, is
invoked; actions have the ability to return values and make use of the values of other
actions. Yacc provides a general tool for describing the input to a computer program. The
Yacc user specifies the structures of his input, together with code to be invoked as each
such structure is recognized. Yacc turns such a specification into a most of the flow of
control in the user’s application handled by this subroutine. The input subroutine produces
by Yacc calls a usersupplied routine to return the next basic input item. Thus, the user can
specify his input in terms of individual input characters, or in terms of higher level
constructs such as names and numbers. The user-supplied routine may also handle
idiomatic features such as comment and continuation conventions, which typically defy
easy grammatical specification. Yacc is written in portable C. The class of specification
accepted is a very general one: LALR(1) grammars with disambiguating rules. In addition to
compilers for C, APL, pascal, RAFTOR, etc., Yacc has been used for less conventional
languages, including a photopesetter language, several desk calculator languages, a
document retrieval system, and a Fortan debugging system.
Yacc converts a context-free grammar and translation code into a set of tables for an LR(1)
parser and translator. The grammar may be ambiguous; specified precedence rules are used
to break ambiguities. The output file, y.tab.c, must be compiled by the C compiler to
produce a program yyparse. This program must be loaded with a lexical analyzer function,
yylex(void) (often generated by lex(1)), with a main (int argc, char *argv[]) program, and
with an error handling routine, yyerror(char*). The options are -o output Direct output to
the specified file instead of y.tab.c. Create file y.debug, containing diagnostic messages . v
Create file y.output, containing a description of the parsing tables and of conflicts arising
from ambiguities in the grammar. -d Create file y.tab.h, containing #define statements that
associate yacc-assigned ‘taken codes’ with user- declared ‘token names. Include it in source
files other than y.tab.c to give access to the token codes. -s stem Change the prefix y of the
file names y.tab.c, y.tab.h, y.debug, and y.output in stem. -S Write a parser that uses Stdio
instead of the print routines in libc.
/Lex program to count number of words/
%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}
/* Rules Section */
%%
([a-z A-Z0-9])* {i++;} /* Rule for counting number of words*/
int yywrap(void){}
Int main()
{
// The function that starts the analysis
yylex();
return 0;
}
PRACTICAL - 2
AIM : Write a program to check whether a string belong to the grammar or not
PROGRAM:
#include <stdio.h>
#include <string.h>
int main() {
char string[50];
int flag = 0;
PROGRAM:
#include <stdio.h>
#include <string.h>
string[strcspn(string,"\n")]='\0';
keyword[strcspn(keyword,"\n")]='\0';
return 0;
}