0% found this document useful (0 votes)
34 views7 pages

C Depart

Uploaded by

kishumanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
34 views7 pages

C Depart

Uploaded by

kishumanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
UNIT II: yntax Analys By design, every programming language has precise rules that prescribe the syntactic structure of well-formed programs. In C, for example, a program is made up of functions, a function out of declarations and statements, a statement out of expressions, and so on, The syntax of programming language constructs can be specified by context-free grammars or BNF (Backus-Naur Form) notation. Grammars offer significant benefits for both language designers and compiler writers. * A grammar gives a precise, yet easy-to-understand, syntactic specification of a programming language. © From certain classes of grammars, we can construct automatically an efficient parser that determines the syntactic structure of a source program. As a side benefit, the parser-construction process can reveal syntactic ambiguities and trouble spots that might have slipped through the initial design phase of a language. © The structure imparted to a language by a properly designed grammar is useful for translating source programs into correct object code and for detecting errors. © A grammar allows a language to be evolved or developed iteratively, by adding new constructs to perform new tasks. These new constructs can be integrated more easily into an implementation that follows the grammatical structure of the language. The Role of the Parser: In compiler model, the parser obtains a string of tokens from the lexical analyzer and verifies that the string of token names can be generated by the grammar for the source language. We expect the parser to report any syntax errors in an intelligible fashion and to recover from commonly occurring errors to continue processing the remainder of the program. Conceptually, for well-formed programs, the parser constructs a parse tree and passes it to the rest of the compiler for further processing. In fact, the parse tree need not be constructed explicitly, since checking and translation actions can be interspersed with parsing, as we shall see. Thus, the parser and the rest of the front end could well be implemented by a single module. There are three general types of parsers for grammars: universal, top-down, and bottom-up. The input to the parser is scanned from left to right, one symbol at a time. token 1 i 5 source | Lexical Let | parse | Rest of | intermediate ——_ arser —I |—__, program | Analyzer agers ' tree! Front End [representation get nex ' ' token x J Symbol Table ‘Most programming language specifications do not describe how a compiler should respond to errors; error handling is left to the compiler designer. Planning the error handling right from the start can both simplify the structure of a compiler and improve its handling of errors. Common programming errors can occur at many different levels. Lexical errors include misspellings of identifiers, keywords, or operators| e.g., the use of an identifier elipseSize instead of ellipseSize | and missing quotes, around text intended as a string. Syntactic errors include misplaced semicolons or extra or missing braces; that is, \{" or \}." As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error (however, this situation is usually allowed by the parser and caught later in the processing, as the compiler attempts to generate code) Semantic errors include type mismatches between operators and operands, e.g., the return of a value in a Java method with result type void. Logical errors can be anything from incorrect reasoning on the part of the programmer to the use in a C program of the assignment operator = instead of the comparison operator ==. The program containing = may be well formed; however, it may not reflect the programmer's intent. The error handler in a parser has goals that are simple to state but challenging to realize: Report the presence of errors clearly and accurately, Recover from each error quickly enough to detect subsequent errors. Add minimal overhead to the processing of correct programs. Error-Recovery Strategies This section is devoted to the following recovery strategies: panic-mode, phrase-level, error-productions, and global-correction. Panic-Mode Recovery On discovering an error, the parser discards input symbols one at a time until one of a designated set of synchronizing tokens is found. The synchronizing tokens are usually delimiters, such as semicolon or }, whose role in the source program is clear and unambiguous. The compiler designer must select the synchronizing tokens appropriate for the source language. While panic-mode correction often skips a considerable amount of input without checking it for additional errors, it has the advantage of simplicity, and is guaranteed not to go into an infinite loop. Phrase-Level Recovery On discovering an error, a parser may perform local correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. A typical local correction is to replace a comma by a semicolon, delete an extraneous semicolon, or insert a missing semicolon. The choice of the local correction is left to the compiler designer. Phrase-level replacement has been used in several error-repairing compilers, as it can correct any input string. Its major drawback is the difficulty it has in coping with situations in which the actual error has occurred before the point of detection. Error Productions By anticipating common errors that might be encountered, designer can augment the grammar for the language at hand with productions that generate the erroneous constructs. A parser constructed from a grammar augmented by these error productions detects the anticipated errors when an error production is used during parsing. The parser can then generate appropriate error diagnostics about the erroneous construct that has been recognized in the input. Global Correction Ideally, a compiler should make as few changes as possible in processing an incorrect input string. There are algorithms for choosing a minimal sequence of changes to obtain a globally least-cost correction. Given an incorrect input string x and grammar G, these algorithms will find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible. Unfortunately, these methods are in general too costly to implement in terms of time and space, so these techniques are currently only of theoretical interest. Context-Free Grammars The Formal Definition of a Context-Free Grammar a context-free grammar (grammar for short) consists of terminals, nonterminals, a start symbol, and productions. 1. Terminals are the basic symbols from which strings are formed. The term “token name" is a synonym for “terminal”. Terminals are the first components of the tokens output by the lexical analyzer. (the keywords if and else and the symbols “ 2. Nonterminals are syntactic variables that denote sets of strings. The sets of strings denoted by nonterminals help define the language generated by the grammar. Nonterminals impose a hierarchical structure on the language that is key to syntax analysis and translation. 3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of strings it denotes is the language generated by the grammar. Conventionally, the productions for the start symbol are listed first. 4, The productions of a grammar specify the manner in which the terminals and nonterminals can be combined to form strings. Each production consists of: (a) A nonterminal called the head or left side of the production; this production defines some of the strings denoted by the head. (b) The symbol >. Sometimes ::= has been used in place of the arrow. (c) A body or right side consisting of zero or more terminals and nonterminals. The components of the body describe one way in which strings of the nonterminal at the head can be constructed. In this granunar, the terminal symbols are id+-* / 0) The nonterninal symbols are expression, term and factor, and expression is the start symbol 0 expression + expression + teran cap = 1 ~ eran eapression term > term factor term + term / factor term + factor factor + ( eapression ) factor + id Derivations ‘The construction ofa parse tree can be made precise by taking a derivational view, in which productions are treated as rewriting rules. Beginning with the start symbol, each rewriting step replaces a nonterminal by the body of one of its productions. The production E ->-E signifies that if E denotes an expression, then -E must also denote an expression. The replacement of a single E by - E will be described by writing Em -E which is read, “E derives -E.” The production E + ( E } can be applied to replace any instance of E in any string of grammar symbols by (E), eg., ExE= (E)*Eor Ex E = Ex(E). We can take a single E and repeatedly apply productions in any order to get a sequence of replacements. For example, E> -E= -(E)> -(id) We call such a sequence of replacements a derivation of (id) from E. This derivation provides a proof that the string -(id) is one particular instance of al expression. For a general definition of derivation, consider a nonterminal A in the middle of a sequence of grammar symbols, as in @A3, where a and 3 are arbitrary strings of grammar symbols. Suppose A + 7 is a production, Then, we write Ad = 7/3. The symbol + means, “derives in one step.” When a sequence of derivation steps a; + ay = --- = @, rewrites a; to a), we say a, derives Qn. Often, we wish to say, “derives in zero or more steps.” For this purpose, we can use the symbol > . Thus, 1. aS a, for any string a, and 2. a4 Band 3+ 7, thenaS 4 Likewise, 3 means, “derives in one or more steps.” If S$ a, where S is the start symbol of a grammar G, we say that a isa sentential form of @. and nonterminals, and may be empty. A sentence of Gis a sentential form with no nonterminals. The langage generated by a grammar is its set of sentences Thus, a string of terminals w is in L(G), the language generated by G, if and Note that a sentential form may contain both terminals ouly if w isa sentence of G (or Sw). A language that can be generated by a grammar is said to be a contezt-free language. If two grammars generate the same language, the grammars are said to be equivalent To understand how parsers work, we shall cousider derivations in which the nonterminal to be replaced at each step is chosen as follows: 1. In leftmost derivations, the leftmost nonterminal in each sentential is al- ways chosen. Ifa + 3 isa step in which the leftmost nonterminal in a is replaced, we write a = 3. Im 2. In rightmost derivations, the rightmost nonterminal is always chosen: we writea > Jin this case, Rightmost > ~ E> ~(B) = ~(E + B) > ~(B + id) > E= -Bb=> -(B)> -(E+E)=> —-(id+ E)> Lefimost tm i bin bin Parse Trees (id + id) (id + id) A parse tree is a graphical representation of a derivation that filters out the order in which productions are applied to replace nonterminals, Each interior node of a parse tree represents the application of a production. The interior node is labeled with the nonterminal A in the head of the production; the children of the node are labeled, from left to right, by the symbols in the body of the production by which this A was replaced during the derivation. The leaves of a parse tree are labeled by nonterminals or terminals and, read from left to right, constitute a sentential form, called the yield or frontier of the tree. id ia Sequence of construction B = B > ia ia ia.

You might also like