Chapter 1: Introduction To Compiling: 1.1: Language Processors
Chapter 1: Introduction To Compiling: 1.1: Language Processors
Other analyzers and synthesizers Other compiler like applications also use analysis and synthesis. Some examples include 1. Pretty printer: Can be considered a real compiler with the target language a formatted version of the source. 2. Interpreter. The synthesis traverses the intermediate code and executes the operation at each node (rather than generating machine code to do such). Multiple Phases The front and back end are themselves each divided into multiple phases. Conceptually, the input to each phase is the output of the previous. Sometime a phase changes the representation of the input. For example, the lexical analyzer converts a character stream input into a token stream output. Sometimes the representation is unchanged. For example, the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code. The diagram is definitely not drawn to scale, in terms of effort or lines of code. In practice, the optimizers dominate. Conceptually, there are three phases of analysis with the output of one phase the input of the next. Each of these phases changes the representation of the program being compiled. The phases are called lexical analysis or scanning, which transforms the program from a string of characters to a string of tokens; syntax analysis or parsing, which transforms the program into some kind of syntax tree; and semantic analysis, which decorates the tree with semantic information. Note that the above classification is conceptual; in practice more efficient representations may be used. For example, instead of having all the information about the program in the tree, tree nodes may point to symbol table entries. Thus the information about the variable counter is stored once and pointed to at each occurrence.
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;. A token is a <token-name,attribute-value> pair. For example 1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for identifier. The value 1 is the index of the entry for x3 in the symbol table produced by the compiler. This table is used gather information about the identifiers and to pass this information to subsequent phases.
2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a pair, whose second component is ignored. The point is that there are many different identifiers so we need the second component, but there is only one assignment symbol =. 3. The lexeme y is mapped to the token <id,2> 4. The lexeme + is mapped to the token <+>. 5. The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters. It is mapped to <number,something>, but what is the something. On the one hand there is only one 3 so we could just use the token <number,3>. However, there can be a difference between how this should be printed (e.g., in an error message produced by subsequent phases) and how it should be stored (fixed vs. float vs. double). Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored. Another possibility is to have a separate numbers table. 6. The lexeme ; is mapped to the token <;>. Note that non-significant blanks are normally removed during scanning. In C, most blanks are nonsignificant. That does not mean the blanks are unnecessary. Consider
int x; intx;
The blank between int and x is clearly necessary, but it does not become part of any token. Blanks inside strings are an exception, they are part of the token (or more likely the table entry pointed to by the second component of the token). Note that we can define identifiers, numbers, and the various symbols and punctuation without using recursion (compare with parsing below). 1. What is Language Processors? 2. Two step process to run a program that a compiler requires . 3. How Come that Java is compiled and interpreted language? 4. What is front end and back end of the compiler? 5. What are the application of analysis and synthesis phase of compiler? 6. How a lexical analyzer processes source language?