0% found this document useful (0 votes)
22 views9 pages

CC Summary (Slides)

The document discusses compiler construction and lexical analysis. It provides an overview of the major phases of compilation: analysis, synthesis, and supporting phases. Analysis includes lexical, syntax, and semantic analysis to break down source code. Synthesis generates intermediate code, performs optimizations, and final code generation. Lexical analysis identifies tokens by breaking input into lexemes matched to patterns using finite automata represented as transition diagrams or tables. Regular expressions define patterns for tokens that are building blocks for lexical analysis.

Uploaded by

sab640887
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

CC Summary (Slides)

The document discusses compiler construction and lexical analysis. It provides an overview of the major phases of compilation: analysis, synthesis, and supporting phases. Analysis includes lexical, syntax, and semantic analysis to break down source code. Synthesis generates intermediate code, performs optimizations, and final code generation. Lexical analysis identifies tokens by breaking input into lexemes matched to patterns using finite automata represented as transition diagrams or tables. Regular expressions define patterns for tokens that are building blocks for lexical analysis.

Uploaded by

sab640887
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Compiler Construction (Slides) Summary

Chapter 1
 Compilers viewed from many perspectives:
o Construction:
 Single pass.
 Multi pass.
 Load & go.
o Functional:
 Debugging.
 Optimizing.
 Compilers have two fundamental parts:
o Analysis: decompose source code into intermediate representation.
o Synthesis: target code generation from representation
 Important software tools in analysis:
o Structure / syntax directed editors: force "syntactically" correct
code to be entered.
o Pretty printers: standardized version for program structure.
o Static checkers: a quick compilation to detect rudimentary errors.
o Interpreters: real time execution of code (line at a time).
o Text formatters: like LATEX and TROFF
o Silicon compilers: take input and generate circuit design.
 Analysis task for compilation:
o Lexical Analysis:
 Left-to-right scan to identify tokens.
 Tokens: sequence of chars that have collective meaning.
 Linear action (not recursive)
 Identify only individual "words" that are the tokens of the
language.
o Hierarchical (Syntax) Analysis:
 Grouping of tokens into meaningful collection
 Verify that the "words" are correctly assembled into
"sentences"
 Recursion is required to identify structure of an expression.
o Semantic Analysis:
 Checking to ensure correctness of components.
 Determine whether the sentences have one and only one
unambiguous interpretation.
 Provide Type Checking (legality of operands) operation.
 Supporting phases for analysis phase:
o Symbol table creation:
 A data structure that contains info about tokens created by
the lexical analyzer.
 Updated during analysis phase, and used during synthesis
phases.
o Error Handling:
 Detection of different errors which correspond to all phases.
 Synthesis task for compilation:
o Intermediate code generation:
 Abstract machine version of code (independent of
architecture).
o Code optimization:
 Find more efficient ways to execute code.
 Replace code with more optimal statements.
 Has two approaches: Peephole and High-level language.
o Final code generation:
 Generate relocatable machine dependent code.
 Grammar: set of rules which govern the interdependencies & structure
among the tokens.
 Assembly code: names used for introductions, and names are used for
memory addresses.
 Loader: taking relocatable machine code, alerting the addresses and
placing the altered instructions into memory.
 Link-editor: taking many (relocatable) machine code programs (with cross-
references) and produce a single file.
o Need to keep track of correspondence between variable names and
corresponding addresses in each piece of code.
 Compiler construction tools:
o Parser Generators : Produce Syntax Analyzers
o Scanner Generators (LEX) : Produce Lexical Analyzers
o Syntax-directed Translation Engines (YACC): Generate
Intermediate Code
o Automatic Code Generators : Generate Actual Code
o Data-Flow Engines : Support Optimization
Chapter 2
 A Context-free Grammar is utilized to describe the syntactic structure of the
language.
 A CFG is characterized by:
o A set of Tokens or Terminal symbols.
o A set of Non-Terminals.
o A set of Production rules.
o A Non-Terminal designated as the start symbol.
 A parse tree for a CFG has the following properties:
o Root is labeled with the start symbol.
o Leaf node is a token or epsilon.
o Interior node is a Non-Terminal.
 Ambiguous grammar that does not enforce associativity.
o Non-ambiguous grammar enforcing left associativity have a parse
tree that will grow to the left.
o Non-ambiguous grammar enforcing right associativity have a parse
tree that will grow to the right.
 Syntax-Directed Translation:
o Associate attributes with grammar rules & constructs and translate as
parsing occurs.
o Each production has a set of semantic rules.
o Each grammar symbol has a set of attributes.
 The type of tree traversal that is being performed during semantic rules is
postorder depth-first traversal.
 Semantic actions are added into the right sides of the productions.
o Example: 𝑒𝑥𝑝𝑟 → 𝑟𝑒𝑠𝑢𝑙𝑡 | 𝑑𝑖𝑔𝑖𝑡 {𝑝𝑟𝑖𝑛𝑡("𝑎𝑐𝑡𝑖𝑜𝑛"); }
 Parse tree / derivation of a token string occurs in a top down fashion.
o Uses a grammar to check structure of tokens.
o Can be recursive descent or predictive parsing.
o Parser operates by attempting to match tokens in the input stream.
 Lexical Analysis process functional responsibilities:
o Input token string is broken down.
o White spaces and comments are filtered out.
o Individual tokens with associated values are identified.
o Symbol table is initialized and entries are constructed for each
"appropriate" token.
 Reserved words are placed into the symbol table for easy lookup.
 Consider 𝑨 → 𝒂
o FIRST(𝒂)= set of leftmost tokens that appear in 𝒂 or in strings
generated by 𝒂
Chapter 3
 Separation of Lexical analysis from parsing presents a simpler conceptual
model as it emphasizes:
o High cohesion and low coupling
o Implies well specified for parallel implementation.
o Increase in compiler efficiency (I/O techniques to enhance lexical
analysis).
o Promoting portability.
 Major terms in Lexical Analysis:
o Token:
 A classification for a common set of strings.
 Examples: <Identifier>, <number>.
o Pattern:
 The rules which characterize the set of strings for a token.
 Examples: recall files and OS wildcards ([A-Z]*.*).
o Lexeme:
 Actual sequence of characters that matches pattern and is
classified by a token.
 Examples: Identifiers: x, count, name, etc..
 Error handling in lexical analysis is very localized, with respect to input
source.
o Errors occur when prefix of remaining input doesn't match any defined
token.
o Possible error recovery actions:
 Deleting or inserting input characters.
 Replacing or transposing characters.
 Lexical Analyzer construction techniques:
o Lexical analyzer generator.
o Hand-code / High-level Language (I/O facilitated by the language).
o Hand-code / Assembly Language (Explicitly manage I/O)
 Language: any set of strings over a fixed alphabet.
 Regular Expression: a set of rules/techniques used for constructing
sequences of symbols (strings) from an alphabet.
o For fixed alphabet ∑
 ∈ is a regular expression denoting {∈ }
 If a is in ∑, a is a regular expression that denotes {𝒂}
 All are Left-Associative. Parentheses are dropped as allowed by
precedence rules.
 Transition Diagrams (TD): used to represent the tokens.
o Attempts to match lexeme to a pattern.
o Each TD has:
 States: represented by circles.
 Actions: represented by arrows between states.
 Start state: beginning of a pattern (arrowhead)
 Final state(s): end of pattern (concentric circles)
o Each TD is Deterministic.
 Lexical Analyzer matches all keywords/reserved words as ids
o After the match, the symbol table or a special keyword table is
consulted
o Keyword table contains string versions of all keywords and associated
token values
o When a match is found, the token is returned, along with its symbolic
value
o If a match is not found, then it is assumed that an id has been
discovered.
 Finite Automata: a recognizer that takes an input string & determines
whether it's a valid sentence of the language.
o Deterministic: has at most one action for a give input symbol.
 Complex but more precise.
o Non-Deterministic: has more than one alternative action for the same
input symbol.
 Easy but less precise.
o Both types are used to recognize regular expressions
 Each NFA consists of:
o S, set of states
o ∑, the symbols of the input alphabet
o 𝛿, transition function.
o 𝑠0 ,the start state
o 𝐹, a set of final or accepting states.
 Problems in NFA:
o Valid input might not be accepted.
o NFA may behave differently on the same input.
 Relationship of NFAs to Compilation:
o Regular Expressions are "Recognized" by NFA
o Regular Expressions are "Patterns" for "Tokens"
o Tokens are building blocks for lexical analysis.
o Lexical analyzer can be described by a collection of NFAs. Each
NFA is for a language token.
 Transition diagrams are the states (circles), arcs, and final states.
 Transition tables are more suitable to representation within a computer.
 Each state in DFA corresponds to a SET of states of the NFA. (same input can
have multiple paths in NFA)
 ax- syntax of regular expression
is determining factor for NFA construction and structure.
 Let 𝑟 be a regular expression, with NFA 𝑁(𝑟), then:
o 𝑁(𝑟) ℎ𝑎𝑠 # 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 ≤ 2(#𝑠𝑦𝑚𝑏𝑜𝑙𝑠+#𝑜𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠) 𝑓𝑜𝑟 𝑟
o 𝑁(𝑟) ℎ𝑎𝑠 𝑒𝑥𝑎𝑐𝑡𝑙𝑦 𝑜𝑛𝑒 𝑠𝑡𝑎𝑟𝑡 𝑎𝑛𝑑 𝑜𝑛𝑒 𝑎𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝑠𝑡𝑎𝑡𝑒
o Each state of 𝑁(𝑟) has at most one outgoing edge 𝑎 ∈ ∑ or at most two
outgoing ∈ ′𝑠
o Each state must have a unique name.

 Designing Lexical Analyzer Generator steps:


o From regular expression to NFA
o From NFA to DFA
o DFA simulation for lexical analyzer.
 Each pattern recognizes lexemes.
 Each pattern described by regular expression.

You might also like