Chapter 1
Chapter 1
(CSEg4306)
Chapter 1
Introduction
What is Computer program ?
ØA collection of instructions that perform a specific task when executed by a
computer.
4. Standardization
ØStandardization of both the language and a broad set of libraries is the only truly
effective way to ensure the portability of code across platforms.
Cont’d
What makes a language successful?
5. Open Source.
ØMost programming languages today have at least one open-source
compiler or interpreter.
ØLinux, the leading open-source operating system, is written in C.
6. Excellent Compilers
ØFortran owns much of its success to extremely good compiler.
ØIt is a matter of language design
Cont’d
What makes a language successful?
7. Economics, Patronage and Inertia
ØThere are factors other than technical merit that greatly influence success.
ØThe backing of a powerful sponsor.
ØPL/I, at least to first approximation, owes its life to IBM.
ØCobol and, more recently, Ada owe their life to the U.S. Department of Defense.
ØC#, despite its technical merits, would probably not have received the attention it
has without the backing of Microsoft.
Why Study Programming Languages?
1. To have good understanding of language design and implementation.
Øhelp one choose the most appropriate language for any given task.
ØMost languages are better for some things than for others.
Øshould one choose C, C++, or C# for systems programming?
ØFortran or C for scientific computations?
ØPHP or Ruby for a web-based application?
ØAda or C for embedded systems?
ØVisual Basic or Java for a graphical user interface?
Cont’d
Why Study Programming Languages?
2. Make it easier to learn new languages.
ØMany languages are closely related.
ØJava and C# are easier to learn if you already know C++;
ØCommon Lisp if you already know Scheme;
ØHaskell if you already know ML.
3. To learn basic concepts that underlie all programming languages.
ØTypes, control (iteration, selection, recursion, non determinacy, concurrency),
abstraction, and naming.
ØThinking in terms of these concepts makes it easier to assimilate the syntax (form) and
semantics (meaning) of new languages, compared to picking them up in a vacuum.
Programming Language Categories
• The many existing languages can be classified into families based on their model of
computation
Two common language groups:-
Imperative (focus is on how the computer should do it)
Øvon Neumann (Fortran, Pascal, Basic, C)
Øobject-oriented (Smalltalk, Eiffel, C++, Java)
Øscripting languages (Perl, Python, JavaScript, PHP)
Declarative (focus is on what the computer is to do )- The top-level division.
Øfunctional (Scheme, ML, pure Lisp, FP)
ØLogic constraint-based (Prolog, VisiCalc, RPG/ Report Program Generator)
Compilation and Interpretation
Compilation
ØThe compiler translates the high-level source program into an equivalent target
program (typically in machine language), and then goes away.
ØThe compiler is the locus of control during compilation; the target program is the
locus of control during its own execution.
ØThe compiler is itself a machine language program, presumably created by
compiling some other high-level program.
ØWhen written to a file in a format understood by the operating system, machine
language is commonly known as object code.
ØSemantic error checking can be performed statically
Cont’d
Compilation and Interpretation
ØJust-in-time compiler that translates byte code into machine language immediately
before each execution of the program
Cont’d
Implementation strategies -6
ØA Pascal compiler, written in Pascal, that would generate output in P-code, a stack-
based language similar to the byte code of modern Java compilers
ØThe same compiler, already translated into P-code
ØA P-code interpreter, written in Pascal
source program/
front end of the
compiler
target program/
back end of the
compiler
Cont’d
An Overview of Compilation
ØThe first few phases (up through semantic analysis) serve to figure out the meaning of
the source program. They are sometimes called the front end of the compiler.
ØThe last few phases serve to construct an equivalent target program. They are sometimes
called the back end of the compiler.
Øcompilation is a series of passes/Phases. If desired, a pass may be written as a separate
program, reading its input from a file and writing its output to a file.
ØCompilers are commonly divided into passes so that the front end may be shared by
compilers for more than one machine (target language) .
ØScanning and parsing serve to recognize the structure of the program, without regard to
its meaning.
Cont’d
Scanner (lexical analysis)
ØDone by the lexer, also known as the scanner.
ØIdentifies lexemes /alphanumeric characters/within the source code
ØWhite space and comments are discarded
ØGenerates a stream of tokens (such as :constants, identifiers, operators, reserved words, and separators)
ØEach token passed to the parser on request
ØCreates a symbol table of 'names' used in the source code
ØMay also create a strings table
ØVery limited error checking at this stage
Ø Tokens: smallest meaningful units of the program.
Cont’d
Scanner (lexical analysis)
Cont’d
Parser (syntax analysis)
• Parsing organizes tokens into a parse tree that represents higher-level constructs (statements,
expressions, subroutines, and so on) in terms of their constituents.
• Checks that the rules of the language have been followed correctly.
• Syntax can be defined using a context free grammar.
• Backus-Naur Form is a well known notation for communicating grammar. Grammar is applied
programmatically to build a parse tree.
• Parse tree is refined to become an abstract syntax tree. Abstract syntax tree traversed several times
during semantic analysis
• Symbol table frequently accessed and updated Symbol table often implemented as a hash table
• Polish (+ab) & reverse Polish (ab+) represent expressions without parentheses. Expressions can be
represented using trees or directed acyclic graphs
Parser (syntax analysis)
Cont’d
Cont’d
Semantic analysis
ØSemantic analysis is the discovery of meaning in a program.
ØUsing the symbol table, the semantic analyzer enforces a large variety of rules that are not
captured by the hierarchical structure of the context-free grammar and the parse tree.
ü Undeclared variables
ü Multiple declarations within the same scope Misuse of reserved identifiers
ü Attempting to access a variable that is out of scope
ü Type mismatches
ü Parameter type mismatches when calling functions and procedures
ü Function return value must match the specified return type
ü Arithmetic operators must operate on numeric types
ü The condition in an If statement must evaluate to true or false
ü The exit condition of a loop must evaluate to true or false
Cont’d
Semantic analysis
Cont’d
Semantic analysis
ØNot all semantic rules can be checked at compile time.
qThose that can are referred to as the static semantics of the language that can be
checked at compile time.
q Those that must be checked at run time are referred to as the dynamic semantics
of the language.
ØStatic objects are constructs (identifiers, statements, expressions etc.) & Dynamic
objects are (instances of) values, locations and the like.
ØC has very little in the way of dynamic checks.
Cont’d
Semantic analysis
ØExamples of rules that other languages enforce at run time include the following.
✔ Variables are never used in an expression unless they have been given a value.
✔ Pointers are never dereferenced unless they refer to a valid object.
✔ Array subscript expressions lie within the bounds of the array.
✔ Arithmetic operations do not overflow.
ØIn the process of checking static semantic rules, the semantic analyzer typically
transforms the parse tree into an abstract syntax tree (otherwise known as an AST, or
simply a syntax tree) by removing most of the “artificial” nodes in the tree’s interior.
Cont’d
Lexical and Syntax Analysis
Intermediate form
ØIntermediate form (IF) done after semantic analysis (if the program passes all checks)
ØIFs are often chosen for machine independence, ease of optimization, or compactness.
ØThey often resemble machine code for some imaginary idealized machine.
ØMany compilers actually move the code through more than one IF
qWrite three address code for the following
Optimization / Code generation phase
ØOptimization takes an intermediate-code program and produces another one that does
the same thing faster, or in less space.
qThe optimization phase is optional
qRemoves unnecessary code line and arranges the sequence of statements
ØCode generation phase produces assembly language or (sometime) relocatable
machine language
qThus code improvement often appears as two additional phases of compilation, one
immediately after semantic analysis and intermediate code generation, the other
immediately after target code generation.
Section wide Group Project
Objective
Design and Implementation of a New Programming Language.
Define Purpose and Feature
What will this language be used for?
Interpreted or Compiled?
What programming paradigm will it follow?
What are the basic data types, control structures, and unique features?
Name for the Language.