PPL Unit1
PPL Unit1
Preliminary Concepts
Background
Frankly, we didn‘t have the vaguest idea how the thing [FORTRAN language and
compiler] would work out in detail. …We struck out simply to optimize the object
program, and the running time because most people at that time believed you
couldn‘t do that kind of thing. They believed that machined-coded programs
would be so inefficient that it would be impractical for many applications.
John Backus, unexpected successes are common – the browser is another
example of an unexpected success
1.1 Reasons for Studying Concepts of Programming Languages-
Increased ability to express ideas
Improved background for choosing appropriate languages
Increased ability to learn new languages
Better understanding of t h e significance of implementation
Overall advancement of computing
Scientific applications
– Large number of floating point computations
– Fortran
Business applications
– Produce reports, use decimal numbers and characters
– COBOL
1
Artificial intelligence
Readability : the ease with which programs can be read and understood
Writability : the ease with which a language can be used to create programs
Reliability : conformance to specifications (i.e., performs to its specifications)
Cost : the ultimate total cost
Others
Readability
The following sections describe characteristics that contribute to the readability of a
programming language
Table: Language evaluation criteria and the characteristics that affect them
2
Overall simplicity
– A manageable set of features and constructs
- A statement can have more than one way to accomplish a particular operation.
For example, in Java/C/C++, a user can increment a simple integer
variable in fourdifferent ways:
a=a+1; a++ ; a+=1; ++a
almost all the above statement can have the same meaning when we uses
a stand- alone expressions. But the last two statements have slightly
different meanings from each other and from the others in some contexts.
– Minimal operator overloading
Orthogonality
– A relatively small set of primitive constructs can be combined in a relatively
small number of ways
– Every possible combination is legal
For Example: Add two 32 bit integers and replace one of the two with the sum.
IBM mainframe two instructions required–
A Reg1, memory_cell and AR Reg1, Reg2
Control statements
– The presence of well-known control structures (e.g., while statement)
Data types and structures
– The presence of adequate facilities for defining data structures
Syntax considerations
– Identifier forms: flexible composition
– Special words and methods of forming compound statements
- Form and meaning: self-descriptive constructs, meaningful keywords
Writability
Simplicity and Orthogonality
3
– Few constructs, a small number of primitives, a small set of rules for
– combining them
Support for abstraction
– The ability to define and use complex structures or operations in ways that
allow details to be ignored
Expressivity
– A set of relatively convenient ways of specifying operations
– Example: the inclusion of for statement in many modern languages
Reliability
Type checking
– Testing for type errors
Exception handling
– Intercept run-time errors and take corrective measures
Aliasing
– Presence of two or more distinct referencing methods for the same memory
location
Readability and writability
– A language that does not support “natural” ways of expressing an algorithm will necessarily
use “unnatural” approaches, and hence reduced reliability
Cost
Training programmers to use language
Writing programs (closeness to particular applications)
Compiling programs
Executing programs
Language implementation system: availability of free compilers
Reliability: poor reliability leads to high costs
Maintaining programs
Others
Portability
– The ease with which programs can be moved from one implementation to
another
Generality
– The applicability to a wide range of applications
Well-definedness
– The completeness and precision of the language‘s official definition
4
1.4 Influences on Language Design –
Computer Architecture
– Languages are developed around the prevalent computer architecture,
known as the von Neumann architecture
Programming Methodologies
– New software development methodologies (e.g., object-oriented software
development) led to new programming paradigms and by extension, new
programming languages
Computer Architecture
Well-known computer architecture: Von Neumann
Imperative languages, most dominant, because of von Neumann computers
– Data and programs stored in memory
– Memory is separate from CPU
– Instructions and data are piped from memory to CPU
– Basis for imperative languages
Variables model memory cells
Assignment statements model piping
Iteration is efficient
Programming Methodologies
1950s and early 1960s: Simple applications; worry about machine efficiency
5
Late 1960s: People efficiency became important; readability, better control
structures
– structured programming
– top-down design and step-wise refinement
Late 1970s: Process-oriented to data-oriented
– data abstraction
Middle 1980s: Object-oriented programming
– Data abstraction + inheritance + polymorphism
Imperative
– Central features are variables, assignment statements, and iteration
– Examples: C, Pascal
Functional
– Main means of making computations is by applying functions to
given parameters
– Examples: LISP, Scheme
Logic
– Rule-based (rules are specified in no particular order)
– Example: Prolog
Object-oriented
– Data abstraction, inheritance, late binding
– Examples: Java, C++
6
Markup
– New; not a programming per se, but used to specify the layout of
information in Web documents
– Examples: XHTML, XML
7
We now briefly describe several programming environments.
Figure 1.2 Layered View of Computer: The operating system and language implementation are layered over
the Machine interface of a compute
9
In the above diagram, a lexical analyzer is used to ignore comments in the source program
because the compiler has no use for them. It converts characters in the source program to lexical units.
The syntax analyzer takes the lexical units from the lexical analyzer and uses them to construct
hierarchical structures called parse trees.
The linking operation connects the user program to the system programs by placing the addresses
of the entry points of the system programs in the calls to them in the user program. The user and
system code together are sometimes called a load module, or executable image. The process of
collecting system programs and linking them to user programs is called linking and loading, or
sometimes just linking. It is accomplished by a systems program called a linker
Pure Interpretation
Here, programs are interpreted by another program called an interpreter, with no translation.
The interpreter program acts as a software simulation of a machine whose fetch-execute cycle deals
with high-level language program statements rather than machine instructions. This software
simulation obviously provides a virtual machine for the language.
10
Advantages:
1. Pure interpretation allows easy implementation of many source-level debugging operations.
Because all run-time error messages can refer to source-level units. For example, if an array
index is found to be out of range, the error message can easily indicate the source line of the
error and the name of the array.
Disadvantages:
1. Execution is very slow (10 to 100 times).
2. It requires more space
3. A symbol table must be present during the interpretation
Hybrid Implementation Systems
A compromise between compilers and pure interpreters
A high-level language program is translated to an intermediate language that
allows easy interpretation
Faster than pure interpretation
Examples
– Perl programs are partially compiled to detect errors before interpretation
– Initial implementations of Java were hybrid; the intermediate form, byte code,
provides portability to any machine that has a byte code interpreter and a runtime
system (together, these are called Java Virtual Machine)
11
Some language implementation systems are a compromise between compilersand pure
interpreters; they translate high-level language programs to an intermediate language designed to
allow easy interpretation. This method is faster than pure interpretation because the source language
statements are decoded only once. Such implementations are called hybrid implementation
systems.
The process used in a hybrid implementation system is shown in the above Figure. Instead of
translating intermediate language code to machine code, it simply interprets the intermediate code.
Perl is implemented with a hybrid system. Perl programs are partially compiled to detect errors
before interpretation and to simplify the interpreter.
Initial implementations of Java were all hybrid. Its intermediate form, called byte code,
provides portability to any machine that has a byte code interpreter and an associated run-time
system.
Programming Environment
A programming environment is a collection of tools used in the development of software. This
collection may consist of only a file system, a text editor, a pre-processor, a linker,and a compiler. All
programs pass through a number of important stages before it is executed by the computer. The
figure shows the typical programming environment.
12
Editor:
It is a program that allows the user to key his/her program. It may be a line editor or graphical
editor; this affects to a large extent, your programming environment. Typically, it provides the
following facilities
• Entering and editing the program
• Loading program into memory
• Compiling the program
• Debugging the program
• Running the program
Preprocessor
Compiler
It is a source program that accepts as inputs, the preprocessed source code, analyze itfor
syntax errors and produces one or more possible outputs.
• If syntax error(s) is/are found, an error listing is provided
• If the program is free of syntax errors, it is converted to object code.
Note: If the preprocessed code is converted to an assembler code, then the assembler is required
to convert it into machine code.
Linker:
A Linker (Linkage Editor) is a program that combines all object code of a program withother necessary external
items to form an executable code
13
Syntax and Semantics
Introduction
Syntax: the form or structure of the expressions, statements, and program units
Semantics: the meaning of the expressions, statements, and program units
Syntax and semantics provide a language‘s definition
– Users of a language definition
– Other language designers
– Implementers
– Programmers (the users of the language)
Lexemes : a, = , 2, *, b, +, 5, ;
Tokens: variables : a, b
Operator : = , * and +
Constants = 2 and 5
Languages Recognizers
– A recognition device reads input strings of the language and decides whether the
input strings belong to the language
– Example: syntax analysis part of a compiler
Languages Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular sentence is correct by
comparing it to the structure of the generator
14
1.10 Formal Methods of Describing Syntax –
The formal language-generation mechanism called grammar is used to describe the syntax
of programming languages. There are several forms for describing the syntax of a programming
language.
1.10.1 Backus-Naur Form and Context-Free Grammars and Origin of Backus-Naur Form and
Fundamentals
BNF is the most widely used method for programming language syntax. BNF is a metalanguage
(is used to describe another language). BNF uses abstraction for syntacticstructures. BNF (Backus-Naur
Form) is a metalanguage for describing context-free grammar.
• The symbol::= (or →) is used for may derive.
• The symbol | separates alternative strings on the right-hand side.
Example 1:
E ::= E + T | T
T ::= T * F | F
F ::= id | constant | (E)
where E is Expression, T is Term, and F is Factor
Example 2:
<Assign> ::= <var> <expression> is a simple assignment statement.
<program> ::= {<statement*>}
<statement> ::= <assignment> | <condition> | <loop>
<condition> ::= if <expr> {<statement> }
<loop> ::= while <expr> {<statement> }
<expr> ::= <identifer> | <number> |<expr>
Here, the LHS of the arrow represents abstraction is being defined and RHS consists of a
combination of tokens, lexemes and references to other abstractions. The abstractions in a BNF
description, or grammar, are often called nonterminal symbols, or simply nonterminals, and the
lexemes and tokens of the rules are called terminal symbols, or simply terminals.
BNF Rules:
a. Non-terminals are BNF abstractions and Terminals are lexemes and tokens
b. A rule has a LHS and RHS which consists of T and NT symbols
c. A grammar is a finite nonempty set of rules
d. An abstraction (or nonterminal symbol) can have more than one RHS
<ident_list> → identifier | identifer, <ident_list>
<stmt> : <single_stmt> | begin <stmt_list> end letter : A | B | … | Z | a | b | … | z
digit : 0 | 1 | … | 9
15
1.10.1.1 Describing Lists
Syntactic lists are described using recursion
<ident_list> → ident
| ident, <ident_list>
A derivation is a repeated application of rules, starting with the start symbol and
ending with a sentence (all terminal symbols)
An Example Grammar
1.<program> → <stmts>
<stmts> → <stmt> | <stmt> ; <stmts>
<stmt> → <var> = <expr>
<var> → a | b | c | d
<expr> → <term> + <term> | <term> - <term>
<term> → <var> | const
1.10.1.2 Grammars and Derivations
BNF is a generative device for defining language. The sentences of the language are
generated through a sequence of application of the rules, beginning with a nonterminal of the grammar
called the Start symbol. A sentence generation is called a derivation.
Every string of symbols in the derivation is a sentential form
A sentence is a sentential form that has only terminal symbols
A leftmost derivation is one in which the leftmost nonterminal in each sentential form
is the one that is expanded
A derivation may be neither leftmost nor rightmost
Example:
<program> →<stmts> → <stmt>
<var> = <expr> → a =<expr>
a = <term> + <term>
a = <var> + <term>
a = b + <term>
a = b + const
Left and Right Derivations:
16
Example:
17
Example:
18
Operator Precedence:
used to indicate that it has Precedence over an operator produced higher up in the tree.
Example: Unambiguous Grammar: Every derivation with an unambiguous grammar has a
unique parse tree, although that treecan be represented by different derivations.
<expr> → <expr> - <term>|<term>
<term> → <term> / const|const
Fig : An Unambiguous Expression Grammar Fig: Parse Tree for Associativity operator
Associativity of Operators
Operator associativity can also be indicated by a grammar
<expr> → <expr> + <expr> | const (ambiguous)
<expr> → <expr> + const | const (unambiguous)
E ::= T { + T }
T ::= F { * F }
F ::= id | constant | (E)
Example: if statement rules
The symbol | (logical OR) is used to define multiple rules. For example, if a statement canbe
described with the rules
<if_stmt> → if ( <logic_expr> ) <stmt>
19
<if_stmt> → if ( <logic_expr> ) <stmt> else <stmt> or
Note:
Variable length list is usually written as ellipsis (…). BNF does not include the ellipsis, instead, it
uses the recursion rule. A rule is recursive if its LHS appears in its RHS.
Example:
<ident_list> → identifier
| identifier, <ident_list>
This defines <ident_list> as either a single token (identifier) or an identifier followed by acomma and
another instance of <ident_list>.
20
1.11 Attribute Grammars
Definition
An attribute grammar is a context-free grammar G = (S, N, T, P) with the
following additions:
– For each grammar symbol x there is a set A(x) of attribute values
– Each rule has a set of functions that define certain attributes of the
nonterminals in the rule
– Each rule has a (possibly empty) set of predicates to check for attribute
consistency
– Let X0 X1 ... Xn be a rule
– Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes
– Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n, define
inherited attributes
– Initially, there are intrinsic attributes on the leaves
Example
Syntax
<assign> → <var> = <expr>
<expr> → <var> + <var> | <var>
<var> → A | B | C
actual_type: synthesized for <var> and <expr>
expected_type: inherited for <expr>
Syntax rule :<expr> → <var>[1] + <var>[2]
Semantic rules :<expr>.actual_type → <var>[1].actual_type
Predicate :<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type
Syntax rule :<var> → id
Semantic rule :<var>.actual_type lookup (<var>.string)
How are attribute values computed?
– If all attributes were inherited, the tree could be decorated in top-down order.
– If all attributes were synthesized, the tree could be decorated in bottom-up
order.
– In many cases, both kinds of attributes are used, and it is some combination of
top-down and bottom-up that must be used.
21
<expr>.expected_type inherited from parent
<var>[1].actual_type lookup (A)
<var>[2].actual_type lookup (B)
<var>[1].actual_type =? <var>[2].actual_type
<expr>.actual_type <var>[1].actual_type
<expr>.actual_type =? <expr>.expected_type
Categories of Attributes
There are two types of attributes namely Synthesized Attributes and Inherited Attributes.
23
– Its usefulness in describing the meaning of a programming language is limited for
language users or compiler writers
Example
24
Evaluation of Denotational Semantics
1.12.5 Expressions
Map expressions onto Z {error}
We assume expressions are decimal numbers, variables, or binary expressions having one
arithmetic operator and two operands, each of which can be an expression
Assignment Statements
– Maps state sets to state sets
Logical Pretest Loops
– Maps state sets to state sets
The meaning of the loop is the value of the program variables after the statements
in the loop have been executed the prescribed number of times, assuming there have
been no errors
In essence, the loop has been converted from iteration to recursion, where the recursive
control is mathematically defined by other recursive state mapping functions
Recursion, when compared to iteration, is easier to describe with mathematics.
25
IMPORTANT QUESTIONS: UNIT – IPART – A
1. What are general-purpose languages? Give examples.
2. List out the language categories
3. What are the factors influencing the writability of a language?
4. What are the fundamental features of imperative languages?
5. Explain generic methods
6. What is the purpose of the assignment statement?
7. Define syntax and semantics
8. Define derivation and parse tree
9. Give an attribute grammar for a simple assignment statement
10. What are the difficulties in using an attribute grammar to describe all of the syntax
andstatic semantics of a contemporary programming language?
11. Differentiate between static and dynamic semantic
12. Write EBNF description for the C union
13. Describe the approach of using axiomatic semantics to convert the correctness of a
givenprogram.
PART – B
Give parse tree and left most derivation for A = A * (B + (C * A)) and A = A * (B + (C)).
Ans:
The reasons why a language would distinguish between uppercase and lowercase in its identifiers are:
Variable identifiers may look different than identifiers that are names for constants. Due to this, in C
language we are using uppercase for constant names and using lowercase for variable names.
Name of the variable names can have their first letter distinguished. If a languagewould not
distinguish between uppercase and lowercase in identifiers is it makes programs less readable. For
example, the words SUM and Sum are actually verysimilar but are completely different.
Ans: The value of an intrinsic attribute is supplied from outside the attribute evaluation process,
usually from the lexical analyzer. The value of a synthesized attribute is computed by an attribute
evaluation function.
Ans:
4. Using the grammar show a parse tree and a leftmost derivation for each of the following
statements:
a. A=A*(B + (C * A))
b. A = (A + B) * CAns: (a) A=A*(B + (C * A))
Ans: b). A = (A + B) * C
Ans:
The following two distinct parse trees for the same string prove that the grammar is ambiguous.
5. Describe, in English, the language defined by the following grammar:
<S> → <A> <B> <C>
<A> → a <A> | a
<B> → b <B> | b
<C> →
Ans: One or more's followed by one or more's followed by one or more c's.
Ans:
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> (+ | *) <expr>
| (<expr>)
| <id>
7. Compute the weakest precondition for each of the following assignment statementsand
postconditions:
a. a = 2 * (b - 1) - 1 {a > 0}
b. b = (c + 10) / 3 {b > 6}
c. a = a + 2 * b - 1 {a > 1}
d. x = 2 * y + x - 1 {x > 11}
Ans:
(a) a = 2 * (b - 1) - 1 {a > 0}
2 * (b - 1) - 1 > 0
2*b-2-1>0
2*b>3
b>3/2
8. Compute the weakest precondition for each of the following sequences of assignmentstatements
and their postconditions:
a. a = 2 * b + 1;
b=a–3
{b < 0}
b. a = 3 * (2 * b + a);
b=2*a–1
{b > 5}
Ans:
a. a=2*b+1
b = a – 3 {b < 0}
a–3<0
a<3
Now, we have:
a = 2 * b + 1 {a < 3}
2*b+1<3
2*b<2
b<
b. a = 3 * (2 * b + a);
b = 2 * a - 1 {b > 5}
2*a-1>5
2*a>6
a>3
Now we have:
a = 3 * (2 * b + a) {a > 3}
3 * (2 * b + a) > 3
6*b+3*a>3
2 * b + a > 1 (Divide by
3 both sides)
b > (1 - a) /2