CSC 4405 Survey of Programming Languages Lecture 3
CSC 4405 Survey of Programming Languages Lecture 3
FACULTY OF SCIENCE
DEPARTEMNT OF COMPUTER SCIENCE
PART THREE
Language Description
CSC4405:
SURVEY OF PLs
3.0 Introduction
The task of providing a concise yet understandable description of a programming language is difficult
but essential to the language‘s success has some languages have suffered the problem of having many
slightly different dialects, a result of a simple but informal and imprecise definition. One of the
problems in describing a language is the diversity of the people who must understand the description.
Among these are initial evaluators, implementers, and users. Most new programming languages are
subjected to a period of scrutiny by potential users, often people within the organization that employs
the language‘s designer, before their designs are completed. These are the initial evaluators.
Programming language implementers obviously must be able to determine how the expressions,
statements, and program units of a language are formed, and also their intended effect when
executed. The difficulty of the implementers‘ job is, in part, determined by the completeness and
precision of the language description.
Language users must be able to determine how to encode software solutions by referring to a language
reference manual. Textbooks and courses enter into this process, but language manuals are usually
the only authoritative printed information source about a language. The study of programming
languages, like the study of natural languages, can be divided into examinations of syntax and
semantics. The syntax of a programming language is the form of its expressions, statements, and
program units. Its semantics is the meaning of those expressions, statements, and program units.
Lexical syntax for defining the rules for basic symbols involving identifiers, literals, punctuators
and operators.
Concrete syntax specifies the real representation of the programs with the help of lexical
symbols like its alphabet.
Abstract syntax conveys only the vital program information.
The Syntax of a programming language is used to signify the structure of programs without considering
their meaning. It basically emphasizes the structure, layout of a program with their appearance. It
involves a collection of rules which validates the sequence of symbols and instruction used in a
program. In general, languages can be formally defined in two distinct ways: by recognition and by
generation.
3.1.1 Language Recognizer
The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this
role, the recognizer need not test all possible strings of characters from some set to determine whether
each is in the language. Rather, it need only to determine whether given programs are in the language.
In effect then, the syntax analyzer determines whether the given programs are syntactically correct.
Language recognizer is like a filters, separating legal sentences from those that are incorrectly formed.
2
3.1.2 Language Generator
A language generator is a device that can be used to generate the sentences of a language, a generator
seems to be a device of limited usefulness as a language descriptor. However, people prefer certain
forms of generators over recognizers because they can more easily read and understand them. By
contrast, the syntax-checking portion of a compiler (a language recognizer) is not as useful a language
description for a programmer because it can be used only in trial-and- error mode. For example, to
determine the correct syntax of a particular statement using a compiler, the programmer can only
submit a speculated version and note whether the compiler accepts it
3.1.3 Parsing
In linguistics, parsing is the process of analyzing a text, made
of a sequence of tokens (for example, words), to determine
its grammatical structure with respect to a given (more or
less) formal grammar. Parsing can also be used as a linguistic
term, especially in reference to how phrases are divided up in
garden path sentences. Fig. 3.1 shows overview of parsig
process.
3.1.3.1 Parse Trees
One of the most attractive features of grammars is that they
naturally describe the hierarchical syntactic structure of the
sentences of the languages they define. These hierarchical
structures are called parse trees. For example, the parse tree
in Figure 3.2 shows the structure of the assignment
statement derived. Every internal node of a parse tree is
labelled with a nonterminal symbol; every leaf is labelled with
a terminal symbol. Every subtree of a parse tree describes
one instance of an abstraction in the sentence. For example:
Figure 3. 1: parsing process
A parse tree for the simple statement A = B * (A + C)
3
Figure 3. 2: Parsing tree for A=B*(A+C)
3.1.3.2 Parser
In computing, a parser is one of the components in an interpreter or compiler, which checks for
correct syntax and builds a data structure (often some kind of parse tree, abstract syntax tree or
other hierarchical structure) implicit in the input tokens. The parser often uses a separate lexical
analyzer to create tokens from the sequence of input characters. Parsers may be programmed by
hand or may be (semi-)automatically generated (in some programming languages) by a tool.
When multiplication and division occur together in an expression, each operation is evaluated as it
occurs from left to right. Likewise, when addition and subtraction occur together in an expression, each
operation is evaluated in order of appearance from left to right. The string concatenation operator (&)
is not an arithmetic operator, but in precedence it does fall after all arithmetic operators and before
all comparison operators. The Is operator is an object reference comparison operator. It does not
compare objects or their values; it checks only to determine if two object references refer to the same
object.
Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination.
Now we‘ll move forward to semantic analysis, where we delve even deeper to check whether they
form a sensible set of instructions in the programming language. Whereas any old noun phrase
followed by some verb phrase makes a syntactically correct English sentence, a semantically correct
one has subject-verb agreement, proper use of gender, and the components go together to express an
idea that makes sense. For a program to be semantically valid, all variables, functions, classes, etc.
5
must be properly defined, expressions and variables must be used in ways that respect the type
system, access control must be respected, and so forth. Semantic analysis is the front end‘s
penultimate phase and the compiler‘s last chance to weed out incorrect programs. We need to ensure
the program is sound enough to carry on to code generation.
i. Operational
Determining the meaning of a program in place of the calculation steps which are necessary to
idealized execution. Some definitions used structural operational semantics which intermediate state
is described on the basis of the language itself others use abstract machine to make use of more ad-
hoc mathematical constructions. With an operational semantics of a programming language, one
usually understands a set of rules for its expressions, statements, programs, etc., are evaluated or
executed. These guidelines tell how a possible implementation of a programming language should be
working and it is not difficult to give skills an implementation of an interpreter of a language in any
programming languages simply by monitoring and translating it operational semantics of the language
destination deployment.
ii. Denotational
e.g. with regard to functions such as programming language specific mathematical functions.
iii. Axiomatic or logical
The definition of a program defining indirectly, by providing the axioms of logic to the characteristics
of the program. Compare with specification and verification.
Static semantics
The static semantics defines restrictions on the structure of valid texts that are hard or impossible to
express in standard syntactic formalisms. For compiled languages, static semantics essentially include
those semantic rules that can be checked at compile time. Examples include checking that every
identifier is declared before it is used (in languages that require such declarations) or that the labels
on the arms of a case statement are distinct. Many important restrictions of this type, like checking
6
that identifiers are used in the appropriate context (e.g. not adding an integer to a function name), or
that subroutine calls have the appropriate number and type of arguments, can be enforced by defining
them as rules in a logic called a type system. Other forms of static analyses like data flow analysis may
also be part of static semantics. Newer programming languages like Java and C# have definite
assignment analysis, a form of data flow analysis, as part of their static semantics.
Dynamic semantics
Once data has been specified, the machine must be instructed to perform operations on the data. For
example, the semantics may define the strategy by which expressions are evaluated to values, or the
manner in which control structures conditionally execute statements. The dynamic semantics (also
known as execution semantics) of a language defines how and when the various constructs of a
language should produce a program behavior. There are many ways of defining execution semantics.
Natural language is often used to specify the execution semantics of languages commonly used in
practice. A significant amount of academic research went into formal semantics of programming
languages, which allow execution semantics to be specified in a formal manner. Results from this field
of research have seen limited application to programming language design and implementation
outside academia.
It uses syntax tree and symbol table to check whether the given program is semantically consistent
with language definition. It gathers type information and stores it in either syntax tree or symbol table.
This type information is subsequently used by compiler during intermediate- code generation.
Some of the semantics errors that the semantic analyzer is expected to recognize:
Type mismatch
Undeclared variable
Reserved identifier misuse.
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
1 Type Checking
Ensures that data types are used in a way consistent with their definition.
2 Label Checking
Keeps a check that control structures are used in a proper manner (example: no break statement
outside a loop)
7
3.2.6 Fundamental Semantic Issues of Variables, Nature of Names and Special Words in
Programming Laguages
Variables in programming tells how the data is represented which can be range from very simple value
to complex one. The value they contain can be change depending on condition. When creating a
variable, we also need to declare the data type it contains. This is because the program will use
different types of data in different ways. Programming languages define data types differently. Data
can hold a very simplex value like an age of the person to something very complex like a student track
record of his performance of whole year. It is a symbolic name given to some known or unknown
quantity or information, for the purpose of allowing the name to be used independently of the
information it represents. Compilers have to replace variables' symbolic names with the actual
locations of the data. While the variable name, type, and location generally remain fixed, the data
stored in the location may get altered during program execution.
For example, almost all languages differentiate between ‗integers‘ (or whole numbers, eg 12), ‗non-
integers‘ (numbers with decimals, eg 0.24), and ‗characters‘ (letters of the alphabet or words). In
programming languages, we can distinguish between different type levels which from the user's point
of view form a hierarchy of complexity, i.e. each level allows new data types or operations of greater
complexity.
8
Abstract level: Programmer defined abstract data types are a set of data objects with
declared operations on these data objects. The implementation or internal representation of
abstract data types is hidden to the users of these types to avoid uncontrolled manipulation
of the data objects (i.e the concept of encapsulation).
2 Naming conventions
Unlike their mathematical counterparts, programming variables and constants commonly take
multiple-character names, e.g. COST or total. Single-character names are most commonly used only
for auxiliary variables; for instance, i, j, k for array index variables. Some naming conventions are
enforced at the language level as part of the language syntax and involve the format of valid identifiers.
In almost all languages, variable names cannot start with a digit (0-9) and cannot contain whitespace
characters. Whether, which, and when punctuation marks are permitted in variable names varies from
language to language; many languages only permit the underscore (_) in variable names and forbid all
other punctuation. In some programming languages, specific (often punctuation) characters (known
as sigils) are prefixed or appended to variable identifiers to indicate the variable's type. Case-sensitivity
of variable names also varies between languages and some languages require the use of a certain case
in naming certain entities; most modern languages are case-sensitive; some older languages are not.
Some languages reserve certain forms of variable names for their own internal use; in many languages,
names beginning with 2 underscores (" _ _") often fall under this category.
3.2.7 Binding
Binding describes how a variable is created and used (or "bound") by and within the given program,
and, possibly, by other programs, as well. There are two types of binding; Dynamic, and Static binding.
3.2.8 Scope
The scope of a variable describes where in a program's text, the variable may be used, while the extent
(or lifetime) describes when in a program's execution a variable has a (meaningful) value. Scope is a
lexical aspect of a variable. Most languages define a specific scope for each variable (as well as any
other named entity), which may differ within a given program. The scope of a variable is the portion of
the program code for which the variable's name has meaning and for which the variable is said to be
"visible". It is also of two type; static and dynamic scope.
3.2.9 Referencing
The referencing environment is the collection of variable which can be used. In a static scoped
language, one can only reference the variables in the static reference environment. A function in a
static scoped language does have dynamic ancestors (i.e. its callers), but cannot reference any variables
declared in that ancestor.
Finally
Difference Between Syntax and Semantics
1. Syntax refers to the structure of a program written in a programming language. On the other hand,
semantics describes the relationship between the sense of the program and the computational model.
2. Syntactic errors are handled at the compile time. As against, semantic errors are difficult to find
and encounters at the runtime.
3. For example, in C++ a variable “s” is declared as “int s;”, to initialize it we must use an integer
value. Instead of using integer we have initialized it with “Seven”. This declaration and initialization
is syntactically correct but semantically incorrect because “Seven” does not represent integer form.
4. In relation syntactic interpretation must have some distinctive meaning, while semantic
component is associated with a syntactic representation.
ASSIGNMENT 3
Briefly explain the following
10