0% found this document useful (0 votes)
20 views

CSC 4405 Survey of Programming Languages Lecture 3

Uploaded by

Nuhu Adamu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

CSC 4405 Survey of Programming Languages Lecture 3

Uploaded by

Nuhu Adamu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

YOBE STATE UNIVERSITY, DAMATURU

FACULTY OF SCIENCE
DEPARTEMNT OF COMPUTER SCIENCE

PART THREE
Language Description
CSC4405:
SURVEY OF PLs

3.0 Introduction
The task of providing a concise yet understandable description of a programming language is difficult
but essential to the language‘s success has some languages have suffered the problem of having many
slightly different dialects, a result of a simple but informal and imprecise definition. One of the
problems in describing a language is the diversity of the people who must understand the description.
Among these are initial evaluators, implementers, and users. Most new programming languages are
subjected to a period of scrutiny by potential users, often people within the organization that employs
the language‘s designer, before their designs are completed. These are the initial evaluators.
Programming language implementers obviously must be able to determine how the expressions,
statements, and program units of a language are formed, and also their intended effect when
executed. The difficulty of the implementers‘ job is, in part, determined by the completeness and
precision of the language description.
Language users must be able to determine how to encode software solutions by referring to a language
reference manual. Textbooks and courses enter into this process, but language manuals are usually
the only authoritative printed information source about a language. The study of programming
languages, like the study of natural languages, can be divided into examinations of syntax and
semantics. The syntax of a programming language is the form of its expressions, statements, and
program units. Its semantics is the meaning of those expressions, statements, and program units.

3.1 FUNDAMENTAL SYNTACTIC ANALYSIS CONCEPT ON UNDERLYING MODERN PROGRAMMING


LANGUAGE
A language, whether natural (such as English, Arabic or Hausa) or artificial (such as C, C++ or Java), is a
set of strings of characters from some alphabet. The strings of a language are called sentences or
statements. The syntax rules of a language specify which strings of characters from the language‘s
alphabet are in the language. English, for example, has a large and complex collection of rules for
specifying the syntax of its sentences. By comparison, even the largest and most complex programming
languages are syntactically very simple.
Syntax is the set of rules that define what the various combinations of symbols mean. This tells the
computer how to read the code. Syntax refers to a concept in writing code dealing with a very specific
set of words and a very specific order to those words when we give the computer instructions. This
order and this strict structure is what enables us to communicate effectively with a computer. Syntax
is to code, like grammar is to English or any other language. A big difference though is that computers
are really exacting in how we structure that grammar or our syntax. This syntax is why we call
programming coding. Each programming language uses different words in a different structure in how
we give it information to get the computer to follow our instructions. Syntax analysis is a task
performed by a compiler which examines whether the program has a proper associated derivation tree
or not. The syntax of a programming language can be interpreted using the following formal and
informal techniques:

 Lexical syntax for defining the rules for basic symbols involving identifiers, literals, punctuators
and operators.
 Concrete syntax specifies the real representation of the programs with the help of lexical
symbols like its alphabet.
 Abstract syntax conveys only the vital program information.
The Syntax of a programming language is used to signify the structure of programs without considering
their meaning. It basically emphasizes the structure, layout of a program with their appearance. It
involves a collection of rules which validates the sequence of symbols and instruction used in a
program. In general, languages can be formally defined in two distinct ways: by recognition and by
generation.
3.1.1 Language Recognizer
The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this
role, the recognizer need not test all possible strings of characters from some set to determine whether
each is in the language. Rather, it need only to determine whether given programs are in the language.
In effect then, the syntax analyzer determines whether the given programs are syntactically correct.
Language recognizer is like a filters, separating legal sentences from those that are incorrectly formed.

2
3.1.2 Language Generator
A language generator is a device that can be used to generate the sentences of a language, a generator
seems to be a device of limited usefulness as a language descriptor. However, people prefer certain
forms of generators over recognizers because they can more easily read and understand them. By
contrast, the syntax-checking portion of a compiler (a language recognizer) is not as useful a language
description for a programmer because it can be used only in trial-and- error mode. For example, to
determine the correct syntax of a particular statement using a compiler, the programmer can only
submit a speculated version and note whether the compiler accepts it

3.1.3 Parsing
In linguistics, parsing is the process of analyzing a text, made
of a sequence of tokens (for example, words), to determine
its grammatical structure with respect to a given (more or
less) formal grammar. Parsing can also be used as a linguistic
term, especially in reference to how phrases are divided up in
garden path sentences. Fig. 3.1 shows overview of parsig
process.
3.1.3.1 Parse Trees
One of the most attractive features of grammars is that they
naturally describe the hierarchical syntactic structure of the
sentences of the languages they define. These hierarchical
structures are called parse trees. For example, the parse tree
in Figure 3.2 shows the structure of the assignment
statement derived. Every internal node of a parse tree is
labelled with a nonterminal symbol; every leaf is labelled with
a terminal symbol. Every subtree of a parse tree describes
one instance of an abstraction in the sentence. For example:
Figure 3. 1: parsing process
A parse tree for the simple statement A = B * (A + C)

3
Figure 3. 2: Parsing tree for A=B*(A+C)

3.1.3.2 Parser

In computing, a parser is one of the components in an interpreter or compiler, which checks for
correct syntax and builds a data structure (often some kind of parse tree, abstract syntax tree or
other hierarchical structure) implicit in the input tokens. The parser often uses a separate lexical
analyzer to create tokens from the sequence of input characters. Parsers may be programmed by
hand or may be (semi-)automatically generated (in some programming languages) by a tool.

3.1.4 Syntactic Ambiguity


Syntactic ambiguity is a property of sentences which may be reasonably interpreted in more than one
way, or reasonably interpreted to mean more than one thing. Ambiguity may or may not involve one
word having two parts of speech or homonyms. Syntactic ambiguity arises not from the range of
4
meanings of single words, but from the relationship between the words and clauses of a sentence, and
the sentence structure implied thereby. When a reader can reasonably interpret the same sentence as
having more than one possible structure, the text is equivocal and meets the definition of syntactic
ambiguity.

3.1.5 Operator Precedence


When several operations occur in an expression, each part is evaluated and resolved in a
predetermined order called operator precedence. Parentheses can be used to override the order of
precedence and force some parts of an expression to be evaluated before other parts. Operations
within parentheses are always performed before those outside. Within parentheses, however, normal
operator precedence is maintained. When expressions contain operators from more than one
category, arithmetic operators are evaluated first, comparison operators are evaluated next, and
logical operators are evaluated last. Comparison operators all have equal precedence; that is, they are
evaluated in the left-to-right order in which they appear. Arithmetic and logical operators are
evaluated in the following order of precedence:
Table 1: operators

Arithmetic Comparison Logical


Exponentiation (^) Equality (=) Not
Negation (-) Inequality (<>) And
Multiplication and division (*, /) Less than (<) Or
Integer division (\) Greater than (>) XOR
Modulus arithmetic (Mod) Less than or equal to (<=) Eqv
Addition and subtraction (+, -) Greater than or equal to (>=) Imp
String concatenation (&) Is &

When multiplication and division occur together in an expression, each operation is evaluated as it
occurs from left to right. Likewise, when addition and subtraction occur together in an expression, each
operation is evaluated in order of appearance from left to right. The string concatenation operator (&)
is not an arithmetic operator, but in precedence it does fall after all arithmetic operators and before
all comparison operators. The Is operator is an object reference comparison operator. It does not
compare objects or their values; it checks only to determine if two object references refer to the same
object.

3.2 FUNDAMENTAL SEMANTIC ANALYSIS CONCEPT ON UNDERLYING MODERN PROGRAMMING


LANGUAGE

Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination.
Now we‘ll move forward to semantic analysis, where we delve even deeper to check whether they
form a sensible set of instructions in the programming language. Whereas any old noun phrase
followed by some verb phrase makes a syntactically correct English sentence, a semantically correct
one has subject-verb agreement, proper use of gender, and the components go together to express an
idea that makes sense. For a program to be semantically valid, all variables, functions, classes, etc.

5
must be properly defined, expressions and variables must be used in ways that respect the type
system, access control must be respected, and so forth. Semantic analysis is the front end‘s
penultimate phase and the compiler‘s last chance to weed out incorrect programs. We need to ensure
the program is sound enough to carry on to code generation.

3.2.1 Semantic Analysis


Semantics term in a programming language is used to figure out the relationship among the syntax
and the model of computation. It emphasizes the interpretation of a program so that the
programmer could understand it in an easy way or predict the outcome of program execution. An
approach known as syntax-directed semantics is used to map syntactical constructs to the
computational model with the help of a function.
Semantic analysis is to provide the task acknowledgment and statements of a semantically correct
program. There are following styles of semantics.

i. Operational

Determining the meaning of a program in place of the calculation steps which are necessary to
idealized execution. Some definitions used structural operational semantics which intermediate state
is described on the basis of the language itself others use abstract machine to make use of more ad-
hoc mathematical constructions. With an operational semantics of a programming language, one
usually understands a set of rules for its expressions, statements, programs, etc., are evaluated or
executed. These guidelines tell how a possible implementation of a programming language should be
working and it is not difficult to give skills an implementation of an interpreter of a language in any
programming languages simply by monitoring and translating it operational semantics of the language
destination deployment.

ii. Denotational

Determining the meaning of a program as elements of a number of abstract mathematical structures

e.g. with regard to functions such as programming language specific mathematical functions.
iii. Axiomatic or logical

The definition of a program defining indirectly, by providing the axioms of logic to the characteristics
of the program. Compare with specification and verification.

3.2.2 Types of Sematic Analysis


Types of semantic analysis involves the following: static and dynamic semantics.

Static semantics

The static semantics defines restrictions on the structure of valid texts that are hard or impossible to
express in standard syntactic formalisms. For compiled languages, static semantics essentially include
those semantic rules that can be checked at compile time. Examples include checking that every
identifier is declared before it is used (in languages that require such declarations) or that the labels
on the arms of a case statement are distinct. Many important restrictions of this type, like checking
6
that identifiers are used in the appropriate context (e.g. not adding an integer to a function name), or
that subroutine calls have the appropriate number and type of arguments, can be enforced by defining
them as rules in a logic called a type system. Other forms of static analyses like data flow analysis may
also be part of static semantics. Newer programming languages like Java and C# have definite
assignment analysis, a form of data flow analysis, as part of their static semantics.

Dynamic semantics

Once data has been specified, the machine must be instructed to perform operations on the data. For
example, the semantics may define the strategy by which expressions are evaluated to values, or the
manner in which control structures conditionally execute statements. The dynamic semantics (also
known as execution semantics) of a language defines how and when the various constructs of a
language should produce a program behavior. There are many ways of defining execution semantics.
Natural language is often used to specify the execution semantics of languages commonly used in
practice. A significant amount of academic research went into formal semantics of programming
languages, which allow execution semantics to be specified in a formal manner. Results from this field
of research have seen limited application to programming language design and implementation
outside academia.

3.2.3 Semantic Analyzer

It uses syntax tree and symbol table to check whether the given program is semantically consistent
with language definition. It gathers type information and stores it in either syntax tree or symbol table.
This type information is subsequently used by compiler during intermediate- code generation.

3.2.4 Semantic Errors

Some of the semantics errors that the semantic analyzer is expected to recognize:

 Type mismatch
 Undeclared variable
 Reserved identifier misuse.
 Multiple declaration of variable in a scope.
 Accessing an out of scope variable.
 Actual and formal parameter mismatch.

3.2.5 Functions of Semantic Analysis

1 Type Checking

Ensures that data types are used in a way consistent with their definition.

2 Label Checking

A program should contain labels references.

3 Flow Control Check

Keeps a check that control structures are used in a proper manner (example: no break statement
outside a loop)
7
3.2.6 Fundamental Semantic Issues of Variables, Nature of Names and Special Words in
Programming Laguages

Attributes of variables, including type, address and value will be discussed.


1 Variables

Variables in programming tells how the data is represented which can be range from very simple value
to complex one. The value they contain can be change depending on condition. When creating a
variable, we also need to declare the data type it contains. This is because the program will use
different types of data in different ways. Programming languages define data types differently. Data
can hold a very simplex value like an age of the person to something very complex like a student track
record of his performance of whole year. It is a symbolic name given to some known or unknown
quantity or information, for the purpose of allowing the name to be used independently of the
information it represents. Compilers have to replace variables' symbolic names with the actual
locations of the data. While the variable name, type, and location generally remain fixed, the data
stored in the location may get altered during program execution.

For example, almost all languages differentiate between ‗integers‘ (or whole numbers, eg 12), ‗non-
integers‘ (numbers with decimals, eg 0.24), and ‗characters‘ (letters of the alphabet or words). In
programming languages, we can distinguish between different type levels which from the user's point
of view form a hierarchy of complexity, i.e. each level allows new data types or operations of greater
complexity.

 Elementary level: Elementary


(sometimes also called basic or simple)
types, such as integers, reals, boo leans,
and characters, are supported by nearly
every programming language. Data
objects of these types can be
manipulated by well-known operators,
like +,- , *, or /, on the programming level.
It is the task of the compiler to translate
the operators onto the correct machine
instructions, e.g. fixed-point and floating-
point operations.

 Structured level: Most high level


programming languages allow the definition of structured types which are based on simple
types. We distinguish between static and dynamic structures. Static structures are arrays,
records, and sets, while dynamic structures are a b it more complicated, since they are
recursively defined and may vary in size and shape during the execution of a program. Lists
and trees are dynamic structures.

8
 Abstract level: Programmer defined abstract data types are a set of data objects with
declared operations on these data objects. The implementation or internal representation of
abstract data types is hidden to the users of these types to avoid uncontrolled manipulation
of the data objects (i.e the concept of encapsulation).

2 Naming conventions
Unlike their mathematical counterparts, programming variables and constants commonly take
multiple-character names, e.g. COST or total. Single-character names are most commonly used only
for auxiliary variables; for instance, i, j, k for array index variables. Some naming conventions are
enforced at the language level as part of the language syntax and involve the format of valid identifiers.
In almost all languages, variable names cannot start with a digit (0-9) and cannot contain whitespace
characters. Whether, which, and when punctuation marks are permitted in variable names varies from
language to language; many languages only permit the underscore (_) in variable names and forbid all
other punctuation. In some programming languages, specific (often punctuation) characters (known
as sigils) are prefixed or appended to variable identifiers to indicate the variable's type. Case-sensitivity
of variable names also varies between languages and some languages require the use of a certain case
in naming certain entities; most modern languages are case-sensitive; some older languages are not.
Some languages reserve certain forms of variable names for their own internal use; in many languages,
names beginning with 2 underscores (" _ _") often fall under this category.

3.2.7 Binding
Binding describes how a variable is created and used (or "bound") by and within the given program,
and, possibly, by other programs, as well. There are two types of binding; Dynamic, and Static binding.

3.2.7.1 Dynamic Binding


Also known as Dynamic Dispatch) is the process of mapping a message to a specific sequence of code
(method) at runtime. This is done to support the cases where the appropriate method cannot be
determined at compile-time. It occurs first during execution, or can change during execution of the
program.

3.2.7.2 Static Binding


It occurs first before run time and remains unchanged throughout program execution

3.2.8 Scope
The scope of a variable describes where in a program's text, the variable may be used, while the extent
(or lifetime) describes when in a program's execution a variable has a (meaningful) value. Scope is a
lexical aspect of a variable. Most languages define a specific scope for each variable (as well as any
other named entity), which may differ within a given program. The scope of a variable is the portion of
the program code for which the variable's name has meaning and for which the variable is said to be
"visible". It is also of two type; static and dynamic scope.

3.2.8.1 Static Scope


The static scope of a variable is the most immediately enclosing block, excluding any enclosed blocks
where the variable has been re-declared. The static scope of a variable in a program can be determined
by simply studying the text of the program. Static scope is not affected by the order in which
procedures are called during the execution of the program.
9
3.2.8.2 Dynamic Scope
The dynamic scope of a variable extends to all the procedures called thereafter during program
execution, until the first procedure to be called that re-declares the variable.

3.2.9 Referencing
The referencing environment is the collection of variable which can be used. In a static scoped
language, one can only reference the variables in the static reference environment. A function in a
static scoped language does have dynamic ancestors (i.e. its callers), but cannot reference any variables
declared in that ancestor.

Finally
Difference Between Syntax and Semantics
1. Syntax refers to the structure of a program written in a programming language. On the other hand,
semantics describes the relationship between the sense of the program and the computational model.

2. Syntactic errors are handled at the compile time. As against, semantic errors are difficult to find
and encounters at the runtime.

3. For example, in C++ a variable “s” is declared as “int s;”, to initialize it we must use an integer
value. Instead of using integer we have initialized it with “Seven”. This declaration and initialization
is syntactically correct but semantically incorrect because “Seven” does not represent integer form.

4. In relation syntactic interpretation must have some distinctive meaning, while semantic
component is associated with a syntactic representation.

ASSIGNMENT 3
Briefly explain the following

1. define Language Recognizer, language Generator and Parsing


2. What are the advantages and disadvantages of dynamic scoping?
3. Define static binding and dynamic binding
4. In what ways are reserved words better than keywords?
5. Define lifetime, scope, static scope, and dynamic scope.

10

You might also like