0% found this document useful (0 votes)
126 views171 pages

Part 1 - PPL

This document provides an overview of the origins and development of programming languages. It discusses how early computer programmers had to manually reconfigure computer hardware to run programs. The development of machine language allowed programmers to store and run programs by entering binary codes. Assembler languages were then created to make machine language more user-friendly by using symbolic codes and instructions. FORTRAN was one of the earliest high-level languages and helped popularize algebraic notation for scientists. ALGOL further improved abstraction and defined structures like loops and conditionals that became standard in most later languages.

Uploaded by

John Abaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views171 pages

Part 1 - PPL

This document provides an overview of the origins and development of programming languages. It discusses how early computer programmers had to manually reconfigure computer hardware to run programs. The development of machine language allowed programmers to store and run programs by entering binary codes. Assembler languages were then created to make machine language more user-friendly by using symbolic codes and instructions. FORTRAN was one of the earliest high-level languages and helped popularize algebraic notation for scientists. ALGOL further improved abstraction and defined structures like loops and conditionals that became standard in most later languages.

Uploaded by

John Abaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 171

Part 1

Principles of Programming Languages

Topics / Content

Overview
The Origins of Programming Languages
Abstractions in Programming Languages
Computational Paradigms
Language Definition
Language Translation

Reference:
Kenneth C. Louden and Kenneth A. Lambert, Programming
Languages: Principles and Practice, 3rd Edition, Cengage Learning.
Pages 3 to 19
And Supplementary Instructional Manual

The Origins of Programming Languages


A vocabulary and set of grammatical rules for instructing
a computer to perform specific tasks.
The term programming language usually refers to highlevel languages, such as BASIC, C, C++, COBOL,
FORTRAN, Ada, and Pascal.
Each language has a unique set of keywords (words that
it understands) and a special syntax for organizing
program instructions.
High-level programming languages, while simple
compared to human languages, are more complex than
the languages the computer actually understands, called
machine languages.
Each different type of CPU has its own unique machine
language.

Origins of Programming

The Origins of Programming Languages


Before the middle of the 1940s, computer operators
hardwired their programs.
That is, they set switches to adjust the internal wiring of
a computer to perform the requested tasks.
This effectively communicated to the computer what
computations were desired, but programming, if it could
be called that, consisted of the expensive and errorprone activity of taking down the hardware to
restructure it.

Machine Language and the First Stored Programs

A major advance in computer design occurred


in the late 1940s, when John von Neumann
had the idea that a computer should be
permanently hardwired with a small set of
general-purpose operations.

The operator could then input into the


computer a series of binary codes that would
organize the basic hardware operations to solve
more-specific problems

Machine Language and the First Stored Programs


Instead of turning off the computer to reconfigure its
circuits, the operator could flip switches to enter these
codes, expressed in machine language, into computer
memory.
At this point, computer operators became the first true
programmers, who developed softwarethe machine
codeto solve problems with computers.

Machine Language and the First Stored Programs


To decode or interpret an instruction, the programmer
(and the hardware) must recognize the first 4 bits of the
line of code as an opcode, which indicates the type of
operation to be performed
Instruction Format
The general format for a machine language i t ti ns
ruction is Op code Operands The operands can be a
memory address a The operands can be a memory
address, a register or a value.

Machine Language and the First Stored Programs

Machine Language and the First Stored Programs

Example

6
0

1. Each line of code contains 16 bits or binary digits.


2. A line of 16 bits represents either a single machine language
instruction or a single data value.
3. The last three lines of code happen to represent data values
the integers 5, 6, and 0using 16-bit twos complement
notation.
4. The first five lines of code represent program instructions.
Program execution begins with the first line of code, which
is fetched from memory, decoded (interpreted), and
executed.
5. Control then moves to the next line of code, and the process
is repeated, until a special halt instruction is reached

Machine Language and the First Stored Programs

Machine language programming is not for the


meek.
Despite the improvement on the earlier
method of reconfiguring the hardware,
programmers were still faced with the tedious
and error-prone tasks of manually translating
their designs for solutions to binary machine
code and loading this code into computer
memory.

Assembler and Machine


Assembler language is the easy way to write
machine language.
Each line of an assembler program generates one
machine language instruction.
The assembler allows you to use variable names
instead of numerical addresses and instruction
mnemonics instead of numerical operation codes.
A program called an assembler translates the
symbolic assembly language code to binary
machine code. For example, lets say that the first
instruction in the program

Assembly Language

Example

LD R1, FIRST
In assembly language. The mnemonic symbol LD (short
for load) translates to the binary opcode 0010 seen in
line 1 of Figure 1.2. The symbols R1 and FIRST translate
to the register number 001 and the data address offset
000000100, respectively. After translation, another
program, called a loader, automatically loads the
machine code for this instruction into computer
memory.

Assembler and Machine


Programmers also used a pair of new input devicesa
keypunch machine to type their assembly language
codes and a card reader to read the resulting punched
cards into memory for the assembler.
These two devices were the forerunners of todays
software text editors. These new hardware and software
tools made it much easier for programmers to develop
and modify their programs.

Assembler and Machine


The assembler and loader would then update all of the
address references in the program, a task that machine
language programmers once had to perform manually.
Moreover, the assembler was able to catch some errors,
such as incorrect instruction formats and incorrect
address calculations, which could not be discovered until
run time in the pre-assembler era.

Assembler and Machine

Assembler and Machine


A second major shortcoming of assembly language is
due to the fact that each particular type of computer
hardware architecture has its own machine language
instruction set, and thus requires its own dialect of
assembly language.
Therefore, any assembly language program has to be
rewritten to port it to different types of machines.
The first assembly languages appeared in the 1950s.
They are still used today, whenever very lowlevel system
tools must be written, or whenever code segments must
be optimized by hand for efficiency.

Fortran and Algebraic Notation

Fortran and Algebraic Notation


One of the precursors of these high-level
languages was FORTRAN, an acronym for
FORmula TRANslation language.
John Backus developed FORTRAN in the early
1950s for a particular type of IBM computer. In
some respects, early FORTRAN code was
similar to assembly language.
It reflected the architecture of a particular type
of machine and lacked the structured control
statements and data structures of later highlevel languages.

Fortran and Algebraic Notation


However, FORTRAN did appeal to
scientists and engineers, who enjoyed its
support for algebraic notation and
floating-point numbers.
The language has undergone numerous
revisions in the last few decades, and
now supports many of the features that
are associated with other languages
descending from its original version.

Fortran and Algebraic Notation

The ALGOL Family


Soon after FORTRAN was introduced, programmers
realized that languages with still higher levels of abstraction
would improve their ability to write concise,
understandable instructions.
Moreover, they wished to write these high-level
instructions for different machine architectures with no
changes.
In the late 1950s, an international committee of computer
scientists (which included John Backus) agreed on a
definition of a new language whose purpose was to satisfy
both of these requirements.

The ALGOL Family


This language became ALGOL (an acronym for
ALGOrithmic Language). Its first incarnation, ALGOL-60,
was released in 1960.
ALGOL provided first of all a standard notation for
computer scientists to publish algorithms in journals. As
such, the language included notations for structured control
statements for sequencing (begin-end blocks), loops (the for
loop), and selection (the if and if-else statements).

The ALGOL Family


These types of statements have appeared in more or less
the same form in every high-level language since.
Likewise, elegant notations for expressing data of different
numeric types (integer and float) as well as the array data
structure were available. Finally, support for procedures,
including recursive procedures, was provided.

Abstractions in Programming
Languages

Abstractions in Programming
Languages
To abstract means simply to hide
something.
pseudo code example:

Two classes of abstraction mechanisms


are distinguished
Control abstraction : provides the
programmer the ability to hide
procedural data
Data abstraction : allow the
definition and use of sophisticated data
types without referring to how such
types will be implemented

Abstractions in Programming
Languages
Control abstraction:
Examples : loops, conditional statements,
and procedure calls
Data abstraction :
Examples : as numbers, character strings,
and search tree.

The process of picking out (abstracting)


common features of objects and
procedures.
A programmer would use abstraction,
for example, to note that two functions
perform almost the same task and can
be combined into a single function.

Abstraction is one of the most


important
techniques
in software
engineering and is closely related to two
other important techniques:

encapsulation
information hiding.

All three techniques are used to reduce


complexity.

#include <iostream.h>
class Add
{
private:
int x,y,r;
public:
int Addition(int x, int y)
{
r= x+y;
return r;
}
void show( )
{ cout << "The sum is::" << r <<
"\n";}
}s;
void main()
{
Add s;
s.Addition(10, 4);
s.show();
}

The integer values


"x,y,r" of the class
"Add" can be
accessed only
through the function
"Addition". These
integer values are
encapsulated inside
the class "Add".

abstract class Bank{


abstract int getRateOfInterest();
}
class SBI extends Bank{
int getRateOfInterest(){return 7;}
}
class PNB extends Bank{
int getRateOfInterest(){return 8;}
}
class TestBank{
public static void main(String args[]){
Bank b=new SBI();
// or
PNB
int interest=b.getRateOfInterest();
System.out.println("Rate of Interest is: "+interest+"
%");
}}

Abstractions
Levels of Abstractions
Measures of the amount of information
contained (and hidden) in the abstraction.
Basic abstractions collect the most
localized machine information.
Structured abstractions collect
intermediate information about the
structure of a program.
Unit abstractions collect large-scale
information in a program

Data: Basic Abstractions


Hides the internal representation of
common data values in a computer.
a = b + c;
Two values stored in two locations and then storing the result in a
new location..
What happens beneath?
There are registers, instruction sets, program counters, storage
units, etc involved. There is PUSH, POP happening. High level
language we use abstracts those complex details.
.

Data: Basic Abstractions


Variables: The use of symbolic names to
hide locations in computer memory that
contains data values.
Data Type: Value usually given names that
are variations of their corresponding
mathematical values.
Int, double, float, etc.

Data: Basic Abstractions


C declaration:
Int x;
Data Type

Variable

var x : integer
Variable

Data Type

Data: Structured Abstractions


Data structure
The principal method for collecting related data values into a single
unit.
Employee record
Contains:
Name
Surname
Given Name
Address
Permanent
Temporary
Phone number
Mobile
Landline

Data: Structured Abstractions


A group of items, all of which have the same data type and
which need to be kept together for purposes of sorting or
searching.
Array : Collects data into a sequence of individually indexed
items. Variables can name a data structure in a declaration,
as in the C:
Int a [10];
which establishes the variable a as the name of an array of
ten integer values.

Data: Structured Abstractions

Data: Structured Abstractions

Data: Structured Abstractions

Data: Unit Abstractions


Often associated with the concept of an
abstract data type
a set of data values and the operations on
those values.
Its main characteristic is the separation
of an interface (the set of operations
available to the user) from its
implementation
(the
internal
representation of data values and
operations)

Data: Unit Abstractions


Examples
package of Lisp, Ada, and Java:
In Java:

java.lang - bundles the fundamental classes.

java.io - classes for input , output functions are bundled in this


package

Data: Unit Abstractions

Data: Unit Abstractions


Additional property of a unit data abstraction that has become
increasingly important:

Reusability : the ability to reuse the data abstraction in


different programs, thus saving the cost of writing abstractions
from scratch for each program.

Typically, such data abstractions represent components


(operationally complete pieces of a program or user interface) and
are entered into a library of available components.

Data: Unit Abstractions

As such, unit data abstractions become the basis for language


library mechanisms (the library mechanism itself, as well as
certain standard libraries, may or may not be part of the
language itself).

The combination of units (their interoperability) is enhanced


by providing standard conventions for their interfaces

Data: Unit Abstractions

When programmers are given a new software resource to use,


they typically study its application programming interface
(API).

An API gives the programmer only enough information about


the resources classes, methods, functions, and performance
characteristics to be able to use that resource effectively.

Data: Unit Abstractions

An example of an API is Javas Swing Toolkit for graphical user


interfaces, as defined in the package javax.swing.

The set of APIs of a modern programming language, such as


Java or Python, is usually organized for easy reference in a set of
Web pages called a doc.

When Java or Python programmers develop a new library or


package of code, they create the API for that resource
using a software tool specifically designed to generate a doc.

Control Abstraction

Control : Basic Abstractions


SUM = FIRST + SECOND

The code fetches the value of the variables of FIRST and


SECOND, adds these values, then store the result location
called SUM.

FIRST

SECOND
SUM
Memory

Typical basic control abstractions are those statements in a


language that combine a few machine instructions into an
abstract statement that is easier to understand than the
machine instructions.

Control : Basic Abstractions


Syntactic sugar

A simpler, shorthand notation.


Replacing long notation into shorter one

Example:
operation x += 10 is the same as
x = x + 10,
in C, Java, and Python.

Control : Basic Abstractions

Control: Structured Abstractions


Selection and iteration are accomplished by the use of branch
instructions to memory locations other than the next one.
Example:

Control : Structured Abstractions


Comments have been added to aid the reader.
If the comments were not included, even a competent LC-3
programmer probably would not be able to tell at a glance what this
algorithm does.
Compare this assembly language code with the use of the structured
if and for statements in the functionally equivalent C++ or Java

Structured Abstractions

Structured control abstractions divide a program into groups of


instructions that are nested within tests that govern their
execution.

They, thus, help the programmer to express the logic of the


primary control structures of sequencing, selection, and iteration
(loops).

At the machine level, the processor executes a sequence of


instructions simply by advancing a program counter through the
instructions memory addresses.

Control: Structured Abstractions


Another structured form of iteration is provided by an iterator.
Iterator

Typically found in object-oriented languages.

An object that is associated with a collection, such as an array, a


list, a set, or a tree.

The programmer opens an iterator on a collection and then visits


all of its elements by running the iterators methods in the
context of a loop.

Data: Structured Abstractions


Example:
The Java code segment uses an iterator to print the contents of a
list, called exampleList, of strings

The iterator-based traversal of a collection is such a common loop


pattern that some languages, such as Java, provide syntactic sugar
for it, in the form of an enhanced for loop

Data: Structured Abstractions


Example:
We can use this type of loop to further simplify the Java code for
computing the sum of either an array or a list of integers

Control : Structured Abstractions

Control : Structured Abstractions


Another powerful mechanism for structuring control
is the procedure, sometimes also called a
subprogram or subroutine.
This allows a programmer to consider a sequence of
actions as a single action that can be called or invoked
from many other points in a program.

Control : Structured Abstractions


Procedural abstraction involves two things.

First, a procedure must be defined by giving it a name


and associating with it the actions that are to be
performed. This is called procedure declaration, and
it is similar to variable and type declaration, mentioned
earlier.

Second, the procedure must actually be called at the


point where the actions are to be performed.

This is sometimes also referred to as procedure


invocation or procedure activation.

Control : Structured Abstractions


As an example, consider the sample code fragment that
computes the greatest common divisor of integers u and v.
We can make this into a procedure in Ada:
procedure gcd(u, v: in integer; x: out integer) is
y, t, z: integer;
begin
z := u;
y := v;
loop
exit when y = 0;
t := y;
y := z mod y;
z := t;
end loop;
x := z;
end gcd;

Parameters:
things that change
from call to call

Control : Structured Abstractions


The system implementation of a procedure call
is a more complex mechanism than selection or
looping, since it requires the storing of
information about the condition of the program
at the point of the call and the way the called
procedure operates. Such information is stored
in a runtime environment.
An abstraction mechanism closely related to
procedures is the function, which can be
viewed simply as a procedure that returns a
value or result to its caller.

Control : Structured Abstractions


For example, the Ada code for the gcd procedure
can more appropriately be written as a function as
given below. Note that the gcd function uses a
recursive strategy to eliminate the loop that
appeared in the earlier version. The use of
recursion further exploits the abstraction
mechanism of the subroutine to simplify the code.
Simplified Ada Code:
function gcd(u, v: in integer) return integer is
begin
if v = 0
return u;
else
return gcd(v, u mod v);
end if;

Javascript Example:

Control: Unit Abstraction


Control can also be abstracted to include a
collection of procedures that provide
logically related services to other parts of
a program and that form a unit, or standalone, part of the program.
One kind of control abstraction that is
difficult to fit into any one abstraction
level is that of parallel programming
mechanisms.

Control: Unit Abstraction


Many modern computers have several
processors or processing elements and are
capable of processing different pieces of
data simultaneously.
A number of programming languages
include mechanisms that allow for the
parallel execution of parts of programs, as
well as providing for synchronization and
communication among such program
parts.

Control: Unit Abstraction


Java has mechanisms for declaring threads
(separately executed control paths within the Java system)
and processes (other programs executing outside the
Java system).
Ada provides the task mechanism for parallel
execution.
Adas tasks are essentially a unit abstraction,
whereas Javas threads and processes are classes and
so are structured abstractions, albeit part of the
standard java.lang package.
Other languages provide different levels of parallel
abstractions, even down to the statement level.

Control: Unit Abstraction

Control: Unit Abstraction

Note that even if the threads are started in sequence (1, 2, 3 etc.)
they may not execute sequentially, meaning thread 1 may not be
the first thread to write its name to System.out.
This is because the threads are in principle executing in parallel
and not sequentially. T
he JVM and/or operating system determines the order in which
the threads are executed. This order does not have to be the same
order in which they were started.

Principles of Programming
Languages
Part 2

Computational Paradigms

Programming languages began by imitating and


abstracting the operations of a computer.
It is not surprising that the kind of computer for which
they were written had a significant effect on their
design. In most cases, the computer in question was
the von Neumann model: a single central processing
unit that sequentially executes instructions that
operate on values stored in memory.
These are typical features of a language based on the
von Neumann model:
variables represent memory locations
assignment allows the program to operate on these
memory locations.

A programming language that is characterized


by these three propertiesthe sequential
execution of instructions, the use of variables
representing memory locations, and the use of
assignment to change the values of variablesis
called an imperative language, because its
primary feature is a sequence of statements that
represent commands, or imperatives.

Imperative language : A programming


language that is characterized by these three
properties:
the sequential execution of instructions
the use of variables representing memory
locations
the use of assignment to change the values of
variables

Because its primary feature is a sequence of


statements that represent commands, or
imperatives.

Most programming languages today are imperative, but, it is not


necessary for a programming language to describe computation
in this way.
Indeed, the requirement that computation be described as a
sequence of instructions, each operating on a single piece of data,
is sometimes referred to as the von Neumann bottleneck.
This bottleneck restricts the ability of a language to provide
either parallel computation, that is, computation that can be
applied to many different pieces of data simultaneously, or
nondeterministic computation, computation that does not
depend on order. Thus, it is reasonable to ask if there are ways
to describe computation that are less dependent on the von
Neumann model of a computer.
Imperative programming languages actually represent only one
paradigm, or pattern, for programming languages.

Von Neumann Bottleneck


In simplicity, traffic of data from cpu to memory

Von Neumann Bottleneck


In simplicity, traffic of data from cpu to memory

Von Neumann Bottleneck


In simplicity, traffic of data from cpu to memory

Alternative Paradigm aside from Imperative

Two alternative paradigms for describing computation


come from mathematics.
1. The functional paradigm is based on the abstract
notion of a function as studied in the lambda calculus.
2. The logic paradigm is based on symbolic logic.
. The importance of these paradigms is their correspondence
to mathematical foundations, which allows them to
describe program behaviour abstractly and precisely.
. This, in turn, makes it much easier to determine if a
program will execute correctly (even without a complete
theoretical analysis), and makes it possible to write concise
code for highly complex tasks.

A fourth programming paradigm, the object-oriented


paradigm, has acquired enormous importance over the
last 20 years.
Object-oriented languages allow programmers to write
reusable code that operates in a way that mimics the
behavior of objects in the real world; as a result,
programmers can use their natural intuition about the
world to understand the behavior of a program and
construct appropriate code.

In a sense, the object-oriented paradigm is an extension


of the imperative paradigm, in that it relies primarily on
the same sequential execution with a changing set of
memory locations, particularly in the implementation of
objects.

The difference is that the resulting programs


consist of a large number of very small pieces
whose interactions are carefully controlled and
yet easily changed.
Moreover, at a higher level of abstraction, the
interaction among objects via message passing
can map nicely to the collaboration of parallel
processors, each with its own area of memory.
The object-oriented paradigm has essentially
become a new standard.

Language Definition

Documentation for the early programming languages was


written in an informal way, in ordinary English.
However, programmers soon became aware of the need for
more precise descriptions of a language, to the point of
needing formal definitions of the kind found in
mathematics.
For example, without a clear notion of the meaning of
programming language constructs, a programmer has no clear
idea of what computation is actually being performed.
Moreover, it should be possible to reason mathematically
about programs, and to do this requires formal verification or
proof of the behavior of a program.

Without a formal definition of a language this is impossible.

The best way to achieve the need for formal


definition and the need for machine or
implementation independence is through
standardization, which requires an independent
and precise language definition that is
universally accepted.
Standards organizations such as ANSI
(American National Standards Institute) and
ISO
(International
Organization
for
Standardization) have published definitions for
many languages, including C, C++, Ada,
Common Lisp, and Prolog.

A further reason for a formal definition is that,


inevitably in the programming process, difficult
questions arise about program behaviour and
interaction.
Programmers need an adequate way to answer
such questions besides the often-used trial-anderror process: it can happen that such questions
need to be answered already at the design stage
and may result in major design changes.
Finally, the requirements of a formal definition
ensure discipline when a language is being
designed.

Often a language designer will not realize the


consequences of design decisions until he or she
is required to produce a clear definition.
Language definition can be loosely divided into
two parts:
1. syntax, or structure, and
2. semantics, or meaning.

Language Syntax
The syntax of a programming language is in many
ways like the grammar of a natural language. It is the
description of the ways different parts of the language
may be combined to form phrases and, ultimately,
sentences.

Language Syntax
The description of language syntax is one of the areas
where formal definitions have gained acceptance, and
the syntax of all languages is now given using a
grammar. For example, a grammar rule for the C if
statement can be written as follows:

Language Syntax

Language Syntax
If else Statement in Java

Language Syntax
If else Statement in Visual Basic 6.0

Language Syntax
If else Statement in C

Language Syntax
If else Statement in PHP

Language Syntax
The lexical structure of a programming language is
the structure of the languages words, which are
usually called tokens.
Thus, lexical structure is similar to spelling in a
natural language.
In the example of a C if statement, the words if and else
are tokens. Other tokens in programming languages
include identifiers (or names), symbols for operations,
such as + and * and special punctuation symbols such
as the semicolon (;) and the period (.).

Language Syntax

Language Syntax

Language Syntax
Tokens in C:

Language Syntax
Tokens in Java:

Language Semantics
Syntax represents only the surface structure of a
language and, thus, is only a small part of a language
definition. The semantics, or meaning, of a language
is much more complex and difficult to describe
precisely.
The first difficulty is that meaning can be defined in
many different ways. Typically, describing the
meaning of a piece of code involves describing the
effects of executing the code, but there is no standard
way to do this.

Language Semantics
Moreover, the meaning of a particular mechanism
may involve interactions with other mechanisms in
the language, so that a comprehensive description of
its meaning in all contexts may become extremely
complex.

Language Semantics
This description itself points out some of the
difficulty in specifying semantics, even for a simple
mechanism such as the if statement.
The description makes no mention of what happens if
the condition evaluates to 0, but there is no else part
(presumably nothing happens; that is, the program
continues at the point after the if statement).

Language Semantics
Another important question is whether the if
statement is safe in the sense that there are no
other language mechanisms that may permit the
statements inside an if statement to be executed
without the corresponding evaluation of the if
expression.
If so, then the if-statement provides adequate
protection from errors during execution, such as
division by zero:
if (x != 0) y = 1 / x;

Language Semantics
Otherwise, additional protection mechanisms may be
necessary (or at least the programmer must be aware
of the possibility of circumventing the if expression).
The alternative to this informal description of
semantics is to use a formal method. However, no
generally accepted method, analogous to the use of
context-free grammars for syntax, exists here either..

Language Semantics
Indeed, it is still not customary for a formal definition
of the semantics of a programming language to be
given at all.
Nevertheless, several notational systems for formal
definitions have been developed and are increasingly
in use. These include:
operational semantics
denotational semantics
axiomatic semantics.

Language Translation

Language Translation
For a programming language to be useful, it must have
a translatorthat is, a program that accepts other
programs written in the language in question and that
either executes them directly or transforms them into
a form suitable for execution.
A translator that executes a program directly is called
an interpreter, while a translator that produces an
equivalent program in a form suitable for execution is
called a compiler.

Language Translation

An interpreter can be viewed as a simulator for a


machine whose machine language is the language
being translated.

Language Translation
Compilation, on the other hand, is at least a two-step
process:

1. the original program (or source program) is input to


the compiler,
2. and a new program (or target program) is output
from the compiler.

This target program may then be executed, if it is in a


form suitable for direct execution (i.e., in machine
language).

Language Translation
More commonly, the target language is assembly
language, and the target program must be translated
by an assembler into an object program, and then
linked with other object programs, and loaded into
appropriate memory locations before it can be
executed.
Sometimes the target language is even another
programming language, in which case a compiler for
that language must be used to obtain an executable
object program.

Alternatively, the target language is a form of


low-level code known as byte code.

After a compiler translates a programs source


code to byte code, the byte code version of the
program is executed by an interpreter.

This interpreter, called a virtual machine, is


written differently for different hardware
architectures, whereas the byte code, like the
source language, is machine-independent.
Languages such as Java and Python compile to
byte code and execute on virtual machines,
whereas languages such as C and C++ compile to
native machine code and execute directly on
hardware.

It is important to keep in mind that a language and the


translator for that language are two different things. It is
possible for a language to be defined by the behaviour of a
particular interpreter or compiler (a so-called
definitional translator), but this is not.

More often, a language definition exists independently, and a translator may or


may not adhere closely to the language definition (one hopes the former).
When writing programs one must always be aware of those features and
properties that depend on a specific translator and are not part of the language
definition.
There are significant advantages to be gained from avoiding nonstandard
features as much as possible.

Principles of Programming
Languages
Part 3

Language Design

What is good programming language design?


By what criteria do we judge it?
The success or failure of a language often
depends on complex interactions among many
language mechanisms.

Practical matters not directly connected to language


definition also have a major effect on the success or failure
of a language.
These include the availability, price, and quality of
translators. Politics, geography, timing, and markets also
have an effect.
The C programming language has been a success at least
partially because of the success of the UNIX operating system,
which supported its use.
COBOL, though chiefly ignored by the computer science
community, continues as a significant language because of its
use in industry, and because of the large number of legacy
applications (old applications that continue to be maintained).

The language Ada achieved immediate influence


because of its required use in certain U.S. Defense
Department projects.
Java and Python have achieved importance through
the growth of the Internet and the free distribution
of these languages and their programming
environments.
The Smalltalk language never came into widespread
use, but most successful object-oriented languages
borrowed a large number of features from it.

Languages succeed for as many different reasons


as they fail.
Some language designers argue that an
individual or small group of individuals have a
better chance of creating a successful language
because they can impose a uniform design
concept.
This was true, for example, with Pascal, C, C++,
APL, SNOBOL, and LISP, but languages
designed by committees, such as COBOL, Algol,
and Ada, have also been successful.

When creating a new language, its essential to decide on an overall goal for the
language, and then keep that goal in mind throughout the entire design
process.
This is particularly important for special purpose languages, such as database
languages, graphics languages, and real-time languages, because the particular
abstractions for the target application area must be built into the language
design.
However, it is true for general-purpose languages as well.
For example, the designers of FORTRAN focused on efficient execution, whereas
the designers of COBOL set out to provide an English-like nontechnical
readability.
Algol60 was designed to provide a block-structured language for describing
algorithms and Pascal was designed to provide a simple instructional language to
promote top-down design.
Finally, the designer of C++ focused on the users needs for greater abstraction
while preserving efficiency and compatibility with C.

Nevertheless, it is still extremely difficult to describe good


programming language design.
Even noted computer scientists and successful language designers offer
conflicting advice.
Niklaus Wirth, the designer of Pascal, advises that simplicity is
paramount .
C. A. R. Hoare, a prominent computer scientist and co-designer of a
number of languages, emphasizes the design of individual language
constructs.
Bjarne Stroustrup, the designer of C++, notes that a language cannot
be merely a collection of neat features.
Fred Brooks, a computer science pioneer, maintains that language
design is similar to any other design problem, such as designing a
building .

Historical Overview
In the early days of programming, machines were extremely slow and memory
was scarce. Program speed and memory usage were, therefore, the prime
concerns.
Also, some programmers still did not trust compilers to produce efficient
executable code (code that required the fewest number of machine instructions
and the smallest amount of memory).
Thus, one principal design criterion really mattered: efficiency of
execution.
For example, FORTRAN was specifically designed to allow the programmer to
generate compact code that executed quickly.
Indeed, with the exception of algebraic expressions, early FORTRAN code more or
less directly mapped to machine code, thus minimizing the amount of translation
that the compiler would have to perform.

Historical Overview
Judging by todays standards, creating a high-level programming language
that required the programmer to write code nearly as complicated as machine
code might seem counterproductive.
After all, the whole point of a high-level programming language is to make
life easier for the programmer.
In the early days of programming, however, writability the quality of a
language that enables a programmer to use it to express a computation
clearly, correctly, concisely, and quicklywas always subservient to efficiency.
Moreover, at the time that FORTRAN was developed, programmers were less
concerned about creating programs that were easy for people to read and
write, because programs at that time tended to be short, written by one or a
few programmers, and rarely revised or updated except by their creators.

Historical Overview
By the time COBOL and Algol60 came on the scene, in
the 1960s, languages were judged by other criteria
than simply the efficiency of the compiled code.
For example, Algol60 was designed to be suitable or
expressing algorithms in a logically clear and concise
wayin other words, unlike FORTRAN, it was
designed for easy reading and writing by people.

Historical Overview
To achieve this design goal, Algol60s designers
incorporated block structure, structured control
statements, a more structured array type, and
recursion.
These features of the language were very effective.
For example, C. A. R. Hoare understood how to express his
QUICKSORT algorithm clearly only after learning Algol60.

Historical Overview
COBOLs designers attempted to improve the readability of
programs by trying to make them look ike ordinary written
English.
In fact, the designers did not achieve their goal. Readers
were not able to easily understand the logic or behavior of
COBOL programs.
They tended to be so long and verbose that they were harder
to read than programs written in more formalized code.
But human readability was, perhaps for the first time, a
clearly stated design goal

148

Chapter 3 - Language Design


Principles
Chapter 3
Louden

149
Chapter 3
Louden

The language design problem


Language design is difficult, and success is hard
to predict:
Pascal a success, Modula-2 a failure
Algol60 a success, Algol68 a failure
FORTRAN a success, PL/I a failure

Conflicting advice

Efficiency

150
Chapter 3
Louden

The first goal (FORTRAN): execution efficiency.


Still an important goal in some settings (C++, C).
Many other criteria can be interpreted from the
point of view of efficiency:
programming efficiency: writability or
expressiveness (ability to express complex
processes and structures)
reliability (security).
maintenance efficiency: readability.
(saw this as a goal for first time in Cobol)

151
Chapter 3
Louden

Other kinds of efficiency


efficiency of execution (optimizable)
efficiency of translation. Are there features
which are extremely difficult to check at compile
time (or even run time)? e.g. Alogol prohibits
assignment to dangling pointers
Implementability (cost of writing translator)

152
Chapter 3
Louden

Features that aid efficiency of execution


Static data types allow efficient allocation and
access.
Manual memory management avoids
overhead of garbage collection.
Simple semantics allow for simple structure
of running programs (simple environments Chapter 8).

153
Chapter 3
Louden

Note conflicts with efficiency


Writability, expressiveness: no static data
types (variables can hold anything, no need
for type declarations). [harder to maintain]
Reliability, writability, readability: automatic
memory management (no need for pointers).
[runs slower]
Expressiveness, writability, readability: more
complex semantics, allowing greater
abstraction. [harder to translate]

154

Internal consistency of a language design:


Regularity
Chapter 3
Louden

Regularity is a measure of how well a language


integrates its features, so that there are no
unusual restrictions, interactions, or behavior.
Easy to remember.
Regularity issues can often be placed in
subcategories:
Generality: are constructs general enough? (Or too
general?)
Orthogonality: are there strange interactions?
Uniformity: Do similar things look the same, and do
different things look different?

155
Chapter 3
Louden

Generality deficiencies
In pascal, procedures can be passed as
parameters, but no procedure variable.
Pascal has no variable length arrays length is
defined as part of definition (even when
parameter)

156
Chapter 3
Louden

Orthogonality: independence
Not context sensitive
Seems similar to generality but more of an odd
decision rather than a limitation.
For example, if I buy a sweater, I may have the
following choices:
short sleeve, long sleeve, or sleeveless
small, medium, or large
red, green, or blue

157
Chapter 3
Louden

Limitations to sweater example:


If it is not possible to get sleeveless sweaters,
that may be a lack of generality.
If any combination of any attributes can be used
together, it is orthogonal.
If red sweaters cannot be purchased in a small
size, but other sweaters can, it is non-orthogonal

158

Orthogonality

Chapter 3
Louden

a relatively small set of primitive constructs can be


combined in a relatively small number of ways. Every
possible combination is legal.
For example - in IBM assembly language there are
different instructions for adding memory to register
or register to register (non-orthogonal).
In Vax, a single add instruction can have arbitrary
operands.
Closely related to simplicity - the more orthogonal, the
fewer rules to remember.

159
Chapter 3
Louden

For examples of non-orthogonality consider


C++:
We can convert from integer to float by simply
assigning a float to an integer, but not vice versa.
(not a question of ability to do generality, but of
the way it is done)
Arrays are pass by reference while integers are
pass by value.
A switch statement works with integers,
characters, or enumerated types, but not doubles
or Strings.

160

Regularity examples from C++


Chapter 3
Louden

Functions are not general: there are no local


functions (simplicity of environment).
Declarations are not uniform: data
declarations must be followed by a semicolon,
function declarations must not.
Lots of ways to increment lack of uniformity
(++i, i++, i = i+1)
i=j and i==j look the same, but are different.
Lack of uniformity

161
Chapter 3
Louden

What about Java?


Are function declarations non-general?
There are no functions, so a non-issue. (Well, what
about static methods?)

Are class declarations non-general?


No multiple inheritance (but there is a reason:
complexity of environment).
Java has a good replacement: interface inheritance.

Do declarations require semicolons?


Local variables do, but is that an issue? (Not really they look like statements.)

162
Chapter 3
Louden

Java regularity, continued


Are some parameters references, others not?
Yes: objects are references, simple data are copies.
This is a result of the non-uniformity of data in Java,
in which not every piece of data is an object.
The reason is efficiency: simple data have fast access.

What is the worst non-regularity in Java?


My vote: arrays. But there are excuses.

163
Chapter 3
Louden

Other design principles


Simplicity: make things as simple as possible, but
not simpler (Einstein). (Pascal, C)
We can make things so simple that it doesnt
work well no string handling, no reasonable
I/0
Can be cumbersome to use or inefficient.

164
Chapter 3
Louden

Other design principles


Expressiveness: make it possible to express
conceptual abstractions directly and simply.
(Scheme)
Helps you to think about the problem.
Perl, for example, allows you to return multiple
arguments:
($a,$b) = swap($a,$b);

165
Chapter 3
Louden

Other design principles


Extensibility: allow the programmer to extend
the language in various ways. (Scheme, C++)
Types, operators
Security: programs cannot do unexpected
damage. (Java)
discourages errors
allows errors to be discovered
type checking

166
Chapter 3
Louden

Other design principles (cont.)


Preciseness: having a definition that can answer
programmers and implementors questions.
(Most languages today, but only one has a
mathematical definition: ML).
If it isnt clear, there will be differences.
Example: Declaration in local scope (for loop)
unknown/known after exit
Example: implementation of switch statement
Example: constants expressions or not?
Example: how much accuracy of float?

167
Chapter 3
Louden

Other design principles (cont.)


Machine-independence: should run the same on
any machine. (Java- big effort)
Consistent with accepted notations easy to
learn and understand for experienced
programmers (Most languages today, but not
Smalltalk & Perl)
Restrictability: a programmer can program
effectively in a subset of the full language. (C++:
avoids runtime penalties)

168
Chapter 3
Louden

C++ case study


Thanks to Bjarne Stroustrup, C++ is not only a
great success story, but also the bestdocumented language development effort in
history:
1997: The C++ Programming Language, 3rd
Edition (Addison-Wesley).
1994: The Design and Evolution of C++ (AddisonWesley).
1993: A History of C++ 1979-1991, SIGPLAN
Notices 28(3).

169
Chapter 3
Louden

Major C++ design goals


OO features: class, inheritance
Strong type checking for better compile-time
debugging
Efficient execution
Portable
Easy to implement
Good interfaces with other tools

170
Chapter 3
Louden

Supplemental C++ design goals


C compatibility (but not an absolute goal: no
gratuitous incompatibilities)
Incremental development based on experience.
No runtime penalty for unused features.
Multiparadigm
Stronger type checking than C
Learnable in stages
Compatibility with other languages and systems

171
Chapter 3
Louden

C++ design errors


Too big?
C++ programs can be hard to understand and
debug
Not easy to implement
Defended by Stroustrup: multiparadigm features
are worthwhile

No standard library until late (and even then


lacking major features)
Stroustrup agrees this has been a major problem

You might also like