L01 Introduction Programming Languages
L01 Introduction Programming Languages
Languages
Paul Fodor
CSE260, Computer Science B: Honors
Stony Brook University
http://www.cs.stonybrook.edu/~cse260
1
Objectives
A History of Programming Languages
Why study programming languages?
Classifications of Programming Languages
Compilation vs. Interpretation
Implementation strategies
Programming Environment Tools
An Overview of Compilation
2
(c) Paul Fodor & Pearson Inc.
History of Programming Languages
At the beginning there was only machine language:
a sequence of bits (binary with: electric switches on
and off, or punched cards) that directly controls a
processor, causing it to add, compare, move data
from one place to another, etc.
Example: GCD program in x86 machine language:
3
(c) Paul Fodor (CS Stony Brook) and Elsevier
History of Programming Languages
Assembly languages were invented to allow machine-
level/processor operations to be expressed with
mnemonic abbreviations
For example, to add two numbers, you might write an
instruction in assembly code like this:
ADDF3 R1, R2, R3
A program called assembler is used to convert
assembly language programs into machine code
Assembly Source File
Machine Code File
…
Assembler …
ADDF3 R1, R2, R3
1101101010011010
…
…
4
(c) Paul Fodor (CS Stony Brook) and Elsevier
Example: GCD program in x86
assembly:
5
(c) Paul Fodor (CS Stony Brook) and Elsevier
History of Programming Languages
Assemblers were eventually augmented with elaborate
“macro expansion” facilities to permit programmers to
define parameterized abbreviations for common
sequences of instructions
Problem: each different kind of computer had to be
programmed in its own assembly language
People began to wish for machine-independent
languages
These wishes led in the mid-1950s to the development of
standard higher-level languages compiled for different
architectures by compilers
6
(c) Paul Fodor (CS Stony Brook) and Elsevier
History of Programming Languages
Today there are thousands of high-level programming
languages, and new ones continue to emerge
7
(c) Paul Fodor (CS Stony Brook) and Elsevier
Why study programming languages?
Why do we have programming languages?
What is a language for?
way of thinking -- way of expressing algorithms
way of specifying what you want (see declarative
languages; e.g., constraint solving languages, where
one declares constraints while the system find models
that satisfy all constraints)
access special features of the hardware
8
(c) Paul Fodor (CS Stony Brook) and Elsevier
Why study programming languages?
What makes a language successful?
easy to learn (BASIC, Pascal, LOGO, Scratch, python)
easy to express things, i.e., easy to use once fluent (C,
Java, Common Lisp, Perl, APL, Algol-68)
easy to deploy (Javascript, BASIC, Forth)
possible to compile to very good (fast/small) code (C,
Fortran)
backing of a powerful sponsor that makes them "free"
(Java, Visual Basic, COBOL, PL/1, Ada)
real wide dissemination at minimal cost (python, Pascal,
Turing, Erlang)
9
(c) Paul Fodor (CS Stony Brook) and Elsevier
Why study programming languages?
Help you choose a language for specific tasks:
C vs. C++ for systems programming (e.g., OS kernels,
drivers, file systems)
Matlab vs. Python vs. R for numerical computations
C vs. python vs. Android vs. Swift vs. ObjectiveC for
embedded systems
Python vs. perl vs. Ruby vs. Common Lisp vs. Scheme vs. ML for
symbolic data manipulation
Java RPC vs. C/CORBA vs. REST for networked programs
Python vs. perl vs. Java for scripting and string
manipulations
10
(c) Paul Fodor (CS Stony Brook) and Elsevier
Why study programming languages?
Make it easier to learn new languages
programming languages are similar (same way of doing things)
because it is easy to walk down family tree
Important: concepts have even more similarity: if
you think in terms of iteration, recursion, abstraction
(method and class), then you will find it easier to
assimilate the syntax and semantic details of a new
language than if you try to pick it up in a vacuum
Think of an analogy to human languages: good grasp of
grammar makes it easier to pick up new languages (at least
Indo-European)
11
(c) Paul Fodor (CS Stony Brook) and Elsevier
Why study programming languages?
Help you make better use of whatever language you use
understand implementation costs: choose between
alternative ways of doing things, based on knowledge of what
will be done underneath:
use x*x instead of x**2
avoid call by value with large data items in Pascal
avoid the use of call by name in Algol 60
choose between computation and table lookup
understanding "obscure" features:
In C, it will help you understand pointers (including arrays and
strings), unions, catch and throw
In Common Lisp, it will help you understand first-class functions,
closures, streams
12
(c) Paul Fodor (CS Stony Brook) and Elsevier
Why study programming languages?
figure out how to do things in languages that don't
support them explicitly:
lack of recursion in Fortran, CSP
unfold the recursive algorithm to mechanically eliminate recursion
and write a non-recursive algorithm (even for things that aren't
quite tail recursive)
lack of suitable structures in Fortran
use comments and programmer discipline
o lack of named constants and enumerations in Fortran
• use identifiers with upper-case letters only that are initialized
once, then never changed
lack of modules in Pascal
include module name in method name and use comments and
programmer discipline
13
(c) Paul Fodor (CS Stony Brook) and Elsevier
Classifications of Programming Languages
Many classifications group languages as:
imperative
procedural (von Neumann/Turing-based) (Fortran, C,
Pascal, Basic)
object-oriented imperative (Smalltalk, C++, Eiffel, Java)
scripting languages (Perl, python, JavaScript, PHP)
declarative
functional (Lisp, Scheme, ML, F#)
also adopted by JavaScript for call-back methods
logic and constraint-based (Prolog, Flora2, clingo, Zinc)
Many more classes: markup languages, assembly
14
languages, query languages, etc.
(c) Paul Fodor (CS Stony Brook) and Elsevier
Classification of PL
GCD Program in different languages, like C, python, SML and Prolog:
In C: • In Python:
int main() { def gcd(a, b):
if a == b:
int i = getint(), j = getint(); return a
while (i != j) { else:
if a > b:
if (i > j) i = i - j;
return gcd(a-b, b)
else j = j - i; else:
} return gcd(a, b-a)
putint(i);
}
In SML: • In Prolog:
gcd(A,A,A).
fun gcd(m,n):int = if m=n then n gcd(A,B,G) :- A > B, C is A-B, gcd(C,B,G).
= else if m>n then gcd(m-n,n) gcd(A,B,G) :- A < B, C is B-A, gcd(C,A,G).
= else gcd(m,n-m);
15
(c) Paul Fodor (CS Stony Brook) and Elsevier
Compilation vs. Interpretation
Compilation vs. interpretation
not opposites
not a clear-cut distinction
Pure Compilation:
The compiler translates the high-level source
program into an equivalent target program
(typically in machine language), and then goes away:
16
(c) Paul Fodor (CS Stony Brook) and Elsevier
Compilation vs. Interpretation
Pure Interpretation:
The interpreter stays around for the execution
of the program
17
(c) Paul Fodor (CS Stony Brook) and Elsevier
Compilation vs. Interpretation
Compilation
Better performance!
Interpretation:
Greater flexibility: real-time
development
Better diagnostics (error messages)
18
(c) Paul Fodor (CS Stony Brook) and Elsevier
Compilation vs. Interpretation
Most modern language implementations
include a mixture of both compilation and
interpretation
Compilation followed by interpretation:
19
(c) Paul Fodor (CS Stony Brook) and Elsevier
Compilation vs. Interpretation
Note that compilation does NOT have to
produce machine language for some sort of
hardware
Compilation is translation from one language into
another, with full analysis of the meaning of the
input
Compilation entails semantic understanding of
what is being processed; pre-processing does not
A pre-processor will often let errors through
20
(c) Paul Fodor (CS Stony Brook) and Elsevier
Compilation vs. Interpretation
Many compiled languages have interpreted
pieces, e.g., formats in Fortran or C
Most compiled languages use “virtual
instructions”
set operations in Pascal
string manipulation in Basic
21
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
The Preprocessor:
Removes comments and white space
Expands abbreviations in the style of a macro assembler
Conditional compilation: if-else directives #if, #ifdef,
#ifndef, #else, #elif and #endif – example:
#ifdef __unix__
# include <unistd.h>
#elif defined _WIN32
# include <windows.h>
#endif
Groups characters into tokens (keywords, identifiers,
numbers, symbols)
Identifies higher-level syntactic structures (loops,
subroutines)
22
(c) Paul Fodor (CS Stony Brook) and Elsevier
Preprocessor
The C Preprocessor:
removes comments
expands macros
conditional compilation
23
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Library of Routines and Linking
The compiler uses a linker program to merge the
appropriate library of subroutines (e.g., math
functions such as sin, cos, log, etc.) into the final
program:
24
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Post-compilation Assembly: the compiler's output is
assembly instead of machine code
Facilitates debugging (assembly language easier for people to read)
Isolates the compiler from changes in the format of machine language
files (only assembler must be changed, is shared by many compilers)
25
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Source-to-Source Translation
C++ implementations based on
the early AT&T compiler
generated an intermediate
program in C, instead of an
assembly language
26
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Bootstrapping: many compilers are self-hosting: they
are written in the language they compile
How does one compile the compiler in the first
place?
One starts with a minimal subset of the language
implementation—often an interpreter (which could
be written in assembly language) to compile a core
language (parsing, semantic analysis and execution).
Then successively use this small implementation to
compile expanded versions of the compiler.
27
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Bootstrapping:
Assemblers were the first language tools to bootstrap
themselves
Java is a self-hosting compiler. So are: Basic, C, C++, C#,
OCaml, Perl6, python, XSB.
It is a form of dogfooding (Using your own product, Eating
your own dog food)
The language is able to reproduce itself.
Developers only need to know the language being compiled.
are also users of the language and part of bug-reporting
Improvements to the compiler improve the compiler itself.
28
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Bootstrapping is related to self-hosting:
Ken Thompson started development on Unix in 1968 by
writing the initial Unix kernel, a command interpreter, an
editor, an assembler, and a few utilities on GE-635.
o Then the Unix operating system became self-hosting:
programs could be written and tested on Unix itself.
Development of the Linux kernel was initially hosted on a
Minix system.
o When sufficient packages, like GCC, GNU bash and
other utilities are ported over, developers can work on
new versions of Linux kernel based on older versions of
itself (like building kernel 3.21 on a machine running
kernel 3.18).
29
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Compilation of Interpreted Languages (e.g., python, Lisp):
Compilers exist for some interpreted languages, but
they aren't pure:
selective compilation of compilable pieces and leave
sophisticated language uses to an interpreter kept at
runtime
30
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Dynamic and Just-in-Time Compilation:
In some cases, a programming system may
deliberately delay compilation until the last possible
moment
Lisp or Prolog invoke the compiler on the fly, to translate
newly created sources into machine language, or to
optimize the code for a particular input set (e.g.,
dynamic indexing in Prolog)
31
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Microcode
Code written in low-level instructions
(microcode or firmware), which are stored in
read-only memory and executed by the hardware.
32
(c) Paul Fodor (CS Stony Brook) and Elsevier
Implementation strategies
Unconventional compilers:
text formatters: TEX/LaTex and troff are actually
compilers
silicon compilers: laser printers themselves
incorporate interpreters for the Postscript page
description language
query language processors for database systems are
also compilers: translate languages like SQL into
primitive operations (e.g., tuple relational calculus
and domain relational calculus)
33
(c) Paul Fodor (CS Stony Brook) and Elsevier
Programming Environment Tools
Compilers and interpreters do not exist in isolation
Programmers are assisted by tools and IDEs
34
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Phases of Compilation
35
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Scanning is recognition of a regular language, e.g.,
via DFA (Deterministic finite automaton)
divides the program into "tokens", which are the
smallest meaningful units; this saves time, since
character-by-character processing is slow
you can design a parser to take characters instead of
tokens as input, but it isn't pretty
36
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Example, take the GCD Program (in C):
int main() {
int i = getint(), j = getint();
while (i != j) {
if (i > j) i = i - j;
else j = j - i;
}
putint(i);
}
37
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Lexical and Syntax Analysis
GCD Program Tokens
Scanning (lexical analysis) and parsing recognize the structure of the
program, groups characters into tokens, the smallest meaningful units
of the program
int main ( ) {
int i = getint ( ) , j = getint ( ) ;
while ( i != j ) {
if ( i > j ) i = i - j ;
else j = j - i ;
}
putint ( i ) ;
}
38
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Parsing is recognition of a context-free
language, e.g., via PDA (Pushdown
automaton)
Parsing discovers the "context free"
structure of the program
Informally, it finds the structure you can
describe with syntax diagrams (e.g., the
"circles and arrows" in a language manual)
39
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Context-Free Grammar and Parsing
Grammar Example for while loops in C:
while-iteration-statement → while ( expression ) statement
statement, in turn, is often a list enclosed in braces:
statement → compound-statement
compound-statement → { block-item-list opt }
where
block-item-list opt → block-item-list
or
block-item-list opt → ϵ
and
block-item-list → block-item
block-item-list → block-item-list block-item
block-item → declaration
40 block-item → statement
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Context-Free Grammar and Parsing
GCD Program Parse Tree:
next slide
41
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Context-Free Grammar and Parsing (continued)
42
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Context-Free Grammar and Parsing (continued)
A B
43
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Semantic analysis is the discovery of meaning
in the program
The compiler actually does what is called
STATIC semantic analysis = that's the meaning
that can be figured out at compile time
Some things (e.g., array subscript out of
bounds) can't be figured out until run time.
Things like that are part of the program's
DYNAMIC semantics.
44
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Symbol table: all phases rely on a symbol
table that keeps track of all the identifiers
in the program and what the compiler
knows about them
This symbol table may be retained (in some
form) for use by a debugger, even after
compilation has completed
45
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Semantic Analysis and Intermediate Code Generation
Semantic analysis is the discovery of meaning in a program
tracks the types of both identifiers and expressions
builds and maintains a symbol table data structure that maps each
identifier to the information known about it
context checking
Every identifier is declared before it is used
No identifier is used in an inappropriate context (e.g., adding a
string to an integer)
Subroutine calls provide the correct number and types of
arguments.
Labels on the arms of a switch statement are distinct constants.
Any function with a non-void return type returns a value explicitly
46
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Semantic analysis implementation
semantic action routines are invoked by the parser when it
realizes that it has reached a particular point within a
grammar rule.
Not all semantic rules can be checked at compile
time: only the static semantics of the language
the dynamic semantics of the language must be checked
at run time
Array subscript expressions lie within the bounds of the
array
Arithmetic operations do not overflow
47
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Semantic Analysis and Intermediate Code Generation
The parse tree is very verbose: once we know
that a token sequence is valid, much of the
information in the parse tree is irrelevant to
further phases of compilation
The semantic analyzer typically transforms the parse tree into
an abstract syntax tree (AST or simply a syntax tree) by
removing most of the “artificial” nodes in the tree’s interior
The semantic analyzer also annotates the remaining nodes
with useful information, such as pointers from identifiers to
their symbol table entries
The annotations attached to a particular node are known as its
48
attributes (c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
GCD Syntax Tree (AST)
49
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
In many compilers, the annotated syntax tree
constitutes the intermediate form that is passed
from the front end to the back end.
In other compilers, semantic analysis ends with a
traversal of the tree that generates some other
intermediate form
One common such form consists of a control
flow graph whose nodes resemble fragments of
assembly language for a simple idealized
50 machine (c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Intermediate Form (IF) is done after semantic
analysis (if the program passes all checks)
IFs are often chosen for machine independence, ease
of optimization, or compactness (these are
somewhat contradictory)
They often resemble machine code for some
imaginary idealized machine; e.g. a stack
machine, or a machine with arbitrarily many
registers
Many compilers actually move the code through
51
more than one IF
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Target Code Generation:
The code generation phase of a compiler translates
the intermediate form into the target language
To generate assembly or machine language, the code
generator traverses the symbol table to assign
locations to variables, and then traverses the
intermediate representation of the program,
generating loads and stores for variable references,
interspersed with appropriate arithmetic operations,
tests, and branches
52
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Target Code Generation:
Naive x86 assembly language for the GCD program
53
(c) Paul Fodor (CS Stony Brook) and Elsevier
An Overview of Compilation
Some improvements are machine independent
Other improvements require an understanding of
the target machine
Code improvement often appears as two phases
of compilation, one immediately after semantic
analysis and intermediate code generation, the
other immediately after target code generation
54
(c) Paul Fodor (CS Stony Brook) and Elsevier