0% found this document useful (0 votes)
72 views11 pages

CS 3723 - Programming Language: 1. Introductory Stuff

This document provides an overview of key concepts in programming languages including: - The differences between syntax, semantics, runtime, and compile-time. - What constitutes a formal language and how finite state machines like DFAs and NFAs are used to recognize languages. - Common tokens like identifiers, keywords, constants, and operators. - The components of a context-free grammar including terminal symbols, non-terminal symbols, replacement rules, and a start symbol. - How to derive sentences and construct parse trees from grammars to determine if a grammar is ambiguous. - How reverse Polish notation writes expressions without parentheses by placing operators after their operands.

Uploaded by

Lara Jane Lopega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views11 pages

CS 3723 - Programming Language: 1. Introductory Stuff

This document provides an overview of key concepts in programming languages including: - The differences between syntax, semantics, runtime, and compile-time. - What constitutes a formal language and how finite state machines like DFAs and NFAs are used to recognize languages. - Common tokens like identifiers, keywords, constants, and operators. - The components of a context-free grammar including terminal symbols, non-terminal symbols, replacement rules, and a start symbol. - How to derive sentences and construct parse trees from grammars to determine if a grammar is ambiguous. - How reverse Polish notation writes expressions without parentheses by placing operators after their operands.

Uploaded by

Lara Jane Lopega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CS 3723 – Programming Language

1. Introductory Stuff.

The three contrasts:

● Syntax versus semantics.


● Run-time versus compile-time.
● Translation versus interpretation

What a language is.?

A language is a set of sentences

Question: Given a simple DFA or NFA, describe the language recognized by it.

NFA

NFA for language L


This automaton is called non-deterministic because there is a case with two
possible choices for the arrow to take

1
DFA

https://lh5.googleusercontent.com/0pNYmYbaODQauCOshxpOIBdoFkIIHliIQUigpz
fHhHVOMz4uUFwdRrfoAjIyNovgqrc1rATjJlJE0AifmG-tIYg4YoAlGXpp5jI2fVqEDcY7x
UvUBRS3rOtd3_CrlStQFA

DFA for language L


This automation is called deterministic because in each case there is exactly one
choice for which arrow to follow. The process is uniquely determined. If you end
up in the terminal state 3, and have reached the end of the string, then you
accept the input string as belonging to L and otherwise you reject it.

2. Finite State Machines.

Question: Given a simple DFA, say how to simulate it.

Simulating a DFA: In a language with labels and gotos, this is easy.


Each state becomes a labeled location in a simulation program, and each arrow
becomes a goto between the two labeled locations that correspond to the two
states. You start at the location corresponding to the start state. You accept if you
are in an accepting state (or terminal state) at the end of the string being
processed. All this is illustrated with programs to recognize C-style
comments: ​Comments​. The above link also shows the use of a while-switch in

2
case gotos are not available or not allowed (say, by an instructor in a course).

Question: Given a simple NFA, say how to simulate it.

Simulating an NFA: As you process each input character, your simulating program
should keep track of the set of all possible states that you might be in. You start
out with the singleton set {start state}. At the end of the input string, if your set of
states includes a terminal state, then you accept. Otherwise you reject. This same
process is essentially the "subset algorithm" that lets one take an NFA and
construct a DFA that accepts the same language as the NFA.

3. Lexical Analysis.

What tokens are?

A token is a sequence of basic units.

Question: Give examples of typical tokens.

● Identifier: This is usually defined as an initial letter, followed by any


sequence of letters or digits. The formal definition often includes other
possibilities, such as an underscore character ( _ ) in the same places where
a letter can occur.
● Keyword or reserved word: Certain strings of letters would look just like an
identifier, but are actually used as a keyword in the language and cannot be
used as an identifier.
● Constant: Depending on the language, there are a number of different
types of constants: integer, floating point, boolean, string, character, and
others.
● Operator: Operators are often one or two special characters, though they
can be more complicated. They are usually infix, prefix, or postfix. More
complex operators, such as the infamous ternary operator ‘? ‘: (present in
C, C++, Java, and even Ruby), are treated by the scanner as separate tokens.
● Special character: A number of special characters are used to separate
other tokens from one another, such as( ) [ ] { } : ; ,

4. Context-free Grammars.

What a CF grammar looks like? The names of its parts.?

3
A ​context-free grammar (CFG)​ is a set of ​recursive replacement rules​
(or ​rewriting rules​, or ​productions​, or just ​rules​) that are used to generate
patterns of strings.

More formally, a CFG consists of the following components:


● a set of ​terminal symbols​, which are the characters of the alphabet that appear
in the strings generated by the grammar.
● a set of ​non-terminal symbols​, which are placeholders for patterns of terminal
symbols that can be generated by the non-terminal symbols
● A set of ​replacement rules​, which are rules for replacing (or rewriting)
non-terminal symbols (on the left side of the production) in a string with other
non-terminal or terminal symbols (on the right side of the production).
● A ​start symbol​, which is a special non-terminal symbol that appears in the initial
string generated by the grammar.
Question: ​ Given a CF grammar and a sentence, produce a derivation sequence
deriving the sentence.

Grammar: Arith. Exp.

E ----> E + E
E ----> E * E
E ----> ( E )
E ----> a | b | c | ...

Sentence: (a+b)*c

Leftmost derivation sequence:


E​ ===> ​E​ * E ===> ( ​E​ ) * E ===> ( ​E​ + E) * E ===> ( a + ​E​ ) * E ===> ( a + b )
* ​E​ ===> ( a + b ) * c

Rightmost derivation sequence:

E​ ===> E * ​E​ ===> ​E​ * c ===> ( ​E​ ) * c ===> ( E + ​E​ ) * c ===> ( ​E​ + b ) * c ===> (

4
a+b)*c

5. Ambiguous Grammars.

Recognize ambiguity when you see it.

Question: Given a grammar, show that it is ambiguous. (Show the two distinct
parse trees)

Ambiguity:​ ​There are other sentences derived from E above that have more than
one parse tree, and corresponding left- and rightmost derivations.
For example, the very simple sentence​ ​a + b * c​. The table looks at leftmost
derivations and parse trees:
1st Leftmost Der.
2nd Leftmost Der.

E ===> E + E E ===> E * E
===> a + E ===> E + E * E
===> a + E * E ===> a + E * E
===> a + b * E ===> a + b * E
===> a + b * c ===> a + b * c
1st Parse Tree 2nd Parse Tree
E E
/|\ /|\
/ | \ / | \
E + E E * E
| /|\ /|\ |
| / | \ / | \ |
a E * E E + E c
| | | |
b c a b
Grammar: Arith. Exp.
E ----​>​ E + E
E ----​>​ E * E
E ----​>​ ( E )
E ----​>​ a | b | c
Even if some parse trees are unique, if there are multiple parse trees for any
sentence, then the grammar is called ​ambiguous​. In a programming language it is
not acceptable to have more than one possible reading of a construct. We can't
flip a coin to decide which parse tree to use. There are several ways around this
problem:
1. Rewrite the grammar so that it is no longer ambiguous yet still accepts
exactly the same language. This is not always possible.
5
2. Introduce extra rules that allow the program to decide which of multiple
parse trees to use. These are called ​disambiguating rules​. (Ah, yes,
"disambiguating", one of my favorite words.)
3. An ambiguous grammar may signal problems with language design, and the
programming language itself might be changed.
6. Unambiguous CF Grammars.**

Question: Given a sentence, construct the leftmost derivation for it and the parse
tree (both unique).

Grammar: Arith. Exp.

E ----> E + E
E ----> E * E
E ----> ( E )
E ----> a | b | c | ...

Leftmost derivation sequence:


E​ ===> ​E​ * E ===> ( ​E​ ) * E ===> ( ​E​ + E) * E ===> ( a + ​E​ ) * E ===> ( a + b )
* ​E​ ===> ( a + b ) * c

Parse Tree:

The sentence ​( a + b ) * c​ has a unique leftmost derivation, a unique (different) rightmost


derivation and the unique parse tree shown below:
Parse Tree: ( a + b ) * c
E
/|\
/ | \
E / | \
/|\ / | \
E * E / | \
/|\ \ E | E
( E ) c /|\ | |
/|\ / | \ | |
E + E / E \ | |
| | / /|\ \ | |
a b | / | \ || |
| E | E || |
| | | | || |
( a + b )* c

6
7. Reverse Polish Notation.

What it is.

RPN is a parenthesis-free notation for expressions, in which each operator comes


after (to the right of) its operands.

Question: Given the value of a RPN expression involving integer constants.

We might use arithmetic expressions with operators: ​+ - * / ^​, with parentheses ​(


)​, and with and integers, floats (or even identifiers) as operands. Suppose one
wants to evaluate (find the value of) such an expression, say ​3+4*5​, or 3
​ *4+5​. This
topic is often covered in beginning programming or in data structures. The
method used is a combination of two algorithms:
1. Translate an arithmetic expression to RPN. In the examples above to ​345*+​,
or to ​34*5+​, and
2. Evaluate the RPN. In the first example, push the first three operands on a
stack, use ​*​ on the the top two operands to get​4*5=20​, and push this. Then
use ​+​ on the remaining top two operands (there are only two) to
get ​3+20=23​, getting the final result. With the second example, push the
first two operands, apply ​*​ to the top two operands (there are only two),
getting ​12​ on the stack. then push the remaining operand ​5​ on the stack,
and apply ​+​ to both stack elements to get ​12+5=17​.

8. Shift-Reduce Parsers

. * Question: Given a grammar, a S-R table, and a sentence in the language


generated by the grammar, use the grammar and the table to construct a
rightmost derivation backwards.

Consider the following grammar:

7
Grammar: Arithmetic Expressions

P ---> E (P start symbol)


E ---> E+T | T
T ---> T*F | F
F ---> ( E ) | id

Use the following table for this grammar:


Parser: Shift-Reduce Table

| id | * | + | ( | ) | $ |
-----+-----+-----+-----+-----+-----+-----+
P | | | | | | acc | ​(s = "shift")
E | | | s | | s | r |
T | | s | r | | r | r | ​(r = "reduce")
F | | r | r | | r | r |
id | | r | r | | r | r | ​(acc = "accept")
* | s | | | s | | |
+ | s | | | s | | |
( | s | | | s | | |
) | | r | r | | r | r |
$ | s | | | s | | |
-----+-----+-----+-----+-----+-----+-----+

The table below shows the shift-reduce parse of the following sentence, showing the
stack, current symbol, remaining symbols, and next action to take at each stage. (This
sentence has the extra artifical symbol $​ ​ stuck in at the beginning and the end.)
Input Sentence

$ ( id + id ) * id $

(You should initially shift the starting ​$​.)

Shift-Reduce Actions

Stack Curr Rest of Input Action


(top at right) Sym
--------------------------------------------------------------------------
$ ( id + id ) * id $ shift
$ ( id + id ) * id $ shift
$ ( id + id ) * id $ reduce: F ---> id
$ (F + id ) * id $ reduce: T ---> F

8
$ (T + id ) * id $ reduce: E ---> T
$ (E + id ) * id $ shift
$ (E + id ) * id $ shift
$ ( E + id ) * id $ reduce: F ---> id
$ (E + F ) * id $ reduce: T ---> F
$ (E + T ) * id $ reduce: E ---> E + T
$ (E ) * id $ shift
$ (E) * id $ reduce: F ---> ( E )
$ F * id $ reduce: T ---> F
$ T * id $ shift
$ T * id $ shift
$ T * id $ reduce: F ---> id
$ T * F $ reduce: T ---> T * F
$ T $ reduce: E ---> T
$ E $ reduce: S ---> E
$ P $ accept

Notice that the sequence of reductions give the following rightmost derivation in
reverse:
Rightmost Derivations
( id + id ) * id
S ===> E
===> T
===> T * F
===> T * id
===> F * id
===> ( E ) * id
===> ( E + T ) * id
===> ( E + F ) * id
===> ( E + id ) * id
===> ( T + id ) * id
===> ( F + id ) * id
===> ( id + id ) * id

9. Semantic Actions

The table below shows the shift-reduce parse of the same sentence, showing the
stack, current symbol, remaining symbols, and next action to take at each stage.
9
The semantic tags are shown in red below the id items and the stack items.
Shift-Reduce Actions​ ​Tag field below stack in red

Stack Curr Rest of Input Action


(top at right) Sym
--------------------------------------------------------------------------
$ ( id + id ) * id $ shift
$ ( id + id ) * id $ shift
$ ( id + id ) * id $ reduce: F ---> id
a
$ ( F + id ) * id $ reduce: T ---> F
a
$ ( T + id ) * id $ reduce: E ---> T
a
$ ( E + id ) * id $ shift
a
$ ( E + id ) * id $ shift
a
$ ( E + id ) * id $ reduce: F ---> id
a b
$ ( E + F ) * id $ reduce: T ---> F
a b
$ ( E + T ) * id $ reduce: E ---> E + T
a b
[output("t1 = a + b;");
$ ( E ) * id $ shift
t1
$ ( E ) * id $ reduce: F ---> ( E )
t1
$ F * id $ reduce: T ---> F
t1
$ T * id $ shift
t1
$ T * id $ shift
t1
$ T * id $ reduce: F ---> id
t1 c
$ T * F $ reduce: T ---> T * F
t1 c
[output("t2 = t1 * c;");
$ T $ reduce: E ---> T
t2
$ E $ reduce: P ---> E
t2
$ P $ accept
t2
[output("print(t2);");
Arithmetic expression: $ ( a + b ) * c $
Tranlation to Intermediate code:
t1 = a + b;
t2 = t1 * c;
print(t2);

10
10. Recursive-Descent Parsers

Recursive descent parser​ is a top-down parser, so called because it builds a parse tree
from the top (the start symbol) down, and from left to right, using an input sentence as
a target as it is scanned from left to right. The actual tree is not constructed but is
implicit in a sequence of function calls.

11

You might also like