Language Translation Principles PT 1

The document discusses principles of language translation including: 1) The key attributes of a language are its syntax and semantics. Syntax defines rules for tokens while semantics defines logical meaning. 2) Tools like parsers and code generators are used to translate a program from a high-level to low-level language by checking syntax and replacing code. 3) Grammars and regular expressions can be used to formally specify the syntax of a language. A grammar defines a language through a set of production rules.

Uploaded by

Adella Rosanauli Aritonang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views40 pages

Language Translation Principles PT 1

Uploaded by

Adella Rosanauli Aritonang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 40

Language Translation

Principles
Part 1: Language Specification
Attributes of a language
• Syntax: rules describing use of language
tokens
• Semantics: logical meaning of
combinations of tokens
• In a programming language, “tokens”
include identifiers, keywords, and
punctuation
Linguistic correctness
• A syntactically correct program is one in
which the tokens are arranged so that the
code can be successfully translated into a
lower-level language
• A semantically correct program is one
that produces correct results
Language translation tools
• Parser: scans source code, compares with
established syntax rules
• Code generator: replaces high level
source code with semantically equivalent
low level code
Techniques to describe syntax of a
language
• Grammars: specify how you combine atomic
elements of language (characters) to form legal
strings (words, sentences)
• Finite State Machines: specify syntax of a
language through a series of interconnected
diagrams
• Regular Expressions: symbolic representation of
patterns describing strings; applications include
forming search queries as well as language
specification
Elements of a Language
• Alphabet: finite, non-empty set of characters
– not precisely the same thing we mean when we
speak of natural language alphabet
– for example, the alphabet of C++ includes the upper-
and lowercase letters of the English alphabet, the
digits 0-9, and the following punctuation symbols:
{,},[,],(,),+,-,*,/,%,=,>,<,!,&,|,’,”,,,.,:,;,_,\
– Pep/8 alphabet is similar, but uses less punctuation
– Language of real numbers has its own alphabet; the
set of characters {0,1,2,3,4,5,6,7,8,9,+,-,.}
Language as ADT
• A language is an example of an Abstract
Data Type (ADT)
• An ADT has these characteristics:
– Set of possible values (an alphabet)
– Set of operations on those values
• One of the operations on the set of values
in a language is concatenation
Concatenation
• Concatenation is the joining of two or more
characters to form a string
• Many programming language tokens are formed
this way; for example:
– > and = form >=
– & and & form &&
– 1, 2, 3 and 4 form 1234
• Concatenation always involves two operands –
either one can be a string or a single character
String characteristics
• The number of characters in a string is the
string’s length
• An empty string is a string with length 0;
we denote the empty string with the
symbol є
• The є is the identity element for
concatenation; if x is string, then:
єx = xє = x
Closure of an alphabet
• The set of all possible strings that can formed by
concatenating elements from an alphabet is the
alphabet’s closure, denoted T* for some
alphabet T
• The closure of an alphabet includes strings that
are not valid tokens in the language; it is not a
finite set
• For example, if R is the real number alphabet,
then R* includes:
-0.092 and 563.18 but also
.0.0.- and 2-4-2.9..-5.
Languages & Grammars
• A language is a subset of the closure of an
alphabet
• A grammar specifies how to concatenate
symbols from an alphabet to form legal
strings in a language
Parts of a grammar
• N: a nonterminal alphabet; each element
of N represents a group of characters
from:
– T: a terminal alphabet
– P: a set of rules for string production; uses
nonterminals to describe language structure
– S: the start symbol, an element of N
Terminal vs. non-terminal symbols
• A non-terminal symbol is used to describe
or represent a set of terminal symbols
• For example, the following standard data
types are terminal symols in C++ and
Java: int, double, float, char
• The non-terminal symbol <type-specifier>
could be used to represent any or all of
these
Valid strings
• S (the start symbol) is a single symbol, not
a set
• Given S and P (rules for production), you
can decide whether a set of symbols is a
valid string in the language
• Conversely, starting from S, if you can
generate a string of terminal symbols
using P, you can create a valid string
Productions

Aw
a non-terminal a string of terminals &
non-terminals
produces
Derivations
• A grammar specifies a language through
the derivation process:
– begin with the start symbol
– substitute for non-terminals using rules of
production until you get a string of terminals
Example: a grammar for identifiers
(a toy example)
• N = {<identifier>, <letter>, <digit>}
• T = {a, b, c, 1, 2, 3}
• P = the productions: ( means “produces”)
1. <identifier><letter>
2. <identifier><identifier><letter>
3. <identifier><identifier><digit>
4. <letter>  a
5. <letter>  b
6. <letter>  c
7. <digit>  1
8. <digit>  2
9. <digit>  3
• S = <identifier>
Example: deriving a12bc:
<identifier>  <identifier><letter> (rule 2)
 <identifier>c (rule 6)
 means  <identifier><letter>c (rule 2)
derives in one  <identifier>bc (rule 5)
step  <identifier><digit>bc (rule 3)
 <identifier>2bc (rule 8)
 <identifier><digit>2bc (rule 3)
 <identifier>12bc (rule 7)
 <letter>12bc
 a12bc
Closure of derivation
• The symbol * means “derives in 0 or more
steps”
• A language specified by a grammar consists of
all strings derivable from the start symbol using
the rules of production
– provides operational test for membership in the
language
– if a string can’t be derived using production rules, it
isn’t in the language
Example: attempting to derive 2a
<identifier>  <identifier><letter>
 <identifier>a
• Since there is no <identifier><digit>
combination in the production rules, we can’t
proceed any further
• This means that 2a isn’t a valid string in our
language
A grammar for signed integers
• N = {I, F, M}
– I means integer
– F means first symbol; optional sign
– M means magnitude
• T = {+,-,d} (d means digit 0-9)
• P = the productions:
1. I  FM
2. F +
3. F -
4. F є (means +/- is optional)
5. M  dM
6. Md
• S=I
Examples
• Deriving 14: Deriving -7:
I  FM I  FM
 єM  -M
 dM  -d
 dd  -7
 14
Recursive rules
• Both of the previous examples (identifiers,
integers) have rules in which a
nonterminal is defined in terms of itself:
– <identifier>  <identifier><letter> and
– M  dM
• Such rules produce languages with infinite
sets of legal sentences
Context-sensitive grammar
• A grammar in which the production rules
may contain more than one non-terminal
on the left side
• The opposite (all of the examples we have
seen thus far), have production rules
restricted to a single non-terminal on the
left: these are known as context-free
grammars
Example
• N = {A,B,C}
• T = {a,b,c}
• P = the productions:
1. A  aABC
2. A  abC
3. CB  BC
4. bB  bb
5. bC  bc This rule is context-sensitive: C can be
substituted with c only if C is immediately
preceded by b
6. cC  cc
• S=A
Context-sensitive grammar
• N = {A, B, C} Example:
aaabbbcc is a valid string by:
• T = {a, b, c} A => aABC (1)
=> aaABCBC (1)
• P = the productions => aaabCBCBC (2)
1. A --> aABC => aaabBCCBC (3)
=> aaabBCBCC (3)
2. A --> abC => aaabBBCCC (3)
3. CB --> BC => aaabbBCCC (4)
4. bB --> bb => aaabbbCCC (4)
=> aaabbbcCC (5)
5. bC --> bc
Here, we substituted c for C;
6. cC --> cc this is allowable only if C has
• S=A b in front of it
=> aaabbbccC (6)
=> aaabbbccc (6)
Valid & invalid strings from
previous example:
• Valid: • Invalid:
– abc – aabc
– aabbcc – cba
– aaabbbccc – bbbccc
– aaaabbbbcccc

The grammar describes a language consisting of strings that start

with a number of a’s, followed by an equal number of b’s and c’s;
this language can be defined mathematically as:

L = {anbncn | n > 0}

Note: an means the concatenation of n a’s

A grammar for expressions
N = {E, T, F} where:
E: expression
T: term – T = {+, *, (, ), a}
F: factor
P: the productions:
1. E -> E + T
2. E -> T
3. T -> T * F
4. T -> F
5. F -> (E)
6. F -> a
S=E
Applying the grammar
• You can’t reach a valid conclusion if you don’t
have a valid string, but the opposite is not true
• For example, suppose we want to parse the string
(a * a) + a using the grammar we just saw
• First attempt:
E => T (by rule 2)
=> F (by rule 4)
… and, we’re stuck, because F can only produce (E) or
a; so we reach a dead end, even though the string is
valid
Applying the grammar
• Here’s a parse that works for (a*a)+a:
E => E+T (rule 1)
=> T+T (rule 2)
=> F+T (rule 4)
=> (E)+T (rule 5)
=> (T)+T (rule 2)
=> (T*F)+T (rule 3)
=> (T*a)+T (rule 6)
=> (F*a)+F (rule 4 applied twice)
=> (a*a) + a (rule 6 applied twice)
Deriving a valid string from a
grammar
• Arbitrarily pick a nonterminal on right side of
current intermediate string & select rules for
substitution until you get a string of terminals
• Automatic translators have more difficult
problem:
– given string of terminals, determine if string is valid,
then produce matching object code
– only way to determine string validity is to derive it
from the start string of the grammar – this is called
parsing
The parsing problem
• Automatic translators aren’t at liberty to
pick rules randomly (as illustrated by the
first attempt to translate the preceding
expression)
• Parsing algorithm must search for the right
sequence of substitutions to derive a
proposed string
• Translator must also be able to prove that
no derivation exists if proposed string is
not valid
Syntax tree
• A parse routine can be represented as a
tree
– start symbol is the root
– interior nodes are nonterminal symbols
– leaf nodes are terminal symbols
– children of an interior node are symbols from
right side of production rule substituted for
parent node in derivation
Syntax tree for (a*a)+a
Grammar for a programming
language
• A grammar for a subset of the C++
language is laid out on pages 340-341 of
the textbook
• A sampling (suitable for either C++ or
Java) is given on the next couple of slides
Rules for declarations
<declaration> -> <type-specifier><declarator-list>;
<type-specifier> -> char | int | double
(remember, this is subset of actual language)
<declarator-list> -> <identifier> |
<declarator-list> , <identifier>
<identifier> -> <letter> |
<identifier><letter> |
<identifier><digit>
<letter> -> a|b|c| … |z|A|B|…|Z
<digit> -> 0|1|2|3|4|5|6|7|8|9
Rules for control structures
<selection-statement> ->
if (<expression>) <statement> |
if (<expression>) <statement>
else <statement>
<iteration-statement> ->
while (<expression>) <statement> |
do <statement> while (<expression>) ;
Rules for expressions
<expression-statement> -> <expression> ;
<expression> -> <relational-expression>
| <identifier> = <expression>
<relational-expression> ->
<additive-expression> |
<relational expression> < <additive-expression> |
<relational expression> > <additive-expression> |
<relational expression> <= <additive-expression> |
<relational expression> >= <additive-expression>
etc.
Backus-Naur Form (BNF)
• BNF is the standardized form for
specification of a programming language
by its rules of production
• In BNF, the -> operator is written ::=
• ALGOL-60 first popularized the form
BNF described in terms of itself
(from Wikipedia)
<syntax>::= <rule> | <rule> <syntax>

<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::="

<opt-whitespace> <expression> <line-end>

<opt-whitespace> ::= " " <opt-whitespace> | ""

<expression> ::= <list> | <list> "|" <expression>

<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>

<list> ::= <term> | <term> <opt-whitespace> <list>

<term> ::= <literal> | "<" <rule-name> ">“

<literal> ::= '"' <text> '"' | "'" <text> "'"

Context Free Grammar
No ratings yet
Context Free Grammar
23 pages
Simple Syntax Directed Translation
No ratings yet
Simple Syntax Directed Translation
51 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
9 pages
Budget of Work Math 5
No ratings yet
Budget of Work Math 5
2 pages
Structure Ofa Compiler: Front End
No ratings yet
Structure Ofa Compiler: Front End
95 pages
Unit 2 - CFG
100% (1)
Unit 2 - CFG
65 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
160 pages
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
100% (2)
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
50 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
133 pages
2nd Phase Syntax Analyzer - 1
No ratings yet
2nd Phase Syntax Analyzer - 1
136 pages
03 Compiler Design Lecture - Syntax Analysis
No ratings yet
03 Compiler Design Lecture - Syntax Analysis
39 pages
Unit 3
No ratings yet
Unit 3
25 pages
CSC441-Lesson 04
No ratings yet
CSC441-Lesson 04
40 pages
Unit-1 F&CD
No ratings yet
Unit-1 F&CD
31 pages
Lesson 3: Syntax Analysis: Risul Islam Rasel
No ratings yet
Lesson 3: Syntax Analysis: Risul Islam Rasel
106 pages
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Instructor Name: Atif Ishaq
19 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
Lecture 1 - Chapter 1-Introduction
No ratings yet
Lecture 1 - Chapter 1-Introduction
152 pages
Lecture 1 - Chapter 1-Introduction
No ratings yet
Lecture 1 - Chapter 1-Introduction
152 pages
Compiler Design 3
No ratings yet
Compiler Design 3
140 pages
Compiler Construction Week 04 Syntax Analysis I)
No ratings yet
Compiler Construction Week 04 Syntax Analysis I)
41 pages
Lecture Four Language Grammar
No ratings yet
Lecture Four Language Grammar
12 pages
Compiler Lecture 4
No ratings yet
Compiler Lecture 4
17 pages
Unit Iii
No ratings yet
Unit Iii
95 pages
Unit-3 Syntax Analysis
No ratings yet
Unit-3 Syntax Analysis
319 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
(Week 3) Syntax Analysis (Derivation)
No ratings yet
(Week 3) Syntax Analysis (Derivation)
46 pages
Van Der Post H. The Python For Finance Handbook... Financial Insights... 2024
No ratings yet
Van Der Post H. The Python For Finance Handbook... Financial Insights... 2024
454 pages
Principles of Programming Language
No ratings yet
Principles of Programming Language
44 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Chapter 3 Syntax Analysis I
No ratings yet
Chapter 3 Syntax Analysis I
27 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
36 pages
Compiler Design CS - 4
No ratings yet
Compiler Design CS - 4
70 pages
Chapter 4 - Context-Free Grammars and Languages
No ratings yet
Chapter 4 - Context-Free Grammars and Languages
60 pages
Unit 2
No ratings yet
Unit 2
86 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Grammar
No ratings yet
Grammar
57 pages
Compiler Design SUBJECT CODE: 203105351: Prof. Kapil Raghuwanshi
No ratings yet
Compiler Design SUBJECT CODE: 203105351: Prof. Kapil Raghuwanshi
66 pages
CP 324 Grammars l4
No ratings yet
CP 324 Grammars l4
19 pages
Class Three
No ratings yet
Class Three
74 pages
Theory of Computation: Madhav Institute of Technology and Science
No ratings yet
Theory of Computation: Madhav Institute of Technology and Science
38 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Lecture 03
No ratings yet
Lecture 03
7 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
30 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
003chapter 3 - Syntax Analysis
No ratings yet
003chapter 3 - Syntax Analysis
171 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Chapter Three Context Free Grammar
No ratings yet
Chapter Three Context Free Grammar
55 pages
FLAT Unitt-1
No ratings yet
FLAT Unitt-1
9 pages
Homework and Exams
No ratings yet
Homework and Exams
8 pages
The Economic Definition of Ore
100% (1)
The Economic Definition of Ore
161 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Compiler 8
No ratings yet
Compiler 8
28 pages
ICT Final Exam For Grade 9
100% (1)
ICT Final Exam For Grade 9
4 pages
Exercise Quadratic Equations
100% (1)
Exercise Quadratic Equations
7 pages
Syntax Analysis: Chapter - 4
No ratings yet
Syntax Analysis: Chapter - 4
41 pages
Role of Parse1
No ratings yet
Role of Parse1
20 pages
Ui
100% (1)
Ui
5 pages
SIM767XX Series - CMUX - USER - GUIDE - V1.00
No ratings yet
SIM767XX Series - CMUX - USER - GUIDE - V1.00
22 pages
Openview Operations Error Messages
No ratings yet
Openview Operations Error Messages
267 pages
Software Dev. Interview Q&A
No ratings yet
Software Dev. Interview Q&A
19 pages
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
No ratings yet
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
7 pages
Module 10 For Grade 10 PDF
No ratings yet
Module 10 For Grade 10 PDF
13 pages
Digital Photography
No ratings yet
Digital Photography
2 pages
Bridge Course
No ratings yet
Bridge Course
49 pages
Northern Railway - Tender Document
No ratings yet
Northern Railway - Tender Document
52 pages
Samsung Galaxy s20 Fe 5g Manual Optimized
No ratings yet
Samsung Galaxy s20 Fe 5g Manual Optimized
174 pages
Lesson 2
No ratings yet
Lesson 2
18 pages
Vianney
No ratings yet
Vianney
41 pages
Smart Parking System
No ratings yet
Smart Parking System
15 pages
E3220 p5k3 Deluxe
No ratings yet
E3220 p5k3 Deluxe
172 pages
Holidays Homework - Summer Vacation 2024-2025 Computer Science
No ratings yet
Holidays Homework - Summer Vacation 2024-2025 Computer Science
2 pages
EC - A1P - Language Test 3B
No ratings yet
EC - A1P - Language Test 3B
4 pages
29.11.2024 FN Seating
No ratings yet
29.11.2024 FN Seating
4 pages
The Impact of Blockchain On Supply Chains A Systematic
No ratings yet
The Impact of Blockchain On Supply Chains A Systematic
38 pages
(ESP32 At) (v2.2.0.0) Release Note
No ratings yet
(ESP32 At) (v2.2.0.0) Release Note
5 pages
Model-Based Development and Simulation For Robotic Systems With Sysml, Simulink and Simscape Profiles
No ratings yet
Model-Based Development and Simulation For Robotic Systems With Sysml, Simulink and Simscape Profiles
12 pages
IT English Test Unit 5
No ratings yet
IT English Test Unit 5
6 pages
HL Paper1
No ratings yet
HL Paper1
15 pages
SQL Lab 3
No ratings yet
SQL Lab 3
8 pages
Data Change Manual
No ratings yet
Data Change Manual
3 pages
Data Quality and Data Preproccessing
No ratings yet
Data Quality and Data Preproccessing
4 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet

Language Translation Principles PT 1

Uploaded by

Language Translation Principles PT 1

Uploaded by

Language Translation

The grammar describes a language consisting of strings that start

Note: an means the concatenation of n a’s

<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::="

<opt-whitespace> ::= " " <opt-whitespace> | ""

<expression> ::= <list> | <list> "|" <expression>

<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>

<list> ::= <term> | <term> <opt-whitespace> <list>

<term> ::= <literal> | "<" <rule-name> ">“

<literal> ::= '"' <text> '"' | "'" <text> "'"

You might also like