0% found this document useful (0 votes)

5 views

PL Lec 2 Syntax and Semantics

This document covers the concepts of syntax and semantics in programming, explaining their differences and the compilation process from source code to machine language. It discusses lexical analysis, tokenization, parsing, and error detection in compilers, as well as the role of regular expressions in pattern matching. Additionally, it addresses semantic analysis and common semantic errors encountered during program execution.

Uploaded by

Maxine Botos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

PL Lec 2 Syntax and Semantics

Uploaded by

Maxine Botos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Lesson 2 – Describing

Syntax and Semantics

Differentiate syntax and semantics
Explain the process/phases of compiling a
source code
Perform a tokenization in java.
Syntax – the form of the expressions, statements, and
program units
Semantics - the meaning of the expressions, statements,
and program units.
 Ex: the syntax of a Java while statement is

while (boolean_expr) statement

The semantics of this statement form is that when the

current value of the Boolean expression is true, the
embedded statement is executed. – The form of a
statement should strongly suggest what the statement is
meant to accomplish
 A lexeme is the lowest level syntactic unit of a language. It includes
identifiers, literals, operators, and special word (e.g. *, sum, begin).
 A program is strings of lexemes.
 A token is a category of lexemes (e.g., identifier).
 An identifier is a token that have lexemes, or instances, such as sum and
total.
 Ex: index = 2 * count + 17;
 Lexemes Tokens
 index identifier
 = equal_sign
 2 int_literal
 * mult_op
 count identifier
 + plus_op
 17 int_literal
 ; semicolon
What Happens When You Run a Program

 A computer doesn’t actually understand the phrase ‘Hello, world!’, and it

doesn’t know how to display it on screen. It only understands on and off. So to
actually run a command like print 'Hello, world!', it has to translate all the code
in a program into a series of ons and offs that it can understand.

 To do that, a number of things happen:

 The source code is translated into assembly language.
 The assembly code is translated into machine language.
 The machine language is directly executed as binary code.

 The coding language first has to translate its source code into assembly
language, a super low-level language that uses words and numbers to
represent binary patterns. Depending on the language, this may be done with
an interpreter (where the program is translated line-by-line), or with a compiler
(where the program is translated as a whole).
 The coding language then sends off the assembly code to the computer’s
assembler, which converts it into the machine language that the computer can
understand and execute directly as binary code.
Lexical Analysis

Lexical analysis is the first phase of a

compiler. It takes the modified source code
from language preprocessors that are written
in the form of sentences. The lexical analyzer
breaks these syntaxes into a series of tokens,
by removing any whitespace or comments in
the source code.
If the lexical analyzer finds a token invalid, it
generates an error. The lexical analyzer
works closely with the syntax analyzer. It
reads character streams from the source
code, checks for legal tokens, and passes the
data to the syntax analyzer when it demands.
Tokens

Lexemes are said to be a sequence of

characters (alphanumeric) in a token. There
are some predefined rules for every lexeme to
be identified as a valid token. These rules are
defined by grammar rules, by means of a
pattern. A pattern explains what can be a
token, and these patterns are defined by
means of regular expressions.
What is a regular expression?

A regular expression is a sequence of characters

that forms a search pattern. When you search for
data in a text, you can use this search pattern to
describe what you are searching for.
A regular expression can be a single character, or a
more complicated pattern.
Regular expressions can be used to perform all
types of text search and text replace operations.
Java does not have a built-in Regular Expression
class, but we can import the java.util.regex package
to work with regular expressions. The package
includes the following classes:

•Pattern Class - Defines a pattern (to be used in a search)

•Matcher Class - Used to search for the pattern
•PatternSyntaxException Class - Indicates syntax error in a
regular expression pattern
Find out if there are any occurrences of the word
"w3schools" in a sentence:
Flags

Flags in the compile() method change how the search is performed.

Here are a few of them:

•Pattern.CASE_INSENSITIVE - The case of letters will be ignored

when performing a search.
•Pattern.LITERAL - Special characters in the pattern will not
have any special meaning and will be treated as ordinary
characters when performing a search.
•Pattern.UNICODE_CASE - Use it together with
the CASE_INSENSITIVE flag to also ignore the case of letters
outside of the English alphabet
Example Explained
In this example, The word "w3schools" is being searched for in a
sentence.
First, the pattern is created using
the Pattern.compile() method. The first parameter indicates
which pattern is being searched for and the second parameter
has a flag to indicates that the search should be case-insensitive.
The second parameter is optional.
The matcher() method is used to search for the pattern in a
string. It returns a Matcher object which contains information
about the search that was performed.
The find() method returns true if the pattern was found in the
string and false if it was not found.
Regular Expression Patterns

The first parameter of the Pattern.compile() method is the pattern. It describes

what is being searched for.

Brackets are used to find a range of characters:

Metacharacters

Metacharacters are characters with a special

meaning:
In programming language, keywords,
constants, identifiers, strings, numbers,
operators and punctuations symbols can be
considered as tokens.
For example, in C language, the variable
declaration line
Specifications of Tokens

Let us understand how the language theory

undertakes the following terms:

Alphabets
Any finite set of symbols {0,1} is a set of
binary alphabets,
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of
Hexadecimal alphabets, {a-z, A-Z} is a set of
English language alphabets.
Strings
Any finite sequence of alphabets is called a
string. Length of the string is the total
number of occurrence of alphabets, e.g., the
length of the string tutorialspoint is 14 and is
denoted by |tutorialspoint| = 14. A string
having no alphabets, i.e. a string of zero
length is known as an empty string and is
denoted by ε (epsilon).
Special Symbol
A typical high-level language contains the following
symbols:-
Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%),
Multiplication(*), Division(/)
Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(-
>)
Assignment =
Special Assignment +=, /=, *=, -=

Comparison ==, !=, <, <=, >, >=

Preprocessor #
Location Specifier &
Logical &, &&, |, ||, !
Shift Operator >>, >>>, <<, <<<
Longest Match Rule

 When the lexical analyzer read the source-code, it scans the

code letter by letter; and when it encounters a whitespace,
operator symbol, or special symbols, it decides that a word is
completed.
 For example:
 While scanning both lexemes till ‘int’, the lexical analyzer
cannot determine whether it is a keyword int or the initials
of identifier int value.
 The Longest Match Rule states that the lexeme scanned
should be determined based on the longest match among all
the tokens available.
 The lexical analyzer also follows rule priority where a
reserved word, e.g., a keyword, of a language is given
priority over user input. That is, if the lexical analyzer finds a
lexeme that matches with any existing reserved word, it
should generate an error.
Syntax Analysis / Parsing

Syntax analysis or parsing is the second phase of a

compiler. In this chapter, we shall learn the basic
concepts used in the construction of a parser.
We have seen that a lexical analyzer can identify
tokens with the help of regular expressions and
pattern rules. But a lexical analyzer cannot check
the syntax of a given sentence due to the limitations
of the regular expressions. Regular expressions
cannot check balancing tokens, such as parenthesis.
Therefore, this phase uses context-free grammar
(CFG), which is recognized by push-down automata.
CFG, on the other hand, is a superset of Regular
Grammar, as depicted below:

It implies that every Regular Grammar is also

context-free, but there exists some problems,
which are beyond the scope of Regular Grammar.
CFG is a helpful tool in describing the syntax of
programming languages.
Context-Free Grammar

We will first see the definition of context-free grammar

and introduce terminologies used in parsing technology.
A context-free grammar has four components:
 A set of non-terminals (V). Non-terminals are syntactic variables
that denote sets of strings. The non-terminals define sets of strings
that help define the language generated by the grammar.
 A set of tokens, known as terminal symbols (Σ). Terminals are the
basic symbols from which strings are formed.
 A set of productions (P). The productions of a grammar specify the
manner in which the terminals and non-terminals can be combined
to form strings. Each production consists of a non-terminal called
the left side of the production, an arrow, and a sequence of tokens
and/or on- terminals, called the right side of the production.
 One of the non-terminals is designated as the start symbol (S); from
where the production begins.
The strings are derived from the start symbol
by repeatedly replacing a non-terminal
(initially the start symbol) by the right side of
a production, for that non-terminal
Syntax Analyzers
A syntax analyzer or parser takes the input
from a lexical analyzer in the form of token
streams. The parser analyzes the source code
(token stream) against the production rules
to detect any errors in the code. The output
of this phase is a parse tree.
This way, the parser accomplishes two tasks,
i.e., parsing the code, looking for errors and
generating a parse tree as the output of the
phase.
Derivation

A derivation is basically a sequence of

production rules, in order to get the input
string. During parsing, we take two decisions
for some sentential form of input:
 Deciding the non-terminal which is to be replaced.
 Deciding the production rule, by which, the non-
terminal will be replaced.
To decide which non-terminal to be replaced
with production rule, we can have two
options.
Left-most Derivation
 If the sentential form of an input is scanned and

replaced from left to right, it is called left-most

derivation. The sentential form derived by the left-
most derivation is called the left-sentential form.
Right-most Derivation
 If we scan and replace the input with production
rules, from right to left, it is known as right-most
derivation. The sentential form derived from the right-
most derivation is called the right-sentential form.
Input string: id + id * id

Note: Follow MDAS

 Input = id + id * id
 Left Most Derivation
 E -> E + E
 E -> E * E
 E -> id

 N , T , P ,S

 E->E+E
 -> id + E
 -> id + E * E
 -> id + id * E
 -> id + id * id
 Input = id + id * id
 Right Most Derivation
 E -> E + E
 E -> E * E
 E -> id

E->E+E
-> E + E * E
-> E + E * id
-> E + id * id
-> id + id * id

Input 1: id + id * id / id
Left Most Derivation:

Input 2 : id * (id + id) - id

Right Most Derivation
Parse Tree

A parse tree is a graphical depiction of a

derivation. It is convenient to see how strings
are derived from the start symbol. The start
symbol of the derivation becomes the root of
the parse tree. Let us see this by an example
from the last topic.
We take the left-most derivation of a + b * c
Error Detection

 A parser should be able to detect and report any error

in the program. It is expected that when an error is
encountered, the parser should be able to handle it and
carry on parsing the rest of the input. Mostly it is
expected from the parser to check for errors but errors
may be encountered at various stages of the
compilation process. A program may have the following
kinds of errors at various stages:
 Lexical : name of some identifier typed incorrectly
 Syntactical : missing semicolon or unbalanced parenthesis
 Semantical : incompatible value assignment
 Logical : code not reachable, infinite loop
 There are four common error-recovery strategies that
can be implemented in the parser to deal with errors in
the code.
Panic mode
When a parser encounters an error anywhere
in the statement, it ignores the rest of the
statement by not processing input from
erroneous input to delimiter, such as semi-
colon. This is the easiest way of error-
recovery and also, it prevents the parser from
developing infinite loops.
Statement mode
When a parser encounters an error, it tries to
take corrective measures so that the rest of
inputs of statement allow the parser to parse
ahead. For example, inserting a missing
semicolon, replacing comma with a semicolon
etc. Parser designers have to be careful here
because one wrong correction may lead to an
infinite loop.
Error productions
Some common errors are known to the
compiler designers that may occur in the
code. In addition, the designers can create
augmented grammar to be used, as
productions that generate erroneous
constructs when these errors are
encountered.
Global correction
The parser considers the program in hand as
a whole and tries to figure out what the
program is intended to do and tries to find
out a closest match for it, which is error-free.
When an erroneous input (statement) X is
fed, it creates a parse tree for some closest
error-free statement Y. This may allow the
parser to make minimal changes in the
source code, but due to the complexity (time
and space) of this strategy, it has not been
implemented in practice yet.
Semantics Analysis

Semantics of a language provide meaning to

its constructs, like tokens and syntax
structure. Semantics help interpret symbols,
their types, and their relations with each
other. Semantic analysis judges whether the
syntax structure constructed in the source
program derives any meaning or not.
Semantic Errors
We have mentioned some of the semantics
errors that the semantic analyzer is expected
to recognize:
 Type mismatch
 Undeclared variable
 Reserved identifier misuse.
 Multiple declaration of variable in a scope.
 Accessing an out of scope variable.
 Actual and formal parameter mismatch.
By runtime, we mean a program in execution.
Runtime environment is a state of the target
machine, which may include software libraries,
environment variables, etc., to provide services to
the processes running in the system.
Runtime support system is a package, mostly
generated with the executable program itself and
facilitates the process communication between
the process and the runtime environment. It takes
care of memory allocation and de-allocation while
the program is being executed.

Method Statement For GIS.1
100% (10)
Method Statement For GIS.1
8 pages
Denr Free Patent Application Form
100% (3)
Denr Free Patent Application Form
3 pages
SSPC Vis 1
100% (3)
SSPC Vis 1
2 pages
Department of Education: Calpi Elementary School
No ratings yet
Department of Education: Calpi Elementary School
6 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Compler
No ratings yet
Compler
35 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Intro To Compilers Lecture 2
No ratings yet
Intro To Compilers Lecture 2
15 pages
Lec3-CompilerConstruction 2
No ratings yet
Lec3-CompilerConstruction 2
26 pages
Lec 3-Compiler Construction
0% (1)
Lec 3-Compiler Construction
26 pages
Unit-I - CD R2021
No ratings yet
Unit-I - CD R2021
60 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Structure and phases of a compiler
No ratings yet
Structure and phases of a compiler
54 pages
Unit 6
No ratings yet
Unit 6
109 pages
CC Viva Questions
0% (1)
CC Viva Questions
5 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
SE Compiler Chapter 2
No ratings yet
SE Compiler Chapter 2
16 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
133 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
cd1
No ratings yet
cd1
92 pages
Lexical and Syntactic Analysis: Slide 1
No ratings yet
Lexical and Syntactic Analysis: Slide 1
39 pages
Compiler Construction: Tahir Iqbal
No ratings yet
Compiler Construction: Tahir Iqbal
28 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Acd Unit-2
No ratings yet
Acd Unit-2
16 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
PPL Unit 1
No ratings yet
PPL Unit 1
8 pages
Compiler Design
No ratings yet
Compiler Design
65 pages
Lexical and Syntactic Analysis: Vitaly Shmatikov
No ratings yet
Lexical and Syntactic Analysis: Vitaly Shmatikov
39 pages
Lexical and Syntax Analysis (Parsing)
No ratings yet
Lexical and Syntax Analysis (Parsing)
39 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Compiler 18700220055 Prathamrai
No ratings yet
Compiler 18700220055 Prathamrai
12 pages
Compiler
No ratings yet
Compiler
60 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
100% (2)
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
50 pages
M2-MAIN
No ratings yet
M2-MAIN
41 pages
lec 02
No ratings yet
lec 02
17 pages
Compiler Design
No ratings yet
Compiler Design
122 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Lexical and Code Generation
No ratings yet
Lexical and Code Generation
6 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
No ratings yet
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
80 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
12 Compilers Grammars
No ratings yet
12 Compilers Grammars
10 pages
Compiler Design Unit-1 - 4
No ratings yet
Compiler Design Unit-1 - 4
4 pages
Compiler Design 1
100% (1)
Compiler Design 1
30 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Language Translation Principles PT 1
No ratings yet
Language Translation Principles PT 1
40 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
28 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
Compilers: CS414-2017S-01 Compiler Basics & Lexical Analysis
No ratings yet
Compilers: CS414-2017S-01 Compiler Basics & Lexical Analysis
58 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Interface Specification For SaaS Applications - HT D D v5.0
No ratings yet
Interface Specification For SaaS Applications - HT D D v5.0
32 pages
8-To-1 MUX
100% (2)
8-To-1 MUX
3 pages
Lic Unit 1
No ratings yet
Lic Unit 1
62 pages
Chanchal Panchal
No ratings yet
Chanchal Panchal
4 pages
St. Louise de Marillac College of Sorsogon: Bachelor of Science in Business Administration
No ratings yet
St. Louise de Marillac College of Sorsogon: Bachelor of Science in Business Administration
4 pages
WaterFilteringIndia ENG PDF
No ratings yet
WaterFilteringIndia ENG PDF
3 pages
Rotary Encoder LED Ring User Guide
No ratings yet
Rotary Encoder LED Ring User Guide
5 pages
Appiah-Agyekum 2013
No ratings yet
Appiah-Agyekum 2013
20 pages
Explain Why Seed Vigor Determination Is Important
No ratings yet
Explain Why Seed Vigor Determination Is Important
5 pages
Data Structures Lab
No ratings yet
Data Structures Lab
24 pages
Basic Switch Part1
No ratings yet
Basic Switch Part1
6 pages
Aircraft Connector Bonding Resistance Tests
No ratings yet
Aircraft Connector Bonding Resistance Tests
11 pages
2122 OSU Affidavit
No ratings yet
2122 OSU Affidavit
1 page
Worksheet16 GLOVING AND GOWNING
No ratings yet
Worksheet16 GLOVING AND GOWNING
4 pages
Tanudtanud, Charisse R. BSHM-3 General Marketing MWF 0130-0230 PM Case Analysis 1
No ratings yet
Tanudtanud, Charisse R. BSHM-3 General Marketing MWF 0130-0230 PM Case Analysis 1
3 pages
Creating Ethernet Cables
No ratings yet
Creating Ethernet Cables
18 pages
Aoc 156S
No ratings yet
Aoc 156S
66 pages
Practicum Report Format 2SSY1819
No ratings yet
Practicum Report Format 2SSY1819
1 page
Office of The Ombudsman V Sison
100% (1)
Office of The Ombudsman V Sison
2 pages
Trump Amicus
No ratings yet
Trump Amicus
24 pages
Trade Notice 03-2017
No ratings yet
Trade Notice 03-2017
133 pages
A Commercial Driver's Guide
No ratings yet
A Commercial Driver's Guide
98 pages
GROUP 4 Music Teachers Inc
No ratings yet
GROUP 4 Music Teachers Inc
4 pages
4.1 - Appendix-A-CEPs and CEAs
No ratings yet
4.1 - Appendix-A-CEPs and CEAs
2 pages
Work Immersion Portfolio Based on Deped-1-1
No ratings yet
Work Immersion Portfolio Based on Deped-1-1
45 pages
Migration From OpenGL ES 1.x To OpenGL ES 2.0 API.1.1f.external
No ratings yet
Migration From OpenGL ES 1.x To OpenGL ES 2.0 API.1.1f.external
42 pages

PL Lec 2 Syntax and Semantics

Uploaded by

PL Lec 2 Syntax and Semantics

Uploaded by

Lesson 2 – Describing

Syntax and Semantics

while (boolean_expr) statement

The semantics of this statement form is that when the

 A computer doesn’t actually understand the phrase ‘Hello, world!’, and it

 To do that, a number of things happen:

Lexical analysis is the first phase of a

Lexemes are said to be a sequence of

A regular expression is a sequence of characters

•Pattern Class - Defines a pattern (to be used in a search)

Flags in the compile() method change how the search is performed.

•Pattern.CASE_INSENSITIVE - The case of letters will be ignored

The first parameter of the Pattern.compile() method is the pattern. It describes

Brackets are used to find a range of characters:

Metacharacters are characters with a special

Let us understand how the language theory

Comparison ==, !=, <, <=, >, >=

 When the lexical analyzer read the source-code, it scans the

Syntax analysis or parsing is the second phase of a

It implies that every Regular Grammar is also

We will first see the definition of context-free grammar

A derivation is basically a sequence of

replaced from left to right, it is called left-most

Note: Follow MDAS

Input 2 : id * (id + id) - id

A parse tree is a graphical depiction of a

 A parser should be able to detect and report any error

Semantics of a language provide meaning to

You might also like