Syntax and Semantics
Syntax and Semantics
Studying programming languages can be divided into examinations of syntax and semantics.
These two (2) make up a language along with the alphabet (set of symbols used to build words
of a certain language) and lexis (or a dictionary, a set of words the language offers its users). A
language, natural (English) or artificial (Python), is a set of strings of characters from the
alphabet.
In natural language, such as English, syntax is the set of rules that determines if a certain string
of words forms a valid sentence. In programming language, syntax is the form of its expressions,
statements, and program units. It defines the meaning of the various combinations of symbols
used in programming and describes which strings of characters comprise a valid program. It is
important to note that every programming language uses different word sets in different orders,
meaning they have their syntax. This is why programmers must carefully and strictly adhere to
their programming language’s syntax, as any deviation could lead to syntax errors, resulting in
the computer being unable to run the source code. A source code is a program written in a high-
level programming language, while the source file is the file containing the source code.
Components of Syntax There are several key components to understanding syntax in
programming languages: • Keywords – reserved words that have a special meaning in the
language. They form the basic building blocks for writing programs. Examples include if, else,
for, and while. • Operators – symbols that perform operations on variables and values. Examples
include arithmetic operators like + and -and logical operators like && and ||. • Punctuation
includes symbols such as commas, semicolons, and braces that help structure the program.
They indicate the end of statements or encapsulate blocks of code. • Identifiers are names that
identify user-defined items such as variables, functions, and arrays. BNF (Backus-Naur Form) It is
a natural notation for describing syntax. Though not immediately accepted, it became the most
popular method of concisely describing programming language syntax. It is also considered a
metalanguage for programming language. A metalanguage is a language or set of terms used to
describe another language. Computer scientists use BNF to describe the syntax of a
programming language as it allows them to write a detailed description of a language’s
grammar.
BNF rules can be created by combining terminals and nonterminals. BNF rules, also called
production rules, are the core components of BNF grammar (a set of BNF rules). A set of BNF
rules can be set to specify the grammar of a language. For example, the Python syntax has a
grammar defined as a set of BNF rules used to validate the syntax of any piece of Python code. If
the code fails to fulfill these rules, a SyntaxError appears.
Knowing how to write BNF rules and which symbols to use will allow the creation of unique
rules. Here is an example of a BNF rule: Grammar for a Full Name Assume that a BNF rule needs
to be defined for how users should input a person’s full name (first, middle, and family name),
whitespace between each component, and the middle name treated as optional.
This rule can be defined as: ::= "" ( "")? The left-hand part of this BNF rule is a nonterminal
variable that identifies the person’s full name. The ::= indicates that will be replaced with the
right-hand part of the rule. The right-hand part of the rule has various components. Firstly, the
first name is defined as the nonterminal. Then, a space that separates the first name from the
next component. A terminal, which consists of a space character between quotes, is used to
define this space. After the nonterminal, a middle name can be accepted, then another space is
used. These two elements are enclosed in parentheses to group them. Then, and the " "
terminal is created. Both are optional, so a question mark (?) is used after to indicate that
condition. Finally, the family name. Another nonterminal, , is used to define this component. A
BNF rule is built, but a working grammar still needs to be created. This is only a root rule.
The rules for , , and must be defined to complete the grammar. The following requirements
must be met to do this: - Only letters must be accepted by each component - A capital letter will
start each component and continue with lowercase letters.
The first added rule accepts all the ASCII letters from uppercase A to Z, while the second rule
accepts all the lowercase. This indicates that accents or other non-ASCII letters will not be
supported.
Starting with the nonterminal expresses that the first letter must be uppercase, continued with
the nonterminal followed by an asterisk (*). The asterisk indicates that the first name will accept
zero or more lowercase letters after the initial uppercase letter. The same rule can be applied to
the and rules.
Unlike regular BNF rules, Python does not use angle brackets (<>) to enclose nonterminal
symbols. It only uses the nonterminal name or identifier, making the rules cleaner and more
readable. Additionally, square brackets ([]) are used differently in Python. In regular BNF, it is
used to enclose sets of characters like [a-z], but in Python, the brackets mean that the enclosed
element is optional. "a"…"z" is used instead to define something like [a-z] in Python.
This BNF grammar contains the rule’s name, return_stmt, ::=, and a terminal symbol consisting
of the word return. The second component is an optional list of expressions, expression_list,
enclosed in square brackets, signifying optionality in Python’s BNF notation. Syntax Tree Also
called Abstract Syntax Tree (AST), it is a tree representation of the syntactic structure of the
source code. Each node in the tree denotes a construct occurring in the source code. The
compiler uses the syntax tree to understand the hierarchical structure of the source code, which
is crucial for code analysis and optimization. Understanding syntax helps ensure that code is
written in a manner that the compiler or interpreter can effectively translate into machine code.
For instance, knowing the most efficient way to implement loops or recursive functions in a
given language involves a deep understanding of that language's syntax.
Semantics is the meaning of those expressions, statements, and program units that syntax
forms. While syntax refers to the set of rules that define the structure of a programming
language, semantics is concerned with the meaning behind that structure. Static and Dynamic
Semantics Semantics in programming languages can be broadly divided into two categories:
static semantics and dynamic semantics. Static semantics involves rules checked simultaneously,
such as type checking and scope resolution. These rules ensure that certain errors are caught
before the program runs, enhancing reliability and robustness. For instance, if a variable is used
without being declared, a statically typed language will flag an error before execution. Dynamic
semantics refers to the behavior of a program when it is run. It includes the execution of
expressions, control structures like loops and conditionals, and manipulating data through
functions and procedures. Dynamic semantics defines how the state of a program changes as it
executes, which is critical for understanding and predicting program behavior. Formal Semantics
Formal methods are often employed to define a programming language's semantics. These
include: • Operational Semantics – describes the behavior of a program in terms of abstract
machine execution steps. For instance, it outlines how each expression or statement is
evaluated step-bystep, making it ideal for simulating program execution. • Denotational
Semantics – maps syntactic constructs to mathematical objects, offering an abstract,
mathematical description of their meaning. This approach helps in reasoning about program
correctness and equivalence. For example, a function in programming might be represented as
a mathematical mapping between input and output values.