0% found this document useful (0 votes)
22 views

Python Lexical Structure

Uploaded by

vivosa4272
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Python Lexical Structure

Uploaded by

vivosa4272
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 4.

The Python Language


This chapter is a quick guide to the Python language. To learn Python from scratch, I
suggest you start with Learning Python, by Mark Lutz and David Ascher (O’Reilly). If
you already know other programming languages and just want to learn the specifics
of Python, this chapter is for you. I’m not trying to teach Python here, so we’re going
to cover a lot of ground at a pretty fast pace.

Lexical Structure
The lexical structure of a programming language is the set of basic rules that govern
how you write programs in that language. It is the lowest-level syntax of the
language and specifies such things as what variable names look like and what
characters are used for comments. Each Python source file, like any other text file, is
a sequence of characters. You can also usefully see it as a sequence of lines,
tokens, or statements. These different syntactic views complement and reinforce
each other. Python is very particular about program layout, especially with regard to
lines and indentation, so you’ll want to pay attention to this information if you are
coming to Python from another language.

Lines and Indentation

A Python program is composed of a sequence of logical lines, each made up of


one or more physical lines. Each physical line may end with a comment. A pound
sign (#) that is not inside a string literal begins a comment. All characters after
the # and up to the physical line end are part of the comment, and the Python
interpreter ignores them. A line containing only whitespace, possibly with a
comment, is called a blank line, and is ignored by the interpreter. In an interactive
interpreter session, you must enter an empty physical line (without any whitespace
or comment) to terminate a multiline statement.

In Python, the end of a physical line marks the end of most statements. Unlike in
other languages, Python statements are not normally terminated with a delimiter,
such as a semicolon (;). When a statement is too long to fit on a single physical line,
you can join two adjacent physical lines into a logical line by ensuring that the first
physical line has no comment and ends with a backslash (\). Python also joins
adjacent physical lines into one logical line if an open parenthesis ((), bracket ([), or
brace ({) has not yet been closed. Triple-quoted string literals can also span physical
lines. Physical lines after the first one in a logical line are known
as continuation lines. The indentation issues covered next do not apply to
continuation lines, but only to the first physical line of each logical line.

Python uses indentation to express the block structure of a program. Unlike other
languages, Python does not use braces or begin/end delimiters around blocks of
statements: indentation is the only way to indicate such blocks. Each logical line in a
Python program is indented by the whitespace on its left. A block is a contiguous
sequence of logical lines, all indented by the same amount; the block is ended by a
logical line with less indentation. All statements in a block must have the same
indentation, as must all clauses in a compound statement. Standard Python style is
to use four spaces per indentation level. The first statement in a source file must
have no indentation (i.e., it must not begin with any whitespace). Additionally,
statements typed at the interactive interpreter prompt >>> (covered in Chapter 3)
must have no indentation.

A tab is logically replaced by up to 8 spaces, so that the next character after the tab
falls into logical column 9, 17, 25, etc. Don’t mix spaces and tabs for indentation,
since different tools (e.g., editors, email systems, printers) treat tabs differently.
The -t and -tt options to the Python interpreter (covered in Chapter 3) ensure
against inconsistent tab and space usage in Python source code. You can configure
any good editor to expand tabs to spaces so that all Python source code you write
contains only spaces, not tabs. You then know that all tools, including Python itself,
are going to be consistent in handling the crucial matter of indentation in your source
files.

Tokens
Python breaks each logical line into a sequence of elementary lexical components,
called tokens. Each token corresponds to a substring of the logical line. The normal
token types are identifiers, keywords, operators, delimiters, and literals, as covered
in the following sections. Whitespace may be freely used between tokens to
separate them. Some whitespace separation is needed between logically adjacent
identifiers or keywords; otherwise, they would be parsed as a single, longer identifier.
For example, printx is a single identifier—to write the keyword print followed by
identifier x, you need to insert some whitespace (e.g., print x).

Identifiers

An identifier is a name used to identify a variable, function, class, module, or other


object. An identifier starts with a letter (A to Z or a to z) or underscore (_) followed by
zero or more letters, underscores, and digits (0 to 9). Case is significant in Python:
lowercase and uppercase letters are distinct. Punctuation characters such as @, $,
and % are not allowed in identifiers.

Normal Python style is to start class names with an uppercase letter and other
identifiers with a lowercase letter. Starting an identifier with a single leading
underscore indicates by convention that the identifier is meant to be private. Starting
an identifier with two leading underscores indicates a strongly private identifier; if the
identifier also ends with two trailing underscores, the identifier is a language-defined
special name. The identifier _ (a single underscore) is special in interactive
interpreter sessions: the interpreter binds _ to the result of the last expression
statement evaluated interactively, if any.

Keywords

Python has 28 keywords (29 in Python 2.3 and later), which are identifiers that
Python reserves for special syntactic uses. Keywords are composed of lowercase
letters only. You cannot use keywords as regular identifiers. Some keywords begin
simple statements or clauses of compound statements, while other keywords are
used as operators. All the keywords are covered in detail in this book, either later in
this chapter or in Chapter 5, Chapter 6, or Chapter 7. The keywords in Python are:
and del for is raise
assert elif from lambda return
break else global not try
class except if or while

continue exec import pass yield [a]


def finally in print
[a]
Only in Python 2.3 and later (or Python 2.2 with from __future__ import generators).

Operators

Python uses non-alphanumeric characters and character combinations as operators.


Python recognizes the following operators, which are covered in detail later in this
chapter:

+ - * / % ** // << >> &


| ^ ~ < <= > >= <> != = =

Delimiters

Python uses the following symbols and symbol combinations as delimiters in


expressions, lists, dictionaries, various aspects of statements, and strings, among
other purposes:

( ) [ ] { }
, : . ` = ;
+= -= *= /= //= %=
&= |= ^= >>= <<= **=

The period (.) can also appear in floating-point and imaginary literals. A sequence of
three periods (...) has a special meaning in slices. The last two rows of the table list
the augmented assignment operators, which serve lexically as delimiters but also
perform an operation. I’ll discuss the syntax for the various delimiters when I
introduce the objects or statements with which they are used.

The following characters have special meanings as part of other tokens:

'"#\
The characters @, $, and ?, all control characters except whitespace, and all
characters with ISO codes above 126 (i.e., non-ASCII characters, such as accented
letters), can never be part of the text of a Python program except in comments or
string literals.

Literals

A literal is a data value that appears directly in a program. The following are all
literals in Python:

42 # Integer literal
3.14 # Floating-point literal
1.0J # Imaginary literal
'hello' # String literal
"world" # Another string literal
"""Good
night""" # Triple-quoted string literal

Using literals and delimiters, you can create data values of other types:

[ 42, 3.14, 'hello' ] # List


( 100, 200, 300 ) # Tuple
{ 'x':42, 'y':3.14 } # Dictionary

The syntax for literals and other data values is covered in detail later in this chapter,
when we discuss the various data types supported by Python.

Statements

You can consider a Python source file as a sequence of simple and compound
statements. Unlike other languages, Python has no declarations or other top-level
syntax elements.

Simple statements

A simple statement is one that contains no other statements. A simple statement lies
entirely within a logical line. As in other languages, you may place more than one
simple statement on a single logical line, with a semicolon (;) as the separator.
However, one statement per line is the usual Python style, as it makes programs
more readable.

Any expression can stand on its own as a simple statement; we’ll discuss
expressions in detail later in this chapter. The interactive interpreter shows the result
of an expression statement entered at the prompt (>>>), and also binds the result to
a variable named _. Apart from interactive sessions, expression statements are
useful only to call functions (and other callables) that have side effects (e.g., that
perform output or change global variables).

An assignment is a simple statement that assigns a value to a variable, as we’ll


discuss later in this chapter. Unlike in some other languages, an assignment in
Python is a statement, and therefore can never be part of an expression.

Compound statements

A compound statement contains other statements and controls their execution. A


compound statement has one or more clauses, aligned at the same indentation.
Each clause has a header that starts with a keyword and ends with a colon (:),
followed by a body, which is a sequence of one or more statements. When the body
contains multiple statements, also known as a block, these statements should be
placed on separate logical lines after the header line and indented rightward from the
header line. The block terminates when the indentation returns to that of the clause
header (or further left from there). Alternatively, the body can be a single simple
statement, following the : on the same logical line as the header. The body may also
be several simple statements on the same line with semicolons between them, but
as I’ve already indicated, this is not good Python style.

You might also like