Python Module
Python Module
Python Module
Modules¶
If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost.
Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the
interpreter and running it with that file as input instead. This is known as creating a script. As your program gets longer, you may
want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several
programs without copying its definition into each program.
To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the
interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module
(the collection of variables that you have access to in a script executed at the top level and in calculator mode).
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
Within a module, the module’s name (as a string) is available as the value of the global variable __name__. For instance, use
your favorite text editor to create a file called fibo.py in the current directory with the following contents:
Now enter the Python interpreter and import this module with the following command:
This does not enter the names of the functions defined in fibo directly in the current symbol table; it only enters the module name
fibo there. Using the module name you can access the functions:
>>> fibo.fib(1000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.__name__
'fibo'
If you intend to use a function often you can assign it to a local name:
A module can contain executable statements as well as function definitions. These statements are intended to initialize the
module. They are executed only the first time the module is imported somewhere. [1]
Each module has its own private symbol table, which is used as the global symbol table by all functions defined in the module.
Thus, the author of a module can use global variables in the module without worrying about accidental clashes with a user’s
global variables. On the other hand, if you know what you are doing you can touch a module’s global variables with the same
notation used to refer to its functions, modname.itemname.
Modules can import other modules. It is customary but not required to place all import statements at the beginning of a module
(or script, for that matter). The imported module names are placed in the importing module’s global symbol table.
There is a variant of the import statement that imports names from a module directly into the importing module’s symbol table.
For example:
This does not introduce the module name from which the imports are taken in the local symbol table (so in the example, fibo is
not defined).
This imports all names except those beginning with an underscore (_).
Note that in general the practice of importing * from a module or package is frowned upon, since it often causes poorly readable
code. However, it is okay to use it to save typing in interactive sessions.
Note
For efficiency reasons, each module is only imported once per interpreter session. Therefore, if you change your modules, you
must restart the interpreter – or, if it’s just one module you want to test interactively, use reload(), e.g. reload(modulename).
the code in the module will be executed, just as if you imported it, but with the __name__ set to "__main__". That means that by
adding this code at the end of your module:
if __name__ == "__main__":
import sys
fib(int(sys.argv[1]))
you can make the file usable as a script as well as an importable module, because the code that parses the command line only
runs if the module is executed as the “main” file:
$ python fibo.py 50
1 1 2 3 5 8 13 21 34
This is often used either to provide a convenient user interface to a module, or for testing purposes (running the module as a
script executes a test suite).
6.1.2. The Module Search Path¶
When a module named spam is imported, the interpreter searches for a file named spam.py in the current directory, and then in
the list of directories specified by the environment variable PYTHONPATH. This has the same syntax as the shell variable
PATH, that is, a list of directory names. When PYTHONPATH is not set, or when the file is not found there, the search
continues in an installation-dependent default path; on Unix, this is usually .:/usr/local/lib/python.
Actually, modules are searched in the list of directories given by the variable sys.path which is initialized from the directory
containing the input script (or the current directory), PYTHONPATH and the installation- dependent default. This allows
Python programs that know what they’re doing to modify or replace the module search path. Note that because the directory
containing the script being run is on the search path, it is important that the script not have the same name as a standard module,
or Python will attempt to load the script as a module when that module is imported. This will generally be an error. See section
Standard Modules for more information.
As an important speed-up of the start-up time for short programs that use a lot of standard modules, if a file called spam.pyc
exists in the directory where spam.py is found, this is assumed to contain an already-“byte-compiled” version of the module
spam. The modification time of the version of spam.py used to create spam.pyc is recorded in spam.pyc, and the .pyc file is
ignored if these don’t match.
Normally, you don’t need to do anything to create the spam.pyc file. Whenever spam.py is successfully compiled, an attempt is
made to write the compiled version to spam.pyc. It is not an error if this attempt fails; if for any reason the file is not written
completely, the resulting spam.pyc file will be recognized as invalid and thus ignored later. The contents of the spam.pyc file are
platform independent, so a Python module directory can be shared by machines of different architectures.
• When the Python interpreter is invoked with the -O flag, optimized code is generated and stored in .pyo files. The
optimizer currently doesn’t help much; it only removes assert statements. When -O is used, all bytecode is optimized;
.pyc files are ignored and .py files are compiled to optimized bytecode.
• Passing two -O flags to the Python interpreter (-OO) will cause the bytecode compiler to perform optimizations that
could in some rare cases result in malfunctioning programs. Currently only __doc__ strings are removed from the
bytecode, resulting in more compact .pyo files. Since some programs may rely on having these available, you should
only use this option if you know what you’re doing.
• A program doesn’t run any faster when it is read from a .pyc or .pyo file than when it is read from a .py file; the only
thing that’s faster about .pyc or .pyo files is the speed with which they are loaded.
• When a script is run by giving its name on the command line, the bytecode for the script is never written to a .pyc or
.pyo file. Thus, the startup time of a script may be reduced by moving most of its code to a module and having a small
bootstrap script that imports that module. It is also possible to name a .pyc or .pyo file directly on the command line.
• It is possible to have a file called spam.pyc (or spam.pyo when -O is used) without a file spam.py for the same module.
This can be used to distribute a library of Python code in a form that is moderately hard to reverse engineer.
• The module compileall can create .pyc files (or .pyo files when -O is used) for all modules in a directory.
Python comes with a library of standard modules, described in a separate document, the Python Library Reference (“Library
Reference” hereafter). Some modules are built into the interpreter; these provide access to operations that are not part of the core
of the language but are nevertheless built in, either for efficiency or to provide access to operating system primitives such as
system calls. The set of such modules is a configuration option which also depends on the underlying platform For example, the
winreg module is only provided on Windows systems. One particular module deserves some attention: sys, which is built into
every Python interpreter. The variables sys.ps1 and sys.ps2 define the strings used as primary and secondary prompts:
These two variables are only defined if the interpreter is in interactive mode.
The variable sys.path is a list of strings that determines the interpreter’s search path for modules. It is initialized to a default path
taken from the environment variable PYTHONPATH, or from a built-in default if PYTHONPATH is not set. You can modify
it using standard list operations:
The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings:
Without arguments, dir() lists the names you have defined currently:
>>> a = [1, 2, 3, 4, 5]
>>> import fibo
>>> fib = fibo.fib
>>> dir()
['__builtins__', '__doc__', '__file__', '__name__', 'a', 'fib', 'fibo', 'sys']
Note that it lists all types of names: variables, modules, functions, etc.
dir() does not list the names of built-in functions and variables. If you want a list of those, they are defined in the standard
module __builtin__:
6.4. Packages¶
Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name
A.B designates a submodule named B in a package named A. Just like the use of modules saves the authors of different modules
from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-
module packages like NumPy or the Python Imaging Library from having to worry about each other’s module names.
Suppose you want to design a collection of modules (a “package”) for the uniform handling of sound files and sound data. There
are many different sound file formats (usually recognized by their extension, for example: .wav, .aiff, .au), so you may need to
create and maintain a growing collection of modules for the conversion between the various file formats. There are also many
different operations you might want to perform on sound data (such as mixing, adding echo, applying an equalizer function,
creating an artificial stereo effect), so in addition you will be writing a never-ending stream of modules to perform these
operations. Here’s a possible structure for your package (expressed in terms of a hierarchical filesystem):
When importing the package, Python searches through the directories on sys.path looking for the package subdirectory.
The __init__.py files are required to make Python treat the directories as containing packages; this is done to prevent directories
with a common name, such as string, from unintentionally hiding valid modules that occur later on the module search path. In the
simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__
variable, described later.
Users of the package can import individual modules from the package, for example:
import sound.effects.echo
This loads the submodule sound.effects.echo. It must be referenced with its full name.
This also loads the submodule echo, and makes it available without its package prefix, so it can be used as follows:
Again, this loads the submodule echo, but this makes its function echofilter() directly available:
Note that when using from package import item, the item can be either a submodule (or subpackage) of the package, or some
other name defined in the package, like a function, class or variable. The import statement first tests whether the item is defined
in the package; if not, it assumes it is a module and attempts to load it. If it fails to find it, an ImportError exception is raised.
Contrarily, when using syntax like import item.subitem.subsubitem, each item except for the last must be a package; the last item
can be a module or a package but can’t be a class or function or variable defined in the previous item.
Now what happens when the user writes from sound.effects import *? Ideally, one would hope that this somehow goes out to the
filesystem, finds which submodules are present in the package, and imports them all. This could take a long time and importing
sub-modules might have unwanted side-effects that should only happen when the sub-module is explicitly imported.
The only solution is for the package author to provide an explicit index of the package. The import statement uses the following
convention: if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be
imported when from package import * is encountered. It is up to the package author to keep this list up-to-date when a new
version of the package is released. Package authors may also decide not to support it, if they don’t see a use for importing * from
their package. For example, the file sounds/effects/__init__.py could contain the following code:
This would mean that from sound.effects import * would import the three named submodules of the sound package.
If __all__ is not defined, the statement from sound.effects import * does not import all submodules from the package
sound.effects into the current namespace; it only ensures that the package sound.effects has been imported (possibly running any
initialization code in __init__.py) and then imports whatever names are defined in the package. This includes any names defined
(and submodules explicitly loaded) by __init__.py. It also includes any submodules of the package that were explicitly loaded by
previous import statements. Consider this code:
import sound.effects.echo
import sound.effects.surround
from sound.effects import *
In this example, the echo and surround modules are imported in the current namespace because they are defined in the
sound.effects package when the from...import statement is executed. (This also works when __all__ is defined.)
Although certain modules are designed to export only names that follow certain patterns when you use import *, it is still
considered bad practise in production code.
Remember, there is nothing wrong with using from Package import specific_submodule! In fact, this is the recommended
notation unless the importing module needs to use submodules with the same name from different packages.
The submodules often need to refer to each other. For example, the surround module might use the echo module. In fact, such
references are so common that the import statement first looks in the containing package before looking in the standard module
search path. Thus, the surround module can simply use import echo or from echo import echofilter. If the imported module is not
found in the current package (the package of which the current module is a submodule), the import statement looks for a top-
level module with the given name.
When packages are structured into subpackages (as with the sound package in the example), you can use absolute imports to
refer to submodules of siblings packages. For example, if the module sound.filters.vocoder needs to use the echo module in the
sound.effects package, it can use from sound.effects import echo.
Starting with Python 2.5, in addition to the implicit relative imports described above, you can write explicit relative imports with
the from module import name form of import statement. These explicit relative imports use leading dots to indicate the current
and parent packages involved in the relative import. From the surround module for example, you might use:
Note that both explicit and implicit relative imports are based on the name of the current module. Since the name of the main
module is always "__main__", modules intended for use as the main module of a Python application should always use absolute
imports.
Packages support one more special attribute, __path__. This is initialized to be a list containing the name of the directory holding
the package’s __init__.py before the code in that file is executed. This variable can be modified; doing so affects future searches
for modules and subpackages contained in the package.
While this feature is not often needed, it can be used to extend the set of modules found in a package.
Footnotes
In fact function definitions are also ‘statements’ that are ‘executed’; the execution of a module-level function enters the
[1]
function name in the module’s global symbol table.
9. Classes¶
Compared with other programming languages, Python’s class mechanism adds classes with a minimum of new syntax and
semantics. It is a mixture of the class mechanisms found in C++ and Modula-3. Python classes provide all the standard features
of Object Oriented Programming: the class inheritance mechanism allows multiple base classes, a derived class can override any
methods of its base class or classes, and a method can call the method of a base class with the same name. Objects can contain
arbitrary amounts and kinds of data. As is true for modules, classes partake of the dynamic nature of Python: they are created at
runtime, and can be modified further after creation.
In C++ terminology, normally class members (including the data members) are public (except see below Private Variables), and
all member functions are virtual. As in Modula-3, there are no shorthands for referencing the object’s members from its
methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by
the call. As in Smalltalk, classes themselves are objects. This provides semantics for importing and renaming. Unlike C++ and
Modula-3, built-in types can be used as base classes for extension by the user. Also, like in C++, most built-in operators with
special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances.
(Lacking universally accepted terminology to talk about classes, I will make occasional use of Smalltalk and C++ terms. I would
use Modula-3 terms, since its object-oriented semantics are closer to those of Python than C++, but I expect that few readers
have heard of it.)
9.1. A Word About Names and Objects¶
Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in
other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with
immutable basic types (numbers, strings, tuples). However, aliasing has a possibly surprising effect on the semantics of Python
code involving mutable objects such as lists, dictionaries, and most other types. This is usually used to the benefit of the
program, since aliases behave like pointers in some respects. For example, passing an object is cheap since only a pointer is
passed by the implementation; and if a function modifies an object passed as an argument, the caller will see the change — this
eliminates the need for two different argument passing mechanisms as in Pascal.
Before introducing classes, I first have to tell you something about Python’s scope rules. Class definitions play some neat tricks
with namespaces, and you need to know how scopes and namespaces work to fully understand what’s going on. Incidentally,
knowledge about this subject is useful for any advanced Python programmer.
A namespace is a mapping from names to objects. Most namespaces are currently implemented as Python dictionaries, but that’s
normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are: the
set of built-in names (containing functions such as abs(), and built-in exception names); the global names in a module; and the
local names in a function invocation. In a sense the set of attributes of an object also form a namespace. The important thing to
know about namespaces is that there is absolutely no relation between names in different namespaces; for instance, two different
modules may both define a function maximize without confusion — users of the modules must prefix it with the module name.
By the way, I use the word attribute for any name following a dot — for example, in the expression z.real, real is an attribute of
the object z. Strictly speaking, references to names in modules are attribute references: in the expression modname.funcname,
modname is a module object and funcname is an attribute of it. In this case there happens to be a straightforward mapping
between the module’s attributes and the global names defined in the module: they share the same namespace! [1]
Attributes may be read-only or writable. In the latter case, assignment to attributes is possible. Module attributes are writable:
you can write modname.the_answer = 42. Writable attributes may also be deleted with the del statement. For example, del
modname.the_answer will remove the attribute the_answer from the object named by modname.
Namespaces are created at different moments and have different lifetimes. The namespace containing the built-in names is
created when the Python interpreter starts up, and is never deleted. The global namespace for a module is created when the
module definition is read in; normally, module namespaces also last until the interpreter quits. The statements executed by the
top-level invocation of the interpreter, either read from a script file or interactively, are considered part of a module called
__main__, so they have their own global namespace. (The built-in names actually also live in a module; this is called
__builtin__.)
The local namespace for a function is created when the function is called, and deleted when the function returns or raises an
exception that is not handled within the function. (Actually, forgetting would be a better way to describe what actually happens.)
Of course, recursive invocations each have their own local namespace.
A scope is a textual region of a Python program where a namespace is directly accessible. “Directly accessible” here means that
an unqualified reference to a name attempts to find the name in the namespace.
Although scopes are determined statically, they are used dynamically. At any time during execution, there are at least three
nested scopes whose namespaces are directly accessible:
• the innermost scope, which is searched first, contains the local names
• the scopes of any enclosing functions, which are searched starting with the nearest enclosing scope, contains non-local,
but also non-global names
• the next-to-last scope contains the current module’s global names
• the outermost scope (searched last) is the namespace containing built-in names
If a name is declared global, then all references and assignments go directly to the middle scope containing the module’s global
names. Otherwise, all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will
simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).
Usually, the local scope references the local names of the (textually) current function. Outside functions, the local scope
references the same namespace as the global scope: the module’s namespace. Class definitions place yet another namespace in
the local scope.
It is important to realize that scopes are determined textually: the global scope of a function defined in a module is that module’s
namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done
dynamically, at run time — however, the language definition is evolving towards static name resolution, at “compile” time, so
don’t rely on dynamic name resolution! (In fact, local variables are already determined statically.)
A special quirk of Python is that – if no global statement is in effect – assignments to names always go into the innermost scope.
Assignments do not copy data — they just bind names to objects. The same is true for deletions: the statement del x removes the
binding of x from the namespace referenced by the local scope. In fact, all operations that introduce new names use the local
scope: in particular, import statements and function definitions bind the module or function name in the local scope. (The global
statement can be used to indicate that particular variables live in the global scope.)
Classes introduce a little bit of new syntax, three new object types, and some new semantics.
class ClassName:
<statement-1>
.
.
.
<statement-N>
Class definitions, like function definitions (def statements) must be executed before they have any effect. (You could
conceivably place a class definition in a branch of an if statement, or inside a function.)
In practice, the statements inside a class definition will usually be function definitions, but other statements are allowed, and
sometimes useful — we’ll come back to this later. The function definitions inside a class normally have a peculiar form of
argument list, dictated by the calling conventions for methods — again, this is explained later.
When a class definition is entered, a new namespace is created, and used as the local scope — thus, all assignments to local
variables go into this new namespace. In particular, function definitions bind the name of the new function here.
When a class definition is left normally (via the end), a class object is created. This is basically a wrapper around the contents of
the namespace created by the class definition; we’ll learn more about class objects in the next section. The original local scope
(the one in effect just before the class definition was entered) is reinstated, and the class object is bound here to the class name
given in the class definition header (ClassName in the example).
Class objects support two kinds of operations: attribute references and instantiation.
Attribute references use the standard syntax used for all attribute references in Python: obj.name. Valid attribute names are all
the names that were in the class’s namespace when the class object was created. So, if the class definition looked like this:
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
then MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object, respectively. Class
attributes can also be assigned to, so you can change the value of MyClass.i by assignment. __doc__ is also a valid attribute,
returning the docstring belonging to the class: "A simple example class".
Class instantiation uses function notation. Just pretend that the class object is a parameterless function that returns a new instance
of the class. For example (assuming the above class):
x = MyClass()
creates a new instance of the class and assigns this object to the local variable x.
The instantiation operation (“calling” a class object) creates an empty object. Many classes like to create objects with instances
customized to a specific initial state. Therefore a class may define a special method named __init__(), like this:
def __init__(self):
self.data = []
When a class defines an __init__() method, class instantiation automatically invokes __init__() for the newly-created class
instance. So in this example, a new, initialized instance can be obtained by:
x = MyClass()
Of course, the __init__() method may have arguments for greater flexibility. In that case, arguments given to the class
instantiation operator are passed on to __init__(). For example,
Now what can we do with instance objects? The only operations understood by instance objects are attribute references. There
are two kinds of valid attribute names, data attributes and methods.
data attributes correspond to “instance variables” in Smalltalk, and to “data members” in C++. Data attributes need not be
declared; like local variables, they spring into existence when they are first assigned to. For example, if x is the instance of
MyClass created above, the following piece of code will print the value 16, without leaving a trace:
x.counter = 1
while x.counter < 10:
x.counter = x.counter * 2
print x.counter
del x.counter
The other kind of instance attribute reference is a method. A method is a function that “belongs to” an object. (In Python, the
term method is not unique to class instances: other object types can have methods as well. For example, list objects have
methods called append, insert, remove, sort, and so on. However, in the following discussion, we’ll use the term method
exclusively to mean methods of class instance objects, unless explicitly stated otherwise.)
Valid method names of an instance object depend on its class. By definition, all attributes of a class that are function objects
define corresponding methods of its instances. So in our example, x.f is a valid method reference, since MyClass.f is a function,
but x.i is not, since MyClass.i is not. But x.f is not the same thing as MyClass.f — it is a method object, not a function object.
x.f()
In the MyClass example, this will return the string 'hello world'. However, it is not necessary to call a method right away: x.f is a
method object, and can be stored away and called at a later time. For example:
xf = x.f
while True:
print xf()
What exactly happens when a method is called? You may have noticed that x.f() was called without an argument above, even
though the function definition for f() specified an argument. What happened to the argument? Surely Python raises an exception
when a function that requires an argument is called without any — even if the argument isn’t actually used...
Actually, you may have guessed the answer: the special thing about methods is that the object is passed as the first argument of
the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x). In general, calling a method with a list of n
arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method’s
object before the first argument.
If you still don’t understand how methods work, a look at the implementation can perhaps clarify matters. When an instance
attribute is referenced that isn’t a data attribute, its class is searched. If the name denotes a valid class attribute that is a function
object, a method object is created by packing (pointers to) the instance object and the function object just found together in an
abstract object: this is the method object. When the method object is called with an argument list, a new argument list is
constructed from the instance object and the argument list, and the function object is called with this new argument list.
Data attributes override method attributes with the same name; to avoid accidental name conflicts, which may cause hard-to-find
bugs in large programs, it is wise to use some kind of convention that minimizes the chance of conflicts. Possible conventions
include capitalizing method names, prefixing data attribute names with a small unique string (perhaps just an underscore), or
using verbs for methods and nouns for data attributes.
Data attributes may be referenced by methods as well as by ordinary users (“clients”) of an object. In other words, classes are not
usable to implement pure abstract data types. In fact, nothing in Python makes it possible to enforce data hiding — it is all based
upon convention. (On the other hand, the Python implementation, written in C, can completely hide implementation details and
control access to an object if necessary; this can be used by extensions to Python written in C.)
Clients should use data attributes with care — clients may mess up invariants maintained by the methods by stamping on their
data attributes. Note that clients may add data attributes of their own to an instance object without affecting the validity of the
methods, as long as name conflicts are avoided — again, a naming convention can save a lot of headaches here.
There is no shorthand for referencing data attributes (or other methods!) from within methods. I find that this actually increases
the readability of methods: there is no chance of confusing local variables and instance variables when glancing through a
method.
Often, the first argument of a method is called self. This is nothing more than a convention: the name self has absolutely no
special meaning to Python. Note, however, that by not following the convention your code may be less readable to other Python
programmers, and it is also conceivable that a class browser program might be written that relies upon such a convention.
Any function object that is a class attribute defines a method for instances of that class. It is not necessary that the function
definition is textually enclosed in the class definition: assigning a function object to a local variable in the class is also ok. For
example:
class C:
f = f1
def g(self):
return 'hello world'
h=g
Now f, g and h are all attributes of class C that refer to function objects, and consequently they are all methods of instances of C
— h being exactly equivalent to g. Note that this practice usually only serves to confuse the reader of a program.
Methods may call other methods by using method attributes of the self argument:
class Bag:
def __init__(self):
self.data = []
def add(self, x):
self.data.append(x)
def addtwice(self, x):
self.add(x)
self.add(x)
Methods may reference global names in the same way as ordinary functions. The global scope associated with a method is the
module containing the class definition. (The class itself is never used as a global scope.) While one rarely encounters a good
reason for using global data in a method, there are many legitimate uses of the global scope: for one thing, functions and modules
imported into the global scope can be used by methods, as well as functions and classes defined in it. Usually, the class
containing the method is itself defined in this global scope, and in the next section we’ll find some good reasons why a method
would want to reference its own class.
Each value is an object, and therefore has a class (also called its type). It is stored as object.__class__.
9.5. Inheritance¶
Of course, a language feature would not be worthy of the name “class” without supporting inheritance. The syntax for a derived
class definition looks like this:
class DerivedClassName(BaseClassName):
<statement-1>
.
.
.
<statement-N>
The name BaseClassName must be defined in a scope containing the derived class definition. In place of a base class name, other
arbitrary expressions are also allowed. This can be useful, for example, when the base class is defined in another module:
class DerivedClassName(modname.BaseClassName):
Execution of a derived class definition proceeds the same as for a base class. When the class object is constructed, the base class
is remembered. This is used for resolving attribute references: if a requested attribute is not found in the class, the search
proceeds to look in the base class. This rule is applied recursively if the base class itself is derived from some other class.
There’s nothing special about instantiation of derived classes: DerivedClassName() creates a new instance of the class. Method
references are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if
necessary, and the method reference is valid if this yields a function object.
Derived classes may override methods of their base classes. Because methods have no special privileges when calling other
methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling
a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)
An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same
name. There is a simple way to call the base class method directly: just call BaseClassName.methodname(self, arguments). This
is occasionally useful to clients as well. (Note that this only works if the base class is accessible as BaseClassName in the global
scope.)
• Use isinstance() to check an instance’s type: isinstance(obj, int) will be True only if obj.__class__ is int or some class
derived from int.
• Use issubclass() to check class inheritance: issubclass(bool, int) is True since bool is a subclass of int. However,
issubclass(unicode, str) is False since unicode is not a subclass of str (they only share a common ancestor, basestring).
Python supports a limited form of multiple inheritance as well. A class definition with multiple base classes looks like this:
For old-style classes, the only rule is depth-first, left-to-right. Thus, if an attribute is not found in DerivedClassName, it is
searched in Base1, then (recursively) in the base classes of Base1, and only if it is not found there, it is searched in Base2, and so
on.
(To some people breadth first — searching Base2 and Base3 before the base classes of Base1 — looks more natural. However,
this would require you to know whether a particular attribute of Base1 is actually defined in Base1 or in one of its base classes
before you can figure out the consequences of a name conflict with an attribute of Base2. The depth-first rule makes no
differences between direct and inherited attributes of Base1.)
For new-style classes, the method resolution order changes dynamically to support cooperative calls to super(). This approach is
known in some other multiple-inheritance languages as call-next-method and is more powerful than the super call found in
single-inheritance languages.
With new-style classes, dynamic ordering is necessary because all cases of multiple inheritance exhibit one or more diamond
relationships (where one at least one of the parent classes can be accessed through multiple paths from the bottommost class).
For example, all new-style classes inherit from object, so any case of multiple inheritance provides more than one path to reach
object. To keep the base classes from being accessed more than once, the dynamic algorithm linearizes the search order in a way
that preserves the left-to-right ordering specified in each class, that calls each parent only once, and that is monotonic (meaning
that a class can be subclassed without affecting the precedence order of its parents). Taken together, these properties make it
possible to design reliable and extensible classes with multiple inheritance. For more detail, see
http://www.python.org/download/releases/2.3/mro/.
“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a
convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-
public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail
and subject to change without notice.
Since there is a valid use-case for class-private members (namely to avoid name clashes of names with names defined by
subclasses), there is limited support for such a mechanism, called name mangling. Any identifier of the form __spam (at least
two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the
current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the
identifier, as long as it occurs within the definition of a class.
Note that the mangling rules are designed mostly to avoid accidents; it still is possible to access or modify a variable that is
considered private. This can even be useful in special circumstances, such as in the debugger.
Notice that code passed to exec, eval() or execfile() does not consider the classname of the invoking class to be the current class;
this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled
together. The same restriction applies to getattr(), setattr() and delattr(), as well as when referencing __dict__ directly.
Sometimes it is useful to have a data type similar to the Pascal “record” or C “struct”, bundling together a few named data items.
An empty class definition will do nicely:
class Employee:
pass
A piece of Python code that expects a particular abstract data type can often be passed a class that emulates the methods of that
data type instead. For instance, if you have a function that formats some data from a file object, you can define a class with
methods read() and readline() that get the data from a string buffer instead, and pass it as an argument.
Instance method objects have attributes, too: m.im_self is the instance object with the method m(), and m.im_func is the function
object corresponding to the method.
User-defined exceptions are identified by classes as well. Using this mechanism it is possible to create extensible hierarchies of
exceptions.
There are two new valid (semantic) forms for the raise statement:
raise instance
In the first form, instance must be an instance of Class or of a class derived from it. The second form is a shorthand for:
A class in an except clause is compatible with an exception if it is the same class or a base class thereof (but not the other way
around — an except clause listing a derived class is not compatible with a base class). For example, the following code will print
B, C, D in that order:
class B:
pass
class C(B):
pass
class D(C):
pass
Note that if the except clauses were reversed (with except B first), it would have printed B, B, B — the first matching except
clause is triggered.
When an error message is printed for an unhandled exception, the exception’s class name is printed, then a colon and a space,
and finally the instance converted to a string using the built-in function str().
9.9. Iterators¶
By now you have probably noticed that most container objects can be looped over using a for statement:
This style of access is clear, concise, and convenient. The use of iterators pervades and unifies Python. Behind the scenes, the for
statement calls iter() on the container object. The function returns an iterator object that defines the method next() which accesses
elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the
for loop to terminate. This example shows how it all works:
>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> it.next()
'a'
>>> it.next()
'b'
>>> it.next()
'c'
>>> it.next()
Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define an __iter__()
method which returns an object with a next() method. If the class defines next(), then __iter__() can just return self:
class Reverse:
"Iterator for looping over a sequence backwards"
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def next(self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.data[self.index]
9.10. Generators¶
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield
statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all
the data values and which statement was last executed). An example shows that generators can be trivially easy to create:
def reverse(data):
for index in range(len(data)-1, -1, -1):
yield data[index]
Anything that can be done with generators can also be done with class based iterators as described in the previous section. What
makes generators so compact is that the __iter__() and next() methods are created automatically.
Another key feature is that the local variables and execution state are automatically saved between calls. This made the function
easier to write and much more clear than an approach using instance variables like self.index and self.data.
In addition to automatic method creation and saving program state, when generators terminate, they automatically raise
StopIteration. In combination, these features make it easy to create iterators with no more effort than writing a regular function.
Some simple generators can be coded succinctly as expressions using a syntax similar to list comprehensions but with
parentheses instead of brackets. These expressions are designed for situations where the generator is used right away by an
enclosing function. Generator expressions are more compact but less versatile than full generator definitions and tend to be more
memory friendly than equivalent list comprehensions.
Examples:
Footnotes
Except for one thing. Module objects have a secret read-only attribute called __dict__ which returns the dictionary used to
[1] implement the module’s namespace; the name __dict__ is an attribute but not a global name. Obviously, using this violates
the abstraction of namespace implementation, and should be restricted to things like post-mortem debuggers.