A Hands-On Introduction To Using Python in The Atmospheric and Oceanic Sciences
A Hands-On Introduction To Using Python in The Atmospheric and Oceanic Sciences
A Hands-On Introduction To Using Python in The Atmospheric and Oceanic Sciences
http://www.johnny-lin.com/pyintro
2012
Chapter 7
An Introduction to OOP Using
Python: Part IBasic Principles
and Syntax
7.1
7.1.1
Procedural
programs
have data and
functions as
separate
entities.
Real world
objects have
states and
behaviors.
7.1.2
Objects are
made up of
attributes and
methods.
Object
instances are
specific
realizations of
a class.
What do objects consist of? An object in programming is an entity or variable that has two entities attached to it: data and things that act on that data.
The data are called attributes of the object, and the functions attached to the
object that can act on the data are called methods of the object. Importantly,
you design these methods to act on the attributes; they arent random functions someone has attached to the object. In contrast, in procedural programming, variables have only one set of data, the value of the variable, with no
functions attached to the variable.
How are objects defined? In the real world, objects are usually examples or specific realizations of some class or type. For instance, individual
people are specific realizations of the class of human beings. The specific realizations, or instances, differ from one another in details but have the same
pattern. For people, we all have the same general shape, organ structure, etc.
In OOP, the specific realizations are called object instances, while the common pattern is called a class. In Python, this common pattern or template is
defined by the class statement.
98
7.2
Python strings (like nearly everything else in Python) are objects. Thus, built
into Python, there (implicitly) is a class definition of the string class, and every time you create a string, you are using that definition as your template.
That template defines both attributes and methods for all string objects, so
whatever string youve created, you have that set of data and functions attached to your string which you can use. Lets look at a specific case:
Example 45 (Viewing attributes and methods attached to strings and
trying out a few methods):
In the Python interpreter, type in:
a = "hello"
Now type: dir(a). What do you see? Type a.title() and a.upper()
and see what you get.
Solution and discussion: The dir(a) command gives a list of (nearly)
all the attributes and methods attached to the object a, which is the string
"hello". Note that there is more data attached to the object than just the
word hello, e.g., the attributes a. doc and a. class also show up in
the dir listing.
Methods can act on the data in the object. Thus, a.title() applies the
title method to the data of a and returns the string "hello" in title case
(i.e., the first letter of the word capitalized); a.upper() applies the upper
method to the data of a and returns the string all in uppercase. Notice these
methods do not require additional input arguments between the parenthesis,
because all the data needed is already in the object (i.e., "hello").
99
The dir
command
shows an
objects
attributes and
methods.
7.3
The upper,
isupper, and
count string
methods.
7.4
While lists have their uses, in scientific computing, arrays are the central
object. Most of our discussion of arrays has focused on functions that create
and act on arrays. Arrays, however, are objects like any other object and have
attributes and methods built-in to them; arrays are more than just a sequence
of numbers. Lets look at an example list of all the attributes and methods of
an array object:
Example 46 (Examining array object attributes and methods):
In the Python interpreter, type in:
a = N.reshape(N.arange(12), (4,3))
Now type: dir(a). What do you see? Based on their names, and your
understanding of what arrays are, what do you think some of these attributes
and methods do?
Solution and discussion: The dir command should give you a list of a
lot of stuff. Im not going to list all the output here but instead will discuss
the output in general terms.
We first notice that there are two types of attribute and method names:
those with double-underscores in front and in back of the name and those
without any pre- or post-pended double-underscores. We consider each type
of name in turn.
A very few double-underscore names sound like data. The a. doc
variable is one such attribute and refers to documentation of the object. Most
of the double-underscore names suggest operations on or with arrays (e.g.,
add, div, etc.), which is what they are: Those names are of the methods of the
array object that define what Python will do to your data when the interpreter
sees a +, /, etc. Thus, if you want to redefine how operators operate
on arrays, you can do so. It is just a matter of redefining that method of the
object.
That being said, I do not, in general, recommend you do so. In Python,
the double-underscore in front means that attribute or method is very private. (A variable with a single underscore in front is private, but not as
private as a double-underscore variable.) That is to say, it is an attribute or
method that normal users should not access, let alone redefine. Python does
not, however, do much to prevent you from doing so, so advanced users who
need to access or redefine those attributes and methods can do so.
101
Doubleunderscore
attribute and
method
names.
Singleunderscore
attribute and
method
names.
And now lets look at some examples of accessing and using array object
attributes and methods:
Example 47 (Using array attributes and methods):
In the Python interpreter, type in:
a = N.reshape(N.arange(12), (4,3))
print a.astype(c)
print a.shape
print a.cumsum()
print a.T
What do each of the print lines do? Are you accessing an attribute or
method of the array?:
How to tell
whether you
are accessing
an attribute or
a method.
While its nice to have a bunch of array attributes and methods attached to
the array object, in practice, I find I seldom access array attributes and find
it easier to use NumPy functions instead of the corresponding array methods. One exception with regards to attributes is the dtype.char attribute;
thats very useful since it tells you the type of the elements of the array (see
Example 30 for more on dtype.char).
7.5
ravel,
resize, and
round
function and
methods.
7.6
Defining a
class using
class.
Defining
methods and
the self
argument.
The init
method.
We had said that all objects are instances of a class, and in the preceding
examples, we looked at what made up string and array instances, which tells
us something about the class definitions for those two kinds of objects. How
would we go about creating our own class definitions?
Class definitions start with class statement. The block following the
class line is the class definition. Within the definition, you refer to the instance of the class as self. So, for example, the instance attribute data
is called self.data in the class definition, and the instance method named
calculate is called self.calculate in the class definition (i.e., it is called
by self.calculate(), if it does not take any arguments).
Methods are defined using the def statement. The first argument in any
method is self; this syntax is how Python tells a method make use of all
the previously defined attributes and methods in this instance. However,
you never type self when you call the method.
Usually, the first method you define will be the init method. This
method is called whenever you create an instance of the class, and so you
usually put code that handles the arguments present when you create (or
instantiate) an instance of a class and conducts any kind of initialization for
the object instance. The arguments list of init is the list of arguments
passed in to the constructor of the class, which is called when you use the
class name with calling syntax.
Whew! This is all very abstract. We need an example! Heres one:
This statement is not entirely correct. If you do set another variable, by assignment, to
such a method call, that lefthand-side variable will typically be set to None.
104
class Book(object):
def __init__(self, authorlast, authorfirst, \
title, place, publisher, year):
self.authorlast = authorlast
self.authorfirst = authorfirst
self.title = title
self.place = place
self.publisher = publisher
self.year = year
10
11
12
13
14
15
16
17
def write_bib_entry(self):
return self.authorlast \
+ , + self.authorfirst \
+ , + self.title \
+ , + self.place \
+ : + self.publisher + , \
+ self.year + .
18
19
20
21
22
23
24
25
26
7.7
Me Press, 2012.
2. The entire Book class definition, with the new method (and line continuations added to fit the code on the page), is:
1
2
3
4
5
6
7
8
9
class Book(object):
def __init__(self, authorlast, authorfirst, \
title, place, publisher, year):
self.authorlast = authorlast
self.authorfirst = authorfirst
self.title = title
self.place = place
self.publisher = publisher
self.year = year
10
11
12
13
def make_authoryear(self):
self.authoryear = self.authorlast \
+ ( + self.year +)
14
15
16
17
18
19
20
21
def write_bib_entry(self):
return self.authorlast \
+ , + self.authorfirst \
+ , + self.title \
+ , + self.place \
+ : + self.publisher + , \
+ self.year + .
108
1
2
3
4
5
6
7
8
9
10
11
class Article(object):
def __init__(self, authorlast, authorfirst, \
articletitle, journaltitle, \
volume, pages, year):
self.authorlast = authorlast
self.authorfirst = authorfirst
self.articletitle = articletitle
self.journaltitle = journaltitle
self.volume = volume
self.pages = pages
self.year = year
12
13
14
15
def make_authoryear(self):
self.authoryear = self.authorlast \
+ ( + self.year +)
16
17
18
19
20
21
22
23
24
def write_bib_entry(self):
return self.authorlast \
+ , + self.authorfirst \
+ ( + self.year + ): \
+ " + self.articletitle + ," \
+ self.journaltitle + , \
+ self.volume + , \
+ self.pages + .
This code looks nearly the same as that for the Book class, with these
exceptions: some attributes differ between the two classes (books, for
instance, do not have journal titles) and the method write bib entry
is different between the two classes (to accommodate the different formatting between article and book bibliography entries). See bibliog.py
in course files/code files for the code.
109
7.8
Summary of
introduction
to OOP.
7.9
The Book and Article classes we wrote earlier manage information related
to books and articles. In this case study, we make use of Book and Article
to help us implement one common use of book and article information: the
creation of a bibliography. In particular, well write a Bibliography class
that will manage a bibliography, given instances of Book and Article objects.
7.9.1
import operator
2
3
4
5
class Bibliography(object):
def __init__(self, entrieslist):
self.entrieslist = entrieslist
6
7
8
9
10
11
12
def sort_entries_alpha(self):
tmp = sorted(self.entrieslist,
key=operator.attrgetter(authorlast,
authorfirst))
self.entrieslist = tmp
del tmp
Lets talk about what this code does. In the init method, there is
only a single argument, entrieslist. This is the list of Book and Article
instances that are being passed into an instance of the Bibliography class.
The init method assigns the entrieslist argument to an attribute of
the same name.
Lines 712 define the sort entries alpha method, which sorts the
entrieslist attribute and replaces the old entrieslist attribute with the
sorted version. The method uses the built-in sorted function, which takes a
keyword parameter key that gives the key used for sorting the argument of
sorted.
How is that key generated? The attrgetter function, which is part of
the operator module, gets the attributes of the names listed as arguments
to attrgetter out of the elements of the item being sorted. (Note that the
attribute names passed into attrgetter are strings, and thus you refer to the
attributes of interest by their string names, not by typing in their names. This
makes the program much easier to write.) In our example, attrgetter has
two arguments; sorted indexes self.entrieslist by the attrgetters
first argument attribute name first then the second.
111
The
attrgetter
function and
sorted.
name
== main :
which is true only if the module is being run as a main program, i.e., by
the python command. If you import the module for use in another module, by using import, the variable name will not have the string value
main , and the diagnostics will not execute.
7.9.2
Lets pause to think for a moment about the method sort entries alpha.
What have we just done? First, we sorted a list of items that are totally
differently structured from each other based on two shared types of data (attributes). Second, we did the sort using a sorting function that does not
care about the details of the items being sorted, only that they had these
two shared types of data. In other words, the sorting function doesnt care
about the source type (e.g., article, book), only that all source types have the
attributes authorlast and authorfirst.
This doesnt seem that big a deal, but think about how we would have had
Comparing
OOP vs. to do it in traditional procedural programming. First, each instance would
procedural for have been an array, with a label of what kind of source it is, for instance:
a sorting
example.
7.9.3
import operator
2
3
4
5
class Bibliography(object):
def __init__(self, entrieslist):
self.entrieslist = entrieslist
6
7
8
9
10
11
12
def sort_entries_alpha(self):
tmp = sorted(self.entrieslist,
key=operator.attrgetter(authorlast,
authorfirst))
self.entrieslist = tmp
del tmp
13
14
15
16
17
18
19
20
def write_bibliog_alpha(self):
self.sort_entries_alpha()
output =
for ientry in self.entrieslist:
output = output \
+ ientry.write_bib_entry() + \n\n
return output[:-2]
The only code that has changed compared to what we had previously is
the write bibliog alpha method; lets talk about what it does. Line 14
defines the method; because self is the only argument, the method is called
with an empty argument list. The next line calls the sort entries alpha
method to make sure the list that is stored in the entrieslist attribute
is alphabetized. Next, we initialize the output string output as an empty
string. When the + operator is used, Python will then use string concatenation on it. Lines 1719 run a for loop to go through all elements in the
list entrieslist. The output of write bib entry is added one entry at a
time, along with two linebreaks after it. Finally, the entire string is output except for the final two linebreaks. (Remember that strings can be manipulated
using list slicing syntax.)
114
7.9.4
Here too, lets ask how would we have written a function that wrote out an
alphabetized bibliography in procedural programming? Probably something
like the following sketch:
def write_bibliog_function(arrayofentries):
[open output file]
for i in xrange(len(arrayofentries)):
ientryarray = arrayofentries[i]
if ientryarray[0] = "article":
[call function for bibliography entry
for an article, and save to output file]
elif ientryarray[0] == "book":
[call function for bibliography entry
for an book, and save to output file]
[...]
[close output file]
7.10
I think the bibliography example in Section 7.9 does a good job of illustrating
what object-oriented programming gives you that procedural programming
115
1
1
1
1
2
2
2
2
3
3
3
3
4]
4]
4]
4]]
0
1
2
3
0
1
2
3
0
1
2
3
0]
1]
2]
3]]
Solution and discussion: The two solutions described below (with the
second solution commented out) are in course files/code files in the file surface domain.py). Heres the solution using for loops:
1
import numpy as N
2
3
4
5
6
class SurfaceDomain(object):
def __init__(self, lon, lat):
self.lon = N.array(lon)
self.lat = N.array(lat)
7
8
9
10
11
12
13
14
Lines 56 guarantee that lon and lat are NumPy arrays, in case lists or
tuples are passed in.
And heres a simpler and faster solution using the meshgrid function in
Using
NumPy instead of the for loops:
meshgrid.
117
import numpy as N
2
3
4
5
6
7
8
9
10
class SurfaceDomain(object):
def __init__(self, lon, lat):
self.lon = N.array(lon)
self.lat = N.array(lat)
[xall, yall] = N.meshgrid(self.lon, self.lat)
self._lonall = xall
self._latall = yall
del xall, yall
So, what does this SurfaceDomain class illustrate about OOP applied to
the geosciences? Pretend you have multiple SurfaceDomain instances that
you want to communicate to each other, where the bounds of one are taken
from (or interpolated with) the bounds of another, e.g., calculations for each
domain instance are farmed out to a separate processor, and youre stitching
domains together:
Comparing
OOP vs.
procedural for
a subdomain
management
example.
In the above schematic, gray areas are SurfaceDomain instances and the
thick, dark lines are the overlapping boundaries between the domain instances.
In procedural programming, to manage this set of overlapping domains,
you might create a grand domain encompassing all points in all the domains
to make an index that keeps track of which domains abut one another. The
index records who contributes data to these boundary regions. Alternately,
you might create a function that processes only the neighboring domains, but
this function will be called from a scope that has access to all the domains
(e.g., via a common block).
But, to manage this set of overlapping domains, you dont really need
such a global view nor access to all domains. In fact, a global index or a
common block means that if you change your domain layout, you have to
hand-code a change to your index/common block. Rather, what you actually
need is only to be able to interact with your neighbor. So why not just write a
method that takes your neighboring SurfaceDomain instances as arguments
118
7.11. SUMMARY
and alters the boundaries accordingly? That is, why not add the following to
the SurfaceDomain class definition:2
class SurfaceDomain(object):
[...]
def syncbounds(self, northobj, southobj,
eastobj, westobj):
[...]
Such a method will propagate to all SurfaceDomain instances automatically, once written in the class definition. Thus, you only have to write one
(relatively) small piece of code that can then affect any number of layouts
of SurfaceDomain instances. Again, object-oriented programming enables
you to push the level at which you code to solve a problem down to a lowerlevel than procedural programming easily allows. As a result, you can write
smaller, better tested bit of code; this makes your code more robust and flexible.
7.11
Summary
You could, I think, fairly summarize this chapter as addressing one big question: Why should an atmospheric or oceanic scientist bother with objectoriented programming? In answer, I suggest two reasons. First, code written using OOP is less prone to error. OOP enables you to mostly eliminate
lengthy argument lists, and it is much more difficult for a function to accidentally process data it should not process. Additionally, OOP deals with long
series of conditional tests much more compactly; there is no need to duplicate
if tests in multiple places. Finally, objects enable you to test smaller pieces
of your program (e.g., individual attributes and methods), which makes your
tests more productive and effective.
Second, programs written using OOP are more easily extended. New
cases are easily added by creating new classes that have the interface methods defined for them. Additional functionality is also easily added by just
adding new methods/attributes. Finally, any changes to class definitions automatically propagate to all instances of the class.
For short, quick-and-dirty programs, procedural programming is still the
Procedural
better option; there is no reason to spend the time coding the additional OOP for short
infrastructure. But for many atmospheric and oceanic sciences applications, programs;
OOP for
Christian Dieterichs PyOM pythonized OM3 ocean model does a similar kind of everything
domain-splitting handling in Python.
else.
2
119
7.11. SUMMARY
things can very quickly become complex. As soon as that happens, the object
decomposition can really help. Heres the rule-of-thumb I use: For a oneoff, short program, I write it procedurally, but for any program I may extend
someday (even if it is a tentative may), I write it using objects.
120