H.deitel Python - How To Program
H.deitel Python - How To Program
H.deitel Python - How To Program
Table of Contents
1
Introduction to
Computers, Internet and
World Wide Web
Objectives
• To understand basic computer concepts.
• To become familiar with different types of
programming languages.
• To become familiar with the history of the Python
programming language.
• To preview the remaining chapters of the book.
Things are always at their best in their beginning.
Blaise Pascal
High thoughts must have high language.
Aristophanes
Our life is frittered away by detail…Simplify, simplify.
Henry David Thoreau
pythonhtp1_01.fm Page 2 Monday, December 10, 2001 12:13 PM
Outline
1.1 Introduction
1.2 What Is a Computer?
1.3 Computer Organization
1.4 Evolution of Operating Systems
1.5 Personal Computing, Distributed Computing and Client/Server
Computing
1.6 Machine Languages, Assembly Languages and High-Level
Languages
1.7 Structured Programming
1.8 Object-Oriented Programming
1.9 Hardware Trends
1.10 History of the Internet and World Wide Web
1.11 World Wide Web Consortium (W3C)
1.12 Extensible Markup Language (XML)
1.13 Open-Source Software Revolution
1.14 History of Python
1.15 Python Modules
1.16 General Notes about Python and This Book
1.17 Tour of the Book
1.18 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
1.1 Introduction
Welcome to Python! We have worked hard to create what we hope will be an informative
and entertaining learning experience for you. The manner in which we approached this top-
ic created a book that is unique among Python textbooks for many reasons. For instance,
we introduce early in the text the use of Python with the Common Gateway Interface (CGI)
for programming Web-based applications. We do this so that we can demonstrate a variety
of dynamic, Web-based applications in the remainder of the book. This text also introduces
a range of topics, including object-oriented programming (OOP), the Python database ap-
plication programming interface (DB-API), graphics, the Extensible Markup Language
(XML), security and an appendix on Web accessibility that addresses programming and
technologies relevant to people with impairments. Whether you are a novice or an experi-
enced programmer, there is much here to inform, entertain and challenge you.
Python How to Program is designed to be appropriate for readers at all levels, from
practicing programmers to individuals with little or no programming experience. How can
one book appeal to both novices and skilled programmers? The core of this book empha-
sizes achieving program clarity through proven techniques of structured programming and
pythonhtp1_01.fm Page 3 Monday, December 10, 2001 12:13 PM
object-based programming. Nonprogrammers learn basic skills that underlie good pro-
gramming; experienced programmers receive a rigorous explanation of the language and
may improve their programming styles. To aid beginning programmers, we have written
this text in a clear and straightforward manner, with abundant illustrations. Perhaps most
importantly, the book presents hundreds of complete working Python programs and shows
the outputs produced when those programs are run on a computer. We call this our Live-
Code™ approach. All of the book’s examples are available on the CD-ROM that accom-
panies this book and on our Web site, www.deitel.com.
Most people are at least somewhat familiar with the exciting capabilities of computers.
Using this textbook, you will learn how to command computers to exercise those capabil-
ities. It is software (i.e., the instructions you write to command the computer to perform
actions and make decisions) that controls computers (often referred to as hardware).
Computer use is increasing in almost every field. In an era of steadily rising costs, the
expense of owning a computer has been decreasing dramatically due to rapid developments
in both hardware and software technology. Computers that filled large rooms and cost mil-
lions of dollars 25 to 30 years ago now are inscribed on the surfaces of silicon chips smaller
than a fingernail and that cost perhaps a few dollars each. Silicon is one of the most abun-
dant materials on the earth—it is an ingredient in common sand. Silicon-chip technology
has made computing so economical that hundreds of millions of general-purpose com-
puters are in use worldwide, helping people in business, industry, government and their per-
sonal lives. Given the current rate of technological development, this number could easily
double over the next few years.
In beginning to study this text, you are starting on a challenging and rewarding educa-
tional path. As you proceed, if you would like to communicate with us, please send us e-mail
at [email protected] or browse our World Wide Web sites at www.deitel.com,
www.prenhall.com/deitel and www.InformIT.com/deitel. We hope you
enjoy learning Python with Python How to Program.
opment costs, however, have been rising steadily, as programmers develop ever more pow-
erful and complex applications without being able to improve significantly the technology of
software development. In this book, you will learn proven software-development methods
that can reduce software-development costs—top-down, stepwise refinement, functionaliza-
tion and object-oriented programming. Object-oriented programming is widely believed to be
the significant breakthrough that can greatly enhance programmer productivity.
normally hold programs or data that other units are not actively using; the computer
then can retrieve this information when it is needed—hours, days, months or even
years later. Information in secondary storage takes much longer to access than does
information in primary memory. However, the price per unit of secondary storage
is much less than the price per unit of primary memory. Secondary storage is usu-
ally nonvolatile—it retains information even when the computer is off.
language program, which adds overtime pay to base pay and stores the result in gross pay,
demonstrates the incomprehensibility of machine language to the human reader.
+1300042774
+1400593419
+1200274027
As the popularity of computers increased, machine-language programming proved to be
excessively slow, tedious and error prone. Instead of using the strings of numbers that com-
puters could directly understand, programmers began using English-like abbreviations to rep-
resent the elementary operations of the computer. These abbreviations formed the basis of
assembly languages. Translator programs called assemblers convert assembly language pro-
grams to machine language at computer speeds. The following section of an assembly-lan-
guage program also adds overtime pay to base pay and stores the result in gross pay, but
presents the steps more clearly to human readers than does its machine-language equivalent:
LOAD BASEPAY
ADD OVERPAY
STORE GROSSPAY
Such code is clearer to humans but incomprehensible to computers until translated into ma-
chine language.
Although computer use increased rapidly with the advent of assembly languages, these
languages still required many instructions to accomplish even the simplest tasks. To speed
up the programming process, high-level languages, in which single statements accomplish
substantial tasks, were developed. Translation programs called compilers convert high-
level-language programs into machine language. High-level languages enable program-
mers to write instructions that look almost like everyday English and contain common
mathematical notations. A payroll program written in a high-level language might contain
a statement such as
grossPay = basePay + overTimePay
Obviously, programmers prefer high-level languages to either machine languages or as-
sembly languages. C, C++, C# (pronounced “C sharp”), Java, Visual Basic, Perl and Py-
thon are among the most popular high-level languages.
Compiling a high-level language program into machine language can require a consid-
erable amount of time. This problem was solved by the development of interpreter programs
that can execute high-level language programs directly, bypassing the compilation step, and
interpreters can start running a program immediately without “suffering” a compilation delay.
Although programs that are already compiled execute faster than interpreted programs, inter-
preters are popular in program-development environments. In these environments, devel-
opers change programs frequently as they add new features and correct errors. Once a
program is fully developed, a compiled version can be produced so that the program runs at
maximum efficiency. As we will see throughout this book, interpreted languages—like
Python—are particularly popular for implementing World Wide Web applications.
the finished products were unreliable. People began to realize that software development
was a far more complex activity than they had imagined. Research activity, intended to ad-
dress these issues, resulted in the evolution of structured programming—a disciplined ap-
proach to the creation of programs that are clear, demonstrably correct and easy to modify.
One of the more tangible results of this research was the development of the Pascal
programming language in 1971. Pascal, named after the seventeenth-century mathemati-
cian and philosopher Blaise Pascal, was designed for teaching structured programming in
academic environments and rapidly became the preferred introductory programming lan-
guage in most universities. Unfortunately, because the language lacked many features
needed to make it useful in commercial, industrial and government applications, it was not
widely accepted in these environments. By contrast, C, which also arose from research on
structured programming, did not have the limitations of Pascal, and became extremely
popular.
The Ada programming language was developed under the sponsorship of the United
States Department of Defense (DOD) during the 1970s and early 1980s. Hundreds of pro-
gramming languages were being used to produce DOD’s massive command-and-control
software systems. DOD wanted a single language that would meet its needs. Pascal was
chosen as a base, but the final Ada language is quite different from Pascal. The language
was named after Lady Ada Lovelace, daughter of the poet Lord Byron. Lady Lovelace is
generally credited with writing the world’s first computer program, in the early 1800s (for
the Analytical Engine mechanical computing device designed by Charles Babbage). One
important capability of Ada is multitasking, which allows programmers to specify that
many activities are to occur in parallel. As we will see in Chapters 18–19, Python offers
process management and multithreading—two capabilities that enable programs to specify
that various activities are to proceed in parallel.
What are objects, and why are they special? Object technology is a packaging scheme
that facilitates the creation of meaningful software units. These units are large and focused
on particular applications areas. There are date objects, time objects, paycheck objects,
invoice objects, audio objects, video objects, file objects, record objects and so on. In fact,
almost any noun can be reasonably represented as a software object. Objects have proper-
ties (i.e., attributes, such as color, size and weight) and perform actions (i.e., behaviors,
such as moving, sleeping or drawing). Classes represent groups of related objects. For
example, all cars belong to the “car” class, even though individual cars vary in make,
model, color and options packages. A class specifies the general format of its objects; the
properties and actions available to an object depend on its class.
We live in a world of objects. Just look around you—there are cars, planes, people, ani-
mals, buildings, traffic lights, elevators and so on. Before object-oriented languages
appeared, procedural programming languages (such as Fortran, Pascal, BASIC and C)
focused on actions (verbs) rather than things or objects (nouns). We live in a world of
objects, but earlier programming languages forced individuals to program primarily with
verbs. This paradigm shift made program writing a bit awkward. However, with the advent
of popular object-oriented languages, such as C++, Java, C# and Python, programmers can
program in an object-oriented manner that reflects the way in which they perceive the
world. This process, which seems more natural than procedural programming, has resulted
in significant productivity gains.
One of the key problems with procedural programming is that the program units cre-
ated do not mirror real-world entities effectively and therefore are not particularly reusable.
Programmers often write and rewrite similar software for various projects. This wastes pre-
cious time and money as people repeatedly “reinvent the wheel.” With object technology,
properly designed software entities (called objects) can be reused on future projects. Using
libraries of reusable componentry can greatly reduce the amount of effort required to imple-
ment certain kinds of systems (as compared to the effort that would be required to reinvent
these capabilities in new projects).
Some organizations report that software reusability is not, in fact, the key benefit of
object-oriented programming. Rather, they indicate that object-oriented programming
tends to produce software that is more understandable because it is better organized and has
fewer maintenance requirements. As much as 80 percent of software costs are not associ-
ated with the original efforts to develop the software, but instead are related to the con-
tinued evolution and maintenance of that software throughout its lifetime. Object
orientation allows programmers to abstract the details of software and focus on the “big pic-
ture.” Rather than worrying about minute details, the programmer can focus on the behav-
iors and interactions of objects. A roadmap that showed every tree, house and driveway
would be difficult, if not impossible, to read. When such details are removed and only the
essential information (roads) remains, the map becomes easier to understand. In the same
way, a program that is divided into objects is easy to understand, modify and update
because it hides much of the detail. It is clear that object-oriented programming will be the
key programming methodology for at least the next decade.
ly with regard to the costs of hardware supporting these technologies. For many decades,
and continuing into the foreseeable future, hardware costs have fallen rapidly, if not precip-
itously. Every year or two, the capacities of computers approximately double.1 This is es-
pecially true in relation to the amount of memory that computers have for programs, the
amount of secondary storage (such as disk storage) computers have to hold programs and
data over longer periods of time and their processor speeds—the speeds at which computers
execute their programs (i.e., do their work). Similar improvements have occurred in the
communications field, in which costs have plummeted as enormous demand for bandwidth
(i.e., information-carrying capacity of communication lines) has attracted tremendous com-
petition. We know of no other fields in which technology moves so quickly and costs fall
so rapidly. Such phenomenal improvement in the computing and communications fields is
truly fostering the so-called Information Revolution.
When computer use exploded in the 1960s and 1970s, many people discussed the dra-
matic improvements in human productivity that computing and communications would
cause. However, these improvements did not materialize. Organizations were spending
vast sums of capital on computers and employing them effectively, but without fully real-
izing the expected productivity gains. The invention of microprocessor chip technology
and its wide deployment in the late 1970s and 1980s laid the groundwork for the produc-
tivity improvements that individuals and businesses have achieved in recent years.
The network was designed to operate without centralized control. This meant that, if a
portion of the network should fail, the remaining working portions would still be able to
route data packets from senders to receivers over alternative paths.
The protocol (i.e., set of rules) for communicating over the ARPAnet became known
as the Transmission Control Protocol (TCP). TCP ensured that messages were properly
routed from sender to receiver and that those messages arrived intact.
In parallel with the early evolution of the Internet, organizations worldwide were
implementing their own networks to facilitate both intra-organization (i.e., within the orga-
nization) and inter-organization (i.e., between organizations) communication. A huge
variety of networking hardware and software appeared. One challenge was to enable these
diverse products to communicate with each other. ARPA accomplished this by developing
the Internet Protocol (IP), which created a true “network of networks,” the current archi-
tecture of the Internet. The combined set of protocols is now commonly called TCP/IP.
Initially, use of the Internet was limited to universities and research institutions; later,
the military adopted the technology. Eventually, the government decided to allow access to
the Internet for commercial purposes. When this decision was made, there was resentment
among the research and military communities—it was felt that response times would
become poor as “the Net” became saturated with so many users.
In fact, the opposite has occurred. Businesses rapidly realized that, by making effective
use of the Internet, they could refine their operations and offer new and better services to
their clients. Companies started spending vast amounts of money to develop and enhance
their Internet presence. This generated fierce competition among communications carriers
and hardware and software suppliers to meet the increased infrastructure demand. The
result is that bandwidth on the Internet has increased tremendously, while hardware costs
have plummeted. It is widely believed that the Internet played a significant role in the eco-
nomic growth that many industrialized nations experienced over the last decade.
The World Wide Web (WWW) allows computer users to locate and view multimedia-
based documents (i.e., documents with text, graphics, animations, audios and/or videos)
on almost any subject. Even though the Internet was developed more than three decades
ago, the introduction of the World Wide Web was a relatively recent event. In 1989, Tim
Berners-Lee of CERN (the European Organization for Nuclear Research) began to
develop a technology for sharing information via hyperlinked text documents. Basing the
new language on the well-established Standard Generalized Markup Language
(SGML)—a standard for business data interchange—Berners-Lee called his invention the
HyperText Markup Language (HTML). He also wrote communication protocols to form
the backbone of his new hypertext information system, which he referred to as the World
Wide Web.
Historians will surely list the Internet and the World Wide Web among the most impor-
tant and profound creations of humankind. In the past, most computer applications ran on
“stand-alone” computers (computers that were not connected to one another). Today’s
applications can be written to communicate among the world’s hundreds of millions of
computers. The Internet and World Wide Web merge computing and communications
technologies, expediting and simplifying our work. They make information instantly and
conveniently accessible to large numbers of people. They enable individuals and small
businesses to achieve worldwide exposure. They are profoundly changing the way we do
business and conduct our personal lives. People can search for the best prices on virtually
pythonhtp1_01.fm Page 12 Monday, December 10, 2001 12:13 PM
any product or service. Special-interest communities can stay in touch with one another.
Researchers can be made instantly aware of the latest breakthroughs worldwide.
We have written two books for academic courses that convey fundamental principles
of computing in the context of Internet and World Wide Web programming—Internet and
World Wide Web How to Program: Second Edition and e-Business and e-Commerce How
to Program.
www.w3.org/Consortium/Process/Process-19991111/
process.html#RecsCR
sible Linking Language (XLink) combines ideas from HyTime and the Text Encoding
Initiative (TEI), to provide extensible linking of resources.
Data independence, the separation of content from its presentation, is the essential
characteristic of XML. Because an XML document describes data, any application con-
ceivably can process an XML document. Recognizing this, software developers are inte-
grating XML into their applications to improve Web functionality and interoperability.
XML’s flexibility and power make it perfect for the middle tier of client/server systems,
which must interact with a wide variety of clients. Much of the processing that was once
limited to server computers now can be performed by client computers, because XML’s
semantic and structural information enables it to be manipulated by any application that can
process text. This reduces server loads and network traffic, resulting in a faster, more effi-
cient Web.
XML is not limited to Web applications. Increasingly, XML is being employed in data-
bases—the structure of an XML document enables it to be integrated easily with database
applications. As applications become more Web enabled, it seems likely that XML will
become the universal technology for data representation. All applications employing XML
would be able to communicate, provided that they could understand each other’s XML
markup, or vocabulary.
Simple Object Access Protocol (SOAP) is a technology for the distribution of objects
(marked up as XML) over the Internet. Developed primarily by Microsoft and Develop-
Mentor, SOAP provides a framework for expressing application semantics, encoding that
data and packaging it in modules. SOAP has three parts: The envelope, which describes the
content and intended recipient of a SOAP message; the SOAP encoding rules, which are
XML-based; and the SOAP Remote Procedure Call (RPC) representation for commanding
other computers to perform a task. SOAP is supported by many platforms, because of its
foundations in XML and HTTP. We discuss XML in Chapter 15, Extensible Markup Lan-
guage (XML) and in Chapter 16, XML Processing.
2. The Open Source Initiative’s definition includes nine requirements to which software must com-
ply before it is considered “open source.” To view the entire definition, visit <www.open-
source.org/docs/definition.html>.
3. <www.opensource.org>.
pythonhtp1_01.fm Page 14 Monday, December 10, 2001 12:13 PM
nature of most commercial software and programmers’ frustrations with the lack of respon-
siveness from closed-source vendors, open-source software, regained popularity. Today,
Python is part of a growing open-source software community, which includes the Linux
operating system, the Perl scripting language, the Apache Web server and hundreds of
other software projects.
Some people in the computer industry equate open-source with “free” software. In most
cases, this is true. However, “free” in the context of open-source software is thought of most
appropriately as “freedom”—the freedom for any developer to modify source code, to
exchanges ideas, to participate in the software-development process and to develop new
software programs based on existing open-source software. Most open-source software is
copyrighted and licenses are associated with the use of the software. Open-source licenses
vary in their terms; some impose few restrictions (e.g., the Artistic license4), whereas others
require many restrictions on the manner in which the software may be modified and used.
Usually, either an individual developer or an organization maintains the software copyrights.
To view an example of a license, visit www.python.org/2.2/license.html to read
the Python agreement.
Typically, the source code for open-source products is available for download over the
Internet. This enables developers to learn from, validate and modify the source code to meet
their own needs. With a community of developers, more people review the code so issues
such as performance and security problems are detected and resolved faster than they
would be in closed-source software development. Additionally, a larger community of
developers can contribute more features. Often, code fixes are available within hours, and
new versions of open-source software are available more frequently than are versions of
closed-source software. Open-source licenses often require that developers publish any
enhancements they make so that the open-source community can continue to evolve those
products. For example, Python developers participate in the comp.lang.python news-
group to exchange ideas regarding the development of Python. Python developers also can
document and submit their modifications to the Python Software Foundation through
Python Enhancement Proposals (PEPS), which enables the Python group to evaluate the
proposed changes and incorporate the ones they choose in future releases.5
Many companies, (e.g., IBM, Red Hat and Sun) support open-source developers and
projects. Sometimes companies take open-source applications and sell them commercially
(this depends on software licensing). For-profit companies also provide services such as sup-
port, custom-made software and training. Developers can offer their services as consultants
or trainers to businesses implementing the software.6 For more information about open-
source software, visit the Open Source Initiative’s Web site at www.opensource.org.
4. <www.opensource.org/licenses/artistic-license.html>.
5. <www.python.org>.
6. <www-106.ibm.com/developerworks/opensource/library/license.ht-
ml?dwzone=opensource>.
pythonhtp1_01.fm Page 15 Monday, December 10, 2001 12:13 PM
Amoeba distributed operating system. To create this new language, he drew heavily from
All Basic Code (ABC)—a high-level teaching language—for syntax, and from Modula-3, a
systems programming language, for error-handling techniques. However, one major short-
coming of ABC was its lack of extensibility; the language was not open to improvements
or extensions. So, van Rossum decided to create a language that combined many of the el-
ements he liked from existing languages, but one that could be extended through classes
and programming interfaces. He named this language Python, after the popular comic
troupe Monty Python.
Since its public release in early 1991, a growing community of Python developers and
users have improved it to create a mature and well-supported programming language.
Python has been used to develop a variety of applications, from creating online e-mail pro-
grams to controlling underwater vehicles, configuring operating systems and creating ani-
mated films. In 2001, the core Python development team moved to Digital Creations, the
creators of Zope—a Web application server written in Python. It is expected that Python
will continue to grow and expand into new programming realms.
Chapter 1—Introduction to Computers, the Internet and the World Wide Web
In this chapter, we discuss what computers are, how they work and how they are pro-
grammed. The chapter introduces structured programming and explains why this set of
techniques has fostered a revolution in the way programs are written. A brief history of the
development of programming languages—from machine languages, to assembly languag-
es to high-level languages—is included. We present some historical information about
computers and computer programming and introductory information about the Internet and
the World Wide Web. We discuss the origins of the Python programming language and
overview the concepts introduced in the remaining chapters of the book.
Chapter 4—Functions
Chapter 4 discusses the design and construction of functions. Python’s function-related ca-
pabilities include built-in functions, programmer-defined functions and recursion. The
pythonhtp1_01.fm Page 17 Monday, December 10, 2001 12:13 PM
techniques presented in Chapter 4 are essential for creating properly structured programs—
especially the larger programs and software that system programmers and application pro-
grammers are likely to develop in real-world applications. The “divide and conquer” strat-
egy is presented as an effective means for solving complex problems by dividing them into
simpler interacting components. We begin by introducing modules as containers for groups
of useful functions. We introduce module math and discuss the many mathematics-related
functions the module contains. Students enjoy the treatment of random numbers and simu-
lation, and they are entertained by a study of the dice game, craps, which makes elegant use
of control structures. The chapter illustrates how to solve a Fibonacci and factorial problem
using a programming technique called recursion in which a function calls itself. Scope
rules are discussed in the context of an example that examines local and global variables.
The chapter also discusses the various ways a program can import a module and its ele-
ments and how the import statement affects the program’s namespace. Python functions
can specify default arguments and keyword arguments. We discuss both ways of passing
information to functions and illustrate some common programming errors in an interactive
session. The exercises present traditional mathematics and computer-science problems, in-
cluding how to solve the famous Towers of Hanoi problem using recursion. Another exer-
cise asks the reader to display the prime numbers from 2–100.
name in a Web browser. We then focus on how to send user input to a CGI script by using
an XHTML form to pass data between the client and the CGI program on the server. We
demonstrate how to use module cgi to process form data. The chapter contains descrip-
tions of various HTTP headers used with CGI. We conclude by integrating the CGI mate-
rial into a Web portal case study that allows the user to log in to a fictional travel Web site
and to view information about special offers.
destructors in base classes and derived classes, and software engineering with inheritance.
This chapter compares various object-oriented relationships, such as inheritance and com-
position. Inheritance leads to programming techniques that highlight one of Python’s most
powerful built-in features—polymorphism. When many classes are related through inher-
itance to a common base class, each derived-class object may be treated as a base-class in-
stance. This enables programs to be written in a general manner independent of the specific
types of the derived-class objects. New kinds of objects can be handled by the same pro-
gram, thus making systems more extensible. This style of programming is commonly used
to implement today’s popular graphical user interfaces (GUIs). The chapter concludes
with a discussion of the new object-oriented programming techniques available in Python
version 2.2.
(save to disk) arbitrary Python objects. We present an example that uses module cPickle
to store a Python dictionary to disk for later use.
remove, update and find contacts in the database. The exercises ask the reader to modify
these programs to provide more functionality, such as verifying that the database does not
contain identical entries.
Chapter 19—Multithreading
This chapter introduces threads, which are “light-weight processes.” They often are more
efficient than full-fledged processes created as a result of commands like fork presented
in the previous chapter. We examine basic threading concepts, including the various states
in which a thread can exist throughout its life. We discuss how to include threads in a pro-
gram by subclassing threading.Thread and overriding method run. The latter half
of the chapter contains examples that address the classic producer/consumer relationship.
We develop several solutions to this problem and introduce the concept of thread synchro-
nization and resource allocation. We introduce threading control primitives, such as locks,
condition variables, semaphores and events. The final solution uses module Queue to pro-
tect access to shared data stored in a queue. The examples demonstrate the hazards of
threaded programs and show how to avoid these hazards. Our solution also demonstrates
the value of writing classes for reuse. We reuse our producer and consumer classes to ac-
cess various synchronized and unsynchronized data types. After completing this chapter,
the reader will have many of the tools necessary to write substantial, extensible and profes-
sional programs in Python.
Chapter 20—Networking
In this chapter, we explore applications that can communicate over computer networks. A
major benefit of a high-level language like Python is that potentially complex topics can be
presented and discussed easily through small, working examples. We discuss basic net-
working concepts and present two examples—a CGI program that displays a chosen Web
page in a browser and a GUI example that displays page content (e.g., XHTML) in a text
area. We also discuss client-server communication over sockets. The programs in this sec-
tion demonstrate how to send and receive messages over the network, using connectionless
and connection-based protocols. A key feature of the chapter is the live-code implementa-
pythonhtp1_01.fm Page 23 Monday, December 10, 2001 12:13 PM
tion of a collaborative client/server Tic-Tac-Toe game in which two clients play Tic-Tac-
Toe by interacting with a multithreaded server that maintains the state of the game. As part
of the exercises, readers will write programs that send and receive messages and files. We
ask the reader to modify the Tic-Tac-Toe game to determine when a player wins the game.
Chapter 21—Security
This chapter discusses Web programming security issues. Web programming allows the
rapid creation of powerful applications, but it also exposes computers to outside attack. We
focus on defensive programming techniques that help the programmer prevent security
problems by using certain techniques and tools. One of those tools is encryption. We pro-
vide an example of encryption and decryption with module rotor, which acts as a substi-
tution cipher. Another tool is module sha, which is used to hash values. A third tool is
Python’s restricted-access (rexec) module, which creates a restricted environment in
which untrusted code can execute without damaging the local computer. This chapter ex-
amines technologies, such as Public Key Cryptography, Secure Socket Layer (SSL), digital
signatures, digital certificates, digital steganography and biometrics, which provide net-
work security. Other types of network security, such as firewalls and antivirus programs,
are also covered, and common security threats including cryptanalytic attacks, viruses,
worms and Trojan horses are discussed.
Chapter 24—Multimedia
This chapter presents Python’s capabilities for making computer applications come alive.
It is remarkable that students in entry-level programming courses will be writing Python
applications with all these capabilities. Some exciting multimedia applications include Py-
OpenGL, a module that binds Python to OpenGL API to create colorful, interactive graph-
ics; Alice, an environment for creating and manipulating 3D graphical worlds in an object-
oriented manner; and Pygame, a large collection of Python modules for creating cross-
platform, multimedia applications, such as interactive games. In our PyOpenGL examples,
we create rotating objects and three-dimensional shapes. In the Alice example, we create a
graphical game version of a popular riddle. The world we create contains a fox, a chicken
and a plant. The goal is to move all three objects across a river, without leaving a predator-
prey pair alone at any one time. Our first Pygame example combines Tkinter and Pyg-
ame to create a GUI compact disc player. The second example illustrates how to play an
MPEG movie. The final Pygame example creates a video game where the user steers a
spaceship through an asteroid field to gather energy cells. We discuss many graphics pro-
gram pitfalls and techniques in the context of this example. With many other programming
languages, these projects would be too complex or detailed to present in a book such as this.
However, Python’s high-level nature, simple syntax and ample modules enable us to
present these exciting examples all in the same chapter!
viewing resumes, and can minimize travel expenses for distance recruiting and interviewing.
In this chapter, we explore career services on the Web from the perspectives of job seekers
and employers. We introduce comprehensive job sites, industry-specific sites (including sites
geared specifically for Python programmers) and contracting opportunities, as well as addi-
tional resources and career services designed to meet the needs of a variety of individuals.
Appendix F—Unicode®
This appendix introduces the Unicode Standard, an encoding scheme that assigns unique
numeric values to the characters of most of the world’s languages. It includes a Python pro-
gram that uses Unicode encoding to print a welcome message in 10 different languages.
tion headers, body text, links, etc.). This separation of structure from content allows greater
manageability and makes changing the style of the document easier and faster.
Appendix L—Accessibility
This appendix discusses how to design accessible Web sites. Currently, the World Wide
Web presents challenges to people with various disabilities. Multimedia-rich Web sites
hinder text readers and other programs designed to help people with visual impairments, and
the increasing amount of audio on the Web is inaccessible to people with hearing impair-
ments. To rectify this situation, the federal government has issued several key legislation
that address Web accessibility. For example, the Americans with Disabilities Act (ADA) pro-
hibits discrimination on the basis of a disability. The W3C started the Web Accessibility Ini-
tiative (WAI), which provides guidelines describing how to make Web sites accessible to
people with various impairments. This chapter provides a description of these methods, such
as use of the <headers> tag to make tables more accessible to page readers, use of the alt
attribute of the <img> tag to describe images, and the proper use of XHTML and related
technologies to ensure that a page can be viewed on any type of display or reader. VoiceXML
also can increase accessibility with speech synthesis and recognition.
SUMMARY
[Note: Because this Section 1.17 is primarily a summary of the rest of the book, we do not provide
summary bullets for that section.]
• Software controls computers (often referred to as hardware).
• A computer is a device capable of performing computations and making logical decisions at
speeds millions, even billions, of times faster than human beings can.
• Computers process data under the control of sets of instructions called computer programs. These
computer programs guide the computer through orderly sets of actions specified by people called
computer programmers.
• The various devices that comprise a computer system (such as the keyboard, screen, disks, mem-
ory and processing units) are referred to as hardware.
• The computer programs that run on a computer are referred to as software.
pythonhtp1_01.fm Page 28 Monday, December 10, 2001 12:13 PM
• The input unit is the “receiving” section of the computer. It obtains information (data and comput-
er programs) from various input devices and places this information at the disposal of the other
units so that the information may be processed.
• The output unit is the “shipping” section of the computer. It takes information processed by the
computer and places it on output devices to make it available for use outside the computer.
• The memory unit is the rapid access, relatively low-capacity “warehouse” section of the computer.
It retains information that has been entered through the input unit so that the information may be
made immediately available for processing when it is needed and retains information that has al-
ready been processed until that information can be placed on output devices by the output unit.
• The arithmetic and logic unit (ALU) is the “manufacturing” section of the computer. It is respon-
sible for performing calculations such as addition, subtraction, multiplication and division and for
making decisions.
• The central processing unit (CPU) is the “administrative” section of the computer. It is the com-
puter’s coordinator and is responsible for supervising the operation of the other sections.
• The secondary storage unit is the long-term, high-capacity “warehousing” section of the computer.
Programs or data not being used by the other units are normally placed on secondary storage de-
vices (such as disks) until they are needed, possibly hours, days, months or even years later.
• Early computers were capable of performing only one job or task at a time. This form of computer
operation often is called single-user batch processing.
• Software systems called operating systems were developed to help make it more convenient to use
computers. Early operating systems managed the smooth transition between jobs and minimized
the time it took for computer operators to switch between jobs.
• Multiprogramming involves the “simultaneous” operation of many jobs on the computer—the
computer shares its resources among the jobs competing for its attention.
• Timesharing is a special case of multiprogramming in which dozens or even hundreds of users
share a computer through terminals. The computer runs a small portion of one user’s job, then
moves on to service the next user. The computer does this so quickly that it might provide service
to each user several times per second, so programs appear to run simultaneously.
• An advantage of timesharing is that the user receives almost immediate responses to requests rath-
er than having to wait long periods for results, as with previous modes of computing.
• In 1977, Apple Computer popularized the phenomenon of personal computing.
• In 1981, IBM introduced the IBM Personal Computer, legitimizing personal computing in busi-
ness, industry and government organizations.
• Although early personal computers were not powerful enough to timeshare several users, these
machines could be linked together in computer networks, sometimes over telephone lines and
sometimes in local area networks (LANs) within an organization. This led to the phenomenon of
distributed computing, in which an organization’s computing is distributed over networks to the
sites at which the real work of the organization is performed.
• Today, information is shared easily across computer networks, where some computers called file
servers offer a common store of programs and data that may be used by client computers distrib-
uted throughout the network—hence the term client/server computing.
• Computer languages may be divided into three general types: machine languages, assembly lan-
guages and high-level languages.
• Any computer can directly understand only its own machine language. Machine languages gener-
ally consist of strings of numbers (ultimately reduced to 1s and 0s) that instruct computers to per-
form their most elementary operations one at a time. Machine languages are machine dependent.
pythonhtp1_01.fm Page 29 Monday, December 10, 2001 12:13 PM
• English-like abbreviations formed the basis of assembly languages. Translator programs called as-
semblers convert assembly-language programs to machine language at computer speeds.
• Compilers translate high-level language programs into machine-language programs. High-level
languages (like Python) contain English words and conventional mathematical notations.
• Interpreter programs directly execute high-level language programs without the need for first com-
piling those programs into machine language.
• Although compiled programs execute much faster than interpreted programs, interpreters are pop-
ular in program-development environments in which programs are recompiled frequently as new
features are added and errors are corrected. Interpreters are also popular for developing Web-based
applications.
• Objects are essentially reusable software components that model items in the real world. Modular,
object-oriented design and implementation approaches make software-development groups more
productive than is possible with previous popular programming techniques. Object-oriented pro-
grams are often easier to understand, correct and modify than programs developed with earlier
methodologies.
• FORTRAN (FORmula TRANslator) was developed by IBM Corporation between 1954 and 1957
for scientific and engineering applications that require complex mathematical computations.
• COBOL (COmmon Business Oriented Language) was developed in 1959 by a group of computer
manufacturers and government and industrial computer users. COBOL is used primarily for com-
mercial applications that require precise and efficient manipulation of large amounts of data.
• C evolved from two previous languages, BCPL and B, as a language for writing operating-systems
software and compilers.
• Both BCPL and B were “typeless” languages—every data item occupied one “word” in memory
and the burden of typing variables fell on the shoulders of the programmer. The C language was
evolved from B by Dennis Ritchie at Bell Laboratories.
• Pascal was designed at about the same time as C. It was created by Professor Nicklaus Wirth and
was intended for academic use.
• Structured programming is a disciplined approach to writing programs that are clearer than un-
structured programs, easier to test and debug and easier to modify.
• The Ada language was developed under the sponsorship of the United States Department of De-
fense (DOD) during the 1970s and early 1980s. One important capability of Ada is called multi-
tasking; this allows programmers to specify that many activities are to occur in parallel.
• Most high-level languages generally allow the programmer to write programs that perform only
one activity at a time. Python, through techniques called process management and multithreading,
enables programmers to write programs with parallel activities.
• Objects are essentially reusable software components that model items in the real world.
• Object technology dates back at least to the mid-1960s. The C++ programming language, devel-
oped at AT&T by Bjarne Stroustrup in the early 1980s, is based C and Simula 67.
• In the early 1990s, researchers at Sun Microsystems® developed a purely object-oriented lan-
guage called Java.
• In the late 1960’s, the Advanced Research Projects Agency of the Department of Defense (ARPA)
rolled out the blueprints for networking the main computer systems of about a dozen ARPA-fund-
ed universities and research institutions. ARPA proceeded to implement what quickly became
called the ARPAnet, the grandparent of today’s Internet.
• Originally designed to connect the main computer systems of about a dozen universities and research
organizations, the Internet today is accessible by hundreds of millions of computers worldwide.
pythonhtp1_01.fm Page 30 Monday, December 10, 2001 12:13 PM
• One of ARPA’s primary goals for the network was to allow multiple users to send and receive in-
formation at the same time over the same communications paths (such as phone lines). The net-
work operated with a technique called packet switching (still in wide use today), in which digital
data are sent in small packages called packets. The packets contain data, address information, er-
ror-control information and sequencing information. The address information routes the packets
of data to their destination. The sequencing information helps reassemble the packets (which—be-
cause of complex routing mechanisms—can actually arrive out of order) into their original order
for presentation to the recipients.
• The protocol for communicating over the ARPAnet became known as TCP—Transmission Con-
trol Protocol. TCP ensured that messages were routed properly from sender to receiver and that
those messages arrived intact.
• Bandwidth is the information-carrying capacity of communications lines.
• In 1990, Tim Berners-Lee of CERN (the European Laboratory for Particle Physics) developed the
World Wide Web and several communication protocols that form its backbone.
• The Web allows computer users to locate and view multimedia-intensive documents over the In-
ternet.
• Browsers view HTML (Hypertext Markup Language) documents on the World Wide Web.
• Python is a modular extensible language; Python can incorporate new modules (reusable pieces of
software).
• The primary distribution center for Python source code, modules and documentation is the Python
Web site—www.python.org—with plans to develop a site dedicated solely to maintaining Py-
thon modules.
• Python is portable, practical and extensible.
TERMINOLOGY
Ada hardware platform
ALU high-level language
arithmetic and logic unit (ALU) input unit
assembler input/output (I/O)
assembly language interpreter
batch processing Java
C machine dependent
C++ machine independent
central processing unit (CPU) machine language
clarity memory
client memory unit
client/server computing multiprocessor
COBOL multiprogramming
computer multitasking
computer program object-oriented programming
computer programmer output unit
data Pascal
distributed computing Python
file server personal computer
FORTRAN portability
function primary memory
functionalization programming language
hardware run a program
pythonhtp1_01.fm Page 31 Monday, December 10, 2001 12:13 PM
screen terminal
software timesharing
software reusability top-down, stepwise refinement
stored program translator program
structured programming UNIX
supercomputer workstation
task
SELF-REVIEW EXERCISES
1.1 Fill in the blanks in each of the following statements:
a) The company that popularized the phenomenon of personal computing was .
b) The computer that made personal computing legitimate in business and industry was the
.
c) Computers process data under the control of sets of instructions called computer
.
d) The six key logical units of the computer are the , , ,
, and the .
e) Python can incorporate new (reusable pieces of software), which can be
written by any Python developer.
f) The three classes of languages discussed in the chapter are , and
.
g) The programs that translate high-level language programs into machine language are
called .
h) C is widely known as the development language of the operating system.
i) In 2001, the core Python development team moved to Digital Creations, the creators of
—a Web application server written in Python.
j) The Department of Defense developed the Ada language with a capability called
, which allows programmers to specify activities that can proceed in parallel.
1.2 State whether each of the following is true or false. If false, explain why.
a) Hardware refers to the instructions that command computers to perform actions and
make decisions.
b) The re regular-expression module provides pattern-based text manipulation in Python.
c) The ALU provides temporary storage for data that has been entered through the input
unit.
d) Software systems called batches manage the transition between jobs.
e) Assemblers convert high-level language programs to assembly language at computer
speeds.
f) Interpreter programs compile high-level language programs into machine language faster
than compilers.
g) Structured programming is a disciplined approach to writing programs that are clear and
easy to modify.
h) Unlike other programming languages, Python is non-extensible.
i) Objects are reusable software components that model items in the real world.
j) Several Canvas components include Label, Button, Entry, Checkbutton and
Radiobutton.
EXERCISES
1.3 Categorize each of the following items as either hardware or software:
a) CPU.
b) ALU.
c) Input unit.
d) A word-processor program.
e) Python modules.
1.4 Translator programs, such as assemblers and compilers, convert programs from one language
(referred to as the source language) to another language (referred to as the object language). Deter-
mine which of the following statements are true and which are false:
a) A compiler translates high-level language programs into object language.
b) An assembler translates source-language programs into machine-language programs.
c) A compiler converts source-language programs into object-language programs.
d) High-level languages are generally machine dependent.
e) A machine-language program requires translation before it can be run on a computer.
1.5 Fill in the blanks in each of the following statements:
a) Python can provide information about itself, a technique called .
b) A computer program that converts assembly-language programs to machine language
programs is called .
c) The logical unit of the computer that receives information from outside the computer for
use by the computer is called .
d) The process of instructing the computer to solve specific problems is called .
e) Three high-level Python data types are: , and .
f) is the logical unit of the computer that sends information that has already
been processed by the computer to various devices so that the information may be used
outside the computer.
g) The general name for a program that converts programs written in a certain computer lan-
guage into machine language is .
1.6 Fill in the blanks in each of the following statements:
a) is the logical unit of the computer that retains information.
b) is the logical unit of the computer that makes logical decisions.
c) The commonly used abbreviation for the computer's control unit is .
d) The level of computer language most convenient to the programmer for writing programs
quickly and easily is .
e) are “mappable” types—keys are stored with their associated values.
pythonhtp1_01.fm Page 33 Monday, December 10, 2001 12:13 PM
f) The only language that a computer can understand directly is called that computer's
.
g) The is the logical unit of the computer that coordinates the activities of all
the other logical units.
1.7 What do each of the following acronyms stand for:
a) W3C.
b) XML.
c) DB-API.
d) CGI.
e) XHTML.
f) TCP/IP.
g) PSP.
h) Tcl/Tk.
i) SSL.
j) HMD.
1.8 State whether each of the following is true or false. If false, explain your answer.
a) Inheritance is a form of software reusability in which new classes are developed quickly
and easily by absorbing the capabilities of existing classes and adding appropriate new
capabilities.
b) Pmw is a module that provides an interface to the popular Tcl/Tk graphical-user-interface
toolkit.
c) Like other high-level languages, Python is generally considered to be machine-indepen-
dent.
pythonhtp1_02.fm Page 34 Wednesday, December 12, 2001 12:12 PM
2
Introduction to Python
Programming
Objectives
• To understand a typical Python program-development
environment.
• To write simple computer programs in Python.
• To use simple input and output statements.
• To become familiar with fundamental data types.
• To use arithmetic operators.
• To understand the precedence of arithmetic operators.
• To write simple decision-making statements.
High thoughts must have high language.
Aristophanes
Our life is frittered away by detail…Simplify, simplify.
Henry Thoreau
My object all sublime
I shall achieve in time.
W.S. Gilbert
pythonhtp1_02.fm Page 35 Wednesday, December 12, 2001 12:12 PM
Outline
2.1 Introduction
2.2 First Program in Python: Printing a Line of Text
2.3 Modifying our First Python Program
2.3.1 Displaying a Single Line of Text with Multiple Statements
2.3.2 Displaying Multiple Lines of Text with a Single Statement
2.4 Another Python Program: Adding Integers
2.5 Memory Concepts
2.6 Arithmetic
2.7 String Formatting
2.8 Decision Making: Equality and Relational Operators
2.9 Indentation
2.10 Thinking About Objects: Introduction to Object Technology
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
2.1 Introduction
Python facilitates a disciplined approach to computer-program design. In this first pro-
gramming chapter, we introduce Python programming and present several examples that
illustrate important features of the language. To understand each example, we analyze the
code one statement at a time. After presenting basic concepts in this chapter, we examine
the structured programming approach in Chapters 3–5. At the same time that we explore
introductory Python topics, we also begin our discussion of object-oriented program-
ming—the key programming methodology presented throughout this text. For this reason,
we conclude this chapter with Section 2.10, Thinking About Objects.
Welcome to Python!
1. The resources for this book, including step-by-step instructions for installing Python on Windows
and Unix/Linux platforms, are posted at www.deitel.com.
pythonhtp1_02.fm Page 36 Wednesday, December 12, 2001 12:12 PM
This program illustrates several important features of the Python language. Let us con-
sider each line of the program. Each program we present in this book has line numbers
included for the reader’s convenience; line numbers are not part of actual Python programs.
Line 4 does the “real work” of the program, namely displaying the phrase Welcome to
Python! on the screen. However, let us consider each line in order.
Lines 1–2 begin with the pound symbol (#), which indicates that the remainder of each
line is a comment. Programmers insert comments to document programs and to improve
program readability. Comments also help other programmers read and understand your
program. Comments do not cause the computer to perform any action when the program is
run—Python ignores comments. We begin every program with a comment indicating the
figure number and the file name in which that program is stored (line 1). We can place any
text we choose in comments. All of the Python programs for this book are included on the
enclosed CD and also are available free for download at www.deitel.com.
A comment that begins with # is called a single-line comment, because the comment
terminates at the end of the current line. A # comment also can begin in the middle of a line
and continue until the end of that line. Such a comment typically documents the Python
code that appears at the beginning of that line. Unlike other programming languages,
Python does not have a separate symbol for a multiple-line comment, so each line of mul-
tiple-line comment must start with the # symbol. The comment text “Printing a line
of text in Python.” describes the purpose of the program (line 2).
Good Programming Practice 2.1
Place abundant comments throughout a program. Comments help other programmers un-
derstand the program, assist in debugging a program (i.e., discovering and removing errors
in a program) and list useful information. Comments also help you understand your pro-
grams when you revisit the code for modifications or updates. 2.1
Line 3 is simply a blank line. Programmers use blank lines and space characters to
make programs easier to read. Together, blank lines, space characters and tab characters are
known as white space. (Space characters and tabs are known specifically as white-space
characters.) Blank lines are ignored by Python.
Good Programming Practice 2.3
Use blank lines to enhance program readability. 2.3
The Python print command (line 4) instructs the computer to display the string of
characters contained between the quotation marks. A string is a sequence of characters con-
tained inside double quotes. The entire line is called a statement. In some programming lan-
guages, like C++ and Java, statements must end with a semicolon. In Python, most
statements simply end when the lines on which they are written end. When the statement
on line 4 executes, it displays the message Welcome to Python! on the screen. Note
that the double quotes that delineate the string do not appear in the output.
Output (i.e., displaying information) and input (i.e., receiving information) in Python
are accomplished with streams of characters. When the preceding statement executes, it
pythonhtp1_02.fm Page 37 Wednesday, December 12, 2001 12:12 PM
sends the stream of characters Welcome to Python! to the standard output stream. The
standard output stream is the channel through which an application presents information to
the user—this information typically is displayed on the screen, but may be printed on a
printer, written to a file, etc. It may even be spoken or issued to braille devices, so users
with visual impairments can receive the outputs.
Python statements can be executed two ways. The first is by typing statements into an
editor to create a program and saving the file with a .py extension (as in Fig. 2.1). Python
files typically end with .py, although other extensions (e.g., .pyw on Windows) can be used.
To use the Python interpreter to execute (run) the program in the file, type
python file.py
at the DOS or Unix shell command line, in which file.py is the name of the Python file.
The shell command line is a text “terminal” in which the user can type commands that cause
the computer system to respond. [Note: To invoke Python, the system path variable must
be set properly to include the python executable—a file containing the Python interpreter
program that can be run. The resources for this book—posted at our Web site www.dei-
tel.com—include instructions on how to set the appropriate system path variable.]
When the Python interpreter runs a program stored in the file, the interpreter starts at
the first line of the file and executes statements until the end of the file. The output box in
Fig. 2.1 contains the results of the Python interpreter running fig02_01.py.
The second way to execute Python statements is interactively. Typing
python
at the shell command line runs the Python interpreter in interactive mode. With this mode,
the programmer types statements directly to the interpreter, which executes these state-
ments one at a time.
Testing and Debugging Tip 2.1
In interactive mode, Python statements are entered and interpreted one at a time. This mode
often is useful when debugging a program. 2.1
Figure 2.2 shows Python 2.2 running in interactive mode on Windows. The first three
lines display information about the version of Python being used (2.2b2 means “version 2.2
beta 2”). The fourth line contains the Python prompt (>>>). When a programmer types a
statement at the Python prompt and presses the Enter key (sometimes labeled the Return
key), the interpreter executes the statement.
The print statement on the fifth line of Fig. 2.2 displays the text Welcome to
Python! to the screen (note, again, that the double quotes delineating the screen do not
print). After printing the text to the screen, the interpreter waits for the user to enter the next
statement. We exit interactive mode by typing the Ctrl-Z end-of-file character (on
Microsoft Windows systems) and pressing the Enter key. Figure 2.3 lists the keyboard
combinations for the end-of-file character for various computer systems.
pythonhtp1_02.fm Page 38 Wednesday, December 12, 2001 12:12 PM
Fig. 2.2 Interactive mode. (Python interpreter software Copyright © 2001 Python
Software Foundation.)
Fig. 2.3 End-of-file key combinations for various popular computer systems.
Welcome to Python!
Welcome
to
Python!
\n Newline. Move the screen cursor to the beginning of the next line.
\t Horizontal tab. Move the screen cursor to the next tab stop.
\r Carriage return. Move the screen cursor to the beginning of the cur-
rent line; do not advance to the next line.
\b Backspace. Move the screen cursor back one space.
\a Alert. Sound the system bell.
\\ Backslash. Print a backslash character.
\" Double quote. Print a double quote character.
\' Single quote. Print a single quote character.
In addition to a name and value, each object has a type. An object’s type identifies the
kind of information (e.g., integer, string, etc.) stored in the object. Integers are whole numbers
that encompass negative numbers (–14), zero (0) and positive numbers (6). In languages like
C++ and Java, the programmer must declare (state) the object type before using the object in
the program. However, Python uses dynamic typing, which means that Python determines an
object’s type during program execution. For example, if object a is initialized to 2, then the
object is of type “integer” (because the number 2 is an integer). Similarly, if object b is ini-
tialized to "Python", then the object is of type “string.” Function raw_input returns
values of type “string,” so the object referenced by integer1 (line 5) is of type “string.”
To perform integer addition on the value referenced by integer1, the program must
convert the string value to an integer value. Python function int (line 6) converts a string
or a number to an integer value and returns the new value. If we do not obtain an integer
value for variable integer1, we will not achieve the desired results—the program would
combine the two strings instead of adding two integers. Figure 2.8 demonstrates this with
an interactive session.
The assignment statement (line 11 of Fig. 2.7) calculates the sum of the variables
integer1 and integer2 and assigns the result to variable sum, using the assignment
symbol =. The statement is read as, “sum references the value of integer1 +
integer2.” Most calculations are performed through assignment statements.
The + symbol is an operator—a special symbol that performs a specific operation. In
this case, the + operator performs addition. The + operator is called a binary operator,
because it has two operands (values) on which it performs its operation. In this example,
the operands are integer1 and integer2. [Note: In Python, the = symbol is not an
operator. Rather, it is referred to as the assignment symbol.]
Common Programming Error 2.1
Trying to access a variable that has not been given a value is a run-time error. 2.1
Line 13 displays the string "Sum is" followed by the numerical value of variable
sum. Items we want to output are separated by commas (,). Note that this print state-
ment outputs values of different types, namely a string and an integer.
Calculations also can be performed in output statements. We could have combined the
statements in lines 11 and 13 into the statement
print "Sum is", integer1 + integer2
thus eliminating the need for variable sum. You should make such combinations only if
you feel it makes your programs clearer.
integer1 "45"
Fig. 2.9 Memory location showing value of a variable and the name bound to
the value.
"45"
integer1 45
Fig. 2.10 Memory location showing the name and value of a variable.
execute, suppose the user enters the string "72". After the program converts this value to
the integer value 72 and places the value into a memory location to which integer2 is
bound, memory appears as in Fig. 2.11. Note that the locations of these objects are not nec-
essarily adjacent in memory.
Once the program has obtained values for integer1 and integer2, the program
adds these values and assigns the sum to variable sum. After the statement
performs the addition, memory appears as in Fig. 2.12. Note that the values of integer1
and integer2 appear exactly as they did before they were used in the calculation of
sum. These values were used, but not modified, as the computer performed the calcula-
tion. Thus, when a value is read out of a memory location, the value is not changed.
integer1 45
integer2 72
Fig. 2.11 Memory locations after values for two variables have been input.
pythonhtp1_02.fm Page 44 Wednesday, December 12, 2001 12:12 PM
integer1 45
integer2 72
sum 117
Figure 2.13 demonstrates that each Python object has a location, a type and a value and
that these object properties are accessed through an object’s name. This program is iden-
tical to the program in Fig. 2.7, except that we have added statements that display the
memory location, type and value for each object at various points in the program.
Line 6 prints integer1’s location, type and value after the call to raw_input.
Python function id returns the interpreter’s representation of the variable’s location. Func-
tion type returns the type of the variable. We print these values again (line 8), after con-
verting the string value in integer1 to an integer value. Notice that both the type and the
location of variable integer1 change as a result of the statement
The change underscores the fact that a program cannot change a variable’s type. Instead,
the statement causes Python to create a new integer value in a new location and assigns the
name integer1 to this location. The location to which integer1 previously referred
is no longer accessible. The remainder of the program prints the location type and value for
variables integer2 and sum in a similar manner.
2.6 Arithmetic
Many programs perform arithmetic calculations. Figure 2.14 summarizes the arithmetic
operators. Note the use of various special symbols not used in algebra. The asterisk (*) in-
dicates multiplication and the percent sign (%) is the modulus operator that we discuss
shortly. The arithmetic operators in Fig. 2.14 are binary operators, (i.e., operators that take
two operands). For example, the expression integer1 + integer2 contains the binary
operator + and the two operands integer1 and integer2.
Python is an evolving language, and as such, some of its features change over time.
Starting with Python 2.2, the behavior of the / division operator will begin to change from
“floor division” to “true division.” Floor division (sometimes called integer division),
divides the numerator by the denominator and returns the highest integer value that is not
greater than the result. For example, dividing 7 by 4 with floor division yields 1 and
dividing 17 by 5 with floor division yields 3. Note that any fractional part in floor division
is simply discarded (i.e., truncated)—no rounding occurs. True division yields the precise
floating-point (i.e., numbers with a decimal point such as 7.0, 0.0975 and 100.12345) result
of dividing the numerator by the denominator. For example, dividing 7 by 4 with true divi-
sion yields 1.75.
Addition + f+7 f + 7
Subtraction – p–c p - c
Multiplication * bm b * m
Exponentiation ** xy x ** y
Division / x x / y
// (new in Python 2.2) x / y or -- or x ÷ y x // y
y
Modulus % r mod s r % s
In prior versions, Python contained only one operator for division—the / operator.
The behavior (i.e., floor or true division) of the operator is determined by the type of the
operands. If the operands are both integers, the operator performs floor division. If one or
both of the operands are floating-point numbers, the operator performs true division.
The language designers and many programmers disliked the ambiguity of the / oper-
ator and decided to create two operators for version 2.2—one for each type of division. The
/ operator performs true division and the // operator performs floor division. However,
this decision could introduce errors into programs that use older versions of Python. There-
fore, the designers came up with a compromise: Starting with Python 2.2 all future 2.x ver-
sions will include two operators, but if a program author wants to use the new behavior, the
programmer must state their intention explicitly with the statement
After Python sees this statement, the / operator performs true division and the // operator
performs floor division. The interactive session in Fig. 2.15 demonstrates floor division and
true division.
We first evaluate the expression 3 / 4. This expression evaluates to the value 0,
because the default behavior of the / operator with integer operands is floor division. The
expression 3.0 / 4.0 evaluates to 0.75. In this case, we use floating-point operands,
so the / operator performs true division. The expressions 3 // 4 and 3.0 // 4.0
evaluate to 0 and 0.0, respectively, because the // operator always performs floor divi-
sion, regardless of the types of the operands. Then, in line 13 of the interactive session, we
change the behavior of the / operator with the special import statement. In effect, this
statement turns on the true division behavior for operator /. Now the expression 3 / 4
evaluates to 0.75. [Note: In this text, we use only the default 2.2 behavior for the / oper-
ator, namely floor division for integers (lines 5–6 of Fig. 2.15) and true division for
floating-point numbers (lines 7–8 of Fig. 2.15).]
Python provides the modulus operator (%), which yields the remainder after integer
division. The expression x % y yields the remainder after x is divided by y. Thus, 7 % 4
yields 3 and 17 % 5 yields 2. This operator is most commonly used with integer operands,
but also can be used with other arithmetic types. In later chapters, we discuss many inter-
esting applications of the modulus operator, such as determining whether one number is a
multiple of another. (A special case of this is determining whether a number is odd or even.)
[Note: The modulus operator can be used with both integer and floating-point numbers.]
Arithmetic expressions in Python must be entered into the computer in straight-line
form. Thus, expressions such as “a divided by b” must be written as a / b, so that all con-
stants, variables and operators appear in a straight line. The algebraic notation
--a-
b
a * (b + c)
Not all expressions with several pairs of parentheses contain nested parentheses. For
example, the expression
a * (b + c) + c * (d + e)
does not contain nested parentheses. Rather, the parentheses in this expression are said to
be “on the same level.”
When we say that certain operators are applied from left to right, we are referring to
the associativity of the operators. For example, in the expression
a + b + c
the addition operators (+) associate from left to right. We will see that some operators as-
sociate from right to left.
Figure 2.16 summarizes these rules of operator precedence. This table will be
expanded as additional Python operators are introduced. A complete precedence chart is
included in the appendices.
Now let us consider several expressions in light of the rules of operator precedence.
Each example lists an algebraic expression and its Python equivalent. The following is an
example of an arithmetic mean (average) of five terms:
m = a---------------------------------------
Algebra: +b+c+d+e
5
Python: m = ( a + b + c + d + e ) / 5
The parentheses are required because division has higher precedence than addition
and, hence, the division will be applied first. The entire quantity ( a + b + c + d + e ) is
to be divided by 5. If the parentheses are erroneously omitted, we obtain a + b + c + d +
e / 5, which evaluates incorrectly as
a + b + c + d + --e-
5
Algebra: y = mx + b
Python: y = m * x + b
No parentheses are required. The multiplication is applied first, because multiplication has
a higher precedence than addition.
The following example contains modulus (%), multiplication, division, addition and
subtraction operations:
Python: z = p * r % q + w / x - y
1 2 4 3 5
pythonhtp1_02.fm Page 49 Wednesday, December 12, 2001 12:12 PM
The circled numbers under the statement indicate the order in which Python applies the
operators. The multiplication, modulus and division are evaluated first, in left-to-right or-
der (i.e., they associate from left to right) because they have higher precedence than ad-
dition and subtraction. The addition and subtraction are applied next. These are also
applied left to right. Once the expression has been evaluated, Python assigns the result to
variable z.
To develop a better understanding of the rules of operator precedence, consider how a
second-degree polynomial is evaluated:
y = a * x ** 2 + b * x + c
2 1 4 3 5
The circled numbers under the statement indicate the order in which Python applies the op-
erators.
Suppose variables a, b, c and x are initialized as follows: a = 2, b = 3, c = 7 and
x = 5. Figure 2.17 illustrates the order in which the operators are applied in the preceding
second-degree polynomial.
The preceding assignment statement can be parenthesized with unnecessary paren-
theses, for clarity, as
y = ( a * ( x ** 2 ) ) + ( b * x ) + c
Step 1. y = 2 * 5 ** 2 + 3 * 5 + 7
5 ** 2 is 25 (Exponentiation)
Step 2. y = 2 * 25 + 3 * 5 + 7
2 * 25 is 50 (Leftmost multiplication)
Step 3. y = 50 + 3 * 5 + 7
Step 4. y = 50 + 15 + 7
50 + 15 is 65 (Leftmost addition)
Step 5. y = 65 + 7
65 + 7 is 72 (Last addition)
Python also supports triple-quoted strings (lines 8–10). Triple-quoted strings are
useful for programs that output strings with special characters, such as quote characters.
Single- or double-quote characters inside a triple-quoted string do not need to use the
escape sequence. Triple-quoted strings also are used for large blocks of text, because triple-
quoted strings can span multiple lines. We use triple-quoted strings in this book when we
write programs that output large blocks of text for the Web.
Python strings support simple, but powerful, output formatting. We can create strings
that format output in several ways:
1. Rounding floating-point values to an indicated number of decimal places.
2. Representing floating-point numbers in exponential notation.
3. Aligning a column of numbers with decimal points appearing one above the other.
4. Right-justifying and left-justifying outputs.
5. Inserting characters or strings at precise locations in a line of output.
6. Displaying all types of data with fixed-size field widths and precision.
The program in Fig. 2.19 demonstrates basic string-formatting capabilities.
9 floatValue = 123456.789
10 print "Float", floatValue
11 print "Default float %f" % floatValue
12 print "Default exponential %e\n" % floatValue
13
14 print "Right justify integer (%8d)" % integerValue
15 print "Left justify integer (%-8d)\n" % integerValue
16
17 stringValue = "String formatting"
18 print "Force eight digits in integer %.8d" % integerValue
19 print "Five digits after decimal in float %.5f" % floatValue
20 print "Fifteen and five characters allowed in string:"
21 print "(%.15s) (%.5s)" % ( stringValue, stringValue )
Integer 4237
Decimal integer 4237
Hexadecimal integer 108d
Float 123456.789
Default float 123456.789000
Default exponential 1.234568e+005
Lines 4–7 demonstrate how to represent integers in a string. Line 5 displays the value
of variable integerValue without string formatting. The % formatting operator inserts
the value of a variable in a string (line 6). The value to the left of the operator is a string that
contains one or more conversion specifiers—place holders for values in the string. Each
conversion specifier begins with a percent sign (%)—not to be confused with the % format-
ting operator—and ends with a conversion-specifier symbol. Conversion-specifier symbol
d indicates that we want to place an integer within the current string at the specified point.
Figure 2.20 lists several conversion-specifier symbols for use in string formatting. [Note:
See Appendix C, Number Systems, for a discussion of numeric terminology in Fig. 2.20.]
The value to the right of the % formatting operator specifies what replaces the place-
holders in the strings. In line 6, we specify the value integerValue to replace the %d
placeholder in the string. Line 7 inserts the hexadecimal representation of the value
assigned to variable integerValue into the string.
Lines 9–12 demonstrate how to insert floating-point values in a string. The f conver-
sion specifier acts as a place holder for a floating-point value (line 11). To the right of the
% formatting operator, we use variable floatValue as the value to be displayed. The e
conversion specifier acts as a place holder for a floating-point value in exponential notation.
Exponential notation is the computer equivalent of scientific notation used in mathematics.
For example, the value 150.4582 is represented in scientific notation as 1.504582 X
102 and is represented in exponential notation as 1.504582E+002 by the computer.
This notation indicates that 1.504582 is multiplied by 10 raised to the second power
(E+002). The E stands for “exponent.”
Lines 14–15 demonstrate string formatting with field widths. A field width is the min-
imum size of a field in which a value is printed. If the field width is larger than the value
being printed, the data is normally right-justified within the field. To use field widths, place
an integer representing the field width between the percent sign and the conversion-speci-
fier symbol. Line 14 right-justifies the value of variable integerValue in a field width
of size eight. To left-justify a value, specify a negative integer as the field width (line 15).
Lines 17–21 demonstrate string formatting with precision. Precision has different
meaning for different data types. When used with integer conversion specifiers, precision
indicates the minimum number of digits to be printed. If the printed value contains fewer
digits than the specified precision, zeros are prefixed to the printed value until the total
number of digits is equivalent to the precision. To use precision, place a decimal point (.) fol-
lowed by an integer representing the precision between the percent sign and the conversion
specifier. Line 18 prints the value of variable integerValue with eight digits of precision.
When precision is used with a floating-point conversion specifier, the precision is the
number of digits to appear after the decimal point. Line 19 prints the value of variable
floatValue with five digits of precision.
pythonhtp1_02.fm Page 54 Wednesday, December 12, 2001 12:12 PM
When used with a string-conversion specifier, the precision is the maximum number
of characters to be written from the string. Line 21 prints the value of variable
stringValue twice—once with a precision of fifteen and once with a precision of five.
Notice that the conversion specifications are contained within parentheses. When the string
to the left of the % formatting operator contains more than one conversion specifier, the
value to the right of the operator must be a comma-separated sequence of values. This
sequence is contained within parentheses and must have the same number of values as the
string has conversion specifiers. Python constructs the string from left to right by matching
a placeholder with the next value specified between parentheses and replacing the format-
ting character with that value.
Python strings support even more powerful string-formatting capabilities through
string methods, which we discuss in detail in Chapter 13, Strings Manipulation and Regular
Expressions.
Relational operators
> > x > y x is greater than y
< < x < y x is less than y
≥ >= x >= y x is greater than or equal to y
≤ <= x <= y x is less than or equal to y
Equality operators
= == x == y x is equal to y
≠ !=, <> x != y, x is not equal to y
x <> y
The following example uses six if structures to compare two user-entered numbers.
If the condition in any of these if structures is true, the assignment statement associated
with that if structure executes. The user inputs two values, and the program converts the
input values to integers and assigns them to variables number1 and number2. Then, the
program compares the numbers and displays the results of the comparisons. Figure 2.22
shows the program and sample executions.
Fig. 2.22 Equality and relational operators used to determine logical relationships.
(Part 1 of 2.)
pythonhtp1_02.fm Page 56 Wednesday, December 12, 2001 12:12 PM
Fig. 2.22 Equality and relational operators used to determine logical relationships.
(Part 2 of 2.)
The program uses Python functions raw_input and int to input two integers (lines
8–14). First a value is obtained for variable number1, then a value is obtained for variable
number2.
The if structure in lines 16–17 compares the values of variables number1 and
number2 to test for equality. If the values are equal, the statement displays a line of text
indicating that the numbers are equal (line 17). If the conditions are met in one or more of
the if structures starting at lines 19, 22, 25, 28 and 31, the corresponding print statement
displays a line of text.
Each if structure consists of the word if, the condition to be tested and a colon (:).
An if structure also contains a body (called a suite). Notice that each if structure in
Fig. 2.22 has a single statement in its body and that each body is indented. Some languages,
like C++, Java and C# use braces, { }, to denote the body of if structures; Python requires
indentation for this purpose. We discuss indentation in the next section.
pythonhtp1_02.fm Page 57 Wednesday, December 12, 2001 12:12 PM
In Python, syntax evaluation is dependent on white space; thus, the inconsistent use of
white space can cause syntax errors. For instance, splitting a statement over multiple lines can
result in a syntax error. If a statement is long, the statement can be spread over multiple lines
using the \ line-continuation character. Some Python interpreters use "..." to denote a con-
tinuing line. The interactive session in Fig. 2.23 demonstrates the line-continuation character.
Good Programming Practice 2.9
A lengthy statement may be spread over several lines with the \ continuation character. If a
single statement must be split across lines, choose breaking points that make sense, such as
after a comma in a print statement or after an operator in a lengthy expression. 2.9
Figure 2.24 shows the precedence of the operators introduced in this chapter. The oper-
ators are shown from top to bottom in decreasing order of precedence. Notice that all these
operators, except exponentiation, associate from left to right.
Testing and Debugging Tip 2.3
Refer to the operator-precedence chart when writing expressions containing many opera-
tors. Confirm that the operators in the expression are performed in the order you expect. If
you are uncertain about the order of evaluation in a complex expression, break the expres-
sion into smaller statements or use parentheses to force the order, exactly as you would do
in an algebraic expression. Be sure to observe that some operators, such as exponentiation
(**), associate from right to left rather than from left to right. 2.9
2.9 Indentation
Python uses indentation to delimit (distinguish) sections of code. Other programming lan-
guages often use braces to delimit sections of code. A suite is a section of code that corre-
sponds to the body of a control structure. We study blocks in the next chapter. The Python
programmer chooses the number of spaces to indent a suite or block, and the number of
spaces must remain consistent for each statement in the suite or block. Python recognizes
new suites or blocks when there is a change in the number of indented spaces.
Common Programming Error 2.7
If a single section of code contains lines of code that are not uniformly indented, the Python
interpreter reads those lines as belonging to other sections, causing syntax or logic errors.2.7
Figure 2.25 contains a modified version of the code in Fig. 2.22 to illustrate improper
indentation. Lines 21–22 show the improper indentation of an if statement. Even though
the program does not produce an error, it skips an equality operator. The
if number1 != number2:
statement (line 21) executes only if the if number1 == number2: statement (line 16)
executes. In this case, the if statement in line 21 never executes, because two equal num-
bers will never be unequal (i.e., 2 will never unequal 2). Thus, the output of Fig. 2.25 does
not state that 1 is not equal to 2 as it should.
images on a computer screen as objects such as people, planes, trees and mountains, rather
than as individual dots of color. We can, if we wish, think in terms of beaches rather than
grains of sand, forests rather than trees and buildings rather than bricks.
We might be inclined to divide objects into two categories—animate objects and inan-
imate objects. Animate objects are “alive” in some sense. They move around and do things.
Inanimate objects, like towels, seem not to do much at all. They just “sit around.” All these
objects, however, do have some things in common. They all have attributes, like size,
shape, color and weight, and they all exhibit behaviors (e.g., a ball rolls, bounces, inflates
and deflates; a baby cries, sleeps, crawls, walks and blinks; a car accelerates, brakes and
turns; a towel absorbs water).
Humans learn about objects by studying their attributes and observing their behaviors.
Different objects can have similar attributes and can exhibit similar behaviors. Compari-
sons can be made, for example, between babies and adults and between humans and chim-
panzees. Cars, trucks, little red wagons and roller skates have much in common.
Object-oriented programming (OOP) models real-world objects using software coun-
terparts. It takes advantage of class relationships, where objects of a certain class—such as
a class of vehicles—have the same characteristics. It takes advantage of inheritance rela-
tionships, and even multiple inheritance relationships, where newly created classes of
objects are derived by absorbing characteristics of existing classes and adding unique char-
acteristics of their own. An object of class “convertible” certainly has the characteristics of
the more general class “automobile,” but a convertible’s roof goes up and down.
Object-oriented programming gives us a more natural and intuitive way to view the
programming process, by modeling real-world objects, their attributes and their behaviors.
OOP also models communications between objects. Just as people send messages to one
another (e.g., a sergeant commanding a soldier to stand at attention), objects communicate
via messages.
OOP encapsulates data (attributes) and functions (behavior) into packages called
objects; the data and functions of an object are intimately tied together. Objects have the
property of information hiding. This means that, although objects may know how to com-
municate with one another, objects normally are not allowed to know how other objects are
implemented—implementation details are hidden within the objects themselves. Surely it
is possible to drive a car effectively without knowing the details of how engines, transmis-
sions and exhaust systems work internally. We will see why information hiding is so crucial
to good software engineering.
In C and other procedural programming languages, programming tends to be action-
oriented; in Python, programming is object-oriented (ideally). The function is the unit of
programming in procedural programming. In object-oriented programming, the unit of pro-
gramming is the class from which objects are eventually instantiated (a fancy term for “cre-
ated”). Python classes contain functions (that implement class behaviors) and data (that
implements class attributes).
Procedural programmers concentrate on writing functions. Groups of actions that per-
form some task are formed into functions, and functions are grouped to form programs.
Data is certainly important in procedural programming, but the view is that data exists pri-
marily in support of the actions that functions perform. The verbs in a system specification
help the procedural programmer determine the set of functions that will work together to
implement the system.
pythonhtp1_02.fm Page 61 Wednesday, December 12, 2001 12:12 PM
SUMMARY
• Programmers insert comments to document programs and to improve program readability. Com-
ments also help other programmers read and understand your program. In Python, comments are
denoted by the pound symbol (#).
• A comment that begins with # is called a single-line comment, because the comment terminates
at the end of the current line.
• Comments do not cause the computer to perform any action when the program is run. Python ig-
nores comments.
• Programmers use blank lines and space characters to make programs easier to read. Together,
blank lines, space characters and tab characters are known as white space. (Space characters and
tabs are known specifically as white-space characters.)
• Blank lines are ignored by Python.
• The standard output stream is the channel by which information presented to the user by an appli-
cation—this information typically is displayed on the screen, but may be printed on a printer, writ-
pythonhtp1_02.fm Page 62 Wednesday, December 12, 2001 12:12 PM
ten to a file, etc. It may even be spoken or issued to braille devices, so users with visual
impairments can receive the outputs.
• The print statement instructs the computer to display the string of characters contained between
the quotation marks. A string is a Python data type that contains a sequence of characters.
• A print statement normally sends a newline character to the screen. After a newline character is
sent, the next string displayed on the screen appears on the line below the previous string. Howev-
er, a comma (,) tells Python not to send the newline character to the screen. Instead, Python adds
a space after the string, and the next string printed to the screen appears on the same line.
• Output (i.e., displaying information) and input (i.e., receiving information) in Python are accom-
plished with streams of characters.
• Python files typically end with .py, although other extensions (e.g., .pyw on Windows) can be
used.
• When the Python interpreter executes a program, the interpreter starts at the first line of the file
and executes statements until the end of the file.
• The backslash (\) is an escape character. It indicates that a “special” character is to be output.
When a backslash is encountered in a string of characters, the next character is combined with the
backslash to form an escape sequence.
• The escape sequence \n means newline. Each occurrence of a \n (newline) escape sequence caus-
es the screen cursor to position to the beginning of the next line.
• A built-in function is a piece of code provided by Python that performs a task. The task is per-
formed when the function is invoked or called. After performing its task, a function may return a
value that represents the end result of the task.
• In Python, variables are more specifically referred to as objects. An object resides in the comput-
er’s memory and contains information used by the program. The term object normally implies that
attributes (data) and behaviors (methods) are associated with the object. The object’s methods use
the attributes to perform tasks.
• A variable name consists of letters, digits and underscores (_) and does not begin with a digit.
• Python is case sensitive—uppercase and lowercase letters are different, so a1 and A1 are different
variables.
• An object can have multiple names, called identifiers. Each identifier (or variable name) referenc-
es (points to) the object (or variable) in memory.
• Each object has a type. An object’s type identifies the kind of information (e.g., integer, string,
etc.) stored in the object.
• In Python, every object has a type, a size, a value and a location.
• Function type returns the type of an object. Function id returns a number that represents the ob-
ject’s location.
• In languages like C++ and Java, the programmer must declare the object type before using the ob-
ject in the program. In Python, the type of an object is determined automatically, as the program
executes. This approach is called dynamic typing.
• Binary operators take two operands. Examples of binary operators are + and -.
• Starting with Python version 2.2, the behavior of the / division operator will change from “floor
division” to “true division.”
• Floor division (sometimes called integer division), divides the numerator by the denominator and
returns the highest integer value that is not greater than the result. Any fractional part in floor di-
vision is simply discarded (i.e., truncated)—no rounding occurs.
pythonhtp1_02.fm Page 63 Wednesday, December 12, 2001 12:12 PM
• True division yields the precise floating-point result of dividing the numerator by the denominator.
• The behavior (i.e., floor or true division) of the / operator is determined by the type of the oper-
ands. If the operands are both integers, the operator performs floor division. If one or both of the
operands are floating-point numbers, the operator perform true division.
• The // operator performs floor division.
• Programmers can change the behavior of the / operator to perform true division with the statement
from __future__ import division.
• In Python version 3.0, the only behavior of the / operator will be true division. After the release
of version 3.0, all programs are expected to have been updated to compensate for the new be-
havior.
• Python provides the modulus operator (%), which yields the remainder after integer division. The
expression x % y yields the remainder after x is divided by y. Thus, 7 % 4 yields 3 and 17 % 5
yields 2. This operator is most commonly used with integer operands, but also can be used with
other arithmetic types.
• The modulus operator can be used with both integer and floating-point numbers.
• Arithmetic expressions in Python must be entered into the computer in straight-line form. Thus,
expressions such as “a divided by b” must be written as a / b, so that all constants, variables and
operators appear in a straight line.
• Parentheses are used in Python expressions in much the same manner as in algebraic expressions.
For example, to multiply a times the quantity b + c, we write a * (b + c).
• Python applies operators in arithmetic expressions in a precise sequence determined by the rules
of operator precedence, which are generally the same as those followed in algebra.
• When we say that certain operators are applied from left to right, we are referring to the associa-
tivity of the operators.
• Python provides strings as a built-in data type and can perform powerful text-based operations.
• Strings can be created using the single-quote (') and double-quote characters ("). Python also sup-
ports triple-quoted strings. Triple-quoted strings are useful for programs that output strings with
quote characters or large blocks of text. Single- or double-quote characters inside a triple-quoted
string do not need to use the escape sequence, and triple-quoted strings can span multiple lines.
• A field width is the minimum size of a field in which a value is printed. If the field width is larger
than that needed by the value being printed, the data normally is right-justified within the field. To
use field widths, place an integer representing the field width between the percent sign and the con-
version-specifier symbol.
• Precision has different meaning for different data types. When used with integer conversion spec-
ifiers, precision indicates the minimum number of digits to be printed. If the printed value contains
fewer digits than the specified precision, zeros are prefixed to the printed value until the total num-
ber of digits is equivalent to the precision.
• When used with a floating-point conversion specifier, the precision is the number of digits to ap-
pear to the right of the decimal point.
• When used with a string-conversion specifier, the precision is the maximum number of characters
to be written from the string.
• Exponential notation is the computer equivalent of scientific notation used in mathematics. For ex-
ample, the value 150.4582 is represented in scientific notation as 1.504582 X 102 and is rep-
resented in exponential notation as 1.504582E+002 by the computer. This notation indicates
that 1.504582 is multiplied by 10 raised to the second power (E+002). The E stands for “ex-
ponent.”
pythonhtp1_02.fm Page 64 Wednesday, December 12, 2001 12:12 PM
• An if structure allows a program to make a decision based on the truth or falsity of a condition.
If the condition is true, (i.e., the condition is met), the statement in the body of the if structure is
executed. If the condition is not met, the body statement is not executed.
• Conditions in if structures can be formed with equality relational operators. The relational oper-
ators all have the same level of precedence and associate from left to right. The equality operators
both have the same level of precedence, which is lower than the precedence of the relational op-
erators. The equality operators also associate from left to right.
• Each if structure consists of the word if, the condition to be tested and a colon (:). An if struc-
ture also contains a body (called a suite).
• Python uses indentation to delimit (distinguish) sections of code. Other programming languages
often use braces to delimit sections of code. A suite is a section of code that corresponds to the
body of a control structure. We study blocks in the next chapter.
• The Python programmer chooses the number of spaces to indent a suite or block, and the number
of spaces must remain consistent for each statement in the suite or block.
• Splitting a statement over two lines can also cause a syntax error. If a statement is long, the state-
ment can be spread over multiple lines using the \ line-continuation character.
• Object-oriented programming (OOP) models real-world objects with software counterparts. It
takes advantage of class relationships where objects of a certain class—such as a class of vehi-
cles—have the same characteristics.
• OOP takes advantage of inheritance relationships, and even multiple-inheritance relationships,
where newly created classes of objects are derived by absorbing characteristics of existing classes
and adding unique characteristics of their own.
• Object-oriented programming gives us a more natural and intuitive way to view the programming
process, namely, by modeling real-world objects, their attributes and their behaviors. OOP also
models communication between objects.
• OOP encapsulates data (attributes) and functions (behavior) into packages called objects; the data
and functions of an object are intimately tied together.
• Objects have the property of information hiding. Although objects may know how to communicate
with one another across well-defined interfaces, objects normally are not allowed to know how
other objects are implemented—implementation details are hidden within the objects themselves.
• In Python, programming can be object-oriented. In object-oriented programming, the unit of pro-
gramming is the class from which instances are eventually created. Python classes contain meth-
ods (that implement class behaviors) and data (that implements class attributes).
• Object-oriented programmers create their own user-defined types called classes and components.
Each class contains both data and the set of functions that manipulate the data. The data compo-
nents of a class are called data members or attributes.
• The functional components of a class are called methods (or member functions, in some other ob-
ject-oriented languages).
• The focus of attention in object-oriented programming is on classes rather than on functions. The
nouns in a system specification help the object-oriented programmer determine the set of classes
that will be used to create the instances that will work together to implement the system.
TERMINOLOGY
abstraction arithmetic operator
alert escape sequence (\a) assignment statement
argument assignment symbol (=)
pythonhtp1_02.fm Page 65 Wednesday, December 12, 2001 12:12 PM
SELF-REVIEW EXERCISES
2.1 Fill in the blanks in each of the following:
a) The statement instructs the computer to display information on the screen.
b) A is a Python data type that contains a sequence of characters.
c) are simply names that reference objects.
d) The is the modulus operator.
e) are used to document a program and improve its readability.
f) Each if structure consists of the word , the to be tested, a
and a .
g) The function converts non-integer values to integer values.
h) A Python statement can be spread over multiple lines using the .
i) Arithmetic expressions enclosed in are evaluated first.
j) An object’s describes the information stored in the object.
2.2 State whether each of the following is true or false. If false, explain why.
a) The Python function get_input requests input from the user.
b) A valid Python arithmetic expression with no parentheses is evaluated left to right.
c) The following are invalid variable names: 3g, 87 and 2h.
d) The operator != is an example of a relational operator.
e) A variable name identifies the kind of information stored in the object.
f) In Python, the programmer must declare the object type before using the object in the
program.
g) If parentheses are nested, the expression in the innermost pair is evaluated first.
h) Python treats the variable names, a1 and A1, as the same variable.
i) The backslash character is called an escape sequence.
j) The relational operators all have the same level of precedence and evaluate left to right.
EXERCISES
2.3 State the order of evaluation of the operators in each of the following Python statements and
show the value of x after each statement is performed.
a) x = 7 + 3 * 6 / 2 - 1
b) x = 2 % 2 + 2 * 2 - 2 / 2
c) x = ( 3 * 9 * ( 3 + ( 9 * 3 / ( 3 ) ) ) )
2.4 Write a program that requests the user to enter two numbers and prints the sum, product, dif-
ference and quotient of the two numbers.
2.5 Write a program that reads in the radius of a circle and prints the circle’s diameter, circum-
ference and area. Use the constant value 3.14159 for π. Do these calculations in output statements.
pythonhtp1_02.fm Page 67 Wednesday, December 12, 2001 12:12 PM
2.6 Write a program that prints a box, an oval, an arrow and a diamond, as shown:
********* *** * *
* * * * *** * *
* * * * ***** * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
********* *** * *
2.7 Write a program that reads in two integers and determines and prints whether the first is a
multiple of the second. (Hint: Use the modulus operator.)
2.8 Give a brief answer to each of the following “object think” questions:
a) Why does this text choose to discuss structured programming in detail before proceeding
with an in-depth treatment of object-oriented programming?
b) What aspects of an object need to be determined before an object-oriented program can
be built?
c) How is inheritance exhibited by human beings?
d) What kinds of messages do people send to one another?
e) Objects send messages to one another across well-defined interfaces. What interfaces
does a car radio (object) present to its user (a person object)?
pythonhtp1_03.fm Page 68 Saturday, December 8, 2001 9:34 AM
3
Control Structures
Objectives
• To understand basic problem-solving techniques.
• To develop algorithms through the process of top-
down, stepwise refinement.
• To use the if, if/else and if/elif/else
structures to select appropriate actions.
• To use the while and for repetition structures to
execute statements in a program repeatedly.
• To understand counter-controlled and sentinel-
controlled repetition.
• To use augmented assignment symbols and logical
operators.
• To use the break and continue program control
statements.
Let’s all move one place on.
Lewis Carroll
The wheel is come full circle.
William Shakespeare, King Lear
Who can control his fate?
William Shakespeare, Othello
The used key is always bright.
Benjamin Franklin
pythonhtp1_03.fm Page 69 Saturday, December 8, 2001 9:34 AM
Outline
3.1 Introduction
3.2 Algorithms
3.3 Pseudocode
3.4 Control Structures
3.5 if Selection Structure
3.6 if/else and if/elif/else Selection Structures
3.7 while Repetition Structure
3.8 Formulating Algorithms: Case Study 1 (Counter-Controlled
Repetition)
3.9 Formulating Algorithms with Top-Down, Stepwise Refinement: Case
Study 2 (Sentinel-Controlled Repetition)
3.10 Formulating Algorithms with Top-Down, Stepwise Refinement: Case
Study 3 (Nested Control Structures)
3.11 Augmented Assignment Symbols
3.12 Essentials of Counter-Controlled Repetition
3.13 for Repetition Structure
3.14 Using the for Repetition Structure
3.15 break and continue Statements
3.16 Logical Operators
3.17 Structured-Programming Summary
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises
3.1 Introduction
Before writing a program to solve a particular problem, it is essential to have a thorough
understanding of the problem and a carefully planned approach to solving the problem.
When writing a program, it is equally essential to understand the types of building blocks
that are available and to use proven program-construction principles. In this chapter, we
discuss these issues in our presentation of the theory and principles of structured program-
ming. The techniques that you learn are applicable to most high-level languages, including
Python. When we begin our treatment of object-oriented programming in Chapter 7, we use
the control structures presented in this chapter to build and manipulate objects.
3.2 Algorithms
Any computing problem can be solved by executing a series of actions in a specified order.
An algorithm is a procedure for solving a problem in terms of
1. actions to be executed and
2. the order in which these actions are to be executed.
pythonhtp1_03.fm Page 70 Saturday, December 8, 2001 9:34 AM
The following example demonstrates that specifying the order in which the actions are to
be executed is important.
Consider the “rise-and-shine” algorithm followed by one junior executive for getting
out of bed and going to work: (1) Get out of bed, (2) take off pajamas, (3) take a shower,
(4) get dressed, (5) eat breakfast, (6) carpool to work. This routine gets the executive to
work to make critical decisions.
Suppose that the same steps are performed in a slightly different order: (1) Get out of
bed, (2) take off pajamas, (3) get dressed, (4) take a shower, (5) eat breakfast, (6) carpool
to work. In this case, our junior executive shows up for work soaking wet.
Specifying the order in which statements are to be executed in a computer program is
called program control. In this chapter, we investigate Python’s program-control capabilities.
3.3 Pseudocode
Pseudocode is an artificial and informal language that helps programmers develop algo-
rithms. Pseudocode consists of descriptions of executable statements—those that are exe-
cuted when the program has been converted from pseudocode to Python. The pseudocode
we present here is useful for developing algorithms that will be converted to Python pro-
grams. Pseudocode is similar to everyday English; it is convenient and user-friendly, al-
though it is not an actual computer programming language.
Pseudocode programs are not executed on computers. Rather, pseudocode helps the
programmer “plan” a program before attempting to write it in a programming language,
such as Python. In this chapter, we provide several examples of how pseudocode can be
used effectively in developing Python programs.
Software Engineering Observation 3.1
Pseudocode often is used to “think out” a program during the program design process. Then
the pseudocode program is converted to Python. 3.1
The research of Bohm and Jacopini1 demonstrated that programs could be written
without any goto statements. The challenge, then became for programmers to alter their
programming styles to “goto-less programming.” When programmers began to take struc-
tured programming seriously beginning in the 1970s, the notion of structured programming
became almost synonymous with goto elimination. Since then, the results have been
impressive, as software development groups have reported reduced development times,
more frequent on-time delivery of systems and more frequent within-budget completion of
software projects. Structured programming has enabled these improvements because struc-
tured programs are clearer, easier to debug and modify and more likely to be bug-free in
the first place.
Bohm and Jacopini’s work demonstrated that all programs could be written in terms of
three control structures—namely, the sequence structure, the selection structure and the
repetition structure. The sequence structure is built into Python. Unless directed otherwise,
the computer executes Python statements sequentially. The flowchart segment of Fig. 3.1
illustrates a typical sequence structure in which two calculations are performed sequen-
tially. A flowchart is a tool that provides graphical representation of an algorithm or a por-
tion of an algorithm.
Flowcharts are drawn using certain special-purpose symbols, such as rectangles, dia-
monds, ovals and small circles; these symbols are connected by arrows called flowlines,
which indicate the order in which the actions of the algorithm execute. Like pseudocode,
flowcharts aid in the development and representation of algorithms. Although most pro-
grammers prefer pseudocode, flowcharts illustrate clearly how control structures operate.
The reader should carefully compare the pseudocode and flowchart representations of each
control structure.
The flowchart segment for the sequence structure in Fig. 3.1 uses the rectangle
symbol, called the action symbol, to indicate an action, (e.g., calculation or an input/output
operation). The flowlines in the figure indicate the order in which the actions are to be
performed—first, grade is added to total, then 1 is added to counter. Python allows
us to have as many actions as we want in a sequence structure—anywhere a single action
may be placed, we can place several actions in sequence.
1. Bohm, C., and G. Jacopini, “Flow Diagrams, Turing Machines, and Languages with Only Two
Formation Rules,” Communications of the ACM, Vol. 9, No. 5, May 1966, pp. 336–371.
pythonhtp1_03.fm Page 72 Saturday, December 8, 2001 9:34 AM
In a flowchart that represents a complete algorithm, an oval symbol containing the word
“Begin” represents the start of the flowchart; an oval symbol containing the word “End” rep-
resents the end of the flowchart. When drawing a portion of an algorithm, as in Fig. 3.1, the
oval symbols are omitted in favor of small circle symbols, also called connector symbols.
Perhaps the most important flowchart symbol is the diamond symbol, also called the
decision symbol, which indicates a decision is to be made. We discuss the diamond symbol
in the next section. The pseudocode we present here is useful for developing algorithms that
will be converted to structured Python programs.
Python provides three types of selection structures: if, if/else and if/elif/
else. We discuss each of these in this chapter. The if selection structure either performs
(selects) an action if a condition (predicate) is true or skips the action if the condition is
false. The if/else selection structure performs an action if a condition is true or performs
a different action if the condition is false. The if/elif/else selection structure performs
one of many different actions, depending on the truth or falsity of several conditions.
The if selection structure is a single-selection structure because it selects or ignores a
single action. The if/else selection structure is a double-selection structure because it
selects between two different actions. The if/elif/else selection structure is a multiple-
selection structure because it selects the action to perform from many different actions.
Python provides two types of repetition structures: while and for. The if, elif,
else, while and for structures are Python keywords. These keywords are reserved by
the language to implement various Python features, such as control structures. Keywords
cannot be used as identifiers (i.e., variable names). Figure 3.2 lists all Python keywords.2
Common Programming Error 3.1
Using a keyword as an identifier is a syntax error. 3.1
In all, Python has only the six control structures: the sequence structure, three types of
selection structures and two types of repetition structures. Each Python program is formed
by combining as many control structures as is appropriate for the algorithm the program
implements. As with the sequence structure shown in Fig. 3.1, we will see that each control
structure is flowcharted with two small circle symbols, one at the entry point to the control
structure and one at the exit point.
Python keywords
2. Python 2.3 will introduce the keyword yield among others. Visit the Python Web site
(www.python.org) to view a tentative list of such keywords, and avoid using them as identifi-
ers.
pythonhtp1_03.fm Page 73 Saturday, December 8, 2001 9:34 AM
Notice that the Python code corresponds closely to the pseudocode. This similarity is the
reason that pseudocode is a useful program development tool. The statement in the body of
the if structure outputs the character string "Passed".
The flowchart of Fig. 3.3 illustrates the single-selection if structure and the diamond
symbol. The decision symbol contains an expression, such as a condition, that can be either
true or false. The diamond has two flowlines emerging from it: One indicates the direction
to follow when the expression in the symbol is true; the other indicates the direction to
follow when the expression is false. We learned, in Chapter 2, Introduction to Python Pro-
gramming, that decisions can be based on conditions containing relational or equality oper-
ators. Actually, a decision can be based on any expression. For instance, if an expression
evaluates to zero, it is treated as false, and if an expression evaluates to nonzero, it is treated
as true.
Note that the if structure is a single-entry/single-exit structure. We will soon learn
that the flowcharts for the remaining control structures also contain (besides small circle
symbols and flowlines) rectangle symbols that indicate the actions to be performed and dia-
mond symbols that indicate decisions to be made. This type of flowchart emphasizes the
action/decision model of programming.
pythonhtp1_03.fm Page 74 Saturday, December 8, 2001 9:34 AM
true
grade >= 60 print “Passed”
false
The flowchart of Fig. 3.4 illustrates the flow of control in the if/else structure. Once
again, note that (besides small circles and arrows) the symbols in the flowchart are rectan-
gles (for actions) and diamonds (for decisions). We continue to emphasize this action/deci-
sion model of computing. Imagine again a bin containing empty double-selection
structures. The programmer’s job is to assemble these selection structures (by stacking and
nesting) with other control structures required by the algorithm and to fill in the rectangles
and diamonds with actions and decisions appropriate to the algorithm being implemented.
Nested if/else structures test for multiple cases by placing if/else selection
structures inside other if/else selection structures. For example, the following
pseudocode statement prints A for exam grades greater than or equal to 90, B for grades 80–
89, C for grades 70–79, D for grades 60–69 and F for all other grades.
If student’s grade is greater than or equal to 90
Print “A”
else
If student’s grade is greater than or equal to 80
Print “B”
else
If student’s grade is greater than or equal to 70
Print “C”
else
If student’s grade is greater than or equal to 60
Print “D”
else
Print “F”
false true
grade >= 60
If grade is greater than or equal to 90, the first four conditions are met, but only the
print statement after the first test executes. After that print executes, the else part of
the “outer” if/else statement skips.
Performance Tip 3.1
A nested if/else structure is faster than a series of single-selection if structures because
the testing of conditions terminates after one of the conditions is satisfied. 3.1
thus replacing the double-selection if/else structure with the multiple-selection if/elif/
else structure. The two forms are equivalent. The latter form is popular because it avoids
the deep indentation of the code to the right. Such indentation often leaves little room on a
line, forcing lines to be split over multiple lines and decreasing program readability.
Each elif can have one or more actions. The flowchart in Fig. 3.5 shows the general
if/elif/else multiple-selection structure. The flowchart indicates that, after an if or
elif statement executes, control immediately exits the if/elif/else structure. Again,
note that (besides small circles and arrows) the flowchart contains rectangle symbols and
diamond symbols. Imagine that the programmer has access to a deep bin of empty if/
elif/else structures—as many as the programmer might need to stack and nest with
pythonhtp1_03.fm Page 77 Saturday, December 8, 2001 9:34 AM
true
if statement condition a case a action(s)
false
false
.
.
.
false
else
default action(s)
statement
The if selection structure can contain several statements in its body (suite), and all
these statements must be indented. The following example includes a suite in the else part
of an if/else structure that contains two statements. A suite that contains more than one
statement is sometimes called a compound statement.
pythonhtp1_03.fm Page 78 Saturday, December 8, 2001 9:34 AM
In this case, if grade is less than 60, the program executes both statements in the body of
the else and prints
Failed.
You must take this course again.
Notice that both statements of the else suite are indented. If the statement
was not indented, the statement executes regardless of whether the grade is less than 60 or
not. This is an example of a logic error.
A programmer can introduce two major types of errors into a program: syntax errors
and logic errors. A syntax error violates the rules of the programming language. Examples
of syntax errors include using a keyword as an identifier or forgetting the colon (:) after an
if statement. The interpreter catches a syntax error and displays an error message.
A logic error causes the program to produce unexpected results and may not be caught
by the interpreter. A fatal logic error causes a program to fail and terminate prematurely.
For fatal errors, Python prints an error message called a traceback and exits. A nonfatal
logic error allows a program to continue executing, but produces incorrect results.
Common Programming Error 3.3
Forgetting to indent all the statements in a suite can lead to syntax or logic errors in a pro-
gram. 3.3
The interactive session in Fig. 3.6 attempts to divide two user-entered values and dem-
onstrates one syntax error and two logic errors. The syntax error is contained in the line
print value1 +
The + operator needs a right-hand operand, so the interpreter indicates a syntax error.
The first logic error is contained in the line
The intention of this line is to print the sum of the two user-entered integer values. How-
ever, the strings were not converted to integers, thus the statement does not produce the de-
sired result. Instead, the statement produces the concatenation of the two strings—formed
by linking the two strings together. Notice that the interpreter does not display any messag-
es because the statement is legal.
The second logic error occurs in the line
The program does not check whether the second user-entered value is 0, so the program
attempts to divide by zero. Dividing by zero is a fatal logic error.
pythonhtp1_03.fm Page 79 Saturday, December 8, 2001 9:34 AM
Just as multiple statements can be placed anywhere a single statement can be placed,
it is possible to have no statements at all, (i.e., empty statements). The empty statement is
represented by placing keyword pass where a statement normally resides (Fig. 3.7).
Common Programming Error 3.5
All control structures must contain at least one statement. A control structure that contains
no statements causes a syntax error. 3.5
describes the repetition that occurs during a shopping trip. The condition, “there are more
items on my shopping list” is either true or false. If it is true, the program performs the ac-
tion “Purchase next item and cross it off my list.” This action is performed repeatedly while
the condition remains true.
The statement(s) contained in the while repetition structure constitute the body (suite) of
the while. The while structure body can consist of a single statement or multiple statements.
Eventually, the condition should evaluate to false (in the above example, when the last item
on the shopping list has been purchased and crossed off the list). At this point, the repetition
terminates, and the program executes the first statement after the repetition structure.
Common Programming Error 3.6
A logic error, called an infinite loop (the repetition structure never terminates), occurs when
an action that causes the condition in the while structure to become false is missing from
the body of a while structure. 3.6
true
product <= 1000 product = 2 * product
false
Imagine a bin of empty while structures that can be stacked and nested with other con-
trol structures to implement an algorithm’s flow of control. The empty rectangles and dia-
monds are then filled in with appropriate actions and decisions. The flowchart shows the
repetition. The flowline emerging from the rectangle wraps back to the decision that is tested
each time through the loop until the decision becomes false. Then, the while structure exits
and control passes to the next statement in the program.
Enter grade: 98
Enter grade: 76
Enter grade: 71
Enter grade: 87
Enter grade: 83
Enter grade: 90
Enter grade: 57
Enter grade: 79
Enter grade: 82
Enter grade: 94
Class average is 81
Lines 5–6 are assignment statements that initialize total to 0 and gradeCounter
to 1. Line 9 indicates that the while structure should continue as long as grade-
Counter’s value is less than or equal to 10.
Lines 10–11 correspond to the pseudocode statement Input the next grade. Function
raw_input displays the prompt “Enter grade:” on the screen and accepts user input.
Line 11 converts the user-entered string to an integer.
Next, the program updates total with the new grade entered by the user—line 12
adds grade to the previous value of total and assigns the result to total.
Then, the program increments the variable gradeCounter to indicate that a grade
has been processed. Line 13 increments gradeCounter by one, allowing the condition
in the while structure to evaluate to false and terminate the loop.
Line 16 executes after the while structure terminates and assigns the results of the
average calculation to variable average. Line 17 displays the string "Class average
is", followed by a space (inserted by print), followed by the value of variable
average.
pythonhtp1_03.fm Page 83 Saturday, December 8, 2001 9:34 AM
Note that the averaging calculation in the program produces an integer result. Actually,
the sum of the grades in this example is 817, which, when divided by 10, yields 81.7—a
number with a decimal point. We discuss how to deal with floating-point numbers in the
next section.
In Fig. 3.10, if line 16 used gradeCounter rather than 10 for the calculation, the
output for this program would display an incorrect value, 74, because gradeCounter con-
tains the values 11, after the termination of the while loop. Fig. 3.11 uses an interactive ses-
sion to demonstrate the value of gradeCounter after the while loop iterates ten times.
The preceding Software Engineering Observation often is all you need for the first
refinement in the top-down process. To proceed to the next level of refinement (i.e., the
second refinement), we commit to specific variables. The program needs to maintain a run-
ning total of the numbers, a count of how many numbers have been processed, a variable
that contains the value of each grade and a variable that contains the calculated average.
The pseudocode statement
Initialize variables
can be refined as follows:
Initialize total to zero
Initialize counter to zero
The pseudocode statement
Input, sum and count the quiz grades
requires a repetition structure (i.e., a loop) that successively inputs each grade. We do not
know how many grades will be entered, so we use sentinel-controlled repetition. The user
inputs legitimate grades successively. After the last legitimate grade has been entered, the
user inputs the sentinel value. The program tests for the sentinel value after each grade is
input and terminates the loop when it has been entered. The second refinement of the pre-
ceding pseudocode statement is
pythonhtp1_03.fm Page 85 Saturday, December 8, 2001 9:34 AM
In Fig. 3.9 and Fig. 3.12, we included some blank lines in the pseudocode to improve
the readability of the pseudocode. The blank lines separate these statements into their var-
ious phases.
The pseudocode algorithm in Fig. 3.12 solves the more general class-averaging
problem. This algorithm was developed after two refinements; sometimes more refine-
ments are necessary.
Software Engineering Observation 3.6
The programmer terminates the top-down, stepwise refinement process when the pseudocode
algorithm is specified in sufficient detail for the programmer to convert the pseudocode to
Python. After this step, implementing the Python program normally is straightforward. 3.6
Figure 3.13 shows the Python program and a sample execution. Although each grade
is an integer, the averaging calculation is likely to produce a number with a decimal point,
(i.e., a real number). The integer data type cannot represent real numbers. The program
uses the floating-point data type to handle numbers with decimal points and introduces
function float, which forces the averaging calculation to produce a floating-point
numeric result.
In this example, we see that control structures can be stacked on top of one another (in
sequence) just as a child stacks building blocks. The while structure (lines 12–16) is
immediately followed by an if/else structure (lines 19–23) in sequence. Much of the
code in this program is identical to the code in Fig. 3.10, so in this section, we will concen-
trate on the new features and issues.
Line 6 initializes the variable gradeCounter to 0, because no grades have been
entered. To keep an accurate record of the number of grades entered, variable grade-
Counter is incremented only when a grade value is entered.
Good Programming Practice 3.5
In a sentinel-controlled loop, the prompts requesting data entry should explicitly remind the
user of the sentinel value. 3.5
Study the difference between the program logic for sentinel-controlled repetition in
Fig. 3.13 and counter-controlled repetition in Fig. 3.10. In counter-controlled repetition,
the program reads a value from the user during each pass of the while structure, for a
specified number of passes. In sentinel-controlled repetition, the program reads one value
(lines 9–10) before the program reaches the while structure. This value determines
whether the program’s flow of control should enter the body of the while structure. If the
while structure condition is false (i.e., the user has already typed the sentinel), the pro-
gram does not execute the while loop (no grades were entered). On the other hand, if the
condition is true, the program executes the while loop and processes the value entered by
the user (i.e., adds the grade to total). After processing the grade, the program
requests the user to enter another grade. After executing the last (indented) line of the
while loop (line 16), execution continues with the next test of the while structure con-
dition, using the new value just entered by the user to determine whether the while struc-
ture’s body should execute again. Notice that the program requests the next value before
evaluating the while structure. This allows for determining whether the value just entered
by the user is the sentinel value before processing the value (i.e., adding it to total). If
the value entered is the sentinel value, the while structure terminates, and the value is not
added to total.
Lines 9–10 and 15–16 contain identical lines of code. In Section 3.15, we introduce
programming constructs that help the programmer avoid repeating code.
Averages do not always evaluate to integer values. Often, an average is a value that
contains a fractional part, such as 7.2 or –93.5. These values are referred to as floating-point
numbers.
The calculation total / gradeCounter results in an integer, because total and
counter contain integer values. Dividing two integers results in integer division, in which
any fractional part of the calculation is discarded (i.e., truncated). The calculation is per-
formed first, the fractional part is discarded before assigning the result to average. To
produce a floating-point calculation with integer values, convert one (or both) of the values
to a floating-point value with function float. Recall that functions are pieces of code that
accomplish a task; in line 20, function float converts the integer value of variable sum
to a floating-point value. The calculation now consists of a floating-point value divided by
the integer gradeCounter.
The Python interpreter knows how to evaluate expressions in which the data types of
the operands are identical. To ensure that the operands are of the same type, the interpreter
pythonhtp1_03.fm Page 88 Saturday, December 8, 2001 9:34 AM
performs an operation called promotion (also called implicit conversion) on selected oper-
ands. For example, in an expression containing integer and floating-point data, integer
operands are promoted to floating point. In our example, the value of gradeCounter is
promoted to a floating-point number. Then, the calculation is performed, and the result of
the floating-point division is assigned to variable average.
Common Programming Error 3.9
Assuming that all floating-point numbers are precise can lead to incorrect results. Most com-
puters approximate floating-point numbers. 3.9
Despite the fact that floating-point numbers are not precise, they have numerous appli-
cations. For example, when we speak of a “normal” body temperature of 98.6, we do not
need to be precise to a large number of digits. When we view the temperature on a ther-
mometer and read it as 98.6, it may actually be 98.5999473210643. The point here is that
calling this number simply 98.6 is adequate for most applications.
Another way floating-point numbers develop is through division. When we divide 10
by 3, the result is 3.3333333…, with the sequence of 3s repeating infinitely. The computer
allocates a fixed amount of space to hold such a value, so the stored floating-point value
only can be an approximation.
3. Two counters are used—one to count the number of students who passed the exam
and one to count the number of students who failed the exam.
4. After the program has processed all the results, it must decide if more than eight
students passed the exam.
Let us proceed with top-down, stepwise refinement. We begin with a pseudocode rep-
resentation of the top:
Analyze exam results and decide if tuition should be raised
Once again, it is important to emphasize that the top is a complete representation of the pro-
gram, but several refinements are likely to be needed before the pseudocode can evolve nat-
urally into a Python program. Our first refinement is
Initialize variables
Input the ten exam grades and count passes and failures
Print a summary of the exam results and decide if tuition should be raised
Here, too, even though we have a complete representation of the entire program, further re-
finement is necessary. We now commit to specific variables. We need counters to record
the passes and failures, a counter to control the looping process and a variable to store the
user input. The pseudocode statement
Initialize variables
can be refined as follows:
Initialize passes to zero
Initialize failures to zero
Initialize student counter to one
Notice that only the counters for the number of passes, number of failures and number of
students are initialized. The pseudocode statement
Input the ten exam grades and count passes and failures
requires a loop that successively inputs the result of each exam. Here it is known in advance
that there are precisely ten exam results, so counter-controlled looping is appropriate. In-
side the loop (i.e., nested within the loop), a double-selection structure determines whether
each exam result is a pass or a failure and increments the appropriate counter accordingly.
The refinement of the preceding pseudocode statement is
While student counter is less than or equal to ten
Input the next exam result
If the student passed
Add one to passes
else
Add one to failures
Add one to student counter
Notice the use of blank lines to set off the If/else control structure to improve program read-
ability. The pseudocode statement
Print a summary of the exam results and decide if tuition should be raised
pythonhtp1_03.fm Page 90 Saturday, December 8, 2001 9:34 AM
13
14 if result == 1:
15 passes = passes + 1
16 else:
17 failures = failures + 1
18
19 studentCounter = studentCounter + 1
20
21 # termination phase
22 print "Passed", passes
23 print "Failed", failures
24
25 if passes > 8:
26 print "Raise tuition"
Note that line 14 uses the equality operator (==) to test whether the value of variable
result equals 1. Be careful not to confuse the equality operator with the assignment
symbol (=). Such confusion can cause syntax or logic errors in Python.
Common Programming Error 3.10
Using the = symbol for equality in a conditional statement is a syntax error. 3.10
pythonhtp1_03.fm Page 92 Saturday, December 8, 2001 9:34 AM
c = c + 3
c += 3
The += symbol adds the value of the expression on the right of the += sign to the value of
the variable on the left of the sign and stores the result in the variable on the left of the sign.
Any statement of the form
where operator is a binary operator, such as +, -, **, *, /, or %, can be written in the form
Assignment Sample
symbol expression Explanation Assigns
Assume: c = 3, d = 5, e = 4, f = 2, g = 6, h = 12
+= c += 7 c = c + 7 10 to c
-= d -= 4 d = d - 4 1 to d
Assignment Sample
symbol expression Explanation Assigns
*= e *= 5 e = e * 5 20 to e
**= f **= 3 f = f ** 3 8 to f
/= g /= 3 g = g / 3 2 to g
%= h %= 9 h = h % 9 3 to h
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
Function range can take one, two or three arguments. If we pass one argument to the
function (as in Fig. 3.19), that argument, called end, is one greater than the upper bound
(highest value) of the sequence. In this case, range returns a sequence in the range:
0–( end-1 )
If we pass two arguments, the first argument, called start, is the lower bound—the
lowest value in the returned sequence—and the second argument is end. In this case,
range returns a sequence in the range:
( start )–( end-1 )
If we pass three arguments, the first two arguments are start and end, respectively,
and the third argument, called increment, is the increment value. The sequence pro-
duced by a call to range with an increment value progresses from start to end in mul-
tiples of the increment value. If increment is positive, the last value in the sequence is
the largest multiple less than end. The following three calls to range produce the same
sequence as in Fig. 3.19.
range( 10 )
range( 0, 10 )
range( 0, 10, 1 )
The increment value of range also can be negative. In this case, it is a decrement and
the sequence produced progresses downwards from start to end in multiples of the
increment value. The last value in the sequence is the smallest multiple greater than end
(Fig. 3.20).
The sequence used in a for structure does not have to be generated using the range
function. The general format of the for structure is
where sequence is a set of items (sequences are explained in detail in Chapter 5). At the first
iteration of the loop, variable element is assigned the first item in the sequence and state-
ment is executed. At each subsequent iteration of the loop, variable element is assigned the
next item in the sequence before the execution of statement. Once the loop has been exe-
cuted once for each item in the sequence, the loop terminates. In most cases, the for struc-
ture can be represented by an equivalent while structure, as in
initialization
while loopContinuationTest:
statement(s)
increment
where the initialization expression initializes the loop’s control variable, loopContinua-
tionTest is the loop-continuation condition and increment increments the control variable.
Common Programming Error 3.16
Creating a for structure that contains no body statements is a syntax error. 3.16
If the sequence part of the for structure is empty (i.e., the sequence contains no
values), the program does not perform the body of the for structure. Instead, execution
proceeds with the statement following the for structure.
The flowchart of the for structure is similar to that of the while structure.
Figure 3.21 illustrates the flowchart of the following for statement
for x in y:
print x
The flowchart shows the initialization and the update processes. Note that update occurs
each time after the program performs the body statement. Besides small circles and arrows,
the flowchart contains only rectangle symbols and a diamond symbol. The programmer
fills the rectangles and diamonds with actions and decisions appropriate to the algorithm.
Establish initial
value of control x = first item in y
variable
Determine if final
value of control more items to true x = next
print x
variable has been process item in y
processed
false Body of loop (this Update the con-
may be many trol variable
statements) (Python does this
automatically)
Sum is 2550
The output statement before the for loop (line 7) and the output statement in the for
loop (line 11) combine to print the values of the variables year and amount with the for-
matting specified by the % formatting operator specifications. The characters %4d specify
that the year column is printed with a field width of four (i.e., the value is printed with at
least four character positions). If the value to be output is fewer than four character posi-
tions wide, the value is right justified in the field by default. If the value to be output is more
than four character positions wide, the field width is extended to accommodate the entire
value.
The characters %21.2f indicate that variable amount is printed as a float-point value
(specified with the character f) with a decimal point. The column has a total field width of
21 character positions and two digits of precision to the right of the decimal point; the total
field width includes the decimal point and the two digits to its right, hence 18 of the 21 posi-
tions appear to the left of the decimal point.
Notice that the variables amount, principal and rate are floating point values.
We did this for simplicity, because we are dealing with fractional parts of dollars and thus
need a type that allows decimal points in its values. Unfortunately, this can cause trouble.
Here is an example of what can go wrong when using floating point values to represent
dollar amounts (assuming that dollar amounts are displayed with two digits to the right of
the decimal point): Two dollar amounts stored in the machine could be 14.234 (which
would normally be rounded to 14.23 for display purposes) and 18.673 (which would nor-
mally be rounded to 18.67 for display purposes). When these amounts are added, they pro-
duce the internal sum 32.907, which would normally be rounded to 32.91 for display
purposes. Thus, your printout could appear as
pythonhtp1_03.fm Page 100 Saturday, December 8, 2001 9:34 AM
14.23
+ 18.67
-------
32.91
but a person adding the individual numbers as printed would expect the sum to be 32.90.
You have been warned!
Good Programming Practice 3.10
Be careful when using floating-point values to perform monetary calculations. Rounding er-
rors may lead to undesired results. 3.10
Note that the body of the for structure contains the calculation 1.0 + rate (line 10).
In fact, this calculation produces the same result each time through the loop, so repeating
the calculation is wasteful. A better solution would be to define a variable (e.g., final-
Rate that references the value of 1.0 + rate before the start of the for structure. Then,
replace the calculation 1.0 + rate (line 10) with variable finalRate.
Performance Tip 3.3
Avoid placing expressions whose values do not change inside loops. 3.3
1 2 3 4
Broke out of loop at x = 5
statement. In this case, the increment is not executed before the repetition-continuation
condition is tested, and the while does not execute in the same manner as the for.
Figure 3.26 uses the continue statement in a for structure to skip the output statement
in the structure and begin the next iteration of the loop.
Good Programming Practice 3.11
Some programmers feel that break and continue violate structured programming. Be-
cause the effects of these statements can be achieved by structured programming techniques
we discuss, these programmers do not use break and continue. 3.11
This if statement contains two simple conditions. The condition gender == "Female"
is evaluated here to determine whether a person is a female. The condition age >= 65 is
evaluated to determine whether a person is a senior citizen. The simple condition to the left
of the and operator is evaluated first, because the precedence of == is higher than the pre-
cedence of and. If necessary, the simple condition to the right of the and operator is eval-
uated next, because the precedence of >= is higher than the precedence of and (as we will
discuss shortly, the right side of a logical AND expression is evaluated only if the left side
is true). The if statement then considers the combined condition:
1 2 3 4 6 7 8 9 10
Used continue to skip printing the value 5
This condition is true only if both of the simple conditions are true. Finally, if this combined
condition is indeed true, then the count of seniorFemales is incremented by 1. If either
or both of the simple conditions are false, then the program skips the incrementing and pro-
ceeds to the statement following the if. The preceding combined condition can be made
more readable by adding redundant parentheses
The table of Fig. 3.27 summarizes the and operator. The table shows all four possible
combinations of false and true values for expression1 and expression2. Such
tables are often called truth tables.
Python evaluates to false or true all expressions that include relational operators and
equality operators. A simple condition (e.g., age >= 65 ) that is false evaluates to the
integer value 0; a simple condition that is true evaluates to the integer value 1. A Python
expression that evaluates to the value 0 is false; a Python expression that evaluates to a non-
zero integer value is true. The interactive session of Fig. 3.28 demonstrates these concepts.
Lines 5–10 of the interactive session demonstrate that the value 0 is false. Lines 11–18
show that any non-zero integer value is true. The simple condition in line 19 evaluates to
true (line 20). The combined conditions in lines 21 and 23 demonstrate the return values of
the and operator. If a combined condition evaluates to false (line 21), the and operator
returns the first value which evaluated to false (line 22). Conversely, if the combined con-
dition evaluates to true (line 23), the and operator returns the last value in the condition
(line 24).
Now let us consider the or (logical OR) operator. Suppose we wish to ensure at some
point in a program that either one or both of two conditions are true before we choose a
certain path of execution. In this case, we use the or operator, as in the following program
segment:
if semesterAverage >= 90 or finalExam >= 90:
print "Student grade is A"
Fig. 3.27 Truth table for the and (logical AND) operator.
pythonhtp1_03.fm Page 104 Saturday, December 8, 2001 9:34 AM
This preceding condition also contains two simple conditions. The simple condition
semesterAverage >= 90 is evaluated to determine whether the student deserves an
“A” in the course because of a solid performance throughout the semester. The simple con-
dition finalExam >= 90 is evaluated to determine whether the student deserves an “A”
in the course because of an outstanding performance on the final exam. The if statement
then considers the combined condition
semesterAverage >= 90 or finalExam >= 90
and awards the student an “A” if either one or both of the simple conditions are true. Note
that the message Student grade is A is not printed when both of the simple conditions
are false. Fig. 3.29 is a truth table for the logical OR operator (or).
If a combined condition evaluates to true, the or operator returns the first value which
evaluated to true. Conversely, if the combined condition evaluates to false, the or operator
returns the last value in the condition.
The and operator has a higher precedence than the or operator. Both operators asso-
ciate from left to right. An expression containing and or or operators is evaluated until its
truth or falsity is known. This is called short circuit evaluation. Thus, evaluation of the
expression
will stop immediately if gender is not equal to "Female" (i.e., the entire expression is
false), but continue if gender is equal to "Female" (i.e., the entire expression could still
be true, if the condition age >= 65 is true).
Performance Tip 3.4
In expressions using operator and, if the separate conditions are independent of one anoth-
er, make the condition that is more likely to be false the left-most condition. In expressions
using operator or, make the condition that is more likely to be true the left-most condition.
This approach can reduce a program’s execution time. 3.4
Figure 3.30 is a truth table for the logical negation operator. In many cases, the programmer
can avoid using logical negation by expressing the condition differently with an appropriate
relational or equality operator. For example, the preceding statement can also be written as
follows:
if grade != sentinelValue:
print "The next grade is", grade
This flexibility can often help a programmer express a condition in a more “natural” or con-
venient manner.
false true
true false
Figure 3.31 shows the precedence and associativity of the Python operators introduced
to this point. The operators are shown from top to bottom, in decreasing order of prece-
dence.
while structure
R e p e t i t io n
for structure
T
T
F
(double selection)
if/else structure
T
S e le c t i o n
if/elif/else structure
(multiple selection)
F
(single selection)
if structure
T
T
T
T
F
.
.
.
F
S e q u e nc e
.
.
.
3) Any rectangle (action) can be replaced by any control structure (sequence, if, if/else,
if/elif/else, while or for).
4) Rules 2 and 3 can be applied as often as you like and in any order.
Applying the rules of Fig. 3.33 always results in a structured flowchart with a neat,
building-block appearance. For example, repeatedly applying rule 2 to the simplest flowchart
results in a structured flowchart containing many rectangles in sequence (Fig. 3.35). Notice
that rule 2 generates a stack of control structures, so let us call rule 2 the stacking rule.
Rule 3 is called the nesting rule. Repeatedly applying rule 3 to the simplest flowchart
results in a flowchart with neatly nested control structures. For example, in Fig. 3.36, the
rectangle in the simplest flowchart is first replaced with a double-selection (if/else)
structure. Then rule 3 is applied again to both of the rectangles in the double-selection struc-
ture, replacing each of these rectangles with double-selection structures. The dashed boxes
around each of the double-selection structures represent the rectangles that were replaced.
.
.
.
Fig. 3.35 Applying (repeatedly) rule 2 of Fig. 3.33 to the simplest flowchart.
pythonhtp1_03.fm Page 109 Saturday, December 8, 2001 9:34 AM
Rule 3
Rule 3 Rule 3
Rule 4 generates larger, more involved and more deeply nested structures. The flow-
charts that emerge from applying the rules in Fig. 3.33 constitute the set of all possible
structured flowcharts and hence the set of all possible structured programs.
The beauty of the structured approach is that we use only six simple single-entry/
single-exit pieces, and we assemble them in only two simple ways. Figure 3.37 shows the
kinds of stacked building blocks that emerge from applying rule 2 and the kinds of nested
building blocks that emerge from applying rule 3. The figure also shows the kind of over-
lapped building blocks that cannot appear in structured flowcharts (because of the elimina-
tion of the goto statement).
If the rules in Fig. 3.33 are followed, an unstructured flowchart (such as that in
Fig. 3.38) cannot be created. If you are uncertain of whether a particular flowchart is struc-
tured, apply the rules of Fig. 3.33 in reverse to try to reduce the flowchart to the simplest
flowchart. If the flowchart is reducible to the simplest flowchart, the original flowchart is
structured; otherwise, it is not.
pythonhtp1_03.fm Page 110 Saturday, December 8, 2001 9:34 AM
Structured programming promotes simplicity. Bohm and Jacopini have given us the
result that only three forms of control are needed:
• Sequence
• Selection
• Repetition
Sequence is trivial. Selection is implemented in one of three ways:
• if structure (single selection)
• if/else structure (double selection)
• if/elif/else structure (multiple selection)
In fact, it is straightforward to prove that the simple if structure is sufficient to provide any
form of selection—everything that can be done with the if/else structure and the if/
elif/else structure can be implemented by combining if structures (although perhaps
not as clearly and efficiently).
Repetition is implemented in one of two ways:
• while structure
• for structure
pythonhtp1_03.fm Page 111 Saturday, December 8, 2001 9:34 AM
It is straightforward to prove that the while structure is sufficient to provide any form of
repetition. Everything that can be done with the for structure can be done with the while
structure (although perhaps not as smoothly).
Combining these results illustrates that any form of control ever needed in a Python
program can be expressed in terms of the following:
• sequence
• if structure (selection)
• while structure (repetition)
Also, these control structures can be combined in only two ways—stacking and nesting. In-
deed, structured programming promotes simplicity.
In this chapter, we discussed how to compose programs from control structures con-
taining actions and decisions. In Chapter 4, Functions, we introduce another program-
structuring unit, called the function. We learn to compose large programs by combining
functions that, in turn, are composed of control structures. We also discuss how functions
promote software reusability. In Chapter 7, Object-Oriented Programming, we introduce
Python’s other program-structuring unit, called the class. We then create objects from
classes and proceed with our treatment of object-oriented programming (OOP).
SUMMARY
• Any computing problem can be solved by executing a series of actions in a specified order. An
algorithm solves problems in terms of the actions to be executed and the order in which these ac-
tions are executed.
• Specifying the order in which statements execute in a computer program is called program control.
• Pseudocode is an artificial and informal language that helps programmers develop algorithms.
Pseudocode is similar to everyday English; it is convenient and user-friendly, although it is not an
actual computer programming language.
• A carefully prepared pseudocode program can be converted easily to a corresponding Python pro-
gram. In many cases, this is done simply by replacing pseudocode statements with their Python
equivalents.
• Normally, statements in a program execute successively in the order in which they appear. This is
called sequential execution. Various Python statements enable the programmer to specify that the
next statement to be executed may be other than the next one in sequence. This is called transfer
of control.
• The goto statement allows a programmer to specify a transfer of control to one of a wide range
of possible destinations in a program.
• The research of Bohm and Jacopini demonstrated that programs could be written without any
goto statements. The challenge of the era became for programmers to shift their styles to “goto-
less programming.”
• Bohm and Jacopini demonstrated that all programs could be written in terms of only three control
structures—the sequence, selection and repetition structures.
• The sequence structure is built into Python. Unless directed otherwise, the computer executes Py-
thon statements sequentially.
• A flowchart is a graphical representation of an algorithm or of a portion of an algorithm. Flow-
charts are drawn using certain special-purpose symbols, such as rectangles, diamonds, ovals and
small circles; these symbols are connected by arrows called flowlines.
pythonhtp1_03.fm Page 112 Saturday, December 8, 2001 9:34 AM
• Like pseudocode, flowcharts aid in the development and representation of algorithms. Although
most programmers prefer pseudocode, flowcharts nicely illustrate how control structures operate.
• The rectangle symbol, also called the action symbol, indicates an action, including a calculation or
an input/output operation. Python allows for as many actions as necessary in a sequence structure.
• Perhaps the most important flowchart symbol is the diamond symbol, also called the decision sym-
bol, which indicates a decision is to be performed.
• Python provides three types of selection structures: if, if/else and if/elif/else.
• The if selection structure either performs (selects) an action if a condition (predicate) is true or
skips the action if the condition is false.
• The if/else selection structure performs an action if a condition is true or performs a different
action if the condition is false.
• The if/elif/else selection structure performs one of many different actions, depending on the
validity of several conditions.
• The if selection structure is a single-selection structure—it selects or ignores a single action. The
if/else selection structure is a double-selection structure—it selects between two different ac-
tions. The if/elif/else selection structure is a multiple-selection structure—it selects from
many possible actions.
• Python provides two types of repetition structures: while and for.
• The words if, elif, else, while and for are Python keywords. These keywords are reserved
by the language to implement various Python features, such as control structures. Keywords can-
not be used as identifiers (e.g., variable names).
• Python has six control structures: sequence, three types of selection and two types of repetition.
Each Python program is formed by combining as many control structures of each type as is appro-
priate for the algorithm the program implements.
• Single-entry/single-exit control structures make it easy to build programs—the control structures
are attached to one another by connecting the exit point of one control structure to the entry point
of the next. This is similar to the way a child stacks building blocks; hence, the term control-struc-
ture stacking.
• Indentation emphasizes the inherent structure of structured programs and, unlike in most other
programming languages, is actually required in Python.
• Nested if/else structures test for multiple cases by placing if/else selection structures inside
other if/else selection structures.
• Nested if/else structures and the multiple-selection if/elif/else structure are equivalent.
The latter form is popular because it avoids deep indentation of the code. Such indentation often
leaves little room on a line, forcing lines to be split over multiple lines and decreasing program
readability.
• The else block of the if/elif/else structure is optional. However, most programmers in-
clude an else block at the end of a series of elif blocks to handle any condition that does not
match the conditions specified in the elif statements. If an if/elif statement specifies an
else block, the else block must be the last block in the statement.
• The if selection structure can contain several statements in the body of an if statement, and all
these statements must be indented. A set of statements contained within an indented code block is
called a suite.
• A fatal logic error causes a program to fail and terminate prematurely. For fatal errors, Python
prints an error message called a traceback and exits. A nonfatal logic error allows a program to
continue executing, but might produce incorrect results.
pythonhtp1_03.fm Page 113 Saturday, December 8, 2001 9:34 AM
• Just as multiple statements can be placed anywhere a single statement can be placed, it is possible
to have no statements at all, (i.e., empty statements). The empty statement is represented by plac-
ing keyword pass where a statement normally resides.
• A repetition structure allows the programmer to specify that a program should repeat an action
while some condition remains true.
• Counter-controlled repetition uses a variable called a counter to control the number of times a set
of statements executes. Counter-controlled repetition often is called definite repetition because the
number of repetitions must be known before the loop begins executing.
• A sentinel value (also called a signal value, a dummy value or a flag value) indicates “end of data
entry.” Sentinel-controlled repetition often is called indefinite repetition because the number of
repetitions is not known before the start of the loop.
• In top-down, stepwise refinement, which is essential to the development of well-structured pro-
grams, the top is a single statement that conveys the overall function of the program. As such, the
top is, in effect, a complete representation of a program. Thus, it is necessary to divide (refine) the
top into a series of smaller tasks and list these in the order in which they need to be performed.
• Floating-point numbers contain a decimal point, as in 7.2 or –93.5.
• Dividing two integers results in integer division, in which any fractional part of the calculation is
discarded (i.e., truncated).
• To produce a floating-point calculation with integer values, convert one (or both) of the values to
a floating-point value with function float.
• The Python interpreter evaluates expressions in which the data types of the operands are identical.
To ensure that the operands are of the same type, the interpreter performs an operation called pro-
motion (also called implicit conversion) on selected operands.
• Python provides several augmented assignment symbols for abbreviating assignment expressions
run together.
• Any statement of the form variable = variable operator expression where operator is a binary
operator, such as +, -, **, *, /, and %, can be written in the form variable operator= expression.
• Function range can take one, two or three arguments. If we pass one argument to the function,
that argument, called end, is one greater than the upper bound (highest value) of the sequence.
• If we pass two arguments, the first argument, called start, is the lower bound—the lowest value
in the returned sequence—and the second argument is end.
• If we pass three arguments, the first two arguments are start and end, respectively, and the third
argument, called increment, is the increment value. The sequence produced by a call to range
with an increment value progresses from start to end in multiples of the increment value. If
increment is positive, the last value in the sequence is the largest multiple less than end.
• The increment value of range also can be negative. In this case, it is a decrement and the se-
quence produced progresses downwards from start to end in multiples of the increment value.
The last value in the sequence is the smallest multiple greater than end.
• The break statement, when executed in a while or for structure, causes immediate exit from
that structure. Program execution continues with the first statement after the structure.
• The continue statement, when executed in a while or a for structure, skips the remaining
statements in the body of that structure and proceeds with the next iteration of the loop.
• Python provides logical operators that form more complex conditions by combining simple con-
ditions. The logical operators are and (logical AND), or (logical OR) and not (logical NOT, also
called logical negation).
pythonhtp1_03.fm Page 114 Saturday, December 8, 2001 9:34 AM
• Python evaluates to false or true all expressions that include relational operators and equality op-
erators. A simple condition (e.g., age >= 65 ) that is false evaluates to the integer value 0; a
simple condition that is true evaluates to the integer value 1. A Python expression that evaluates
to the value 0 is false; a Python expression that evaluates to a non-zero integer value is true.
• If a combined condition evaluates to false, the and operator returns the first value which evaluated
to false. Conversely, if the combined condition evaluates to true, the and operator returns the last
value in the condition.
• If a combined condition evaluates to true, the or operator returns the first value which evaluated
to true. Conversely, if the combined condition evaluates to false, the or operator returns the last
value in the condition.
• The and operator has a higher precedence than the or operator. Both operators associate from left
to right. An expression containing and or or operators is evaluated until its truth or falsity is
known. This is called short circuit evaluation.
• The not (logical negation) operator enables a programmer to “reverse” the meaning of a condi-
tion. Unlike the and and or operators, which combine two conditions (binary operators), the log-
ical negation operator has a single condition as an operand (i.e., not is a unary operator).
TERMINOLOGY
action/decision model of programming function
action symbol goto elimination
algorithm goto statement
and (logical AND) operator if selection structure
augmented addition assignment symbol if/elif/else selection structure
augmented assignment statement if/else selection structure
augmented assignment symbol implicit conversion
break statement increment argument of range function
compound statement increment value
connector symbols indefinite repetition
continue statement initialization phase
control structure int function
control-structure nesting keyword
control-structure stacking list
counter logic error
counter-controlled repetition logical negation
decision symbol logical operator
default condition loop-continuation test
definite repetition lower bound
double-selection structure multiple-selection structure
diamond symbol nested if/else structure
dummy value nesting
empty statement nesting rule
end argument of range function nonfatal logic error
exception handling not (logical NOT) operator
fatal logic error off-by-one error
first refinement or (logical OR) operator
flag value oval symbol
float function pass keyword
flowchart procedure
for repetition structure processing phase
pythonhtp1_03.fm Page 115 Saturday, December 8, 2001 9:34 AM
SELF-REVIEW EXERCISES
3.1 Fill in the blanks in each of the following statements:
a) The if/elif/else structure is a structure.
b) The words if and else are examples of reserved words called Python .
c) Sentinel-controlled repetition is called because the number of repetitions is
not known before the loop begins executing.
d) The augmented assignment symbol *= performs .
e) Function creates a sequence of integers.
f) A procedure for solving a problem is called a(n) .
g) The keyword represents an empty statement.
h) A set of statements within an indented code block in Python is called a .
i) All programs can be written in terms of three control structures, namely, ,
and .
j) A is a graphical representation of an algorithm.
3.2 State whether each of the following is true or false. If false, explain why.
a) Pseudocode is a simple programming language.
b) The if selection structure performs an indicated action when the condition is true.
c) The if/else selection structure is a single-selection structure.
d) A fatal logic error causes a program to execute and produce incorrect results.
e) A repetition structure performs the statements in its body while some condition remains
true.
f) Function float converts its argument to a floating-point value.
g) The exponentiation operator ** associates left to right.
h) Function call range( 1, 10 ) returns the sequence 1 to 10, inclusive.
i) Sentinel-controlled repetition uses a counter variable to control the number of times a set
of instructions executes.
j) The symbol = tests for equality.
3.2 a) False. Pseudocode is an artificial and informal language that helps programmers develop
algorithms. b) True. c) False. The if/else selection structure is a double-selection structure—it se-
lects between two different actions. d) False. A fatal logic error causes a program to terminate.
e) True. f) True. g) False. The exponentiation operator associates from right to left. h) False. Function
call range( 1, 10) returns the sequence 1–9, inclusive. i) False. Counter-controlled repetition uses
a counter variable to control the number of repetitions; sentinel-control repetition waits for a sentinel
value to stop repetition. j) False. The operator == tests for equality; the symbol = is for assignment.
EXERCISES
3.3 Drivers are concerned with the mileage obtained by their automobiles. One driver has kept
track of several tankfuls of gasoline by recording miles driven and gallons used for each tankful. De-
velop a Python program that prompts the user to input the miles driven and gallons used for each tank-
ful. The program should calculate and display the miles per gallon obtained for each tankful. After
processing all input information, the program should calculate and print the combined miles per gal-
lon obtained for all tankful (= total miles driven divide by total gallons used).
3.4 A palindrome is a number or a text phrase that reads the same backwards or forwards. For
example, each of the following five-digit integers is a palindrome: 12321, 55555, 45554 and 11611.
Write a program that reads in a five-digit integer and determines whether it is a palindrome. (Hint:
Use the division and modulus operators to separate the number into its individual digits.)
3.5 Input an integer containing 0s and 1s (i.e., a “binary” integer) and print its decimal equiva-
lent. Appendix C, Number Systems, discusses the binary number system. (Hint: Use the modulus and
division operators to pick off the “binary” number’s digits one at a time from right to left. Just as in
the decimal number system, where the rightmost digit has the positional value 1 and the next digit
leftward has the positional value 10, then 100, then 1000, etc., in the binary number system, the right-
most digit has a positional value 1, the next digit leftward has the positional value 2, then 4, then 8,
etc. Thus, the decimal number 234 can be interpreted as 2 * 100 + 3 * 10 + 4 * 1. The decimal equiv-
alent of binary 1101 is 1 * 8 + 1 * 4 + 0 * 2 + 1 * 1.)
3.6 The factorial of a nonnegative integer n is written n! (pronounced “n factorial”) and is defined
as follows:
n! = n · (n - 1) · (n - 2) · … · 1 (for values of n greater than or equal to 1)
and
n! = 1 (for n = 0).
For example, 5! = 5 · 4 · 3 · 2 · 1, which is 120. Factorials increase in size very rapidly. What is the
largest factorial that your program can calculate before leading to an overflow error?
a) Write a program that reads a nonnegative integer and computes and prints its factorial.
pythonhtp1_03.fm Page 117 Saturday, December 8, 2001 9:34 AM
b) Write a program that estimates the value of the mathematical constant e by using the for-
mula [Note: Your program can stop after summing 10 terms.]
1- + ----
1- + ----
1- + …
e = 1 + ----
1! 2! 3!
c) Write a program that computes the value of ex by using the formula [Note: Your program
can stop after summing 10 terms.]
2 3
x x- x x
e = 1 + ---- + ----- + ----- + …
1! 2! 3!
3.7 Write a program that prints the following patterns separately, one below the other each pat-
tern separated from the next by one blank line. Use for loops to generate the patterns. All asterisks
(*) should be printed by a single statement of the form
print '*',
(which causes the asterisks to print side by side separated by a space). (Hint: The last two patterns
require that each line begin with an appropriate number of blanks.) Extra credit: Combine your code
from the four separate problems into a single program that prints all four patterns side by side by
making clever use of nested for loops. For all parts of this program—minimize the numbers of
asterisks and spaces and the number of statements that print these characters.
3.8 (Pythagorean Triples) A right triangle can have sides that are all integers. The set of three
integer values for the sides of a right triangle is called a Pythagorean triple. These three sides must
satisfy the relationship that the sum of the squares of two of the sides is equal to the square of the
hypotenuse. Find all Pythagorean triples for side1, side2 and hypotenuse all no larger than 20.
Use a triple-nested for-loop that tries all possibilities. This is an example of “brute force” comput-
ing. You will learn in more advanced computer science courses that there are many interesting prob-
lems for which there is no known algorithmic approach other than sheer brute force.
Pythonhtp1_04.fm Page 118 Saturday, December 8, 2001 9:34 AM
4
Functions
Objectives
• To understand how to construct programs modularly
from small pieces called functions.
• To create new functions.
• To understand the mechanisms of exchanging
information between functions.
• To introduce simulation techniques using random
number generation.
• To understand how the visibility of identifiers is
limited to specific regions of programs.
• To understand how to write and use recursive
functions, i.e., functions that call themselves.
• To introduce default and keyword arguments.
Form ever follows function.
Louis Henri Sullivan
E pluribus unum.
(One composed of many.)
Virgil
O! call back yesterday, bid time return.
William Shakespeare
Richard II
When you call me that, smile.
Owen Wister
Pythonhtp1_04.fm Page 119 Saturday, December 8, 2001 9:34 AM
Outline
4.1 Introduction
4.2 Program Components in Python
4.3 Functions
4.4 Module math Functions
4.5 Function Definitions
4.6 Random-Number Generation
4.7 Example: A Game of Chance
4.8 Scope Rules
4.9 Keyword import and Namespaces
4.9.1 Importing one or more modules
4.9.2 Importing identifiers from a module
4.9.3 Binding names for modules and module identifiers
4.10 Recursion
4.11 Example Using Recursion: The Fibonacci Series
4.12 Recursion vs. Iteration
4.13 Default Arguments
4.14 Keyword Arguments
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
4.1 Introduction
Most computer programs that solve real-world problems are larger than the programs pre-
sented in the previous chapters. Experience has shown that the best way to develop and
maintain a large program is to construct it from smaller pieces or components, each of
which is more manageable than the original program. This technique is called divide and
conquer. This chapter describes many features of the Python language that facilitate the de-
sign, implementation, operation and maintenance of large programs.
actual statements defining the function are written only once, but may be called upon “to
do their job” from many points throughout a program. Thus functions are a fundamental
unit of software reuse in Python because functions allow us to reuse program code.
Python modules provide functions that perform such common tasks as mathematical
calculations, string manipulations, character manipulations, Web programming, graphics
programming and many other operations. These functions simplify the programmer’s
work, because the programmer does not have to write new functions to perform common
tasks. A collection of modules, the standard library, is provided as part of the core Python
language. These modules are located in the library directory of the Python installation (e.g.,
/usr/lib/python2.2 or /usr/local/lib/python2.2 on Unix/Linux; \Python\Lib or
\Python22\Lib on Windows).
Just as a module groups related definitions, a package groups related modules. The
package as a whole provides tools to help the programmer accomplish a general task (e.g.,
graphics or audio programming). Each module in the package defines classes, functions or
data that perform specific, related tasks (e.g., creating colors, processing .wav files and the
like). This text introduces many available Python packages, but creating a robust package
is a software engineering exercise beyond the scope of the text.
Good Programming Practice 4.1
Familiarize yourself with the collection of functions and classes in the core Python modules. 4.1
A function is invoked (i.e., made to perform its designated task) by a function call.
The function call specifies the function name and provides information (as arguments)
that the called function needs to perform its job. A common analogy for this is the hierar-
chical form of management. A boss (the calling function or caller) requests a worker (the
called function) to perform a task and return (i.e., report back) the results after performing
the task. The boss function is unaware of how the worker function performs its designated
tasks. The worker might call other worker functions, yet the boss is unaware of this deci-
sion. We will discuss how “hiding” implementation details promotes good software engi-
neering. Figure 4.1 shows the boss function communicating with worker functions
worker1, worker2 and worker3 in a hierarchical manner. Note that worker1 acts
as a boss function to worker4 and worker5. The boss function when calling
worker1 need not know about worker1’s relationship with worker4 and worker5.
Relationships among functions might not always be a hierarchical structure like the one in
this figure.
Pythonhtp1_04.fm Page 121 Saturday, December 8, 2001 9:34 AM
boss
worker4 worker5
4.3 Functions
Functions allow the programmer to modularize a program. All variables created in function
definitions are local variables—they are known only to the function in which they are de-
clared. Most functions have a list of parameters (which are also local variables) that pro-
vide the means for communicating information between functions.
There are several motivations for “functionalizing” a program. The divide-and-conquer
approach makes program development more manageable. Another motivation is software
reusability—using existing functions as building blocks for creating new programs. Software
reusability is a major benefit of object-oriented programming as we will see in Chapter 7,
Object-Based Programming, Chapter 8, Customizing Classes, and Chapter 9, Object-Based
Programming: Inheritance. With good function naming and definition, programs can be cre-
ated from standardized functions that accomplish specific tasks, rather than having to write
customized code for every task. A third motivation is to avoid repeating code in a program.
Packaging code as a function allows the code to be executed in several locations just by
calling the function rather than rewriting it in every instance it is used.
Software Engineering Observation 4.2
Each function should be limited to performing a single, well-defined task, and the function
name should effectively express that task. This promotes software reusability. 4.2
passed to the function, followed by a right parenthesis. To use a function that is defined in
a module, a program must import the module, using keyword import. After the module
has been imported, the program can invoke functions in that module, using the module’s
name, a dot (.) and the function call (i.e., moduleName.functionName()). The interactive
session in Fig. 4.2 demonstrates how to print the square root of 900 using the math
module.
When the line
print math.sqrt( 900 )
executes, the math module’s function sqrt calculates the square root of the number con-
tained in the parentheses (e.g., 900). The number 900 is the argument of the math.sqrt
function. The function returns (i.e., gives back as a result) the floating-point value 30.0,
which is displayed on the screen.
When the line
print math.sqrt( -900 )
executes, the function call generates an error, also called an exception, because function
sqrt cannot handle a negative argument. The interpreter displays information about this
error to the screen. Exceptions and exception handling are discussed in Chapter 12, Excep-
tion Handling.
Common Programming Error 4.1
Failure to import the math module when using math module functions is a runtime error.
A program must import each module before using its functions and variables. 4.1
Consider a program, with a user-defined function square, that calculates the squares
of the integers from 1 to 10 (Fig. 4.4). Functions must be defined before they are used.
Good Programming Practice 4.2
Place a blank line between function definitions to separate the functions and enhance pro-
gram readability. 4.2
Line 9 of the main program invokes function square (defined at lines 5–6) with the
statement
print square( x ),
Function square receives a copy of x in the parameter y.1 Then square calculates
y * y (line 6). The result is returned to the statement that invoked square. The function
call (line 9) evaluates to the value returned by the function. This value is displayed by the
print statement. The value of x is not changed by the function call. This process is re-
peated 10 times using the for repetition structure.
The format of a function definition is
1 4 9 16 25 36 49 64 81 100
1. Actually, y receives a reference to x, but y behaves as if it were a copy of x’s value. This is the
concept of pass-by-object-reference, which we introduce in Chapter 5, Lists, Tuples and Dictio-
naries.
Pythonhtp1_04.fm Page 125 Saturday, December 8, 2001 9:34 AM
When a function completes its task, the function returns control to the caller. There are
three ways to return control to the point from which a function was invoked. If the function
does not return a result explicitly, control is returned either when the last indented line is
reached or upon execution of the statement
return
In either case, the function returns None, a Python value that represents null—indicating
that no value has been declared—and evaluates to false in conditional expressions.
Pythonhtp1_04.fm Page 126 Saturday, December 8, 2001 9:34 AM
return expression
5 3 3 3 2
3 2 3 3 4
2 3 6 5 4
6 2 4 1 2
To show that these numbers occur with approximately equal likelihood, let us simulate
6000 rolls of a die (Fig. 4.7). Each integer from 1 to 6 should appear approximately 1000
times.
Face Frequency
1 946
2 1003
3 1035
4 1012
5 987
6 1017
Player rolled 2 + 5 = 7
Player wins
Player rolled 1 + 2 = 3
Player loses
Player rolled 1 + 5 = 6
Point is 6
Player rolled 1 + 6 = 7
Player loses
Player rolled 5 + 4 = 9
Point is 9
Player rolled 4 + 4 = 8
Player rolled 2 + 3 = 5
Player rolled 5 + 4 = 9
Player wins
Notice that the player must roll two dice on each roll. Function rollDice simulates
rolling the dice (lines 6–12). Function rollDice is defined once, but it is called from two
places in the program (lines 14 and 26). The function takes no arguments, so the parameter
list is empty. Function rollDice prints and returns the sum of the two dice (lines 10–12).
Pythonhtp1_04.fm Page 131 Saturday, December 8, 2001 9:34 AM
The game is reasonably involved. The player could win or lose on the first roll or on
any subsequent roll. The variable gameStatus keeps track of the win/loss status. Vari-
able gameStatus is one of the strings "WON", "LOST" or "CONTINUE". When the
player wins the game, gameStatus is set to "WON" (lines 17 and 29). When the player
loses the game, gameStatus is set to "LOST" (lines 19 and 31). Otherwise,
gameStatus is set to "CONTINUE", allowing the dice to be rolled again (line 21).
If the game is won or lost after the first roll, the body of the while structure (lines 25–
31) is skipped, because gameStatus is not equal to "CONTINUE" (line 25). Instead, the
program proceeds to the if/else structure (lines 33–36), which prints "Player wins"
if gameStatus equals "WON", but "Player loses" if gameStatus equals
"LOST".
If the game is not won or lost after the first roll, the value of sum is assigned to variable
myPoint (line 22). Execution proceeds with the while structure, because gameStatus
equals "CONTINUE". During each iteration of the while loop, rollDice is invoked to
produce a new sum (line 26). If sum matches myPoint, gameStatus is set to "WON"
(lines 28–29), the while test fails (line 25), the if/else structure prints "Player
wins" (lines 33–34) and execution terminates. If sum is equal to 7, gameStatus is set
to "LOST" (lines 30–31), the while test fails (line 25), the if/else statement prints
"Player loses" (lines 35–36) and execution terminates. Otherwise, the while loop
continues executing.
Note the use of the various program-control mechanisms discussed earlier. The craps
program uses one programmer-defined function—rollDice—and the while, if/else
and if/elif/else structures. The program uses both stacked control structures (the if/
elif/else in lines 16–23 and the while in lines 25–31) and nested control structures
(the if/elif in lines 28–31 is nested inside the while in lines 25–31).
print x
Before a value can be printed to the screen, Python must first find the identifier named x
and determine the value associated with that identifier. Namespaces store information
about an identifier and the value to which it is bound. Python defines three namespaces—
local, global and built-in. When a program attempts to access an identifier’s value, Python
searches the namespaces in a certain order—local, global and built-in namespaces—to see
whether and where the identifier exists.
2. Nested scopes are not discussed in this text. Nested scopes are a complex topic and were optional
in Python 2.1 but are mandatory in Python 2.2. Information about nested scopes can be found in
PEP 227 at www.python.org/peps/pep-0227.html.
Pythonhtp1_04.fm Page 132 Saturday, December 8, 2001 9:34 AM
The first namespace that Python searches is the local namespace, which stores bind-
ings created in a block. Function bodies are blocks, so all function parameters and any iden-
tifiers the function creates are stored in the function’s local namespace. Each function has
a unique local namespace—one function cannot access the local namespace of another
function. In the example above, Python first searches the function’s local namespace for an
identifier named x. If the function’s local namespace contains such an identifier, the func-
tion prints the value of x to the screen. If the function’s local namespace does not contain
an identifier named x (e.g., the function does not define any parameters or create any iden-
tifiers named x), Python searches the next outer namespace—the global namespace (some-
times called the module namespace).
The global namespace contains the bindings for all identifiers, function names and
class names defined within a module or file. Each module or file’s global namespace con-
tains an identifier called __name__ that states the module’s name (e.g., "math" or
"random"). When a Python interpreter session starts or when the Python interpreter
begins executing a program stored in a file, the value of __name__ is "__main__". In
the example above, Python searches for an identifier named x in the global namespace. If
the global namespace contains the identifier (i.e., the identifier was bound to the global
namespace before the function was called), Python stops searching for the identifier and the
function prints the value of x to the screen. If the global namespace does not contain an
identifier named x, Python searches the next outer namespace—the built-in namespace.
The built-in namespace contains identifiers that correspond to many Python functions
and error messages. For example, functions raw_input, int and range belong to the
built-in namespace. Python creates the built-in namespace when the interpreter starts, and
programs normally do not modify the namespace (e.g., by adding an identifier to the
namespace). In the example above, the built-in namespace does not contain an identifier
named x, so Python stops searching and prints an error message stating that the identifier
could not be found.
An identifier’s scope describes the region of a program that can access the identifier’s
value. If an identifier is defined in the local namespace (e.g., in a function), all statements
in the block may access that identifier. Statements that reside outside the block (e.g., in the
main portion of a program or in another function) cannot access the identifier. Once the
code block terminates (e.g., after a return statement), all identifiers in that block’s local
namespace “go out of scope” and are inaccessible.
If an identifier is defined in the global namespace, the identifier has global scope. A
global identifier is known to all code that executes, from the point at which the identifier is
created until the end of the file. Furthermore, if certain criteria are met, functions may
access global identifiers. We discuss this issue momentarily. Identifiers contained in built-
in namespaces may be accessed by code in programs, modules or functions.
One pitfall that can arise in a program that uses functions is called shadowing. When
a function creates a local identifier with the same name as an identifier in the module or
built-in namespaces, the local identifier shadows the global or built-in identifier. A logic
error can occur if the programmer references the local variable when meaning to reference
the global or built-in identifier.
Common Programming Error 4.6
Shadowing an identifier in the module or built-in namespace with an identifier in the local
namespace may result in a logic error. 4.6
Pythonhtp1_04.fm Page 133 Saturday, December 8, 2001 9:34 AM
Python provides a way for programmers to determine what identifiers are available
from the current namespace. Built-in function dir returns a list of these identifiers.
Figure 4.9 shows the namespace that Python creates when starting an interactive session.
Calling function dir tells us that the current namespace contains three identifiers:
__builtins__, __doc__ and __name__. The next command in the session prints the
value for identifier __name__, to demonstrate that this value is __main__ for an inter-
active session. The subsequent command prints the value for identifier __builtins__.
Notice that we get back a value indicating that this identifier is bound to a module. This
indicates that the identifier __builtins__ can be used to refer to the module
__builtin__.We explore this further in Section 4.9. The next command in the interac-
tive session creates a new identifier x and binds it to the value 3. Calling function dir
again reveals that identifier x has been added to the session’s namespace.
The interactive session in Fig. 4.9 only hints at a Python program’s powerful ability
to provide information about the identifiers in a program (or interactive session). This is
called introspection. Python provides many other introspective capabilities, including func-
tions globals and locals that return additional information about the global and local
namespaces, respectively.
Although functions help make a program easier to debug, scoping issues can introduce
subtle errors into a program if the developer is not careful. The program in Fig. 4.10 dem-
onstrates these issues, using global and local variables. Line 4 creates variable x with the
value 1. This variable resides in the global namespace for the program and has global scope.
In other words, variable x can be accessed and changed by any code that appears after line
4. This global variable is shadowed in any function that creates a local variable named x.
In the main program, line 22 prints the value of variable x (i.e., 1). Lines 24–25 assign the
value 7 to variable x and print its new value.
global x is 1
global x is 7
local x in a is 25 after entering a
local x in a is 26 before exiting a
global x is 7 on entering b
global x is 70 on exiting b
local x in a is 25 after entering a
local x in a is 26 before exiting a
global x is 70 on entering b
global x is 700 on exiting b
global x is 700
The program defines two functions that neither receive nor return any arguments.
Function a (lines 7–12) declares a local variable x and initializes it to 25. Then, function a
prints local variable x, increments it and prints it again (lines 10–12). Each time the pro-
Pythonhtp1_04.fm Page 135 Saturday, December 8, 2001 9:34 AM
gram invokes the function, function a recreates local variable x and initializes the variable
to 25, then increments it to 26.
Function b (lines 15–20) does not declare any variables. Instead, line 16 designates x
as having global scope with keyword global. Therefore, when function b refers to vari-
able x, Python searches the global namespace for identifier x. When the program first
invokes function b (line 28), the program prints the value of the global variable (7), multi-
plies the value by 10 and prints the value of the global variable (70) again before exiting
the function. The second time the program invokes function b (line 30), the global variable
contains the modified value, 70. Finally, line 32 prints the global variable x in the main pro-
gram again (700) to show that function b has modified the value of this variable.
3. Actually, function dir returns a list of attributes for the object passed as an argument. In the case of a module,
this information amounts to a list of all identifiers (e.g., functions and data) defined in the module.
Pythonhtp1_04.fm Page 136 Saturday, December 8, 2001 9:34 AM
The interactive session in Fig. 4.14 demonstrates that a program also may import all
identifiers defined in a module. The statement
imports all identifiers that do not start with an underscore from the math module into the
interactive session’s namespace. Now the programmer can invoke any of the functions
from the math module, without accessing the function through the dot access operator.
However, importing a module’s identifiers in this way can lead to serious errors and is con-
sidered a dangerous programming practice. Consider a situation in which a program had
defined an identifier named e and assigned it the string value "e". After executing the pre-
ceding import statement, identifier e is bound to the mathematical floating-point con-
stant e, and the previous value for e is no longer accessible. In general, a program should
never import all identifiers from a module in this way.
Testing and Debugging Tip 4.3
In general, avoid importing all identifiers from a module into the namespace of another mod-
ule. This method of importing a module should be used only for modules provided by trusted
sources, whose documentation explicitly states that such a statement may be used to import
the module. 4.3
imports the random module and places a reference to the module named random in the
namespace. In the interactive session in Fig. 4.15, the statement
import random as randomModule
also imports the random module, but the as clause of the statement allows the program-
mer to specify the name of the reference to the module. In this case, we create a reference
named randomModule. Now, if we want to access the random module, we use refer-
ence randomModule.
A program can also use an import/as statement to specify a name for an identifier
that the program imports from a module. The line
from math import sqrt as squareRoot
imports the sqrt function from module math and creates a reference to the function
named squareRoot. The programmer may now invoke the function with this reference.
Typically, module authors use import/as statements, because the imported element
may define names that conflict with identifiers already defined by the author’s module. With
the import/as statement, the module author can specify a new name for the imported ele-
ments and thereby avoid the naming conflict. Programmers also use the import/as state-
ment for convenience. A programmer may use the statement to rename a particularly long
identifier that the program uses extensively. The programmer specifies a shorter name for the
identifier, thus increasing readability and decreasing the amount of typing.
4.10 Recursion
The programs we have discussed thus far generally are structured as functions that call one
another in a disciplined, hierarchical manner. For some problems, however, it is useful to
have functions call themselves. A recursive function is a function that calls itself, either di-
rectly or indirectly (through another function). Recursion is an important topic discussed at
length in upper-level computer-science courses. In this section and the next, we present
simple examples of recursion.
We first consider recursion conceptually and then illustrate several recursive func-
tions. Recursive problem-solving approaches have a number of elements in common. A
recursive function is called to solve a problem. The function actually knows how to solve
only the simplest case(s), or so-called base case(s). If the function is not called with a base
case, the function divides the problem into two conceptual pieces—a piece that the function
knows how to solve (a base case) and a piece that the function does not know how to solve.
To make recursion feasible, the latter piece must resemble the original problem, but be a
slightly simpler or slightly smaller version of the original problem. Because this new
problem looks like the original problem, the function invokes (calls) a fresh copy of itself
to go to work on the smaller problem; this is referred to as a recursive call and is also called
the recursion step. The recursion step normally includes the keyword return, because
this result will be combined with the portion of the problem the function knew how to solve
to form a result that will be passed back to the original caller.
The recursion step executes while the original call to the function is still open (i.e.,
while it has not finished executing). The recursion step can result in many more such recur-
sive calls, as the function divides each new subproblem into two conceptual pieces. For the
recursion eventually to terminate, the sequence of smaller and smaller problems must con-
verge on a base case. At that point, the function recognizes the base case and returns a result
to the previous copy of the function, and a sequence of returns ensues up the line until the
original function call eventually returns the final result to the caller. This process sounds
exotic when compared with the conventional problem solving techniques we have used to
this point. As an example of these concepts at work, let us write a recursive program to per-
form a popular mathematical calculation.
The factorial of a nonnegative integer n, written n! (and pronounced “n factorial”), is
the product
n · (n - 1) · (n - 2) · … · 1
with 1! equal to 1, and 0! equal to 1. For example, 5! is the product 5 · 4 · 3 · 2 · 1, which is
equal to 120.
The factorial of an integer, number, greater than or equal to 0 can be calculated iter-
atively (nonrecursively) using for, as follows:
factorial = 1
for counter in range( 1, number + 1 ):
factorial *= counter
Pythonhtp1_04.fm Page 140 Saturday, December 8, 2001 9:34 AM
n! = n · (n - 1)!
5! = 5 · 4 · 3 · 2 · 1
5! = 5 · (4 · 3 · 2 · 1)
5! = 5 · (4!)
The evaluation of 5! would proceed as shown in Fig. 4.16. Figure 4.16 (a) shows how
the succession of recursive calls proceeds until 1! evaluates to 1, which terminates the
recursion. Figure 4.16 (b) shows the values returned from each recursive call to its caller
until the final value is calculated and returned.
Figure 4.17 uses recursion to calculate and print the factorials of the integers from 0
to 10. The recursive function factorial (lines 5–10) first tests to determine whether a
terminating condition is true (line 7)—if number is less than or equal to 1 (the base case),
factorial returns 1, no further recursion is necessary and the function terminates. Oth-
erwise, if number is greater than 1, line 10 expresses the problem as the product of
number and a recursive call to factorial evaluating the factorial of number - 1. Note
that factorial( number - 1 ) is a simpler version of the original calculation,
factorial( number ).
Common Programming Error 4.7
Either omitting the base case or writing the recursion step incorrectly so that it does not con-
verge on the base case will cause infinite recursion, eventually exhausting memory. This is
analogous to the problem of an infinite loop in an iterative (nonrecursive) solution. 4.7
5 * 4! 5 * 4! 5! = 5 * 24 = 120 is returned
4 * 3! 4 * 3! 4! = 4 * 6 = 24 is returned
3 * 2! 3 * 2! 3! = 3 * 2 = 6 is returned
2 * 1! 2 * 1! 2! = 2 * 1 = 2 is returned
1 1 1 returned
(a) Procession of recursive calls (b) Values returned from each recursive call
0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5040
8! = 40320
9! = 362880
10! = 3628800
begins with 0 and 1 and has the property that each subsequent Fibonacci number is the sum
of the previous two Fibonacci numbers.
The series occurs in nature, in particular, describing a spiral. The ratio of successive
Fibonacci numbers converges on a constant value of 1.618…. This number, too, repeatedly
occurs in nature and has been called the golden ratio, or the golden mean. Humans tend to
find the golden mean aesthetically pleasing. Architects often design windows, rooms, and
buildings whose length and width are in the ratio of the golden mean. Postcards often are
designed with a golden-mean length/width ratio.
The Fibonacci series can be defined recursively as follows:
fibonacci( 0 ) = 0
fibonacci( 1 ) = 1
fibonacci( n ) = fibonacci( n – 1 ) + fibonacci( n – 2 )
Note that there are two base cases for the Fibonacci calculation—fibonacci(0) is defined to
be 0 and fibonacci(1) is defined to be 1. The program of Fig. 4.18 calculates the ith Fibonac-
ci number recursively, using function fibonacci (lines 4–14). Notice that Fibonacci
numbers increase rapidly. Each output box shows a separate execution of the program.
Pythonhtp1_04.fm Page 142 Saturday, December 8, 2001 9:34 AM
Enter an integer: 0
Fibonacci(0) = 0
Enter an integer: 1
Fibonacci(1) = 1
Enter an integer: 2
Fibonacci(2) = 1
Enter an integer: 3
Fibonacci(3) = 2
Enter an integer: 4
Fibonacci(4) = 3
Enter an integer: 6
Fibonacci(6) = 8
Enter an integer: 10
Fibonacci(10) = 55
Enter an integer: 20
Fibonacci(20) = 6765
The initial call to fibonacci (line 17) is not a recursive call, but all subsequent calls
to fibonacci performed from the body of fibonacci are recursive. Each time
fibonacci is invoked, it tests for the base case—n equal to 0 or 1. If this condition is
true, fibonacci returns n (line 10). Interestingly, if n is greater than 1, the recursion step
generates two recursive calls (line 14), each of which is a simpler problem than the original
call to fibonacci. Figure 4.19 illustrates fibonacci evaluating fibonacci( 3 ).
A word of caution is in order about recursive programs like the one we use here to gen-
erate Fibonacci numbers. Each invocation of the fibonacci function that does not match
one of the base cases (i.e., 0 or 1) results in two more recursive calls to fibonacci. This
set of recursive calls rapidly gets out of hand. Calculating the Fibonacci value of 20 using
the program in Fig. 4.18 requires 21,891 calls to the fibonacci function; calculating the
Fibonacci value of 30 requires 2,692,537 calls to the fibonacci function.
As you try to calculate larger Fibonacci values, you will notice that each consecutive
Fibonacci number results in a substantial increase in calculation time and number of calls
to the fibonacci function. For example, the Fibonacci value of 31 requires 4,356,617
calls, and the Fibonacci value of 32 requires 7,049,155 calls. As you can see, the number
of calls to fibonacci is increasing quickly—2,692,538 additional calls between Fibonacci
values of 31 and 32. This difference in number of calls made between Fibonacci values of
31 and 32 is more than 1.5 times the number of calls for Fibonacci values between 30 and
31. Computer scientists refer to this as exponential complexity. Problems of this nature
humble even the world’s most powerful computers! In the field of complexity theory, com-
puter scientists study how hard algorithms work to complete their tasks. Complexity issues
are discussed in detail in the upper-level computer-science course generally called “Algo-
rithms” or “Complexity.”
Fibonacci( 3 )
return 1 return 0
Let us reconsider some observations that we make repeatedly throughout the book.
Good software engineering is important. High performance is important. Unfortunately,
these goals are often at odds with one another. Good software engineering is key to
making more manageable the task of developing the larger and more complex software sys-
Pythonhtp1_04.fm Page 145 Saturday, December 8, 2001 9:34 AM
tems. High performance in these systems is key to realizing the systems of the future, which
will place ever-greater computing demands on hardware. Where do functions fit in here?
Software Engineering Observation 4.9
Functionalizing programs in a neat, hierarchical manner promotes good software engi-
neering, but it has a price. 4.9
The first call to boxVolume (line 8) specifies no arguments and thus uses all three
default values. The second call (line 10) passes a length argument and thus uses default
values for the width and height arguments. The third call (line 12) passes arguments for
length and width and thus uses a default value for the height argument. The last call
(line 14) passes arguments for length, width and height, thus using no default values.
Good Programming Practice 4.6
Using default arguments can simplify writing function calls. However, some programmers
feel that explicitly specifying all arguments makes programs easier to read. 4.6
11 if CGI == "yes":
12 print "CGI scripts are enabled"
13 print # prints a new line
14
15 generateWebsite( "Deitel" )
16
17 generateWebsite( "Deitel", Flash = "yes",
18 url = "www.deitel.com/new" )
19
20 generateWebsite( CGI = "no", name = "Prentice Hall" )
SUMMARY
• Constructing a large program from smaller components, each of which is more manageable than
the original program, is a technique called divide and conquer.
• Components in Python are called functions, classes, modules and packages.
• Python programs typically are written by combining new functions and classes the programmer
writes with “pre-packaged” functions or classes available in numerous Python modules.
• The programmer can write programmer-defined functions to define specific tasks that could be
used at many points in a program.
• A module defines related classes, functions and data. A package groups related modules. The
package as a whole provides tools to help the programmer accomplish a general task.
• A function is invoked (i.e., made to perform its designated task) by a function call.
• The function call specifies the function name and provides information (as a comma-separated list
of arguments) that the called function needs to do its job.
• All variables created in function definitions are local variables—they are known only in the func-
tion in which they are created.
• Most functions have a list of parameters that provide the means for communicating information
between functions. A function’s parameters are also local variables.
• The divide-and-conquer approach makes program development more manageable.
• Another motivation for using the divide-and-conquer approach is software reusability—using ex-
isting functions as building blocks to create new programs.
• A third motivation for using the divide-and-conquer approach is to avoid repeating code in a pro-
gram. Packaging code as a function allows the code to be executed from several locations in a pro-
gram simply by calling the function.
• The math module functions allow the programmer to perform certain common mathematical cal-
culations.
• Functions normally are called by writing the name of the function, followed by a left parenthesis,
followed by the argument (or a comma-separated list of arguments) of the function, followed by a
right parenthesis.
• To use a function that is defined in a module, a program has to import the module, using keyword
import. After the module has been imported, the program can access a function or a variable in
the module, using the module name, a dot (.) and the function or variable name.
Pythonhtp1_04.fm Page 149 Saturday, December 8, 2001 9:34 AM
• A recursive function invokes a fresh copy of itself to go to work on a smaller version of the prob-
lem; this procedure is referred to as a recursive call and is also called the recursion step.
• Both iteration and recursion are based on a control structure: Iteration uses a repetition structure;
recursion uses a selection structure.
• Both iteration and recursion also involve repetition: Iteration explicitly uses a repetition structure;
recursion achieves repetition through repeated function calls.
• Iteration and recursion both involve a termination test: Iteration terminates when the loop-contin-
uation condition fails; recursion terminates when a base case is recognized.
• Iteration with counter-controlled repetition and recursion both gradually approach termination: It-
eration keeps modifying a counter until the counter assumes a value that makes the loop-continu-
ation condition fail; recursion keeps producing simpler versions of the original problem until the
base case is reached.
• Iteration and recursion can both occur infinitely: An infinite loop occurs with iteration if the loop-
continuation test never becomes false; infinite recursion occurs if the recursion step does not re-
duce the problem each time in a manner that converges on the base case.
• Recursion repeatedly invokes the mechanism and, consequently, the overhead of function calls.
This can be expensive in both processor time and memory space. Iteration normally occurs within
a function, so the overhead of repeated function calls and extra memory assignment is omitted.
• Some function calls commonly pass a particular value of an argument. The programmer can spec-
ify that such an argument is a default argument, and the programmer can provide a default value
for that argument. When a default argument is omitted in a function call, the interpreter automat-
ically inserts the default value of that argument and passes the argument in the call.
• Default arguments must be the rightmost (trailing) arguments in a function’s parameter list. When
calling a function with two or more default arguments, if an omitted argument is not the rightmost
argument in the argument list, all arguments to the right of that argument also must be omitted.
• The programmer can specify that a function receives one or more keyword arguments. The func-
tion definition can assign a value to a keyword argument. Either a function may a default value for
a keyword argument or a function call may assign a new value to the keyword argument, using the
format keyword = value.
TERMINOLOGY
acos function fabs function
asin function factorial
atan function Fibonacci series
base case floor function
built-in namespace fmod function
__builtins__ function
calling function function argument
ceil function function body
comma-separated list of arguments function call
cos function function definition
def statement function name
default argument function parameter
dir function global keyword
divide and conquer global namespace
dot (.) operator global variable
exp function globals function
expression hypot function
Pythonhtp1_04.fm Page 151 Saturday, December 8, 2001 9:34 AM
identifier __name__
import keyword package
iterative function parameter list
keyword argument probability
local namespace random module
local variable randrange function
locals function recursion
log function recursive function
log10 function return keyword
"__main__" scope
main program sin function
math module sqrt function
module tan function
module namespace
SELF-REVIEW EXERCISES
4.1 Fill in the blanks in each of the following statements.
a) Constructing a large program from smaller components is called .
b) Components in Python are called , , and
.
c) “Pre-packaged” functions or classes are available in Python .
d) The module functions allow programmers to perform common mathemati-
cal calculations.
e) The indented statements that follow a statement form a function body.
f) The in a function call is the operator that causes the function to be called.
g) The module introduces the element of chance into Python programs.
h) A program can obtain the name of its module through identifier .
i) During code execution, three namespaces can be accessed: , and
.
j) A recursive function converges on the .
4.2 State whether each of the following is true or false. If false, explain why.
a) All variables declared in a function are global to the program containing the function.
b) An import statement must be included for every module function used in a program.
c) Function fmod returns the floating-point remainder of its two arguments.
d) The keyword return displays the result of a function.
e) A function’s parameter list is a comma-separated list containing the names of the param-
eters received by the function when it is called.
f) Function call random.randrange ( 1, 7 ) produces a random integer in the range
1 to 7, inclusive.
g) An identifier’s scope is the portion of the program in which the identifier has meaning.
h) Every call to a recursive function is a recursive call.
i) Omitting the base case in a recursive function can lead to “infinite” recursion.
j) A recursive function may call itself indirectly.
4.2 a) False. All variables declared in a function are local—known only in the function in which
they are defined. b) False. Functions included in the __builtin__ module do not need to be im-
ported. c) True. d) False. Keyword return passes control and optionally, the value of an expression,
back to the point from which the function was called. e) True. f) False. Function call random.ran-
drange ( 1, 7 ) produces a random integer in the range from 1 to 6, inclusive. g) True. h) False.
The initial call to the recursive function is not recursive. i) True. j) True.
EXERCISES
4.3 Implement the following function fahrenheit to return the Fahrenheit equivalent of a
Celsius temperature.
9
F = --- C + 32
5
Use this function to write a program that prints a chart showing the Fahrenheit equivalents of all Cel-
sius temperatures 0–100 degrees. Use one position of precision to the right of the decimal point for
the results. Print the outputs in a neat tabular format that minimizes the number of lines of output
while remaining readable.
4.4 An integer greater than 1 is said to be prime if it is divisible by only 1 and itself. For example,
2, 3, 5 and 7 are prime numbers, but 4, 6, 8 and 9 are not.
a) Write a function that determines whether a number is prime.
b) Use this function in a program that determines and prints all the prime numbers between
2 and 1,000.
c) Initially, you might think that n/2 is the upper limit for which you must test to see whether
a number is prime, but you need go only as high as the square root of n. Rewrite the pro-
gram and run it both ways to show that you get the same result.
4.5 An integer number is said to be a perfect number if the sum of its factors, including 1 (but
not the number itself), is equal to the number. For example, 6 is a perfect number, because 6 = 1 + 2
+ 3. Write a function perfect that determines whether parameter number is a perfect number. Use
this function in a program that determines and prints all the perfect numbers between 1 and 1000.
Print the factors of each perfect number to confirm that the number is indeed perfect. Challenge the
power of your computer by testing numbers much larger than 1000.
4.6 Computers are playing an increasing role in education. The use of computers in education is
referred to as computer-assisted instruction (CAI). Write a program that will help an elementary
school student learn multiplication. Use the random module to produce two positive one-digit inte-
gers. The program should then display a question, such as
The student then types the answer. Next, the program checks the student’s answer. If it is correct, print
the string "Very good!" on the screen and ask another multiplication question. If the answer is
wrong, display "No. Please try again." and let the student try the same question again repeat-
edly until the student finally gets it right. A separate function should be used to generate each new
question. This method should be called once when the program begins execution and each time the
user answers the question correctly. (Hint: To convert the numbers for the problem into strings for
the question, use function str. For example, str( 7 ) returns "7".)
4.7 Write a program that plays the game of “guess the number” as follows: Your program choos-
es the number to be guessed by selecting an integer at random in the range 1 to 1000. The program
then displays
Pythonhtp1_04.fm Page 153 Saturday, December 8, 2001 9:34 AM
The player then types a first guess. The program responds with one of the following:
If the player's guess is incorrect, your program should loop until the player finally gets the number
right. Your program should keep telling the player Too high or Too low to help the player “zero
in” on the correct answer. After a game ends, the program should prompt the user to enter "y" to play
again or "n" to exit the game.
4.8 (Towers of Hanoi) Every budding computer scientist must grapple with certain classic prob-
lems. The Towers of Hanoi (see Fig. 4.23) is one of the most famous of these. Legend has it that, in
a temple in the Far East, priests are attempting to move a stack of disks from one peg to another. The
initial stack had 64 disks threaded onto one peg and arranged from bottom to top by decreasing size.
The priests are attempting to move the stack from this peg to a second peg, under the constraints that
exactly one disk is moved at a time and that at no time may a larger disk be placed above a smaller
disk. A third peg is available for holding disks temporarily. Supposedly, the world will end when the
priests complete their task, so there is little incentive for us to facilitate their efforts.
Let us assume that the priests are attempting to move the disks from peg 1 to peg 3. We wish to
develop an algorithm that will print the precise sequence of peg-to-peg disk transfers.
If we were to approach this problem with conventional methods, we would rapidly find our-
selves hopelessly knotted up in managing the disks. Instead, if we attack the problem with recursion
in mind, it immediately becomes tractable. Moving n disks can be viewed in terms of moving only n
- 1 disks (hence, the recursion), as follows:
a) Move n - 1 disks from peg 1 to peg 2, using peg 3 as a temporary holding area.
b) Move the last disk (the largest) from peg 1 to peg 3.
c) Move the n - 1 disks from peg 2 to peg 3, using peg 1 as a temporary holding area.
The process ends when the last task involves moving n = 1 disk, i.e., the base case. This is
accomplished trivially by moving the disk without the need for a temporary holding area.
Write a program to solve the Towers of Hanoi problem. Use a recursive function with four
parameters:
a) The number of disks to be moved
b) The peg on which these disks are initially threaded
c) The peg to which this stack of disks is to be moved
d) The peg to be used as a temporary holding area
Your program should print the precise instructions it will take to move the disks from the start-
ing peg to the destination peg. For example, to move a stack of three disks from peg 1 to peg 3, your
program should print the following series of moves:
Fig. 4.23 The Towers of Hanoi for the case with 4 disks.
pythonhtp1_05.fm Page 155 Saturday, December 8, 2001 9:35 AM
5
Lists, Tuples and
Dictionaries
Objectives
• To understand Python sequences.
• To introduce the list, tuple and dictionary data types.
• To understand how to create, initialize and refer to
individual elements of lists, tuples and dictionaries.
• To understand the use of lists to sort and search
sequences of values.
• To be able to pass lists to functions.
• To introduce list and dictionary methods.
• To create and manipulate multiple-subscript lists and
tuples.
With sobs and tears he sorted out
Those of the largest size …
Lewis Carroll
Attempt the end, and never stand to doubt;
Nothing’s so hard, but search will find it out.
Robert Herrick
Now go, write it before them in a table,
and note it in a book.
Isaiah 30:8
‘Tis in my memory lock’d,
And you yourself shall keep the key of it.
William Shakespeare
pythonhtp1_05.fm Page 156 Saturday, December 8, 2001 9:35 AM
Outline
5.1 Introduction
5.2 Sequences
5.3 Creating Sequences
5.4 Using Lists and Tuples
5.4.1 Using Lists
5.4.2 Using Tuples
5.4.3 Sequence Unpacking
5.4.4 Sequence Slicing
5.5 Dictionaries
5.6 List and Dictionary Methods
5.7 =References and Reference Parameters
5.8 Passing Lists to Functions
5.9 Sorting and Searching Lists
5.10 Multiple-Subscripted Sequences
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
5.1 Introduction
This chapter introduces Python’s data-handling capabilities that use data structures. Data
structures hold and organize information (data). Many types of data structures exist, and
each type has features appropriate for certain tasks. Sequences, often called arrays in other
languages, are data structures that store (usually) related data items. Python supports three
basic sequence data types: the string, the list and the tuple. Mappings, often called associa-
tive arrays or hashes in other languages, are data structures that store data in key-value
pairs. Python supports one mapping data type: the dictionary. This chapter discusses Py-
thon’s sequence and mapping types in the context of several examples. Chapter 22, Data
Structures, introduces some high-level data structures (linked lists, queues, stacks and
trees) that extend Python’s basic data types.
5.2 Sequences
A sequence is a series of contiguous values that often are related. We already have encoun-
tered sequences in several programs: Python strings are sequences, as is the value returned
by function range—a Python built-in function that returns a list of integers. In this sec-
tion, we discuss sequences in detail and explain how to refer to a particular element, or lo-
cation, in the sequence.
Figure 5.1 illustrates sequence c, which contains 12 integer elements. Any element
may be referenced by writing the sequence name followed by the element’s position
number in square brackets ([]). The first element in every sequence is the zeroth element.
Thus, in sequence c, the first element is c[ 0 ], the second element is c[ 1 ], the sixth
element of sequence c is c[ 5 ]. In general, the ith element of sequence c is c[ i - 1 ].
pythonhtp1_05.fm Page 157 Saturday, December 8, 2001 9:35 AM
c[ 0 ] -45 c[ -12 ]
c[ 1 ] 6 c[ -11 ]
c[ 2 ] 0 c[ -10 ]
c[ 3 ] 72 c[ -9 ]
c[ 4 ] 1543 c[ -8 ]
c[ 5 ] -89 c[ -7 ]
c[ 6 ] 0 c[ -6 ]
c[ 7 ] 62 c[ -5 ]
c[ 8 ] -3 c[ -4 ]
c[ 9 ] 1 c[ -3 ]
c[ 10 ] 6453 c[ -2 ]
c[ 11 ] 78 c[ -1 ]
Sequences also can be accessed from the end. The last element is c[ -1 ], the second to
last element is c[ -2 ] and the ith-from-the-end is c[ -i ]. Sequences follow the same
naming conventions as variables.
The position number more formally is called a subscript (or an index), which must be
an integer or an integer expression. If a program uses an integer expression as a subscript,
Python evaluates the expression to determine the index. For example, if variable a equals
5 and variable b equals 6, then the statement
print c[ a + b ]
prints the value of c[ 11 ]. Integer expressions used as subscripts can be useful for iterat-
ing over a sequence in a loop.
Python lists and dictionaries are mutable—they can be altered. For example, if
sequence c in Fig. 5.1 were mutable, the statement
c[ 11 ] = 0
modifies the value of element 11 by assinging it a new value of 0 to replace the original
value of 78.
pythonhtp1_05.fm Page 158 Saturday, December 8, 2001 9:35 AM
On the other hand, some types of sequences are immutable—they cannot be altered
(e.g., by changing element values). Python strings and tuples are immutable sequences. For
example, if the sequence c were immutable, the statement
c[ 11 ] = 0
would be illegal. Let us examine sequence c in detail. The sequence name is c. The length
of the sequence is determined by the function call len( c ). It is useful to know a se-
quence’s length, because referring to an element outside the sequence results in an “out-of-
range” error. Most of the errors discussed in this chapter can be caught as exceptions. [Note:
We discuss exceptions in Chapter 12, Exception Handling.]
Sequence c contains 12 elements, namely c[ 0 ], c[ 1 ], …, c[ 11 ]. The range of
elements also can be referenced by c[ -12 ], c[ -11 ], ..., c[ -1 ]. In this example,
c[ 0 ] contains the value -45, c[ 1 ] contains the value 6, c[ -9 ] contains the value
72 and c[ -2 ]contains the value 6453. To calculate the sum of the values contained in
the first three elements of sequence c and assign the result to variable sum, we would write
sum = c[ 0 ] + c[ 1 ] + c[ 2 ]
To divide the value of the seventh element of sequence c by 2 and assign the result to the
variable x, we would write
x = c[ 6 ] / 2
Common Programming Error 5.1
It is important to note the difference between the “seventh element of the sequence” and “se-
quence element seven.” Sequence subscripts begin at 0, thus the “seventh element of the se-
quence” has a subscript of 6. On the other hand, “sequence element seven” references
subscript 7 (i.e., c[ 7 ]), which is the eighth element of the sequence. This confusion often
leads to “off-by-one” errors. 5.1
The pair of square brackets enclosing the subscript of a sequence is a Python operator.
Figure 5.2 shows the precedence and associativity of the operators introduced to this point
in the text. They are shown from top to bottom in decreasing order of precedence, with their
associativity and types.
To create a list that contains a sequence of values, separate the values by commas inside
square brackets ([])
aList = [ 1, 2, 3 ]
To create a tuple that contains a sequence of values, simply separate the values with com-
mas.
aTuple = 1, 2, 3
Creating a tuple is sometimes referred to as packing a tuple. Tuples also can be created by
surrounding the comma-separated list of tuple values with optional parentheses. It is the
commas that create tuples, not the parentheses.
aTuple = ( 1, 2, 3 )
Notice that a comma (,) follows the value. The comma identifies the variable—
aSingleton—as a tuple. If the comma were omitted, aSingleton would simply con-
tain the integer value 1.
aList = [ 1, 2, 3, 4, 5 ]
aTuple = ( 1, 2, 3, 4, 5 )
In practice, however, Python programmers distinguish between the two data types to rep-
resent different kinds of sequences, based on the context of the program. In the next sub-
sections, we discuss the situations for which lists and tuples are best suited.
Line 4 creates empty list, aList. Lines 7–8 use a for loop to insert the values 1, …,
10 into aList, using the += augmented assignment statement. When the value to the left
of the += statement is a sequence, the value to the right of the statement also must be a
sequence. Thus, line 8 places square brackets around the value to be added to the list. Line
10 prints variable aList. Python displays the list as a comma-separated sequence of
values inside square brackets. Variable aList represents a typical Python list—a
sequence containing homogeneous data.
Lines 13–18 demonstrate the most common way of accessing a list’s elements. The
for structure actually iterates over a sequence
The for structure (lines 15–16) starts with the first element in the sequence, assigns the
value of the first element to the control variable (item) and executes the body of the for
loop (i.e., prints the value of the control variable). The loop then proceeds to the next ele-
ment in the sequence and performs the same operations. Thus, lines 15–16 print each ele-
ment of aList.
List elements also can be accessed through their corresponding indices. Lines 21–25
access each element in aList in this manner. The function call in line 24
returns a sequence that contains the values 0, ..., len( aList ) - 1. This sequence con-
tains all possible element positions for aList. The for loop iterates through this se-
quence and, for each element position, prints the position and the value stored at that
position.
pythonhtp1_05.fm Page 162 Saturday, December 8, 2001 9:35 AM
Lines 30–31 modify some of the list’s elements. To modify the value of a particular
element, we assign a new value to that element. Line 30 changes the value of the list’s first
element from 0 to -100; line 31 changes the value of the list’s third-from-the-end element
from 8 to 19.
If the program attempts to access a nonexistent index (e.g., index 13) in aList, the
program exits and Python displays an out-of-range error message. The interactive session
in Fig. 5.4 demonstrates the results of accessing an out-of-range list element.
Common Programming Error 5.2
Referring to an element outside the sequence is an error. 5.2
Generally, a program does not concern itself with the length of a list, but simply iter-
ates over the list and performs an operation for each element in the list. Figure 5.5 demon-
strates one practical application of using lists in such a manner—creating a histogram (a
bar graph of frequencies) from a collection of data.
13 # create histogram
14 print "\nCreating a histogram from values:"
15 print "%s %10s %10s" % ( "Element", "Value", "Histogram" )
16
17 for i in range( len( values ) ):
18 print "%7d %10d %s" % ( i, values[ i ], "*" * values[ i ] )
Enter 10 integers:
Enter integer 1: 19
Enter integer 2: 3
Enter integer 3: 15
Enter integer 4: 7
Enter integer 5: 11
Enter integer 6: 9
Enter integer 7: 13
Enter integer 8: 5
Enter integer 9: 17
Enter integer 10: 1
The program creates an empty list called values (line 4). Lines 7–11 input 10 inte-
gers from the user and insert those integers into the list. Lines 14–18 create the histogram.
For each element in the list, the program prints the element’s index and value and a string
that contains the same number of asterisks (*) as the value. The expression
"*" * values[ i ]
uses the multiplication operator (*) to create a string with the number of asterisks specified
by values[ i ].
these values might be represented as integers, each integer has its own meaning, and the
full representation of the time is obtained only by taking all three values together. The
length of the tuple (i.e., its number of data items) is predetermined and cannot change dur-
ing a program’s execution.
By convention, each data item in the tuple represents a unique portion of the overall
data. Therefore, a program usually does not iterate over a tuple, but accesses the parts of
the tuple the program needs to perform its task. Figure 5.6 demonstrates how to create and
access a tuple using this idiom.
Lines 5–7 ask the user to enter three integers that represent the hour, minutes and sec-
onds, respectively. Line 9 creates a tuple called currentTime to store the user-entered
values. Lines 14–16 print the number of seconds that have passed since midnight. We per-
form a different operation (i.e., multiply each value by a different factor) for each value in
the tuple; therefore, the program accesses each value by its index.
As tuples are immutable, Python provides error handling that notifies users when they
attempt to modify tuples. For example, if the program attempts to change the first element
in currentTime to contain the value 0,
currentTime[ 0 ] = 0
the program exits and Python displays a runtime error
Traceback (most recent call last):
File "fig05_06.py", line 18, in ?
currentTime[ 0 ] = 0
TypeError: object doesn't support item assignment
to indicate that the program illegally attempted to change the value of the immutable tuple.
Enter hour: 9
Enter minute: 16
Enter second: 1
The value of currentTime is: (9, 16, 1)
The number of seconds since midnight is 33361
Note that the use of lists and tuples introduced in Section 5.4.1 and Section 5.4.2 is not
a rule, but rather a convention that Python programmers follow. Python does not limit the
data type stored in lists and tuples (i.e., they can contain homogeneous or heterogeneous
data). The primary difference between lists and tuples is that lists are mutable whereas
tuples are immutable.
or
aTuple = ( 1, 2, 3 )
is called packing a tuple, because the values are “packed into” the tuple. Tuples and other
sequences also can be unpacked—the values stored in the sequence are assigned to various
identifiers. Unpacking is a useful programming shortcut for assigning values to multiple
variables in a single statement. The program in Fig. 5.7 demonstrates the results of unpack-
ing strings, lists and tuples.
Lines 5–7 create a string, a list and a tuple, each containing three elements. Sequences
are unpacked with an assignment statement. The assignment statement in line 11 unpacks the
elements in variable aString and assigns each element to a variable. The first element is
assigned to variable first, the second to variable second and the third to variable third.
Line 12 prints the variables to confirm that the string unpacked properly. Lines 14–20 per-
form similar operations for the elements in variables aList and aTuple. When unpacking
a sequence, the number of variable names to the left of the = symbol should equal the number
of elements in the sequence to the right of the symbol; otherwise, a runtime error occurs.
Notice that when unpacking a sequence, parentheses or brackets are optional to the left of the
= symbol because there usually are no precedence issues.
Unpacking string...
String values: a b c
Unpacking list...
List values: 1 2 3
Unpacking tuple...
Tuple values: a A 1
Before swapping: x = 3, y = 4
After swapping: x = 4, y = 3
4 # create sequences
5 sliceString = "abcdefghij"
6 sliceTuple = ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 )
7 sliceList = [ "I", "II", "III", "IV", "V",
8 "VI", "VII", "VIII", "IX", "X" ]
9
10 # print strings
11 print "sliceString: ", sliceString
12 print "sliceTuple: ", sliceTuple
13 print "sliceList: ", sliceList
14 print
15
16 # get slices
17 start = int( raw_input( "Enter start: " ) )
18 end = int( raw_input( "Enter end: " ) )
19
20 # print slices
21 print "\nsliceString[", start, ":", end, "] = ", \
22 sliceString[ start:end ]
23
24 print "sliceTuple[", start, ":", end, "] = ", \
25 sliceTuple[ start:end ]
26
27 print "sliceList[", start, ":", end, "] = ", \
28 sliceList[ start:end ]
sliceString: abcdefghij
sliceTuple: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
sliceList: ['I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII',
'IX', 'X']
Enter start: 3
Enter end: 3
sliceString[ 3 : 3 ] =
sliceTuple[ 3 : 3 ] = ()
sliceList[ 3 : 3 ] = []
sliceString: abcdefghij
sliceTuple: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
sliceList: ['I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII',
'IX', 'X']
Enter start: -4
Enter end: -1
sliceString[ -4 : -1 ] = ghi
sliceTuple[ -4 : -1 ] = (7, 8, 9)
sliceList[ -4 : -1 ] = ['VII', 'VIII', 'IX']
sliceString: abcdefghij
sliceTuple: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
sliceList: ['I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII',
'IX', 'X']
Enter start: 0
Enter end: 10
sliceString[ 0 : 10 ] = abcdefghij
sliceTuple[ 0 : 10 ] = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
sliceList[ 0 : 10 ] = ['I', 'II', 'III', 'IV', 'V', 'VI', 'VII',
'VIII', 'IX', 'X']
Lines 5–18 create the three sequences and request the user to specify a beginning and
ending index for the slice. Lines 21–28 print the specified slice for each sequence. A slice
is simply a new sequence, created from an existing sequence. The expression in line 22
sliceString[ start:end ]
creates (slices) a new sequence from variable sliceString. This new sequence contains
the values stored at indices sliceString[ start ], …, sliceString[ end - 1 ].
In general, to obtain from sequence a slice of the ith element through the jth element, inclu-
sive, use the expression
sequence[ i:j + 1 ]
Figure 5.8 includes three sample outputs from the program. The first sample creates a
slice from indices 0 to 10 (e.g., the entire sequence). Recall that the first element in every
sequence is the zeroth element. The sequence created from this slice is equivalent to the
sequence created with the expression
sequence[ : ]
This expression creates a new sequence that is a copy of the original sequence. The above
expression is equivalent to the following expressions:
The syntax for sequence slicing provides a useful shortcut for selecting a portion of an
existing sequence. A program can use sequence slicing to create a copy of a list when
passing the list to a function. We discuss this issue in Section 5.7 and 5.8.
Note that negative slices cannot access the last element of a list directly (i.e.,slice-
String[ -4 : -1 ] = ghi) because slices apply to points between elements. With neg-
ative slices, the last point between elements is the point between elements with indices -2
and -1.
pythonhtp1_05.fm Page 169 Saturday, December 8, 2001 9:35 AM
5.5 Dictionaries
In addition to lists and tuples, Python supports another powerful data type, called the dic-
tionary. Dictionaries (called hashes or associative arrays in other languages) are mapping
constructs consisting of key-value pairs. Dictionaries can be thought of as unordered col-
lections of values where each value is referenced through its corresponding key. For exam-
ple, a dictionary might store phone numbers that can be referenced by a person’s name.
The statement
emptyDictionary = {}
creates an empty dictionary. Notice that curly braces ({}) denote dictionaries. To initialize
key-value pairs for a dictionary, use the statement
dictionary = { 1 : "one", 2 : "two" }
A comma separates each key-value pair. Dictionary keys must be immutable values, such
as strings, numbers or tuples. Dictionary values can be of any Python data type.
Common Programming Error 5.3
Using a list or a dictionary for a dictionary key is an syntax error. 5.3
Figure 5.9 demonstrates how to create, initialize, access and manipulate simple dictio-
naries. Lines 5–6 create and print an empty dictionary. Line 9 creates a dictionary grades
and initializes the dictionary to contain four key-value pairs. The keys are strings that con-
tain student names, and the integer values represent the students’ grades. Line 10 prints the
value assigned to variable grades. Observe that the application displays grades in a dif-
ferent order than the declaration; this is because a dictionary is an unordered collection of
key-value pairs. Also, notice in the output that the dictionary keys appear in single quotes,
because Python displays strings in single quotes.
16
17 # add to an existing dictionary
18 grades[ "Michael" ] = 93
19 print "\nDictionary grades after modification:"
20 print grades
21
22 # delete entry from dictionary
23 del grades[ "John" ]
24 print "\nDictionary grades after deletion:"
25 print grades
All grades: {'Edwin': 89, 'John': 87, 'Steve': 76, 'Laura': 92}
dictionaryName[ key ]
In line 13, the dictionaryName is grades and the key is the string "Steve". This expres-
sion evaluates to the value stored in the dictionary at key "Steve", namely, 76. Line 14
assigns a new value, 90, to the key "Steve". Dictionary values are modified using syntax
similar to that of modifying lists. Line 15 prints the result of changing the dictionary value.
Line 18 inserts a new key-value pair into the dictionary. Although this statement
resembles the syntax for modifying an existing dictionary value, it inserts a new key-value
pair because Michael is a new key. The statement
modifies the value associated with key, if the dictionary already contains that key. Other-
wise, the statement inserts the key-value pair into the dictionary.
Software Engineering Observation 5.1
When adding a key-value pair to a dictionary, mis-typing the key could be a source of inad-
vertent errors. 5.1
Lines 19–20 print the results of adding a new key-value pair to the dictionary. The
order in which the key-value pairs are printed is entirely arbitrary (remember that a dictio-
nary is an unordered collection of key-value pairs).
pythonhtp1_05.fm Page 171 Saturday, December 8, 2001 9:35 AM
The expression dictionaryName[ key ] can lead to subtle programming errors. If this
expression appears on the left-hand side of an assignment statement and the dictionary does
not contain the key, the assignment statement inserts the key-value pair into the dictionary.
However, if the expression appears to the right of an assignment statement (or any statement
that simply attempts to access the value stored at the specified key), then the statement
causes the program to exit and to display an error message, because the program is trying
to access a nonexistent key.
Common Programming Error 5.4
Attempting to access a nonexistent dictionary key is a “key error”, a runtime error. 5.4
removes the specified key and its value from the dictionary. If the specified key does not
exist in the dictionary, then the above statement causes the program to exit and to display
an error message. Again, this is because the program is accessing a nonexistent key. This
runtime error can be caught through exception handling, which we discuss in Chapter 12.
Dictionaries are powerful data types that help programmers accomplish sophisticated
tasks. Many Python modules provide data types similar to dictionaries that facilitate access
and manipulation of more complex data. In the next section, we explore the dictionary’s
capabilities further.
particular value occurs in a list. Lines 4–7 create a list (responses) that contains several
values between 1–10. Lines 11–12 contain a for loop that calls list method count to return
the amount of times an element appears in a list. Method count takes as an argument a value
of any data type. If the list contains no elements with the specified value, method count
returns 0. Lines 11–12 print the frequency of each value in the list.
Rating Frequency
1 2
2 2
3 2
4 2
5 5
6 11
7 5
8 7
9 1
10 3
Lists provide several other useful methods. Figure 5.12 summarizes these methods.
Throughout the text, we create programs that invoke list methods to accomplish tasks.
Method Purpose
The dictionary data type also provides many methods that enable the programmer to
manipulate the stored data. Figure 5.13 demonstrates three dictionary methods. Lines 4–7
create the dictionary monthsDictionary that represents the months of the year. Line
10 uses dictionary method items to print the dictionary’s key-value pairs to the screen.
The method returns a list of tuples, where each tuple contains a key-value pair.
Dictionary method keys (line 13) returns an unordered list of the dictionary’s keys.
Similarly, dictionary method values (line 16) returns an unordered list of the dictionary’s
values. Lines 20–21 demonstrate a common use of dictionary method keys. The for loop
iterates over the dictionary keys. Each key is assigned to control variable key. Line 21
prints both the key and the value associated with that key. Figure 5.14 summarizes the dic-
tionary methods.
Method Description
Dictionary method copy returns a new dictionary that is a shallow copy of the original
dictionary. In a shallow copy, the elements in the new dictionary are references to the ele-
ments in the original dictionary.
The interactive session in Fig. 5.15 demonstrates the difference between shallow and
deep copies. We first create dictionary, which contains one value—a list of numbers.
We then invoke dictionary method copy to create a shallow copy of dictionary, and
we assign the copy to variable shallowCopy. The values stored for key "listKey" in
both dictionaries reference the same object. To underscore this fact, we insert the value 4
at the end of the list stored in dictionary. We then print the value of variables dic-
tionary and shallowCopy. Notice that the list has been changed in both copies of the
dictionary. This is a consequence of doing a shallow copy, which does not create a fully
independent copy of the original dictionary.
Sometimes, a shallow copy is sufficient for a program, especially if the dictionaries
contain no references to other Python objects (i.e., they contain only literal numeric values
or immutable values). However, sometimes it is necessary to create a copy—called a deep
copy—that is independent of the original dictionary. To create a deep copy, Python pro-
vides module copy. The remainder of the interactive session in Fig. 5.15 creates a deep
copy of variable dictionary. We first import function deepcopy from module copy.
We then call deepcopy and pass dictionary as an argument. The function call returns
a deep copy of dictionary, and we assign the copy to variable deepCopy. The value
associated with deepCopy[ "listKey" ] is now independent of the value associated
with that key in variables dictionary and shallowCopy. To demonstrate this fact,
we append a new value to dictionary’s list and print the values for dictionary,
shallowCopy and deepCopy.
Shallow and deep copies reflect how Python handles references (i.e., names of
objects). The programmer should exercise caution when dealing with references to objects
like lists and dictionaries, because changing an object affects the value of all the names that
refer to that object. In the next two sections, we discuss how passing a reference to a func-
tion affects an object’s value.
Software Engineering Observation 5.2
deepCopyList = originalList[:] does a deep copy which means that the deep-
CopyList is a deep copy of the originalList. 5.2
To perform tasks, functions require certain input values, which the main program or func-
tions have (or know). The main program (e.g., a program that simulates a calculator) may
ask users for input, and those input values are sent, in turn, to functions (e.g., add, sub-
tract). The values, or arguments, have to be passed to the functions through a certain pro-
tocol. In many programming languages, the two ways to pass arguments to functions are
pass-by-value and pass-by-reference. When an argument is passed by value, a copy of the
argument’s value is made and passed to the called function.
Testing and Debugging Tip 5.3
With pass-by-value, changes to the called function’s copy do not affect the original vari-
able’s value in the calling code. This prevents accidental side effects that can hinder the de-
velopment of correct and reliable software systems. 5.3
With pass-by-reference, the caller allows the called function to access the caller’s data
directly and to modify that data. Pass-by-reference can improve performance by elimi-
nating the overhead of copying large amounts of data. However, pass-by-reference can
weaken security, because the called function can access the caller’s data.
Unlike many other languages, Python does not allow programmers to choose between
pass-by-value and pass-by-reference when passing arguments. Python arguments are always
passed by object reference—the function receives references to the values passed as argu-
ments. In practice, pass-by-object-reference can be thought of as a combination of pass-by-
value and pass-by-reference. If a function receives a reference to a mutable object (e.g., a dic-
tionary or a list), the function can modify the original value of the object. It is as if the object
had been passed by reference. If a function receives a reference to an immutable object (e.g.,
a number, a string or a tuple, whose elements are immutable values), the function cannot
modify the original object directly. It is as if the object had been passed by value.
As always, it is important for the programmer to be aware of when an object may be
modified by the function to which it is passed. Remembering the preceding rules and under-
standing how Python treats references to objects is essential to creating large and sophisti-
cated Python systems.
Fig. 5.16 Passing lists and individual list elements to methods. (Part 1 of 2.)
pythonhtp1_05.fm Page 179 Saturday, December 8, 2001 9:35 AM
Fig. 5.16 Passing lists and individual list elements to methods. (Part 2 of 2.)
the list would remain unchanged, because the function would modify the value of local
variable item and not the value stored at a particular index in the list.
ments of aList in ascending order. The remainder of the program prints the results of
sorting the list.
Much research has been performed in the area of list-sorting algorithms, resulting in
the design of many algorithms. Some of these algorithms are simple to express and pro-
gram, but are inefficient. Other algorithms are complex and sophisticated, but provide
increased performance. The exercises at the end of this chapter investigate a well-known
sorting algorithm.
Performance Tip 5.1
Sometimes, the simplest algorithms perform poorly. Their virtue is that they are easy to
write, test and debug. Sometimes complex algorithms are needed to realize maximum per-
formance. 5.1
Often, programmers work with large amounts of data stored in lists. It might be neces-
sary to determine whether a list contains a value that matches a certain key value. The pro-
cess of locating a particular element value in a list is called searching.
The program in Fig. 5.18 searches a list for a value. Line 5 creates list aList, which
contains the even numbers between 0 and 198, inclusive. Line 7 then retrieves the search
key from the user and assigns the value to variable searchKey. Keyword in tests
whether list aList contains the user-entered search key (line 9). If the list contains the
value stored in variable searchKey, the expression (line 9) evaluates to true; otherwise,
the expression evaluates to false.
If the list contains the search key, line 10 invokes list method index to obtain the
index of the search key. List method index takes a search key as a parameter, searches
through the list and returns the index of the first list value that matches the search key. If
the list does not contain any value that matches the search key, the program displays an
error message. [Note: Figure 5.18 searches aList twice (lines 9–10), which, for large
sequences, can result in poor performance. To improve performance, the program can use
list method index and trap the exception that occurs if the argument is not in the list. We
discuss exception-handling techniques in Chapter 12.]
As with sorting, a great deal of research has been devoted to the task of searching. In
the exercises at the end of this chapter, we explore some of the more sophisticated ways of
searching a list.
Column subscript
Row subscript
Sequence name
Fig. 5.19 Double-subscripted sequence with three rows and four columns.
Every element in sequence a is identified in Fig. 5.19 by an element name of the form
a[ i ][ j ]; a is the name of the sequence, and i and j are the subscripts that uniquely
identify the row and column of each element in a. Notice that the names of the elements in
the first row all have 0 as the first subscript; the names of the elements in the fourth column
all have 3 as the second subscript.
Multiple-subscripted sequences can be initialized during creation in much the same
way as a single-subscripted sequence. A double-subscripted list with two rows and columns
could be created with
b = [ [ 1, 2 ], [ 3, 4 ] ]
The values are grouped by row—the first row is the first element in the list, and the second
row is the second element in the list. So, 1 and 2 initialize b[ 0 ][ 0 ] and b[ 0 ][ 1 ],
and 3 and 4 initialize b[ 1 ][ 0 ] and b[ 1 ][ 1 ]. Multiple-subscripted sequences are
maintained as sequences of sequences. The statement
c = ( ( 1, 2 ), ( 3, 4, 5 ) )
creates a tuple c with row 0 containing two elements (1 and 2) and row 1 containing three
elements (3, 4 and 5). Python allows multiple-subscripted sequences to have rows of dif-
ferent lengths.
Figure 5.20 demonstrates creating and initializing double-subscripted sequences and
using nested for structures to traverse the sequences (i.e., manipulate every element of the
sequence).
Fig. 5.20 Tables created using lists of lists and tuples of tuples. (Part 1 of 2.)
pythonhtp1_05.fm Page 183 Saturday, December 8, 2001 9:35 AM
Fig. 5.20 Tables created using lists of lists and tuples of tuples. (Part 2 of 2.)
The program declares two sequences. Line 4 creates the multiple-subscript list
table1 and provides six values in two sublists (i.e., two lists-within-lists). The first sub-
list (row) of the sequence contains the values 1, 2 and 3; the second sublist contains the
values 4, 5 and 6.
Line 5 creates multiple-subscript tuple table2 and provides six values in three sub-
tuples (i.e., tuples-within-tuples). The first subtuple (row) contains two elements with
values 1 and 2, respectively. The second subtuple contains one element with value 3. The
third subtuple contains three elements with values 4, 5 and 6. Lines 9–14 use a nested for
structure to output the rows of list table1. The outer for structure iterates over the rows
in the list. The inner for structure iterates over each column in the row. The remainder of
the program prints the values for variable table2 in a similar manner.
The program in Fig. 5.20 demonstrates one case in a which a for structure is useful
for manipulating a multiple-subscripted sequence. Many other common sequence manipu-
lations use for repetition structures. For example, the following for structure sets all the
elements in the third row of sequence a in Fig. 5.19 to 0:
for column in range( len( a[ 2 ] ) ):
a[ 2 ][ column ] = 0
We specified the third row; thus, the first subscript is always 2 (0 is the first row and 1 is
the second row). The for structure varies only the second subscript (i.e., the column sub-
script). The preceding for structure is equivalent to the assignment statements
pythonhtp1_05.fm Page 184 Saturday, December 8, 2001 9:35 AM
a[ 2 ][ 0 ] = 0
a[ 2 ][ 1 ] = 0
a[ 2 ][ 2 ] = 0
a[ 2 ][ 3 ] = 0
The following nested for structure determines the total of all the elements in sequence a:
total = 0
for row in a:
for column in row:
total += column
The for structure totals the elements of the sequence one row at a time. The outer for struc-
ture iterates over the rows in the table so that the elements of each row may be totaled by the
inner for structure. The total is displayed when the nested for structure terminates.
The program in Fig. 5.21 performs several other common sequence manipulations on
the 3-by-4 list grades. Each row of the list represents a student, and each column repre-
sents a grade on one of the four exams the students took during the semester. The list
manipulations are performed by four functions. Function printGrades (lines 5–25)
prints the data stored in list grades in a tabular format. Function minimum (lines 28–38)
determines the lowest grade of any student for the semester. Function maximum (lines 41–
51) determines the highest grade of any student for the semester. Function average (lines
54–60) determines a particular student’s semester average. Notice that line 55 initializes
total to 0.0, so the function returns a floating-point value.
26
27
28 def minimum( grades ):
29 lowScore = 100
30
31 for studentExams in grades: # loop over students
32
33 for score in studentExams: # loop over scores
34
35 if score < lowScore:
36 lowScore = score
37
38 return lowScore
39
40
41 def maximum( grades ):
42 highScore = 0
43
44 for studentExams in grades: # loop over students
45
46 for score in studentExams: # loop over scores
47
48 if score > highScore:
49 highScore = score
50
51 return highScore
52
53
54 def average( setOfGrades ):
55 total = 0.0
56
57 for grade in setOfGrades: # loop over student’s scores
58 total += grade
59
60 return total / len( setOfGrades )
61
62
63 # main program
64 grades = [ [ 77, 68, 86, 73 ],
65 [ 96, 87, 89, 81 ],
66 [ 70, 90, 86, 81 ] ]
67
68 printGrades( grades )
69 print "\n\nLowest grade:", minimum( grades )
70 print "Highest grade:", maximum( grades )
71 print "\n"
72
73 # print average for each student
74 for i in range( len( grades ) ):
75 print "Average for student", i, "is", average( grades[ i ] )
Lowest grade: 68
Highest grade: 96
Function printGrades uses the list grades and variables students (number of
rows in the list) and exams (number of columns in the list). The function loops through list
grades, using nested for structures to print out the grades in tabular format. The outer
for structure (lines 19–25) iterates over i (i.e., the row subscript), the inner for structure
(lines 22–23) over j (i.e., the column subscript).
Functions minimum and maximum loop through list grades, using nested for
structures. Function minimum compares each grade to variable lowScore. If a grade is
less than lowScore, lowScore is set to that grade (line 36). When execution of the
nested structure is complete, lowScore contains the smallest grade in the double-sub-
scripted list. Function maximum works similarly to function minimum.
Function average takes one argument—a single-subscripted list of test results for a
particular student. When line 75 invokes average, the argument is grades[ i ], which
specifies that a particular row of the double-subscripted list grades is to be passed to
average. For example, the argument grades[ 1 ] represents the four values (a single-
subscripted list of grades) stored in the second row of the double-subscripted list grades.
Remember that, in Python, a double-subscripted list is a list with elements that are single-
subscripted lists. Function average calculates the sum of the list elements, divides the
total by the number of test results and returns the floating-point result.
In the above example, we demonstrated how to use double-subscripted lists. However,
when we need to compute pure numerical problems (i.e., multi-dimensional arrays), the
basic Python language cannot handle them efficiently. In this case, a package called NumPy
should be used. The NumPy (numerical python) package contains modules that handle
arrays, and it provides multi-dimensional array objects for efficient computation. For more
information on NumPy, visit sourceforge.net/projects/numpy.
Chapters 2–5 introduced the basic-programming techniques of Python. In Chapter 6,
Introduction to the Common Gateway Interface (CGI), we will use these techniques to
design Web-based applications. In Chapters 7–9, we will introduce object-oriented pro-
gramming techniques that will allow us to build complex applications in the latter half of
the book.
pythonhtp1_05.fm Page 187 Saturday, December 8, 2001 9:35 AM
SUMMARY
• Data structures hold and organize information (data).
• Sequences, often called arrays in other languages, are data structures that store related data items.
Python supports three basic sequence data types: a string, a list and a tuple.
• A sequence element may be referenced by writing the sequence name followed by the element’s
position number in square brackets ([]). The first element in a sequence is the zeroth element.
• Sequences can be accessed from the end of the sequence by using negative subscripts.
• The position number more formally is called a subscript (or an index), which must be an integer
or an integer expression. If a program uses an integer expression as a subscript, Python evaluates
the expression to determine the location of the subscript.
• Some types of sequences are immutable—the sequence cannot be altered (e.g., by changing the
value of one of its elements). Python strings and tuples are immutable sequences.
• Some sequences are mutable—the sequence can be altered. Python lists are mutable sequences.
• The length of the sequence is determined by the function call len( sequence ).
• To create an empty string, use the empty quotes (i.e., "", '',""" """ or ''' ''')
• To create an empty list, use empty square brackets (i.e., []). To create a list that contains a se-
quence of values, separate the values with commas, and place the values inside square brackets.
• To create an empty tuple, use the empty parentheses (i.e., ()). To create a tuple that contains a
sequence of values, simply separate the values with commas. Tuples also can be created by sur-
rounding the tuple values with parentheses; however, the parentheses are optional.
• Creating a tuple is sometimes referred to as packing a tuple.
• When creating a one-element tuple—called a singleton—write the value, followed by a comma (,).
• In practice, Python programmers distinguish between tuples and lists to represent different kinds
of sequences, based on the context of the program.
• Although lists are not restricted to homogeneous data types, Python programmers typically use
lists to store sequences of homogeneous values—values of the same data type. In general, a pro-
gram uses a list to store homogeneous values for the purpose of looping over these values and per-
forming the same operation on each value. Usually, the length of the list is not predetermined and
may vary over the course of the program.
• The += augmented assignment statement can insert a value in a list. When the value to the left of
the += symbol is a sequence, the value to the right of the symbol must be a sequence also.
• The for/in structure iterates over a sequence. The for structure starts with the first element in
the sequence, assigns the value of the first element to the control variable and executes the body
of the for structure. Then, the for structure proceeds to the next element in the sequence and
performs the same operations.
• If a program attempts to access a nonexistent index, the program exits and displays an “out-of-
range” error message. This error can be caught as an exception.
• Tuples store sequences of heterogeneous data. Each data piece in a tuple represents a part of the
total information represented by the tuple. Usually, the length of the tuple is predetermined and
does not change over the course of a program’s execution. A program usually does not iterate over
a sequence, but accesses the parts of the tuple the program needs to perform its task.
• If a program attempts to modify a tuple, the program exits and displays an error message.
• Sequences can be unpacked—the values stored in the sequence are assigned to various identifiers.
Unpacking is a useful programming shortcut for assigning values to multiple variables in a single
statement.
pythonhtp1_05.fm Page 188 Saturday, December 8, 2001 9:35 AM
• When unpacking a sequence, the number of variable names to the left of the = symbol must equal
the number of elements in the sequence to the right of the symbol.
• Python provides the slicing capability to obtain contiguous regions of a sequence.
• To obtain a slice of the ith element through the jth element, inclusive, use the expression se-
quence[ i:j + 1 ].
• The dictionary is a mapping construct that consists of key-value pairs. Dictionaries (called hashes
or associative arrays in other languages), can be thought of as unordered collections of values
where each value is accessed through its corresponding key.
• To create an empty dictionary, use empty curly braces (i.e., {}).
• To create a dictionary with values, use a comma-separated sequence of key-value pairs, inside
curly braces. Each key-value pair is of the form key : value.
• Python dictionary keys must be immutable values, like strings, numbers or tuples, whose elements
are immutable. Dictionary values can be of any Python data type.
• Dictionary values are accessed with the expression dictionaryName[ key ].
• To insert a new key-value pair in a dictionary, use the statement dictionaryName[ key ] = value.
• The statement dictionaryName[ key ] = value modifies the value associated with key, if the dictio-
nary already contains that key. Otherwise, the statement inserts the key-value pair into the dictionary.
• Accessing a non-existent dictionary key causes the program to exit and to display a “key error”
message.
• A method performs the behaviors (tasks) of an object.
• To invoke an object’s method, specify the name of the object, followed by the dot (.) access op-
erator, followed by the method invocation.
• List method append adds an items to the end of a list.
• List method count takes a value as an argument and returns the number of elements in the list
that have that value. If the list contains no elements with the specified value, method count re-
turns 0.
• Dictionary method items returns a list of tuples, where each tuple contains a key-value pair. Dic-
tionary method keys returns an unordered list of the dictionary’s keys. Dictionary method val-
ues returns an unordered list of the dictionary’s values.
• Dictionary method copy returns a new dictionary that is a shallow copy of the original dictionary.
In a shallow copy, the elements in the new dictionary are references to the elements in the original
dictionary.
• If the programmer wants to create a copy—called a deep copy—that is independent of the original
dictionary, Python provides module copy. Function copy.deepcopy returns a deep copy of it
argument.
• In many programming languages, the two ways to pass arguments to functions are pass-by-value
and pass-by-reference (also called pass-by-value and pass-by-reference).
• When an argument is passed by value, a copy of the argument’s value is made and passed to the
called function.
• With by reference, the caller allows the called function to access the caller’s data directly and to
modify that data.
• Unlike many other languages, Python does not allow programmers to choose between pass-by-val-
ue and pass-by-reference to pass arguments. Python arguments are always passed by object refer-
ence—the function receives references to the values passed as arguments. In practice, pass-by-
object-reference can be thought of as a combination of pass-by-value and pass-by-reference.
pythonhtp1_05.fm Page 189 Saturday, December 8, 2001 9:35 AM
• If a function receives a reference to a mutable object (e.g., a dictionary or a list), the function can
modify the original value of the object. It is as if the object had been passed by reference.
• If a function receives a reference to an immutable object (e.g., a number, a string or a tuple whose
elements are immutable values), the function cannot modify the original object directly. It is as if
the object had been passed by value.
• To pass a list argument to a function, specify the name of the list without square brackets.
• Although entire lists can be changed by a function, individual list elements that are numeric and
immutable sequence data types cannot be changed. To pass a list element to a function, use the
subscripted name of the list element as an argument in the function call.
• Slicing creates a new sequence; therefore, when a program passes a slice to a function, the original
sequence is not affected.
• Sorting data is the process of placing data into a particular order.
• By default, list method sort sorts the elements of a list in ascending order.
• Some sorting algorithms are simple to express and program, but are inefficient. Other algorithms
are complex and sophisticated, but provide increased performance.
• Often, programmers work with large amounts of data stored in lists. It might be necessary to de-
termine whether a list contains a value that matches a certain key value. The process of locating a
particular element value in a list is called searching.
• Keyword in tests whether a sequence contains a particular value.
• List method index takes a search key as a parameter, searches through the list and returns the
index of the first list value that matches the search key. If the list does not contain any value that
matches the search key, the program displays an error message.
• Sequences can contain elements that are also sequences. Such sequences have multiple subscripts.
A common use of multiple-subscripted sequences is to represent tables of values consisting of in-
formation arranged in rows and columns.
• To identify a particular table element, we must specify two subscripts—by convention, the first
identifies the element’s row, the second identifies the element’s column.
• Sequences that require two subscripts to identify a particular element are called double-subscript-
ed sequences or two-dimensional sequences.
• Python does not support multiple-subscripted sequences directly, but allows programmers to spec-
ify single-subscripted tuples and lists whose elements are also single-subscripted tuples and lists,
thus achieving the same effect.
• A sequence with m rows and n columns is called an m-by-n sequence. It is more commonly know
as two-dimensional sequence.
• The name of every element in a multiple-subscripted sequence is of the form a[ i ][ j ], where
a is the name of the sequence, and i and j are the subscripts that uniquely identify the row and
column of each element in the sequence.
• To compute pure numerical problems (i.e., multi-dimensional arrays), use package NumPy (nu-
merical Python). This package contains modules that handle arrays and provides multi-dimension-
al array objects for efficient computation.
TERMINOLOGY
append method of list bracket operator ([])
array clear method of dictionary
associative array column
pythonhtp1_05.fm Page 190 Saturday, December 8, 2001 9:35 AM
SELF-REVIEW EXERCISES
5.1 Fill in the blanks in each of the following statements:
a) are “associative arrays” that consist of pairs.
b) The last element in a sequence can always be accessed with subscript .
c) Statement creates a singleton aTuple.
d) Function returns the length of a sequence.
e) Selecting a portion of a sequence with the operator [:] is called .
f) Dictionary method returns a list of key-value pairs.
pythonhtp1_05.fm Page 191 Saturday, December 8, 2001 9:35 AM
EXERCISES
5.3 Use a list to solve the following problem: Read in 20 numbers. As each number is read, print
it only if it is not a duplicate of a number already read.
5.4 Use a list of lists to solve the following problem. A company has four salespeople (1 to 4)
who sell five different products (1 to 5). Once a day, each salesperson passes in a slip for each differ-
ent type of product sold. Each slip contains:
a) The salesperson number.
b) The product number.
c) The number of that product sold that day.
Thus, each salesperson passes in between 0 and 5 sales slips per day. Assume that the information
from all of the slips for last month is available. Write a program that will read all this information for
last month’s sales and summarize the total sales by salesperson by product. All totals should be
stored in list sales. After processing all the information for last month, display the results in tabu-
lar format, with each of the columns representing a particular salesperson and each of the rows rep-
resenting a particular product. Cross-total each row to get the total sales of each product for last
month; cross-total each column to get the total sales by salesperson for last month. Your tabular
printout should include these cross-totals to the right of the totaled rows and at the bottom of the
totaled columns.
pythonhtp1_05.fm Page 192 Saturday, December 8, 2001 9:35 AM
5.5 (The Sieve of Eratosthenes) A prime integer is any integer greater than 1 that is evenly divis-
ible only by itself and 1. The Sieve of Eratosthenes is a method of finding prime numbers. It operates
as follows:
a) Create a list with all elements initialized to 1 (true). List elements with prime subscripts
will remain 1. All other list elements will eventually be set to zero.
b) Starting with list element 2, every time a list element is found whose value is 1, loop
through the remainder of the list and set to zero every element whose subscript is a mul-
tiple of the subscript for the element with value 1. For list subscript 2, all elements be-
yond 2 in the list that are multiples of 2 will be set to zero (subscripts 4, 6, 8, 10, etc.);
for list subscript 3, all elements beyond 3 in the list that are multiples of 3 will be set to
zero (subscripts 6, 9, 12, 15, etc.); and so on.
When this process is complete, the list elements that are still set to 1 indicate that the subscript is a
prime number. These subscripts can then be printed. Write a program that uses a list of 1000 ele-
ments to determine and print the prime numbers between 2 and 999. Ignore element 0 of the list.
5.6 (Bubble Sort) Sorting data (i.e. placing data into some particular order, such as ascending or
descending) is one of the most important computing applications. Python lists provide a sort meth-
od. In this exercise, readers implement their own sorting function, using the bubble-sort method. In
the bubble sort (or sinking sort), the smaller values gradually “bubble” their way upward to the top of
the list like air bubbles rising in water, while the larger values sink to the bottom of the list. The pro-
cess that compares each adjacent pair of elements in a list in turn and swaps the elements if the second
element is less than the first element is called a pass. The technique makes several passes through the
list. On each pass, successive pairs of elements are compared. If a pair is in increasing order, bubble
sort leaves the values as they are. If a pair is in decreasing order, their values are swapped in the list.
After the first pass, the largest value is guaranteed to sink to the highest index of a list. After the sec-
ond pass, the second largest value is guaranteed to sink to the second highest index of a list, and so
on. Write a program that uses function bubbleSort to sort the items in a list.
5.7 (Binary Search) When a list is sorted, a high-speed binary search technique can find items in
the list quickly. The binary search algorithm eliminates from consideration one-half of the elements
in the list being searched after each comparison. The algorithm locates the middle element of the list
and compares it with the search key. If they are equal, the search key is found, and the subscript of
that element is returned. Otherwise, the problem is reduced to searching one half of the list. If the
search key is less than the middle element of the list, the first half of the list is searched. If the search
key is not the middle element in the specified piece of the original list, the algorithm is repeated on
one-quarter of the original list. The search continues until the search key is equal to the middle ele-
ment of the smaller list or until the smaller list consists of one element that is not equal to the search
key (i.e. the search key is not found.)
Even in a worst-case scenario, searching a list of 1024 elements will take only 10 comparisons
during a binary search. Repeatedly dividing 1024 by 2 (because after each comparison we are able to
eliminate from the consideration half the list) yields the values 512, 256, 128, 64, 32, 16, 8, 4, 2 and 1.
The number 1024 (210) is divided by 2 only ten times to get the value 1. Dividing by 2 is equivalent to
one comparison in the binary-search algorithm. A list of 1,048,576 (220) elements takes a maximum of
20 comparisons to find the key. A list of one billion elements takes a maximum of 30 comparisons to
find the key. The maximum number of comparisons needed for the binary search of any sorted list can
be determined by finding the first power of 2 greater than or equal to the number of elements in the list.
Write a program that implements function binarySearch, which takes a sorted list and a
search key as arguments. The function should return the index of the list value that matches the
search key (or -1, if the search key is not found).
5.8 Create a dictionary of 20 random values in the range 1–99. Determine whether there are any
duplicate values in the dictionary. (Hint: you many want to sort the list first.)
pythonhtp1_06.fm Page 193 Saturday, December 8, 2001 1:27 PM
6
Introduction to the
Common Gateway
Interface (CGI)
Objectives
• To understand the Common Gateway Interface (CGI)
protocol.
• To understand the Hypertext Transfer Protocol
(HTTP).
• To implement CGI scripts.
• To use XHTML forms to send information to CGI
scripts.
• To understand and parse query strings.
• To use module cgi to process information from
XHTML forms.
This is the common air that bathes the globe.
Walt Whitman
The longest part of the journey is said to be the passing of the
gate.
Marcus Terentius Varro
Railway termini...are our gates to the glorious and unknown.
Through them we pass out into adventure and sunshine, to
them, alas! we return.
E. M. Forster
There comes a time in a man’s life when to get where he has
to go—if there are no doors or windows—he walks through
a wall.
Bernard Malamud
pythonhtp1_06.fm Page 194 Saturday, December 8, 2001 1:27 PM
Outline
6.1 Introduction
6.2 Client and Web Server Interaction
6.2.1 System Architecture
6.2.2 Accessing Web Servers
6.2.3 HTTP Transactions
6.3 Simple CGI Script
6.4 Sending Input to a CGI Script
6.5 Using XHTML Forms to Send Input and Using Module cgi to Retrieve
Form Data
6.6 Using cgi.FieldStorage to Read Input
6.7 Other HTTP Headers
6.8 Example: Interactive Portal
6.9 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
6.1 Introduction
The Common Gateway Interface (CGI) describes a set of protocols through which appli-
cations (commonly called CGI programs or CGI scripts) interact with Web servers and
indirectly with Web browsers (e.g., client applications). A Web server is a specialized
software application that responds to client application requests by providing resources
(e.g. Web pages). CGI protocols often generate Web content dynamically. A Web page is
dynamic if a program on the Web server generates that page’s content each time a user
requests the page. For example, a form in a Web page could request that a user enter a zip
code. When the user types and submits the zip code, the Web server can use a CGI pro-
gram to create a page that displays information about the weather in that client’s region.
In contrast, static Web page content never changes unless the Web developers edit the doc-
ument.
CGI is “common” because it is not specific to any operating system (e.g., Linux or
Windows), to any programming language or to any Web server software. CGI can be used
with virtually any programming or scripting language, such as C, Perl and Python. In this
chapter, we explain how Web clients and servers interact. We introduce the basics of CGI
and use Python to write CGI scripts.
The CGI protocol was developed in 1993 by the National Center for Supercomputing
Applications (NCSA—www.ncsa.uiuc.edu), for use with its HTTPd Web server.
NCSA developed CGI to be a simple tool to produce dynamic Web content. The simplicity
of CGI resulted in its widespread use and in its adoption as an unofficial worldwide pro-
tocol. CGI was quickly incorporated into additional Web servers, such as Microsoft
Internet Information Services (IIS) and Apache (www.apache.org).
pythonhtp1_06.fm Page 195 Saturday, December 8, 2001 1:27 PM
indicates that the text between the opening <title> tag and the closing </title> tag is
the Web page’s title. The browser renders the text between these tags in a specific manner.
XHTML requires syntactically correct documents—markup must follow specific rules.
For example, XHTML tags must be in all lowercase letters and all opening tags must have
corresponding closing tags. We discuss XHTML in detail in Appendix I and Appendix J.
Each Web page has a unique Uniform Resource Locator (URL) associated with it—an
address of sorts. The URL contains information that directs a browser to the resource (most
often a Web page) the user wishes to access. For example, consider the URL
http://www.deitel.com/books/downloads.html
The first part of the address, http://, indicates that the resource is to be obtained using
the Hypertext Transfer Protocol (HTTP). During this interaction, the Web server and the
client communicate using the platform-independent HTTP, a protocol for transferring re-
quests and files over the Internet (e.g., between Web servers and Web browsers).
Section 6.2.3 discusses HTTP.
The next section of the URL—www.deitel.com—is the hostname of the server,
which is the name of the server computer, the host, on which the resource resides. A domain
name system (DNS) server translates the hostname (www.deitel.com) into an Internet
Protocol (IP) address (e.g., 207.60.134.230) that identifies the server computer (just
as a telephone number uniquely identifies a particular phone line). This translation opera-
tion is a DNS lookup. A DNS server maintains a database of hostnames and their corre-
sponding IP addresses.
The remainder of the URL specifies the requested resource—/books/down-
loads.html. This portion of the URL specifies both the name of the resource (down-
loads.html—an HTML/XHTML document) and its path (/books). The Web server
maps the URL to a file (or other resource, such as a CGI program) on the server, or to another
resource on the server’s network. The Web server then returns the requested document to the
client. The path represents a directory in the Web server’s file system. It also is possible that
the resource is created dynamically and does not reside anywhere on the server computer. In
this case, the URL uses the hostname to locate the correct server, and the server uses the path
and resource information to locate (or create) the resource to respond to the client’s request.
As we will see, URLs also can provide input to a CGI program residing on a server.
Database
We can request document from local Web servers through the machine name or
through localhost—a hostname that references the local machine. We use local-
host in this book. To determine the machine name in Windows 98, right-click Network
Neighborhood, and select Properties from the context menu to display the Network
dialog. In the Network dialog, click the Identification tab. The computer name displays
in the Computer name: field. Click Cancel to close the Network dialog. In Windows
2000, right click My Network Places and select Properties from the context menu to
display the Network and Dialup Connections explorer. In the explorer, click Net-
work Identification. The Full Computer Name: field in the System Properties
window displays the computer name. To determine the machine name on most Linux
machines, simply type the command hostname at a shell prompt.
A client also can access a server by specifying the server’s domain name or IP address
(e.g., in a Web browser’s Address field). A domain name represents a group of hosts on
the Internet; it combines with a hostname (such as www—a common hostname for Web
servers) and a top-level domain (TLD) to form a fully qualified hostname, which provides
a user-friendly way to identify a site on the Internet. In a fully qualified hostname, the TLD
often describes the type of organization that owns the domain name. For example, the com
TLD usually refers to a commercial business, whereas the org TLD usually refers to a non-
profit organization. In addition, each country has its own TLD, such as cn for China, et
for Ethiopia, om for Oman and us for the United States.
Web server
Client
Internet
Fig. 6.2 Client interacting with server and Web server. Step 1: The request, GET
/books/downloads.html HTTP/1.1.
Web server
Client
Internet The server responds to
the request with an
appropriate message,
along with the resource
contents.
Fig. 6.2 Client interacting with server and Web server. Step 2: The HTTP response,
HTTP/1.1 200 OK.
HTTP method indicating that the client is requesting a resource. The next part of the request
provides the name (downloads.html) and path (/books/) of the resource (an HTML/
XHTML document). The final part of the request provides the protocol’s name and version
number (HTTP/1.1).
Servers that understand HTTP version 1.1 translate this request and respond (step 2,
Fig. 6.2). The server responds with a line indicating the HTTP version, followed by a status
code that consists of a numeric code and phrase describing the status of the transaction. For
example,
HTTP/1.1 200 OK
informs the client that the requested resource was not found on the server in the location
specified by the URL.
Browsers often cache (save on a local disk) Web pages for quick reloading, to reduce
the amount of data that the browser needs to download. However, browsers typically do not
cache server responses to post requests, because subsequent post requests may not contain
the same information. For example, several users who participate in a Web-based survey
pythonhtp1_06.fm Page 199 Saturday, December 8, 2001 1:27 PM
may request the same Web page. Each user’s response changes the overall results of the
survey, thus the data on the Web server is changed.
On the other hand, Web browsers cache server responses to get requests. With a Web-
based search engine, a get request normally supplies the search engine with search criteria
specified in an XHTML form. The search engine then performs the search and returns the
results as a Web page. These pages are cached in the event that the user performs the same
search again.
The server normally sends one or more HTTP headers, which provide additional infor-
mation about the data sent in response to the request. In this case, the server is sending an
HTML/XHTML text document, so the HTTP header reads
Content-type: text/html
This information is known as the MIME (Multipurpose Internet Mail Extensions) type of
the content. MIME is an Internet standard that specifies how messages should be formatted,
and clients use the content type to determine how to represent the content to the user. Each
type of data sent has a MIME type associated with it that helps the browser determine how
to process the data it receives. For example, the MIME type text/plain indicates that
the data is text that should be displayed without attempting to interpret any of the content
as HTML or XHTML markup. Similarly, the MIME type image/gif indicates that the
content is a GIF (Graphics Interchange Format) image. When this MIME type is received
by the browser, it attempts to display the image. For more information on MIME, visit
www.nacs.uci.edu/indiv/ehood/MIME/MIME.html
The header (or set of headers) is followed by a blank line (a carriage return, line feed or
combination of both) which indicates to the client that the server is finished sending HTTP
headers. The server then sends the text in the requested HTML/XHTML document (down-
loads.html). The connection terminates when the transfer of the resource completes. The
client-side browser interprets the text it receives and displays (or renders) the results.
This section examined how a simple HTTP transaction is performed between a Web-
browser application on the client side (e.g., Microsoft Internet Explorer or Netscape Com-
municator) and a Web-server application on the server side (e.g., Apache or IIS). Next, we
introduce CGI programming.
As long as a file on the server remains unchanged, its associated URL will display the
same content in clients’ browsers each time the file is accessed. For the content in the file
to change (e.g., to include new links or the latest company news), someone must alter the
file manually (probably with a text editor or Web-page design software) then load the
changed file back onto the server.
Manually changing Web pages is not feasible for those who want to create interesting
and dynamic Web pages. For example, if you want your Web page always to display the
current date or weather, the page would require continuous updating.
The examples in this chapter rely heavily upon XHTML and Cascading Style Sheets
(CSS). CSS allows document authors to specify the presentation of elements on a Web page
(spacing, margins, etc.) separately from the structure of the document (section headers,
body text, links, etc.). Readers not familiar with these technologies will want to read
Appendix I and Appendix J, which describe XHTML in detail and Appendix K, Cascading
Style Sheets, which introduces CSS.
Figure 6.3 illustrates the full program listing for our first CGI script. Line 1
#!c:\Python\python.exe
is a directive (sometimes called the pound-bang or sh-bang) that specifies the location of
the Python interpreter on the server. This directive must be the first line in a CGI script. The
examples in this chapter are for Window users. For UNIX or Linux-based machines, the
directive typically is one of the following:
#!/usr/bin/python
#!/usr/local/bin/python
#!/usr/bin/env python
depending on the location of the Python interpreter. [Note: If you do not know where the
Python interpreter resides, contact the server administrator.]
Common Programming Error 6.1
Forgetting to put the directive (#!) in the first line of a CGI script is an error if the Web serv-
er running the script does not understand the .py filename extension. 6.1
Line 5 imports module time. This module obtains the current time on the Web
server and displays it in the user’s browser. Lines 7–17 define function printHeader.
This function takes argument title, which corresponds to the title of the Web page. Line
pythonhtp1_06.fm Page 201 Saturday, December 8, 2001 1:27 PM
1 #!c:\Python\python.exe
2 # Fig. 6.3: fig06_03.py
3 # Displays current date and time in Web browser.
4
5 import time
6
7 def printHeader( title ):
8 print """Content-type: text/html
9
10 <?xml version = "1.0" encoding = "UTF-8"?>
11 <!DOCTYPE html PUBLIC
12 "-//W3C//DTD XHTML 1.0 Strict//EN"
13 "DTD/xhtml1-strict.dtd">
14 <html xmlns = "http://www.w3.org/1999/xhtml">
15 <head><title>%s</title></head>
16
17 <body>""" % title
18
19 printHeader( "Current date and time" )
20 print time.ctime( time.time() )
21 print "</body></html>"
8 prints the HTTP header. Notice that line 9 is blank, which denotes the end of the HTTP
headers. The line that follows the last HTTP header must be a blank line, otherwise Web
browsers cannot render the content properly. Lines 10–14 print the XML declaration, doc-
ument type declaration and opening <html> tag. For more information on XML, see
Chapter 15. Lines 15–17 contain the XHTML document header and title and begin the
XHTML document body.
Common Programming Error 6.2
Failure to place a blank line after an HTTP header is an error. 6.2
Line 19 begins the main portion of the program by calling function printHeader
and passing an argument that represents the title of the Web page. Line 20 calls two func-
tions in module time to print the current time. Function time.time returns a floating-
point value that represents the number of seconds since midnight, January 1, 1970 (called
pythonhtp1_06.fm Page 202 Saturday, December 8, 2001 1:27 PM
the epoch). Function time.ctime takes as an argument the number of seconds since the
epoch and returns a human-readable string that represents the current time. We conclude
the program by printing the XHTML body and document closing tags. For a complete list
of functions in module time, visit
www.python.org/doc/current/lib/module-time.html
Note that the program consists almost entirely of print statements. Until now, the
output of print has always displayed on the screen. However, technically speaking, the
default target for print is standard output—an information stream presented to the user
by an application. Typically, standard output is displayed on the screen, but it may be sent
to a printer, written to a file, etc. When a Python program executes as a CGI script, the
server redirects the standard output to the client Web browser. The browser interprets the
headers and tags as if they were part of a normal server response to an XHTML document
request.
Executing the program requires a properly configured server. [Note: In this book, we
use the Apache Web server. For information on obtaining and configuring Apache, refer to
our Python Web resources at www.deitel.com.] Once a server is available, the Web
server site administrator specifies where CGI scripts can reside and what names are allowed
for them. In our example, we place the Python file in the Web server’s cgi-bin directory.
For UNIX and Linux users, it also is necessary to set the permissions before executing the
program. For example, UNIX and Linux command
chmod 755 fig06_02.py
in the browser’s Address or Location field. If the server resides on a different computer,
replace localhost with the server’s hostname or IP address. [Note: The IP address of
localhost is always 127.0.0.1.] Requesting the document causes the server to exe-
cute the program and return the results.
Figure 6.4 illustrates the process of calling a CGI script. First, the client requests the
resource named fig06_02.py from the server, just as the client requested down-
loads.html in the previous example (Step 1). If the server has not been configured to
handle CGI scripts, it might return the Python code as text to the client.
A properly configured Web server, however, recognizes that certain resources need to
be processed differently. For example, when the resource is a CGI script, the script must be
executed by the Web server. A resource usually is designated as a CGI script in one of two
ways—either it has a special filename extension (such as .cgi or .py), or it is located in
a specific directory (often cgi-bin). In addition, the server administrator must grant
explicit permission for remote access and CGI-script execution.
The server recognizes that the resource is a Python script and invokes Python to exe-
cute the script (Step 2). The program executes, and the text sent to standard output is
returned to the Web server (Step 3). Finally, the Web server prints an additional line to the
output that indicates the status of the HTTP transaction (such as HTTP/1.1 200 OK, for
success) and sends the whole body of text to the client (Step 4).
pythonhtp1_06.fm Page 203 Saturday, December 8, 2001 1:27 PM
Fig. 6.4 Step 2: The Web server starts the CGI script. (Part 2 of 4.)
Fig. 6.4 Step 3: The output of the script is sent to the Web server. (Part 3 of 4.)
pythonhtp1_06.fm Page 204 Saturday, December 8, 2001 1:27 PM
Fig. 6.4 Step 4: The HTTP response, HTTP/1.1 200 OK. (Part 4 of 4.)
The browser on the client side then processes the XHTML output and displays the
results. It is important to note that the browser does not know about the work the server has
done to execute the CGI script and return XHTML output. As far as the browser is con-
cerned, it is requesting a resource like any other and receiving a response like any other.
The client computer is not required to have a Python interpreter installed, because the script
executes on the server. The client simply receives and processes the script’s output.
We now consider a more involved CGI program. Figure 6.5 organizes all CGI environ-
ment variables and their corresponding values in an XHTML table, which is then displayed
in a Web browser. Environment variables contain information about the execution environ-
ment in which script is being run. Such information includes the current user name and the
name of the operating system. A CGI program uses environment variables to obtain infor-
mation about the client (e.g., the client’s IP address, operating system type, browser type,
etc.) or to obtain information passed from the client to the CGI program.
Line 6 imports module cgi. This module provides several CGI-related capabilities,
including text-formatting, form-processing and URL parsing. In this example, we use
module cgi to format XHTML text; in later examples, we use module cgi to process
XHTML forms.
1 #!c:\Python\python.exe
2 # Fig. 6.5: fig06_05.py
3 # Program displaying CGI environment variables.
4
5 import os
6 import cgi
7
8 def printHeader( title ):
9 print """Content-type: text/html
10
11 <?xml version = "1.0" encoding = "UTF-8"?>
12 <!DOCTYPE html PUBLIC
13 "-//W3C//DTD XHTML 1.0 Strict//EN"
14 "DTD/xhtml1-strict.dtd">
15 <html xmlns = "http://www.w3.org/1999/xhtml">
16 <head><title>%s</title></head>
17
18 <body>""" % title
19
20 rowNumber = 0
21 backgroundColor = "white"
22
23 printHeader( "Environment Variables" )
24 print """<table style = "border: 0">"""
25
26 # print table of cgi variables and values
27 for item in os.environ.keys():
28 rowNumber += 1
29
30 if rowNumber % 2 == 0: # even row numbers are white
31 backgroundColor = "white"
32 else: # odd row numbers are grey
33 backgroundColor = "lightgrey"
34
35 print """<tr style = "background-color: %s">
36 <td>%s</td><td>%s</td></tr>""" % ( backgroundColor,
37 cgi.escape( item ), cgi.escape( os.environ[ item ] ) )
38
39 print """</table></body></html>"""
background color for each row. For each environment variable, lines 35–37 create a new
row in the table containing that key and the corresponding value.
Note that line 37 calls function cgi.escape and passes as values each environment
variable name and value. This function takes a string and returns a properly formatted
XHTML string. Proper formatting means that special XHTML characters, such as the less-
than and greater-than signs (< and >), are “escaped.” For example, function escape
returns a string where “<” is replaced by “<”, “>” is replaced by “>” and “&” is
replaced by “&”. The replacement signifies that the browser should display a char-
acter instead of treating the character as markup. After we have printed all the environment
variables, we close the table, body and html tags (line 39).
1 #!c:\Python\python.exe
2 # Fig. 6.6: fig06_06.py
3 # Example using QUERY_STRING.
4
5 import os
6 import cgi
7
8 def printHeader( title ):
9 print """Content-type: text/html
10
11 <?xml version = "1.0" encoding = "UTF-8"?>
12 <!DOCTYPE html PUBLIC
13 "-//W3C//DTD XHTML 1.0 Strict//EN"
14 "DTD/xhtml1-strict.dtd">
15 <html xmlns = "http://www.w3.org/1999/xhtml">
16 <head><title>%s</title></head>
17
18 <body>""" % title
19
20 printHeader( "QUERY_STRING example" )
21 print "<h1>Name/Value Pairs</h1>"
22
23 query = os.environ[ "QUERY_STRING" ]
24
25 if len( query ) == 0:
26 print """<p><br />
27 Please add some name-value pairs to the URL above.
28 Or try
29 <a href = "fig06_06.py?name=Veronica&age=23">this</a>.
30 </p>"""
31 else:
32 print """<p style = "font-style: italic">
33 The query string is '%s'.</p>""" % cgi.escape( query )
34 pairs = cgi.parse_qs( query )
35
36 for key, value in pairs.items():
37 print "<p>You set '%s' to value %s</p>"" % \
38 ( key, value )
39
40 print "</body></html>"
If the query string is not empty, the value of the query string (lines 31–32) prints. Func-
tion cgi.parse_qs parses (i.e., “splits-up”) the query string (line 33). This function
takes as an argument a query string and returns a dictionary of name-value pairs contained
in the query string. Lines 35–37 contain a for loop to print the names and values contained
in dictionary pairs.
6.5 Using XHTML Forms to Send Input and Using Module cgi to
Retrieve Form Data
If Web page users had to type all the information that the page required into the page’s URL
every time the user wanted to access the page, Web surfing would be quite a laborious task.
XHTML provides forms on Web pages that provide a more intuitive way for users to input
information to CGI scripts.
The <form> and </form> tags surround an XHTML form. The <form> tag typi-
cally takes two attributes. The first attribute is action, which specifies the operation to
perform when the user submits the form. For our purposes, the operation usually will be to
call a CGI script to process the form data. The second attribute is method, which is either
get or post. In this section, we show examples using both methods. An XHTML form may
pythonhtp1_06.fm Page 209 Saturday, December 8, 2001 1:27 PM
contain any number of elements. Figure 6.7 gives a brief description of several possible ele-
ments to include.
Figure 6.8 demonstrates a basic XHTML form that uses the HTTP get method. Lines
21–26 output the form. Notice that the method attribute is get and the action attribute
is fig06_08.py (i.e., the script calls itself to handle the form data once they are sub-
mitted—this is called a postback).
The form contains two input fields. The first is a single-line text field (type =
"text") with the name word (line 23). The second displays a button, labeled Submit
word, to submit the form data (line 24).
The first time the script executes, QUERY_STRING should contain no value (unless
the user has specifically appended a query string to the URL). However, once the user
enters a word into the word text field and clicks the Submit word button, the script is
called again. This time, the QUERY_STRING environment variable contains the name of
the text-input field (word) and the user-entered value. For example, if the user enters the
word python and clicks the Submit word button, QUERY_STRING would contain the
value "word=python".
type attribute
Tag name (for <input> tags) Description
1 #!c:\Python\python.exe
2 # Fig. 6.8: fig06_08.py
3 # Demonstrates get method with an XHTML form.
4
5 import cgi
6
7 def printHeader( title ):
8 print """Content-type: text/html
9
10 <?xml version = "1.0" encoding = "UTF-8"?>
11 <!DOCTYPE html PUBLIC
12 "-//W3C//DTD XHTML 1.0 Strict//EN"
13 "DTD/xhtml1-strict.dtd">
14 <html xmlns = "http://www.w3.org/1999/xhtml">
15 <head><title>%s</title></head>
16
17 <body>""" % title
18
19 printHeader( "Using 'get' with forms" )
20 print """<p>Enter one of your favorite words here:<br /></p>
21 <form method = "get" action = "fig06_08.py">
22 <p>
23 <input type = "text" name = "word" />
24 <input type = "submit" value = "Submit word" />
25 </p>
26 </form>"""
27
28 pairs = cgi.parse()
29
30 if pairs.has_key( "word" ):
31 print """<p>Your word is:
32 <span style = "font-weight: bold">%s</span></p>""" \
33 % cgi.escape( pairs[ "word" ][ 0 ] )
34
35 print "</body></html>"
Line 28 uses function cgi.parse to parse the form data. This function is similar to
function cgi.parse_qs, except that cgi.parse parses the data from standard input
(as opposed to the query string) and returns the name-value pairs in a dictionary.
Thus, during the second execution of the script, when the query string is parsed, line
28 assigns the returned dictionary to variable pairs. If dictionary pairs contains the key
"word", the user has submitted at least one word and the program prints the word(s) to the
browser. The words are passed to function cgi.escape in case the input includes some
special characters (such as <, > or a space). Lines 31–33 use CSS to display the result. CSS
is discussed in Appendix K, Cascading Style Sheets (CSS). In Fig. 6.8, we see that the
spaces in the address bar are replace by plus signs because Web browsers URL-encode
XHTML-form data they send, which means that spaces are turned into plus signs and that
certain other symbols (such as the apostrophe) are translated into their ASCII value in hexa-
decimal and preceded with a percent sign.
Using get with an XHTML form passes data to the CGI script in the same way that we
saw in Fig. 6.6—through environment variables. Another way that CGI scripts interact
with servers is via standard input and the post method. For comparison purposes, let us now
reimplement the application of Fig. 6.8 using post. Notice that the code in the two figures
is virtually identical. The XHTML form indicates that we are now using the post method
to submit the form data (line 21).
1 #!c:\Python\python.exe
2 # Fig. 6.9: fig06_09.py
3 # Demonstrates post method with an XHTML form.
4
5 import cgi
6
7 def printHeader( title ):
8 print """Content-type: text/html
9
10 <?xml version = "1.0" encoding = "UTF-8"?>
11 <!DOCTYPE html PUBLIC
12 "-//W3C//DTD XHTML 1.0 Strict//EN"
13 "DTD/xhtml1-strict.dtd">
The post method sends data to a CGI script via standard input. The data are encoded
just as in QUERY_STRING (that is, with name-value pairs connected by equals signs and
ampersands), but the QUERY_STRING environment variable is not set. Instead, the post
method sets the environment variable CONTENT_LENGTH, to indicate the number of char-
acters of data that were sent (or posted). A benefit of the post method is that the number of
characters of data can vary in size.
Although methods get and post are similar, some important differences exist. A get
request sends form content as part of the URL. A post request posts form content to the end
of an HTTP request. Another difference is the manner in which browsers process
responses. Browsers often cache (save on disk) Web pages, so that when the Web page is
requested a second time, the browser need not download the page again, but can load the
page from the cache. This process speeds up the user’s browsing experience by reducing
the amount of data downloaded to view a Web page. Browsers do not cache the server
pythonhtp1_06.fm Page 213 Saturday, December 8, 2001 1:27 PM
responses to post requests, however, because subsequent post requests might not contain
the same information.
This method of handling responses is different from that of handling get requests.
When a Web-based search engine is used, a get request normally supplies the search engine
with the information specified in the XHTML form. The search engine then performs the
search and returns the results as a Web page.
Software Engineering Observation 6.2
Most Web servers limit get request query strings to 1024 characters. If a query string exceeds
this limit, use the post request. 6.2
1 #!c:\Python\python.exe
2 # Fig. 6.10: fig06_10.py
3 # Demonstrates use of cgi.FieldStorage an with XHTML form.
4
5 import cgi
6
7 def printHeader( title ):
8 print """Content-type: text/html
9
10 <?xml version = "1.0" encoding = "UTF-8"?>
11 <!DOCTYPE html PUBLIC
12 "-//W3C//DTD XHTML 1.0 Strict//EN"
13 "DTD/xhtml1-strict.dtd">
14 <html xmlns = "http://www.w3.org/1999/xhtml">
15 <head><title>%s</title></head>
16
17 <body>""" % title
18
19 printHeader( "Using cgi.FieldStorage with forms" )
20 print """<p>Enter one of your favorite words here:<br /></p>
21 <form method = "post" action = "fig06_10.py">
22 <p>
23 <input type = "text" name = "word" />
24 <input type = "submit" value = "Submit word" />
25 </p>
26 </form>"""
27
28 form = cgi.FieldStorage()
29
30 if form.has_key( "word" ):
31 print """<p>Your word is:
32 <span style = "font-weight: bold">%s</span></p>""" \
33 % cgi.escape( form[ "word" ].value )
34
35 print "</body></html>"
Content-type: text/html
For example,
prints the Content-type header with the text/plain content type. If the con-
tent-type of a page is specified as text/plain, the page is processed as plain text
instead of as an HTML or XHTML document.
In addition to HTTP header Content-type, a CGI script can supply other HTTP
headers. In most cases, the server passes these extra headers to the client untouched. For
example, the following Refresh header redirects the client to a new location after a spec-
ified amount of time:
Five seconds after the Web browser receives this header, the browser requests the resource
at the specified URL. Alternatively, the Refresh header can omit the URL, in which case
it refreshes the current page at the given time interval.
The CGI protocol indicates that certain types of headers output by a CGI script are to
be handled by the server, rather than be passed directly to the client. The first of these is the
Location header. Like the Refresh header, Location redirects the client to a new
location:
Location: http://www.deitel.com/newpage.html
If used with a relative URL (e.g., Location: /newpage.html), the Location head-
er indicates to the server that the redirection is to be performed on the server side, without
sending the Location header back to the client. In this case, it appears to the user as if
the browser originally requested that resource. When a Python script uses the Location
header, the Content-type header is not necessary because the new resource has its own
content type.
The CGI specification also includes a Status header, which tells the server to output
a status-header line (e.g., HTTP/1.1 200 OK). Normally, the server sends the appropriate
status line to the client (adding, for example, the 200 OK status code in most cases). How-
ever, CGI allows you to change the response status if you so desire. For example, sending a
header indicates that, although the request was successful, the browser should continue to
display the same page. This header might be useful if you want to allow users to submit
forms without moving to a new page.
We now have covered the fundamentals of the CGI protocol. To review, the CGI pro-
tocol allows scripts to interact with servers in three basic ways:
1. through the output of headers and content to the client via standard output;
2. by the server’s setting of environment variables (including the URL-encoded
QUERY_STRING) whose values are available within the script (via os.envi-
ron); and
3. through posted, URL-encoded data that the server sends to the script’s standard
input.
Figure 6.12 is the CGI script that processes the data received from the client. Line 20
retrieves the form data in a cgi.FieldStorage instance and assigns the result to local
pythonhtp1_06.fm Page 217 Saturday, December 8, 2001 1:27 PM
variable form. The if structure that begins in line 22 tests whether form contains the key
"name". If form does not contain that key, the user has not entered a name, and we
print a Location HTTP header (line 23) to redirect the user to the XHTML file where
the user can enter a name (fig06_11.html). The document fig06_11.html is con-
tained in the Web server’s main document root (as indicated by the / that precedes the page
name). The effect of line 23 is that clients who try to access fig06_12.py directly,
without going through the login procedure, must enter through the portal.
1 #!c:\Python\python.exe
2 # Fig. 6.12: fig06_12.py
3 # Handles entry to Bug2Bug Travel.
4
5 import cgi
6
7 def printHeader( title ):
8 print """Content-type: text/html
9
10 <?xml version = "1.0" encoding = "UTF-8"?>
11 <!DOCTYPE html PUBLIC
12 "-//W3C//DTD XHTML 1.0 Strict//EN"
13 "DTD/xhtml1-strict.dtd">
14 <html xmlns = "http://www.w3.org/1999/xhtml">
15 <head><title>%s</title></head>
16
17 <body>""" % title
18
19 form = cgi.FieldStorage()
20
21 if not form.has_key( "name" ):
22 print "Location: /fig06_11.html\n"
23 else:
24 printHeader( "Bug2Bug Travel" )
25 print "<h1>Welcome, %s!</h1>" % form[ "name" ].value
26 print """<p>Here are our weekly specials:<br /></p>
27 <ul><li>Boston to Taiwan for $300</li></ul>"""
28
29 if not form.has_key( "password" ):
30 print """<p style = "font-style: italic">
31 Become a member today for more great deals!</p>"""
32 elif form[ "password" ].value == "Coast2Coast":
33 print """<hr />
34 <p>Current specials just for members:<br /></p>
35 <ul><li>San Diego to Hong Kong for $250</li></ul>"""
36 else:
37 print """<p style = "font-style: italic">
38 Sorry, you have entered the wrong password.
39 If you have the correct password, enter
40 it to see more specials.</p>"""
41
42 print "<hr /></body></html>"
If a user has entered a name, we print a greeting that includes the user’s name and the
weekly specials (lines 26–28). Line 30 tests whether the user entered a password. If the user
has not entered a password, we invite the user to become a member (line 31). If the user has
entered a password, line 32 determines whether the password is equal to the string
"Coast2Coast". If true, we print the member specials to the browser. Note that the pass-
word, weekly specials and member specials are hard-coded (i.e., their values are supplied
in the code). If the user-entered password does not equal "Coast2Coast", the applica-
tion requests the user to enter a valid password (lines 36–38).
Performance Tip 6.1
In response to each CGI request, a Web server executes a CGI program to create the re-
sponse to the client. This process often takes more time than returning a static document.
When implementing a Web site, define content that does not change frequently as static con-
tent. This practice allows the Web server to respond to clients more quickly than if only CGI
scripting were used. 6.1
SUMMARY
• The Common Gateway Interface (CGI) describes a set of protocols through which applications
(commonly called CGI programs or CGI scripts) can interact with Web servers and (indirectly)
with clients.
• The content of dynamic Web pages does not require modification by programmers, however the
content of static Web pages requires modification by programmers.
• The Common Gateway Interface is “common” in the sense that it is not specific to any particular
operating system (such as Linux or Windows) or to any one programming language.
• HTTP describes a set of methods and headers that allow clients and servers to interact and ex-
change information in a uniform and predictable way.
pythonhtp1_06.fm Page 220 Saturday, December 8, 2001 1:27 PM
• A Web page in its simplest form is nothing more than an XHTML (Extensible Hypertext Markup
Language) document. An XHTML document is just a plain-text file containing markings (markup,
or tags) that describe to a Web browser how to display and format the information in the document.
• Hypertext information creates links to different pages or to other portions of the same page.
• Any XHTML file available for viewing over the Internet has a URL (Universal Resource Locator)
associated with it. The URL contains information that directs a browser to the resource that the
user wishes to access.
• The hostname is the name of the computer where a resource (such as an XHTML document) re-
sides. The hostname is translated into an IP address, which identifies the server on the Internet.
• To request a resource, the browser first sends an HTTP request message to the server. The server
responds with a line indicating the HTTP version, followed by a numeric code and a phrase de-
scribing the status of the transaction.
• The server normally sends one or more HTTP headers, which provide additional information
about the data being sent. The header or set of headers is followed by a blank line, which indicates
that the server has finished sending HTTP headers.
• Once the server sends the contents of the requested resource, the connection is terminated. The cli-
ent-side browser processes the XHTML it receives and displays the results.
• get is an HTTP method that indicates that the client wishes to obtain a resource.
• The function time.ctime, when called with time.time(), returns a string value such as
Wed Jul 18 10:54:57 2001.
• Redirecting output means sending output to somewhere other than the standard output, which is
normally the screen.
• Just as standard input refers to the standard method of input into a program (usually the keyboard),
standard output refers to the standard method of output from a program (usually the screen).
• If a server is not configured to handle CGI scripts, the server may return the Python program as
text to display in a Web browser.
• A properly configured Web server will recognize a CGI script and execute it. A resource is usually
designated as a CGI script in one of two ways: Either it has a specific filename extension or it is
located in a specific directory. The server administrator must explicitly give permission for remote
clients to access and execute CGI scripts.
• When the server recognizes that the resource requested is a Python script, the server invokes Py-
thon to execute the script. The Python program executes and the Web server sends the output to
the client as the response to the request.
• With a CGI script, we must explicitly include the Content-type header, whereas, with an
XHTML document, the header would be added by the Web server.
• The CGI protocol for output to be sent to a Web browser consists of printing to standard output
the Content-type header, a blank line and the data (XHTML, plain text, etc.) to be output.
• Module cgi provides functions that simplify the creation of CGI scripts. Among other things,
cgi includes a set of functions to aid in dynamic XHTML generation.
• The os.environ dictionary contains the names and values of all the environment variables.
• CGI-enabled Web servers set environment variables that provide information about both the serv-
er’s and the client’s script-execution environment.
• The environment variable QUERY_STRING provides a mechanism that enables programmers to
supply any type of data to CGI scripts. The QUERY_STRING variable contains extra information
that is appended to a URL, following a question mark (?). The question mark is not part of the
resource requested or of the query string. It simply serves as a delimiter.
pythonhtp1_06.fm Page 221 Saturday, December 8, 2001 1:27 PM
• Data put into a query string can be structured in a variety of ways, provided that the CGI script that
reads the string knows how to interpret the formatted data.
• Forms provide another way for users to input information that is sent to a CGI script.
• The <form> and </form> tags surround an XHTML form.
• The <form> tag generally takes two attributes. The first attribute is action, which specifies the
action to take when the user submits the form. The second attribute is method, which is either get
or post.
• Using get with an XHTML form causes data to be passed to the CGI script through environment
variable QUERY_STRING, which is set by the server.
• Web browsers URL-encode XHTML-form data that they send. This means that spaces are turned
into plus signs and that certain other symbols (such as the apostrophe) are translated into their
ASCII value in hexadecimal and preceded with a percent sign.
• A CGI script can supply HTTP headers in addition to Content-type. In most cases, the server
passes these extra headers to the client untouched.
• The Location header redirects the client to a new location. If used with a relative URL, the Lo-
cation header indicates to the server that the redirection is to be performed without sending the
Location header back to the client.
• The CGI specification also includes a Status header, which informs the server to output a cor-
responding status header line. Normally, the server adds the appropriate status line to the output
sent to the client. However, CGI allows users to change the response status.
TERMINOLOGY
#! directive get method
? in query string HTTP header
127.0.0.1 IP address hidden attribute value (type)
action attribute HTML (Hypertext Markup Language)
button attribute HTTP (Hypertext Transfer Protocol)
protocol HTTP method
CSS (Cascading Style Sheet) HTTP transaction
CGI (Common Gateway Interface) image attribute value (type)
CGI environment variable image/gif MIME type
.cgi file extension input HTML element
cgi module IP (Internet Protocol) address
CGI Script localhost
cgi module Location HTTP header
cgi.escape function method of XHTML form
cgi.FieldStorage object MIME (Multipurpose Internet Mail Extensions)
cgi.parse function os.environ data member
cgi.parse_qs function password attribute value (type)
cgi-bin directory .py file extension
checkbox attribute value (type) post method
CONTENT_LENGTH portal
Content-type HTTP header pound-bang directive
domain name system (DNS) QUERY_STRING environment variable
dynamic Web content radio attribute value (type)
environment variable redirect
file attribute value (type) Refresh HTTP header
form XHTML element (<form>…</form>) relative URL
pythonhtp1_06.fm Page 222 Saturday, December 8, 2001 1:27 PM
SELF-REVIEW EXERCISES
6.1 Fill in the blanks in each of the following statements:
a) CGI is an acronym for .
b) HTTP describes a set of and that allow clients and servers to in-
teract.
c) The translation of a hostname into an IP address normally is performed by a .
d) The , which is part of the HTTP header sent with every type of data, helps
the browser determine how to process the data it receives.
e) are reserved memory locations that an operating systems maintains to keep
track of system information.
f) Function takes a string and returns a properly formatted XHTML string.
g) Variable contains extra information that is appended to a URL in a get re-
quest, following a question mark.
h) The default target for print is .
i) The data member contains all the environment variables.
j) XHTML allow users to input information to a CGI script.
6.2 State whether each of the following is true or false. If false, explain why.
a) The CGI protocol is not specific to any particular operating system or programming lan-
guage.
b) Function time.ctime returns a floating-point value that represents the number of sec-
onds since the epoch.
c) The first directive of a CGI script provides the location of the Python interpreter.
d) The forward slash character acts as a delimiter between the resource and the query string
in a URL.
e) CGI scripts are executed on the client’s machine.
f) The Status: 204 No Response header indicates that a request to the server failed.
g) Redirection sends output to somewhere other than the screen.
h) The action attribute of the form element specifies the action to take when the user
submits the form.
i) A post request posts form contents to the end of an HTTP request.
j) Form data can be stored in an object of class cgi.FormStorage.
6.2 a) True. b) False. Function ctime.time takes a floating-point value that represents the
number of seconds since the epoch as an argument and returns a human-readable string representing
the current time. c) True. d) False. A question mark acts as a delimiter between the resource and the
query string in a URL. e) False. The server executes CGI scripts. f) False. The Status: 204 No
Response header indicates that, although the request was successful, the browser should continue
to display the same page. g) True. h) True. i) True. j) False. Form data can be stored in an object of
class cgi.FieldStorage.
EXERCISES
6.3 Write a CGI script that prints the squares of the integers from 1 to 10 on separate lines.
6.4 Modify your solution to Exercise 6.3 to display its output in an XHTML table. The left col-
umn should be the number, and the right column should be the square of that number.
6.5 Write a CGI script that receives as input three numbers from the client and returns a statement
indicating whether the three numbers could represent an equilateral triangle (all three sides are the
same length), an isosceles triangle (two sides are the same length) or a right triangle (the square of
one side is equal to the sum of the squares of the other two sides).
6.6 Write a soothsayer CGI program that allows the user to submit a question. When the question
is submitted, the server should display a random response from a list of vague answers.
6.7 You are provided with a portal page (see the code and output below) where people can buy
products. Write the CGI script to enable this interactive portal. The user should specify how many of
each item to buy. The total cost of the items purchased should be displayed to the user.
27 <tr>
28 <td>CD</td>
29 <td>Buy this really cool CD</td>
30 <td>$12.00</td>
31 <td><input type = "text" name = "CD" /></td>
32 </tr>
33
34 <tr>
35 <td>Book</td>
36 <td>Buy this really cool book</td>
37 <td>$19.99</td>
38 <td><input type = "text" name = "book" /></td>
39 </tr>
40
41 <tr>
42 <td>Airplane</td>
43 <td>Buy this really cool airplane</td>
44 <td>$1,000,000</td>
45 <td><input type = "text" name = "airplane" /></td>
46 </tr>
47 </table>
48
49 <input type = "submit" value = "submit">
50 </form>
51 </body>
52 </html>
6.8 Write a CGI script for a TV show survey. List five TV shows, let the survey participant rank
the TV shows with numbers from 1 (least favorite) to 5 (most favorite). Display the participant's most
favorite TV show.
pythonhtp1_07.fm Page 225 Saturday, December 8, 2001 2:29 PM
7
Object-Based
Programming
Objectives
• To understand the software-engineering concepts of
“encapsulation” and “data hiding.”
• To understand the notions of data abstraction and
abstract data types (ADTs).
• To create Python ADTs, namely classes.
• To understand how to create, use and destroy objects
of a class.
• To control access to object attributes and methods.
• To begin to appreciate the value of object orientation.
My object all sublime
I shall achieve in time.
W. S. Gilbert
Is it a world to hide virtues in?
William Shakespeare, Twelfth Night
Your public servants serve you right.
Adlai Stevenson
Classes struggle, some classes triumph, others are
eliminated.
Mao Zedong
This above all: to thine own self be true.
William Shakespeare, Hamlet
pythonhtp1_07.fm Page 226 Saturday, December 8, 2001 2:29 PM
Outline
7.1 Introduction
7.2 Implementing a Time Abstract Data Type with a Class
7.3 Special Attributes
7.4 Controlling Access to Attributes
7.4.1 Get and Set Methods
7.4.2 Private Attributes
7.5 Using Default Arguments With Constructors
7.6 Destructors
7.7 Class Attributes
7.8 Composition: Object References as Members of Classes
7.9 Data Abstraction and Information Hiding
7.10 Software Reusability
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
7.1 Introduction
Now we begin our deeper study of object orientation. Through our discussion of Python
programs in Chapters 2–6, we have already encountered many basic concepts (i.e., “object
think”) and terminology (i.e., “object speak”). Let us briefly overview some key concepts
and terminology of object orientation. Object-oriented programming (OOP) encapsulates
(i.e., wraps) data (attributes) and functions (behaviors) into components called classes. The
data and functions of a class are intimately tied together. A class is like a blueprint. Using
a blueprint, a builder can build a house. Using a class, a programmer can create an object
(also called an instance). One blueprint can be reused many times to make many houses.
One class can be reused many times to make many objects of the same class. Classes have
a property called information hiding. This means that, although objects may know how to
communicate with one another across well-defined interfaces, one object normally should
not be allowed to know how another object is implemented—implementation details are
hidden within the objects themselves. Surely it is possible to drive a car effectively without
knowing the details of how engines, transmissions and exhaust systems work internally.
We will see why information hiding is crucial to good software engineering.
In C and other procedural programming languages, programming tends to be action-
oriented; in Python, programming can be object-oriented. In procedural programming, the
unit of programming is the function. In object-oriented programming, the unit of program-
ming is the class from which objects eventually are instantiated (i.e., created).
Procedural programmers concentrate on writing functions. Groups of actions that per-
form some task are formed into functions, and functions are grouped to form programs.
Data certainly is important in procedural programming, but the view is that data exists pri-
marily in support of the actions that functions perform. The verbs in a system specifica-
tion—a document that describes the services an application should provide—help the
pythonhtp1_07.fm Page 227 Saturday, December 8, 2001 2:29 PM
procedural programmer determine the set of functions that will work together to implement
the system.
Object-oriented programmers concentrate on creating their own user-defined types,
called classes. Classes are also referred to as programmer-defined types. Each class con-
tains data and the set of functions that manipulate the data. The data components of a class
are called attributes (or data members). The functional components of a class are called
methods (or member functions, in other object-oriented languages). The focus of attention
in object-oriented programming is on classes rather than on functions. The nouns in a
system specification help the object-oriented programmer determine the set of classes that
will be used to create the objects that will work together to implement the system.
Software Engineering Observation 7.1
A central theme of this book is “reuse, reuse, reuse.” We will carefully discuss a number of
techniques for “polishing” classes to encourage reuse. We focus on “crafting valuable class-
es” and creating valuable “software assets.” 7.1
Fig. 7.1 Time class—contains attributes and methods for storing and displaying
time of day.
Line 5 contains the class’s optional documentation string—a string that describes the
class. If a class contains a documentation string, the string must appear in the line or lines
following the class header. A user can view a class’s documentation string by executing the
following statement
pythonhtp1_07.fm Page 229 Saturday, December 8, 2001 2:29 PM
print ClassName.__doc__
Modules, methods and functions also may specify a documentation string.
Good Programming Practice 7.1
Include documentation strings, where appropriate, to enhance program clarity. 7.1
Line 7 begins the definition for special method __init__, the constructor method of
the class. A constructor is a special method that executes each time an object of a class is
created. The constructor (method __init__) initializes the attributes of the object and
returns None. Python classes may define several other special methods, identified by
leading and trailing double-underscores (__) in the name. We discuss many of these special
methods in Chapter 8, Customizing Classes.
Common Programming Error 7.3
Returning a value other than None from a constructor is a fatal, runtime error. 7.3
All methods, including constructors, must specify at least one parameter. This param-
eter represents the object of the class for which the method is called. This parameter often
is referred to as the class instance object. This term can be confusing, so we refer to the first
argument of any method as the object reference argument, or simply the object reference.
Methods must use the object reference to access attributes and other methods that belong
to the class. By convention, the object reference argument is called self.
Common Programming Error 7.4
Failure to specify an object reference (usually called self) as the first parameter in a meth-
od definition causes fatal logic errors when the method is invoked at runtime. 7.4
Each object has its own namespace that contains the object’s methods and attributes. The
class’s constructor starts with an empty object (self) and adds attributes to the object’s
namespace. For example, the constructor for class Time (lines 7–12) adds three attributes
(hour, minute and second) to the new object’s namespace. Line 10 binds attribute hour
to the object’s namespace and initializes the attribute’s value to 0. Once an attribute has been
added to an object’s namespace, a client that uses the object may access the attribute’s value.
pythonhtp1_07.fm Page 230 Saturday, December 8, 2001 2:29 PM
One of the fundamental principles of good software engineering is that a client should
not need to know how a class is implemented to use that class. Python’s use of modules
facilitates this data abstraction—the program in Fig. 7.2 simply imports the Time defini-
tion and uses class Time without knowing how the class is implemented.
Software Engineering Observation 7.3
Clients of a class do nost need access to the class’s source code to use the class. 7.3
To create an object of class Time, simply “call” the class name as if it were a function
(line 6). This call invokes the constructor for class Time. Even though the class definition
stipulates that the constructor (__init__) takes one argument, line 6 does not pass any
arguments to the constructor. Python inserts the first (object reference) argument into every
method call, including a class’s constructor call. The constructor initializes the object’s
attributes. Once the constructor exits, Python assigns the newly created object to time1.
Client code must access an object’s attributes through a reference to that object. Lines
10–12 demonstrate how a program can access an object’s attributes through the dot (.)
access operator. The name of the object appears to the left of the dot, and the attribute
appears to the right of the dot. The output demonstrates the initial values that the con-
structor assigned to attributes hour, minute and second.
Client code can access an object’s methods in a similar manner. Line 16 calls time1’s
printMilitary method. Notice again that the method call passes no arguments, even
though the method definition specifies one parameter called self. Python passes a refer-
ence to time1 in the printMilitary call, so the method may access the object’s
attributes.
Line 23 modifies the value assigned to attribute time1.hour. The output from lines
24–25 shows a problem that often arises when a client indiscriminately accesses an object’s
data. The meaning of attribute hour is unclear, because that data member now has a value
of 25. We say that the data member is in an inconsistent state (it contains an invalid value).
Some other programming languages provide ways to prevent a client from accessing an
object’s data. Python, on the other hand, does not provide such strict programming con-
structs. Later in this chapter, we discuss the various ways Python programmers ensure that
an object’s data remains in a consistent state.
Common Programming Error 7.5
Directly accessing an object’s attributes may cause the data to enter an inconsistent state. 7.4
Attribute Description
__bases__ A tuple that contains base classes from which the class directly inher-
its. If the class does not inherit from other classes, the tuple is empty.
[Note: We discuss base classes and inheritance in Chapter 9, Object-
Oriented Programming: Inheritance.]
__dict__ A dictionary that corresponds to the class’s namespace. Each key-
value pair represents an identifier and its value in the namespace.
__doc__ A class’s docstring. If the class does not specify a docstring, the value
is None.
__module__ A string that contains the module (file) name in which the class is
defined.
__name__ A string that contains the class’s name.
Attribute Description
__class__ A reference to the class from which the object was instantiated.
__dict__ A dictionary that corresponds to the object’s namespace. Each key-
value pair represents an identifier and its value in the namespace.
tion. More specifically, a method that sets data member interestRate typically would be
named setInterestRate, and a method that gets the interestRate typically would
be named getInterestRate. Get methods also are called “query” methods.
It may seem that providing both set and get capabilities provides no benefit over
accessing the attributes directly, but there is a subtle difference. A get method seems to allow
clients to read the data at will, but the get method can control the formatting of the data. A set
method can—and most likely should—scrutinize attempts to modify the value of the
attribute. This ensures that the new value is appropriate for that data item. For example, a set
method can reject the following values: the value 37 as the date, a negative value as a person’s
body weight and the value 185 on an exam (when the grade range is 0–100).
Software Engineering Observation 7.4
Controlling access, especially write access, to attributes through access methods helps en-
sure data integrity. 7.4
A class’s set methods sometimes return values that indicate attempts were made to
assign invalid data to an object of the class. Clients of the class then test the return values
of set methods to determine whether the object it is manipulating is a valid object and to
take appropriate actions if the object is not valid. Alternatively, a set method may specify
that an error message—called an exception—be sent (“raised”) to the client when the client
attempts to assign an invalid value to an attribute. Raising exceptions is a topic we explore
in detail in Chapter 12, Exception Handling. Exceptions are the preferred technique for
handling invalid attribute values in Python.
Good Programming Practice 7.5
Methods that set the values of data should verify that the intended new values are proper. If
they are not, the set methods should indicate that an error has occurred. 7.5
Fig. 7.7 Access methods defined for class Time. (Part 1 of 3.)
pythonhtp1_07.fm Page 235 Saturday, December 8, 2001 2:29 PM
Fig. 7.7 Access methods defined for class Time. (Part 2 of 3.)
pythonhtp1_07.fm Page 236 Saturday, December 8, 2001 2:29 PM
Fig. 7.7 Access methods defined for class Time. (Part 3 of 3.)
Notice that the constructor creates attributes with single leading underscores (_) in
lines 10–12. Attribute names that begin with a single underscore have no special meaning
in the syntax of the Python language itself. However, the single leading underscore is a con-
vention among Python programmers who use the class. When a class author creates an
attribute with a single leading underscore, the author does not want users of the class to
access the attribute directly. If a program requires access to the attributes, the class author
provides some other means for doing so. In this case, we provide access methods through
which clients should manipulate the data.
Good Programming Practice 7.6
An attribute with a single leading underscore conveys information about a class’s interface.
Clients of a class that defines such attributes should access and modify the attributes’ values
only through the access methods that the class provides. Failing to do so often causes unex-
pected errors to occur during program execution. 7.6
Method setTime (lines 14–19) is the set method that clients should use to set all
values in an object’s time. This method receives as arguments values for attributes _hour,
_minute and _second. Methods setHour (lines 21–27), setMinute (lines 29–35)
and setSecond (lines 37–43) are set methods for the individual attributes. These
methods provide more flexibility to clients that modify the time.
pythonhtp1_07.fm Page 237 Saturday, December 8, 2001 2:29 PM
Method setHour (lines 21–27) changes an object’s _hour attribute. The method
checks whether the value passed as a parameter is in the range 0–23, inclusive. If the hour
is valid, the method updates attribute _hour with the new value. Otherwise, the method
raises an exception, to indicate that the client has attempted to place the object’s data in an
inconsistent state. An exception is a Python object that indicates a special event (most often
an error) has occurred. For example, when a program attempts to access a nonexistent dic-
tionary key, Python raises an exception.
When an exception is raised a program either can catch the exception and handle it; or
the exception can go uncaught, in which case the program prints an error message and ter-
minates immediately. Catching and handling an exception enables a program to recognize
and potentially fix errors that might otherwise cause a program to terminate. For example,
a client that uses class Time can catch an exception and detect that the program has
attempted to place data in an inconsistent state (i.e., set an invalid time). Catching and han-
dling exceptions is a broad topic that we discuss in detail in Chapter 12, Exception Han-
dling. For now, we discuss only how to raise an exception to indicate invalid data
assignments and prevent data corruption.
The statement in line 27 uses keyword raise to raise an exception. The keyword
raise is followed by the name of the exception, a comma and a value that the exception
object stores as an attribute. When Python executes a raise statement, an exception is
raised; if the exception is not caught, Python prints an error message that contains the name
of the exception and the exception’s attribute value, as shown in Fig. 7.8.
The remaining methods—setMinute (lines 29–35) and setSecond (lines 37–43)
change attributes _minute and _second, respectively. Each method ensures that the
values remain in the range 0–59, inclusive. If the values are invalid, the methods raise
exceptions and specify appropriate error-message arguments.
pythonhtp1_07.fm Page 238 Saturday, December 8, 2001 2:29 PM
Lines 45–58 contain the get methods for class Time. Clients use these methods
(getHour, getMinute and getSecond) to retrieve the values of an attributes _hour,
_minute and _second, respectively. The remainder of the class definition does not
differ from the previous definition we presented.
Software Engineering Observation 7.8
If a class provides access methods for its data, clients should use only access methods to re-
trieve/modify data. This “agreement” between class and client helps maintain data in a con-
sistent state. 7.8
Figure 7.9 contains a driver for modified class Time. A driver is a program that tests
a class’s interface. Lines 4–6 import class Time from module Time2 and create an
object of the class. Lines 9–12 call methods printMilitary and printStandard to
display the initial time values of the object.
13
14 # change time
15 time1.setTime( 13, 27, 6 )
16 print "\n\nMilitary time after setTime is",
17 time1.printMilitary()
18 print "\nStandard time after setTime is",
19 time1.printStandard()
20
21 time1.setHour( 4 )
22 time1.setMinute( 3 )
23 time1.setSecond( 34 )
24 print "\n\nMilitary time after setHour, setMinute, setSecond is",
25 time1.printMilitary()
26 print "\nStandard time after setHour, setMinute, setSecond is",
27 time1.printStandard()
Line 15 calls time1’s method setTime, passing values that correspond to 1:27:06
PM, to change the object’s time values. Lines 16–19 call the appropriate methods to display
the formatted times. The interactive session in Fig. 7.10 creates an object of class Time and
calls method setTime to attempt to place the object’s data in an inconsistent state. Each call
to method setTime contains an invalid value, and each call results in an error message.
>>>
>>> time1.setMinute( 99 )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "Time2.py", line 35, in setMinute
raise ValueError, "Invalid minute value: %d" % minute
ValueError: Invalid minute value: 99
>>>
>>> time1.setSecond( -99 )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "Time2.py", line 43, in setSecond
raise ValueError, "Invalid second value: %d" % second
ValueError: Invalid second value: -99
Fig. 7.10 set method called with invalid values. (Part 2 of 2.)
time1.hour = 25
To prevent such access, we prefix the name of the attribute with two underscore char-
acters (__). When Python encounters an attribute name that begins with two underscores,
the interpreter performs name mangling on the attribute, to prevent indiscriminate access
to the data. Name mangling changes the name of an attribute by including information
about the class to which the attribute belongs. For example, if the Time constructor con-
tained the line
self.__hour = 0
print private.publicData
pythonhtp1_07.fm Page 241 Saturday, December 8, 2001 2:29 PM
behaves as expected—Python prints the value of the public attribute. When we write the
statement
print private.__privateData
Python prints an error message which explains that class PrivateClass does not con-
tain an attribute called __privateData. We prefixed our attribute name with double un-
derscores, so Python changed the name of the attribute in the class definition.
However, we can still access the data, because we know that Python renames attribute
__privateData to attribute _PrivateClass__privateData. Therefore, the line
print private._PrivateClass__privateData
successfully prints the value assigned to the private attribute. The final two statements in
the session demonstrate that private data may be modified in the same way as public data.
pythonhtp1_07.fm Page 242 Saturday, December 8, 2001 2:29 PM
However, accessing and modifying private attributes in this manner violates the data en-
capsulation the class author intended. A client should never perform such a manipulation,
but instead should use any access methods the class provides.
Software Engineering Observation 7.11
Make private any data that the client should not access. 7.11
Python programmers use private attributes for different reasons. Some programmers
use private attributes to avoid common scoping problems that may arise in inheritance hier-
archies. [Note: We discuss inheritance in Chapter 9, Object-Oriented Programming: Inher-
itance.] Other programmers use private attributes for data or methods the client should
never access. These attributes or methods are essential to the inner workings of the class,
but are not part of the class’s interface. For example, a class author might designate a utility
method by prepending the method name with two underscores. In this chapter, we use pri-
vate attributes to demonstrate access methods and to introduce a basic data integrity tech-
nique. In the next chapter, we discuss other ways to ensure data integrity. The techniques
we discuss in the next chapter allow programmers to use public access syntax but also to
take advantage of the data integrity provided by access methods. This practice enables pro-
grammers to add data integrity to a project as the project grows and matures, without having
to change the interface on which the project’s clients have come to rely.
Fig. 7.13 Default constructor defined for class Time. (Part 1 of 3.)
pythonhtp1_07.fm Page 243 Saturday, December 8, 2001 2:29 PM
Fig. 7.13 Default constructor defined for class Time. (Part 2 of 3.)
pythonhtp1_07.fm Page 244 Saturday, December 8, 2001 2:29 PM
Fig. 7.13 Default constructor defined for class Time. (Part 3 of 3.)
In this example, the constructor invokes method setTime with the values passed to
the constructor (or the default values). The class uses private attributes to store data. As
with the previous definition of Time, setTime uses the class’s other methods, which
ensure that the value supplied for __hour is in the range 0–23 and that the values for
__minute and __second are each in the range 0–59. If a value is out of range, the
appropriate method raises an exception (this is an example of ensuring that a data member
remains in a consistent state).
The Time constructor could have included the same statements as method setTime.
This may be slightly more efficient because the extra call to setTime is eliminated.
Coding the Time constructor and method setTime identically, however, makes main-
taining this class more difficult. If the implementation of method setTime changes, the
implementation of the Time constructor should change accordingly. Instead, any changes
to the implementation of setTime need to be made only once, because the Time con-
structor calls setTime directly. This reduces the likelihood of a programming error when
altering the implementation.
Software Engineering Observation 7.12
If a method of a class already provides all or part of the functionality required by a construc-
tor (or other method) of the class, call that method from the constructor (or other method).
This simplifies the maintenance of the code and reduces the likelihood of an error if the im-
plementation of the code is modified. As a general rule: Avoid repeating code. 7.12
Figure 7.14 initializes four objects of class Time (defined in Fig. 7.13)—one with all
three arguments defaulted in the constructor call, one with one argument specified, one with
two arguments specified and one with three arguments specified. The values of each object’s
attributes after initialization are displayed by calling printTimeValues (lines 6–10).
If no constructor is defined for a class, the interpreter creates a default constructor (i.e.,
one that can be called with no arguments). However, the constructor that Python provides
pythonhtp1_07.fm Page 245 Saturday, December 8, 2001 2:29 PM
does not perform any initialization, so, when an object is created, the object is not guaran-
teed to be in a consistent state.
Constructed with:
7.6 Destructors
A constructor is a method that initializes a newly created object. Conversely, a destructor
executes when an object is destroyed (e.g., after no more references to the object exist). A
class can define a special method called __del__ that executes when the last reference to
an object is deleted or goes out of scope1. The method itself does not actually destroy the
object—it performs termination housekeeping before the interpreter reclaims the object’s
memory, so that memory may be reused. A destructor normally specifies no parameters
other than self and returns None.
We have not defined method __del__ for the classes presented to this point. In pro-
gramming languages such as C++, destructors often allocate and recycle memory. Python
handles most of these issues for the programmer, so __del__ normally is not included in
the class definition. Occasionally, a class defines __del__ to close a network or a data-
base connection before destroying an object. We discuss these issues throughout the text,
as appropriate. In the next section, we define method __del__ for a class, to maintain a
count of all objects of the class that have been created.
1. Actually, there are some cases in which __del__ does not execute immediately after the last ref-
erence to an object is deleted. However, in most cases, it is safe to assume that the method executes
when expected. See www.python.org/doc/current/ref/customization.html for
more information.
pythonhtp1_07.fm Page 247 Saturday, December 8, 2001 2:29 PM
Although class attributes may seem like global variables, each class attribute resides
in the namespace of the class in which it is created. Class attributes should be initialized
once (and only once) in the class definition. A class’s class attributes can be accessed
through any object of that class. A class’s class attributes also exist even when no object of
that class exists. To access a class attribute when no object of the class exists, prefix the
class name, followed by a period, to the name of the attribute.
Software Engineering Observation 7.13
A class’s class attributes can be used even if no objects of that class have been instantiated. 7.13
Class Employee (Fig. 7.15) demonstrates how to define a class attribute that main-
tains a count of the number of objects of the class that have been instantiated. The class
attribute count is initialized to 0 in the class definition (line 7). Notice that the creation of
class attribute count appears in the body of the class definition, not inside a method. The
statement has the effect of defining a new variable named count, with value 0, and adding
that variable to class Employee’s namespace.
Figure 7.16 access Employee’s class attribute. Class attribute count maintains a
count of the number of existing objects of class Employee and can be accessed whether
or not objects of class Employee exist. If no objects of the class exist, a program can ref-
erence count through the class name (line 7). Lines 10–11 create two Employee objects.
When each Employee object is created, its constructor is called. In the output, notice that
creating identifier employee3 (line 12) does not create a new object of class Employee
and therefore does not call Employee’s constructor. The statement simply binds a new
name to the object created in line 18, so that employee3 and employee1 refer to the
same object. Lines 18–20 use keyword del to delete all references to the two Employee
objects. Method __del__ for the object created in line 10 does not execute until the last
reference to that object is deleted in line 20.
Figure 7.17 uses a class Date (Fig. 7.17) a modified class Employee (Fig. 7.18) and
to demonstrate references to objects as members of other objects. Class Employee con-
tains attributes firstName, lastName, birthDate and hireDate. Attributes
birthDate and hireDate are objects of class Date, which contains attributes month,
day and year. The program (Fig. 7.19) instantiates an object of class Employee and ini-
tializes and displays its attributes.
In Fig. 7.18, the Employee constructor (lines 9–20) takes nine arguments—self,
firstName, lastName, birthMonth, birthDay, birthYear, hireMonth,
hireDay and hireYear—and creates objects of class Date from the last six argu-
ments. Arguments birthMonth, birthDay and birthYear are passed to object
birthDate’s constructor, and arguments hireMonth, hireDay and hireYear are
passed to object hireDate’s constructor. Class Date and class Employee each define
method __del__ to print a message when an object of class Date or an object of class
Employee is destroyed, respectively.
21
22 def __del__( self ):
23 """Called before Employee destruction"""
24
25 print "Employee object about to be destroyed: %s, %s" \
26 % ( self.lastName, self.firstName )
27
28 def display( self ):
29 """Prints employee information"""
30
31 print "%s, %s" % ( self.lastName, self.firstName )
32 print "Hired:",
33 self.hireDate.display()
34 print "Birth date:",
35 self.birthDate.display()
Jones, Bob
Hired: 3/12/1988
Birth date: 7/24/1949
ping the dish off the stack). Stacks are known as last-in, first-out (LIFO) data structures—the
last item pushed (inserted) on the stack is the first item popped (removed) from the stack.
Stacks can easily be implemented with lists, and in fact, Python lists contain methods
that programers can use to make lists “act” like stacks. (We also implement our own class
Stack in Chapter 22, Data Structures.) A client of a stack class need not be concerned with
the stack’s implementation. The client knows only that when data items are placed in the
stack, these items will be recalled in last-in, first-out order. The client cares about what
functionality a stack offers, but not about how that functionality is implemented. This con-
cept is referred to as data abstraction. Although programmers might know the details of a
class’s implementation, they should not write code that depends on these details. This
enables a particular class (such as one that implements a stack and its operations, push and
pop) to be replaced with another version without affecting the rest of the system. As long
as the services of the class do not change (i.e., every method still has the same name, returns
the same type of value and defines the same parameter list in the new class definition), the
rest of the system is not affected.
The job of a high-level language is to create a view convenient for programmers to use.
There is no single accepted standard view—that is one reason why there are so many pro-
gramming languages. Object-oriented programming in Python presents yet another view.
Most programming languages emphasize actions. In these languages, data exists to
support the actions that programs must take. Data is “less interesting” than actions. Data is
“crude.” Only a few built-in data types exist, and it is difficult for programmers to create
their own data types. The object-oriented style of programming in Python elevates the
importance of data. The primary activities of object-oriented programming in Python is the
creation of data types (i.e., classes) and the expression of the interactions among objects of
those data types. To create languages that emphasize data, the programming-languages
community needed to formalize some notions about data. The formalization we consider
here is the notion of abstract data types (ADTs). ADTs receive as much attention today as
structured programming did decades earlier. ADTs, however, do not replace structured pro-
gramming. Rather, they provide an additional formalization to improve the program-devel-
opment process.
Consider the built-in integer type, which most people would associate with an integer in
mathematics. Rather, the integer type is an abstract representation of an integer. Unlike math-
ematical integers, computer integers are fixed in size. For example, the integer type on some
computers is limited approximately to the range –2 billion to +2 billion. If the result of a cal-
culation falls outside this range, an error occurs, and the computer responds in some machine-
dependent manner. It might, for example, “quietly” produce an incorrect result. Mathematical
integers do not have this problem. Therefore, the notion of a computer integer is only an
approximation of the notion of a real-world integer. The same is true of the floating-point type
and other built-in types.
We have taken the notion of the integer type for granted until this point, but we now
consider it from a new perspective. Types like integer, floating-points, strings and others
are all examples of abstract data types. These types are representations of real-world
notions to some satisfactory level of precision within a computer system.
An ADT actually captures two notions: A data representation and the operations that
can be performed on that data. For example, in Python, an integer contains an integer value
(data) and provides addition, subtraction, multiplication, division and modulus operations;
pythonhtp1_07.fm Page 253 Saturday, December 8, 2001 2:29 PM
Another abstract data type we discuss is a queue, which is similar to a “waiting line.”
Computer systems use many queues internally. A queue offers well-understood behavior
to its clients: Clients place items in a queue one at a time via an enqueue operation, then get
those items back one at a time via a dequeue operation. A queue returns items in first-in,
first-out (FIFO) order, which means that the first item inserted in a queue is the first item
removed. Conceptually, a queue can become infinitely long, but real queues are finite.
The queue hides an internal data representation that keeps track of the items currently
waiting in line, and it offers a set of operations to its clients (enqueue and dequeue). The cli-
ents are not concerned about the implementation of the queue—clients simply depend upon
the queue to operate “as advertised.” When a client enqueues an item, the queue should accept
that item and place it in some kind of internal FIFO data structure. Similarly, when the client
wants the next item from the front of the queue, the queue should remove the item from its
internal representation and deliver the item in FIFO order (i.e., the item that has been in the
queue the longest should be the next one returned by the next dequeue operation).
The queue ADT guarantees the integrity of its internal data structure. Clients cannot
manipulate this data structure directly—only the queue ADT has access to its internal data.
Clients are able to perform only allowable operations on the data representation; the ADT
rejects operations that its interface does not provide. This could mean issuing an error mes-
sage, terminating execution, raising an exception (as discussed in Chapter 12, Exception
Handling) or simply ignoring the operation request.
rity, how all object attributes may be accessed directly by the client, how the single leading
underscore (_) indicates that clients should not access attributes and how the double
leading underscore (__) mangles an attribute’s name to prevent casual attribute access.
Python’s direct attribute access encourages rapid application development and facilitates
dynamic introspection; however, direct access is often insufficient for large-scale software
projects. In the next chapter, we discuss how class authors can ensure data integrity, while
still taking advantage of direct access syntax. This data integrity functionality can be added
to the class without changing the interface the client uses to access an object’s data. This
promotes both the safe, modular programming techniques and rapid development practices
that Python programmers desire.
SUMMARY
• Object-oriented programming (OOP) encapsulates (i.e., wraps) data (attributes) and functions (be-
haviors) into components called classes. The data and functions of a class are intimately tied together.
• A class is like a blueprint. Using a blueprint, a builder can build a house. Using a class, a program-
mer can create an object (also called an instance).
• Classes have a property called information hiding. Although objects may know how to communi-
cate with one another across well-defined interfaces, one object normally should not be allowed
to know how another object is implemented—implementation details are hidden within the objects
themselves.
• In procedural programming, the unit of programming is the function. In object-oriented program-
ming, the unit of programming is the class from which objects eventually are instantiated.
• Procedural programmers concentrate on writing functions. The verbs in a system specification
help the procedural programmer determine the set of functions that will work together to imple-
ment the system.
• Object-oriented programmers concentrate on creating their own user-defined types, called classes.
The nouns in a system specification help the object-oriented programmer determine the set of
classes that will be used to create the objects that will work together to implement the system.
• Classes simplify programming because the clients need to be concerned only with the operations
encapsulated or embedded in the object—the object interface.
• Keyword class begins a class definition. The keyword is followed by the name of the class,
which is followed by a colon (:). The line that contains keyword class and the class name is
called the class’s header.
• The body of the class is an indented code block that contains methods and attributes that belong
to the class.
• A class’s optional documentation string describes the class. If a class contains a documentation
string, the string must appear in the line or lines following the class header.
• Method __init__ is the constructor method of a class. A constructor is a special method that
executes each time an object of a class is created. The constructor initializes the attributes of the
object and returns None.
• All methods, including constructors, must specify at least one parameter—the object reference.
This parameter represents the object of the class for which the method is called. Methods must use
the object reference to access attributes and other methods that belong to the class.
• By convention, the object reference argument is called self.
• Each object has its own namespace that contains the object’s methods and attributes. The class’s
constructor starts with an empty object (self) and adds attributes to the object’s namespace.
pythonhtp1_07.fm Page 255 Saturday, December 8, 2001 2:29 PM
• Once a class has been defined, programs can create objects of that class. Programmers can create
objects as necessary. This is one reason why Python is said to be an extensible language.
• One of the fundamental principles of good software engineering is that a client should not need to
know how a class is implemented to use that class. Python’s use of modules facilitates this data
abstraction—a program can import a class definition and use the class without knowing how the
class is implemented.
• To create an object of a class, simply “call” the class name as if it were a function. This call invokes
the constructor for the class.
• Classes and objects of classes both have special attributes that can be manipulated. These at-
tributes, which Python creates when a class is defined or when an object of a class is created, pro-
vide information about the class or object of a class to which they belong.
• Directly accessing an object’s data can leave the data in an inconsistent state.
• Most object-oriented programming languages allow an object to prevent its clients from accessing
the object’s data directly. However, in Python, the programmer uses attribute naming conventions
to hide data from clients.
• Although a client can access an object’s data directly (and perhaps cause the data to enter an in-
consistent state), a programmer can design classes to encourage correct use. One technique is for
the class to provide access methods through which the data of the class can be read and written in
a carefully controlled manner.
• Predicate methods are read-only access methods that test the validity of a condition.
• When a class defines access methods, a client should access an object’s attributes only through
those access methods.
• Classes often provide methods that allow clients to set or get the values of attributes. Although
these methods need not be called set and get, they often are. Get methods also are called “query”
methods.
• A get method can control the formatting of the data. A set method can—and most likely should—
scrutinize attempts to modify the value of the attribute. This ensures that the new value is appro-
priate for that data item.
• A set method may specify that an error message—called an exception—be raised to the client
when the client attempts to assign an invalid value to an attribute.
• When a class author creates an attribute with a single leading underscore, the author does not want
users of the class to access the attribute directly. If a program requires access to the attributes, the
class author provides some other means for doing so.
• Python comparisons may be chained. The chaining syntax that enables programmers to write com-
parison expressions in familiar, arithmetic terms.
• When an exception is raised a program either can catch the exception and handle it; or the exception
can go uncaught, in which case the program prints an error message and terminates immediately.
• The keyword raise is followed by the name of the exception, a comma and a value that the ex-
ception object stores as an attribute. When Python executes a raise statement, an exception is
raised. If the exception is not caught, Python prints an error message that contains the name of the
exception and the exception’s attribute value.
• In programming languages such as C++ and Java, a class may state explicitly which attributes or
methods may be accessed by clients of the class. These attributes or methods are said to be public.
Attributes and methods that may not be accessed by clients of the class are said to be private.
• To prevent indiscriminate attribute access, prefix the name of the attribute with two underscore
characters (__).
pythonhtp1_07.fm Page 256 Saturday, December 8, 2001 2:29 PM
• When Python encounters an attribute name that begins with two underscores, the interpreter per-
forms name mangling on the attribute, to prevent indiscriminate access to the data. Name man-
gling changes the name of an attribute by including information about the class to which the
attribute belongs.
• Constructors can define default arguments that specify initial values for an object’s attributes, if
the client does not specify an argument at construction time.
• Constructors can define keyword arguments that enable the client to specify values for only cer-
tain, named arguments.
• Programmer-supplied constructors that default all their arguments (or explicitly require no argu-
ments) are also called default constructors
• If no constructor is defined for a class, the interpreter creates a default constructor. However, the
constructor that Python provides does not perform any initialization, so, when an object is created,
the object is not guaranteed to be in a consistent state.
• A destructor executes when an object is destroyed (e.g., after no more references to the object
exist).
• A class can define a special method called __del__ that executes when the last reference to an
object is deleted or goes out of scope. A destructor normally specifies no parameters other than
self and returns None.
• A class attribute represents “class-wide” information (i.e., a property of the class, not of a specific
object of the class).
• Although class attributes may seem like global variables, each class attribute resides in the
namespace of the class in which it is created. Class attributes should be initialized once (and only
once) in the class definition.
• A class’s class attributes can be accessed through any object of that class. A class’s class attributes
also exist even when no object of that class exists. To access a class attribute when no object of
the class exists, prefix the class name, followed by a period, to the name of the attribute.
• Sometimes, a programmer needs objects whose attributes are themselves references to objects of
other classes. Such a capability is called composition.
• Stacks are known as last-in, first-out (LIFO) data structures—the last item pushed (inserted) on
the stack is the first item popped (removed) from the stack.
• Types like integer, floating-points, strings and others are all examples of abstract data types. These
types are representations of real-world notions to some satisfactory level of precision within a
computer system.
• An ADT actually captures two notions: A data representation and the operations that can be per-
formed on that data. Python programmers use classes to implement abstract data types.
• A queue, is a “waiting line.” A queue offers well-understood behavior to its clients: Clients place
items in a queue one at a time via an enqueue operation, then get those items back one at a time
via a dequeue operation.
• A queue returns items in first-in, first-out (FIFO) order, which means that the first item inserted in
a queue is the first item removed.
• Python programmers concentrate both on crafting new classes and on reusing classes from the
standard library. This kind of software reusability speeds the development of powerful, high-qual-
ity software.
• The standard library enables Python developers to build applications faster by reusing preexisting,
extensively tested classes. In addition to reducing development time, standard library classes also
improve programmers’ abilities to deb
pythonhtp1_07.fm Page 257 Saturday, December 8, 2001 2:29 PM
TERMINOLOGY
abstract data type (ADT) get access method
access method inconsistent state
attribute information hiding
__bases__ attribute of a class __init__ method
behavior instantiate
built-in data type interface
class keyword last in, first out (LIFO)
__class__ attribute of an object member function
class body method
class instance object module
class library __module__ attribute of a class
class scope __name__ attribute of a class
composition name mangling
consistent state object
constructor object-oriented programming (OOP)
container class popping off a stack
data abstraction predicate method
data member private
data type public
data validation pushing onto a stack
__del__ method queue
del keyword rapid application development (RAD)
dequeue reference
destructor self
__dict__ attribute of a class set access method
__dict__ attribute of an object single underscore (_)
__doc__ attribute of a class software reuse
double underscore (__) stack
encapsulation structured programming
enqueue termination housekeeping
extensible language user-defined type
first in, first out (FIFO) utility method
SELF-REVIEW EXERCISES
7.1 Fill in the blanks in each of the following statements:
a) Object-oriented programming data and functions into .
b) Method is called the constructor.
c) Classes enable programmers to model objects that have (represented as data
members) and behaviors (represented as ).
d) A class’s methods are often referred to as in other object-oriented program-
ming languages.
e) A method tests the truth or falsity of a condition.
f) A is a variable shared by all objects of a class.
g) are known as last-in, first-out data structures.
h) A user of an object is referred to as a .
i) Python performs name mangling on attributes that begin with underscore(s).
j) Describing the functionality of a class independent of its implementation is called
.
pythonhtp1_07.fm Page 258 Saturday, December 8, 2001 2:29 PM
7.2 State whether each of the following is true or false. If false, explain why.
a) Object-oriented programming languages do not use functions to perform actions.
b) The parameter self must be the first item in a method’s argument list.
c) The class constructor returns an object of the class.
d) Programmer-defined and built-in modules are imported in the same way.
e) Constructors may specify keyword arguments and default arguments.
f) An attribute that begins with a single underscore is a private attribute.
g) The destructor is called when the keyword del is used on an object.
h) A shared class attribute should be initialized in the constructor.
i) When invoking an object’s method, a program does not need to pass a value that corre-
sponds to the object reference parameter.
j) Every class should have a __del__ method to reclaim an object’s memory.
EXERCISES
7.3 Create a class called Complex for performing arithmetic with complex numbers. Write a
driver program to test your class.
Complex numbers have the form
realPart + imaginaryPart * i
where i is
-1
Use floating-point numbers to represent the data of the class. Provide a constructor that enables an
object of this class to be initialized when it is created. The constructor should contain default values
in case no initializers are provided. Provide methods for each of the following:
a) Adding two ComplexNumbers: The real parts are added to form the real part of the re-
sult, and the imaginary parts are added to form the imaginary part of the result.
b) Subtracting two ComplexNumbers: The real part of the right operand is subtracted
from the real part of the left operand to form the real part of the result, and the imaginary
part of the right operand is subtracted from the imaginary part of the left operand to form
the imaginary part of the result.
c) Printing ComplexNumbers in the form (a, b), where a is the real part and b is the
imaginary part.
7.4 Create a class called RationalNumber for performing arithmetic with fractions. Write a
driver program to test your class.
pythonhtp1_07.fm Page 259 Saturday, December 8, 2001 2:29 PM
Use integer variables to represent the data of the class—the numerator and the denominator.
Provide a constructor that enables an object of this class to be initialized when it is declared. The
constructor should contain default values, in case no initializers are provided, and should store the
fraction in reduced form (i.e., the fraction
2---
4
would be stored in the object as 1 in the numerator and 2 in the denominator). Provide methods for
each of the following:
a) Adding two RationalNumbers. The result should be stored in reduced form.
b) Subtracting two RationalNumbers. The result should be stored in reduced form.
c) Multiplying two RationalNumbers. The result should be stored in reduced form.
d) Dividing two RationalNumbers. The result should be stored in reduced form.
e) Printing RationalNumbers in the form a/b, where a is the numerator and b is the
denominator.
f) Printing RationalNumbers in floating-point format.
7.5 Modify the Time class of Fig. 7.13 to include a tick method that increments the time stored
in a Time object by one second. The Time object should always remain in a consistent state. Write
a driver program that tests the tick method. Be sure to test the following cases:
a) Incrementing into the next minute.
b) Incrementing into the next hour.
c) Incrementing into the next day (i.e., 23:59:59 to 0:00:00).
7.6 Create a class Rectangle. The class has attributes __length and __width, each of
which defaults to 1. It has methods that calculate the perimeter and the area of the rectangle. It
has set and get methods for both __length and __width. The set methods should verify that
__length and __width are each floating-point numbers larger than 0.0 and less than 20.0. Write
a driver program to test the class.
7.7 Create a more sophisticated Rectangle class than the one you created in Exercise 7.6. This
class stores only the x-y coordinates of the upper left-hand and lower right-hand corners of the rect-
angle. The constructor calls a set function that accepts two tuples of coordinates and verifies that each
of these is in the first quadrant, with no single x or y coordinate larger than 20.0. Methods calculate
the length, width, perimeter and area. The length is the larger of the two dimensions. In-
clude a predicate method isSquare that determines whether the rectangle is a square. Write a driver
program to test the class.
7.8 Create a class TicTacToe that will enable you to write a complete program to play the
game of tic-tac-toe. The class contains a 3-by-3 double-subscripted list of letters. The constructor
should initialize the empty board to all zeros. Allow two human players. Wherever the first player
moves, place an "X" in the specified square; place an "O" wherever the second player moves. Each
move must be to an empty square. After each move, determine whether the game has been won and
whether the game is a draw. [Note: If you feel ambitious, modify your program so that the computer
makes the moves for one of the players automatically. Also, allow the player to choose whether to go
first or second.]
pythonhtp1_08.fm Page 260 Monday, December 10, 2001 6:49 PM
8
Customizing Classes
Objectives
• To understand how to write special methods that
customize a class.
• To be able to represent an object as a string.
• To use special methods to customize attribute access.
• To understand how to redefine (overload) operators to
work with new classes.
• To learn when to, and when not to, overload operators.
• To learn how to overload sequence operations.
• To learn how to overload mapping operations.
• To study interesting, customized classes.
The whole difference between construction and creation is
exactly this: that a thing constructed can only be loved after
it is constructed; but a thing created is loved before it exists.
Gilbert Keith Chesterton, Preface to Dickens, Pickwick
Papers
The die is cast.
Julius Caesar
Our doctor would never really operate unless it was
necessary. He was just that way. If he didn’t need the money,
he wouldn’t lay a hand on you.
Herb Shriner
pythonhtp1_08.fm Page 261 Monday, December 10, 2001 6:49 PM
Outline
8.1 Introduction
8.2 Customizing String Representation: Method __str__
8.3 Customizing Attribute Access
8.4 Operator Overloading
8.5 Restrictions on Operator Overloading
8.6 Overloading Unary Operators
8.7 Overloading Binary Operators
8.8 Overloading Built-in Functions
8.9 Converting Between Types
8.10 Case Study: A Rational Class
8.11 Overloading Sequence Operations
8.12 Case Study: A SingleList Class
8.13 Overloading Mapping Operations
8.14 Case Study: A SimpleDictionary Class
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
8.1 Introduction
In Chapter 7, we introduced the basics of Python classes and the notion of abstract data
types (ADTs). We discussed how methods __init__ and __del__ execute when an ob-
ject is created and destroyed, respectively. These methods are two examples of the many
special methods that a class may define. A special method is a method that has a special
meaning in Python; the Python interpreter calls one of an object’s special methods when
the client performs a certain operation on the object. For example, when a client creates an
object of a class, Python invokes the __init__ special method of that class.
A class author implements special methods to customize the behavior of the class. The
purpose of customization is to provide the clients of a class with a simple notation for
manipulating objects of the class. For example, in Chapter 7, manipulations on objects were
accomplished by sending messages (in the form of method calls) to the objects. This
method-call notation is cumbersome for certain kinds of classes, especially mathematical
classes. For such classes, it would be nice to use Python’s rich set of built-in operators and
statements to manipulate objects. In this chapter, we show how to define special methods
that enable Python’s operators to work with objects—a process called operator over-
loading. It is straightforward and natural to extend Python with these new capabilities.
Operator overloading also requires great care, because, when overloading is misused, it can
make a program difficult to understand.
Operator + has multiple purposes in Python, for example, integer addition and string con-
catenation. This is an example of operator overloading. The Python language itself overloads
operators + and *, among others. These operators perform differently to suit the context in
integer arithmetic, floating-point arithmetic, string manipulation and other operations.
pythonhtp1_08.fm Page 262 Monday, December 10, 2001 6:49 PM
Python enables the programmer to overload most operators to be sensitive to the con-
text in which they are used. The interpreter takes the appropriate action based on the
manner in which the operator is used. Some operators are overloaded frequently, especially
operators like + and -. The job performed by overloaded operators also can be performed
by explicit method calls, but operator notation is often clearer.
In this chapter, we discuss when to use operator overloading and when not to use it.
We show how to overload operators, and we present complete programs using overloaded
operators.
Customization provides other benefits, as well. A class may define special methods
that cause an object of the class to behave like a list or like a dictionary. A class also may
define special methods to control how a client accesses object attributes through the dot
access operator. In this chapter, we introduce the appropriate special methods and create
classes that implement them.
print objectOfClass
Python calls the object’s __str__ method and outputs the string returned by that method.
Figure 8.1 demonstrates how to define special method __str__ to handle data of a user-
defined telephone number class called PhoneNumber. This program assumes telephone
numbers are input correctly.
20 def test():
21
22 # obtain phone number from user
23 newNumber = raw_input(
24 "Enter phone number in the form (123) 456-7890:\n" )
25
26 phone = PhoneNumber( newNumber ) # create PhoneNumber object
27 print "The phone number is:",
28 print phone # invokes phone.__str__()
29
30 if __name__ == "__main__":
31 test()
Method __init__ (lines 7–12) accepts a string in the form "(xxx) xxx-xxxx",
where each x in the string is a digit in the phone number. The method slices the string and
stores the pieces of the phone number as attributes.
Method __str__ (lines 14–18) is a special method that constructs and returns a string
representation of an object of class PhoneNumber. When the interpreter encounters the
statement
print phone
print phone.__str__()
When a program passes a PhoneNumber object to built-in function str or when a pro-
gram uses a PhoneNumber object with the % string-formatting operator (e.g., "%s" %
phone), Python also calls method __str__.
Common Programming Error 8.1
Returning a non-string value from method __str__ is a fatal, runtime error. 8.1
Function test, (lines 20–28) requests a phone number from the user, creates a new
PhoneNumber object, and prints the string representation of the object. Recall that when a
module runs as a stand-alone program (i.e., the user invokes the Python interpreter on the
module), Python assigns the value "__main__" to the namespace’s name (stored in built-
in variable __name__). Line 31 calls function test, if PhoneNumber.py is executed
as a stand-alone program. This practice of defining a driver function and testing a module’s
namespace to execute the function is employed by many Python modules. The benefit of this
practice is that a module author can define different behaviors for the module, based on the
context in which the module is used. If another program imports the module, the value of
__name__ will be the module name (e.g., "PhoneNumber"), and the test function does
pythonhtp1_08.fm Page 264 Monday, December 10, 2001 6:49 PM
not execute. If the module is executed as a stand-alone program, the value of __name__ is
"__main__", and the test function executes. In Chapters 10 and 11, we create graphical
programs that use test functions to display the graphical components we define.
Good Programming Practice 8.1
Provide test functions for modules you create, when necessary. These functions help ensure
that the module works correctly, and they provide additional information to clients of the
class by demonstrating the ways in which a module’s operations may be performed. 8.1
Method Description
Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from TimeAccess import Time
>>> time1 = Time( 4, 27, 19 )
>>> print time1
04:27:19
>>> print time1.hour, time1.minute, time1.second
4 27 19
>>> time1.hour = 16
>>> print time1
16:27:19
>>> time1.second = 90
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "TimeAccess.py", line 30, in __setattr__
raise ValueError, "Invalid %s value: %d" % \
ValueError: Invalid second value: 90
self._hour = value
method __setattr__ would execute again, with the arguments "_hour" and value,
resulting in infinite recursion. Assigning a value through the object’s __dict__ attribute,
however, does not invoke method __setattr__, but simply inserts the appropriate key–
value pair in the object’s __dict__.
Common Programming Error 8.2
In method __setattr__, assigning a value to an object’s attribute through the dot access
operator results in infinite recursion. Use the object’s __dict__ instead. 8.2
Lines 25–31 of method __setattr__ perform similar tests for when the client
attempts to assign a value to attributes minute or second. If the specified value falls
within the appropriate range, the method assigns the value to the object’s attribute (either
_minute or _second). If the client attempts to assign a value to an attribute other than
hour, minute or second, line 33 assigns the value to the specified attribute name, to
preserve Python’s default behavior for adding attributes to an object.
pythonhtp1_08.fm Page 267 Monday, December 10, 2001 6:49 PM
Assigning a value to an object’s attribute, but mistakenly typing the wrong name for that at-
tribute is a logic error. Python adds a new attribute to the object’s namespace with the in-
correct name. 8.3
Lines 36–46 contain the definition for method __getattr__. When a client pro-
gram contains the expression
time1.attribute
as an rvalue (i.e., the right-hand value in an operator expression), Python first looks in
time1’s __dict__ attribute for the attribute name. If the attribute name is in
__dict__, Python simply returns the attribute’s value. If the attribute name is not in the
object’s __dict__, Python generates the call
time1.__getattr__( attribute )
where attribute is the name of the attribute that the client is attempting to access. The
method tests for whether the client is attempting to access hour, minute or second and,
if so, returns the value of the appropriate attribute. Otherwise the method raises an excep-
tion (line 46).
Software Engineering Observation 8.1
The __getattr__ definition for every class should raise the AttributeError excep-
tion if the attribute name cannot be found, to preserve Python’s default behavior for locating
nonexistent attributes. 8.1
The interactive session that follows the class definition in Fig. 8.3 demonstrates the
benefit of defining special methods __getattr__ and __setattr__. The client pro-
gram can access the attributes of an object of class Time in a transparent manner, through
the dot access operator. The interface to class Time appears identical to the interface we
presented in the first definition of the class in Chapter 7, but it has the advantage of main-
taining data in a consistent state. In Chapter 9, Inheritance, we discuss a similar tech-
nique—called properties—that enables class authors to specify a method that executes
when a client attempts to access or modify a particular attribute.
Software Engineering Observation 8.2
Designers of large systems that require strict access to data should use __getattr__ and
__setattr__ to ensure data integrity. Developers of large systems that use Python 2.2
can use properties, a more efficient technique to take advantage of the syntax allowed by
__getattr__ and __setattr__. 8.2
Although operator overloading may sound like an exotic capability, most program-
mers implicitly use overloaded operators regularly. For example, the addition operator (+)
operates quite differently on integers, floating-point numbers and strings. But addition nev-
ertheless works fine with variables of these types and other built-in types, because the addi-
tion operator (+) has been overloaded in the Python language itself.
Operators are overloaded by writing a method definition as you normally would, except
that the method name corresponds to the Python special method for that operator. For
example, the method name __add__ overloads the addition operator (+). To use an operator
on an object of a class, the class must overload (i.e., define a special method for) that operator.
Overloading is most appropriate for mathematical classes. These often require that a
substantial set of operators be overloaded to ensure consistency with the way these mathe-
matical classes are handled in the real world. For example, it would be unusual to overload,
for rational numbers, only addition, because other arithmetic operators also are used com-
monly with rational numbers.
Python is an operator-rich language. Python programmers who understand the
meaning and context of each operator are likely to make reasonable choices when it comes
to overloading operators for new classes.
Operator overloading provides the same concise expressions for user-defined classes
that Python provides with its rich collection of operators for built-in types. However, oper-
ator overloading is not automatic; the programmer must write operator overloading
methods to perform the desired operations.
Extreme misuses of overloading are possible, such as overloading operator + to per-
form subtraction-like operations or overloading operator - to perform multiplication-like
operations. Such non-intuitive uses of overloading make a program extremely difficult to
comprehend and should be avoided.
Good Programming Practice 8.4
Overload operators to perform the same function or similar functions on objects as the op-
erators perform on objects of built-in types. Avoid nonintuitive uses of operators. 8.4
1. Two operators cannot be overloaded: {} and lambda. [Note: lambda is a keyword that supports
functional programming—a technique that is beyond the scope of this book.]
pythonhtp1_08.fm Page 269 Monday, December 10, 2001 6:49 PM
+ - * ** / // % <<
>> & | ^ ~ < > <=
>= == != += -= *= **= /=
//= %= <<= >>= &= ^= |= []
() . ‘‘ in
The meaning of how an operator works on objects of built-in types cannot be changed
by operator overloading. The programmer cannot, for example, change the meaning of how
+ adds two integers. Operator overloading works only with objects of user-defined classes
or with a mixture of an object of a user-defined class and an object of a built-in type.
Overloading a binary mathematical operator (e.g., +, -, *) automatically overloads the
operator’s corresponding augmented assignment statement. For example, overloading an
addition operator to allow statements like
implies that the += augmented assignment statement also is overloaded to allow statements
such as
object2 += object1
Although (in this case) the programmer does not have to define a method to overload the
+= assignment statement, such behavior also can be achieved by defining the method ex-
plicitly for that class.
Performance Tip 8.1
Sometimes it is preferable to overload an augmented assignment version of an operator to
perform the operation "in place" (i.e., without using extra memory by creating a new object).
8.1
pythonhtp1_08.fm Page 270 Monday, December 10, 2001 6:49 PM
~object1
object1.__invert__()
The operand object1 is the object for which the Class method __invert__ is in-
voked. Figure 8.5 lists the unary operators and their corresponding special methods.
- __neg__
+ __pos__
~ __invert__
Binary operator/
statement Special method
+ __add__, __radd__
- __sub__, __rsub__
* __mul__, __rmul__
/ __div__, __rdiv__, __truediv__ (for Python 2.2),
__rtruediv__ (for Python 2.2)
// __floordiv__, __rfloordiv__ (for Python version 2.2)
% __mod__, __rmod__
** __pow__, __rpow__
<< __lshift__, __rlshift__
>> __rshift__, __rrshift__
& __and__, __rand__
^ __xor__, __rxor__
| __or__, __ror__
+= __iadd__
-= __isub__
*= __imul__
/= __idiv__, __itruediv__ (for Python version 2.2)
//= __ifloordiv__ (for Python version 2.2)
%= __imod__
**= __ipow__
<<= __ilshift__
>>= __irshift__
&= __iand__
^= __ixor__
|= __ior__
== __eq__
!+, <> __ne__
> __gt__
< __lt__
>= __ge__
<= __le__
Fig. 8.7 Common built-in functions and their corresponding special methods.
Method Description
Method Description
11
12 return x
13
14 class Rational:
15 """Representation of rational number"""
16
17 def __init__( self, top = 1, bottom = 1 ):
18 """Initializes Rational instance"""
19
20 # do not allow 0 denominator
21 if bottom == 0:
22 raise ZeroDivisionError, "Cannot have 0 denominator"
23
24 # assign attribute values
25 self.numerator = abs( top )
26 self.denominator = abs( bottom )
27 self.sign = ( top * bottom ) / ( self.numerator *
28 self.denominator )
29
30 self.simplify() # Rational represented in reduced form
31
32 # class interface method
33 def simplify( self ):
34 """Simplifies a Rational number"""
35
36 common = gcd( self.numerator, self.denominator )
37 self.numerator /= common
38 self.denominator /= common
39
40 # overloaded unary operator
41 def __neg__( self ):
42 """Overloaded negation operator"""
43
44 return Rational( -self.sign * self.numerator,
45 self.denominator )
46
47 # overloaded binary arithmetic operators
48 def __add__( self, other ):
49 """Overloaded addition operator"""
50
51 return Rational(
52 self.sign * self.numerator * other.denominator +
53 other.sign * other.numerator * self.denominator,
54 self.denominator * other.denominator )
55
56 def __sub__( self, other ):
57 """Overloaded subtraction operator"""
58
59 return self + ( -other )
60
61 def __mul__( self, other ):
62 """Overloaded multiplication operator"""
63
117
118 def __str__( self ):
119 """String representation"""
120
121 # determine sign display
122 if self.sign == -1:
123 signString = "-"
124 else:
125 signString = ""
126
127 if self.numerator == 0:
128 return "0"
129 elif self.denominator == 1:
130 return "%s%d" % ( signString, self.numerator )
131 else:
132 return "%s%d/%d" % \
133 ( signString, self.numerator, self.denominator )
134
135 # overloaded coercion capability
136 def __int__( self ):
137 """Overloaded integer representation"""
138
139 return self.sign * divmod( self.numerator,
140 self.denominator )[ 0 ]
141
142 def __float__( self ):
143 """Overloaded floating-point representation"""
144
145 return self.sign * float( self.numerator ) / self.denominator
146
147 def __coerce__( self, other ):
148 """Overloaded coercion. Can only coerce int to Rational"""
149
150 if type( other ) == type( 1 ):
151 return ( self, Rational( other ) )
152 else:
153 return None
rational1: 1
rational2: 1/3
rational3: -1/2
1 / 1/3 = 3
-1/2 - 1/3 = -5/6
1/3 * -1/2 - 1 = -7/6
rational1 after adding rational2 * rational3: 5/6
Method simplify (lines 33–38) reduces an object of class Rational. The method
first calls function gcd to determine the greatest common divisor of the object’s numerator
and denominator (line 36). The method then uses the greatest common divisor to simplify
the rational object (lines 37–38).
pythonhtp1_08.fm Page 278 Monday, December 10, 2001 6:49 PM
Method __neg__ (lines 41–45) overloads the unary negation operator. If rational
is an object of class Rational, when the interpreter encounters the expression
-rational
rational.__neg__()
which simply creates a new object of class Rational with the negated sign of the original
object.
Method __add__ (lines 48–54) overloads the addition operator. This method takes
two arguments—the object reference (self), and a reference to another object of class
Rational. If rational1 and rational2 are two objects of class Rational, when
the interpreter encounters the expression
rational1 + rational2
rational1.__add__( rational2 )
This method creates and returns a new object of class Rational that represents the results
of adding self to other. The numerator of this new value is computed with the expression
self.denominator * other.denominator
Method __sub__ (lines 56–59) overloads the binary subtraction operator. This
method uses the overloaded + and - operators to create and return the results of subtracting
the method’s second argument from the method’s first argument.
Method __mul__ (lines 61–66) overloads the binary multiplication operator. This
method creates and returns a new object of class Rational that represents the product of
the method’s two arguments.
Method __div__ (lines 68–73) overloads the binary division operator / and creates
and returns a new object of class Rational that represents the results of dividing the
method’s two arguments. Method __truediv__ (lines 75–79) overloads the binary divi-
sion operator / for Python versions 2.2 and greater that use floating-point division. This
method simply calls method __div__, because the / operator should perform the same
operation, regardless of the Python version. [Note: See Chapter 2, Introduction to Python
Programming, for more information on the difference in the / operator between Python
versions.]
Method __eq__ (lines 82–85) overloads the binary equality operator (==). If
rational1 and rational2 are two objects of class Rational, when the interpreter
encounters the expression
rational1 == rational2
pythonhtp1_08.fm Page 279 Monday, December 10, 2001 6:49 PM
which attempts to add an integer to an object of class Rational. This statement results in
the method call
rational.__add__( rational.__coerce__( 1 ) )
Special method __coerce__ should contain code that converts the object and the other
type to the same type and should return a tuple that contains the two converted values.
Method __coerce__ for class Rational converts only integer values. Line 150 deter-
mines whether the type of the method’s second argument is an integer. If so, the method
returns a tuple that contains the object reference argument and a new object of class Ra-
tional, created by passing the integer argument to Rational’s constructor. Python ex-
pects special method __coerce__ to return None if a coercion of the two types is not
possible; therefore, line 153 returns None if the method’s argument is not an integer.
The driver program (Fig. 8.10) creates objects of class Rational—rational1 is
initialized by default to 1/1, rational2 is initialized to 10/30 and rational3, which
is initialized to -7/14. The Rational constructor calls method simplify to reduce the
specified numerator and denominator. Thus, rational2 represents the value 1/3, and
rational3 represents the value -1/2.
The driver program outputs each of the constructed objects of class Rational, using
the print statement. Lines 17–21 demonstrate the results of using overloaded arithmetic
operators /, - and *. Lines 24–26 demonstrate that overloading the + addition operator
implicitly overloads the += assignment statement. The program uses the += augmented
assignment statement to add to rational1 the product of rational2 * rational3,
then prints the results. The driver then prints the results of comparing the objects of class
Rational through the overloaded comparison operators (lines 29–31). Line 34 prints the
absolute value of object rational3. Lines 38-40 tests Rational’s coercion capability
by printing the integer representation (invoking method __int__) and the floating-point
representation (invoking method __float__) and by adding an object of class
Rational and an integer (invoking method __coerce__).
Method Description
Method Description
52 if value in self.__list:
53 raise ValueError, \
54 "List already contains value %s" % str( value )
55
56 self.__list[ index ] = value
57
58 # overloaded equality operators
59 def __eq__( self, other ):
60 """Overloaded == operator"""
61
62 if len( self ) != len( other ):
63 return 0 # lists of different sizes
64
65 for i in range( 0, len( self ) ):
66
67 if self.__list[ i ] != other.__list[ i ]:
68 return 0 # lists are not equal
69
70 return 1 # lists are equal
71
72 def __ne__( self, other ):
73 """Overloaded != and <> operators"""
74
75 return not ( self == other )
Creating integers1...
List size: 8
Integer 1: 1
Integer 2: 2
Integer 3: 3
Integer 4: 4
Integer 5: 5
Integer 6: 6
Integer 7: 7
Integer 8: 8
Creating integers2...
List size: 10
Integer 1: 9
Integer 2: 10
Integer 3: 11
Integer 4: 12
Integer 5: 13
Integer 6: 14
Integer 7: 15
Integer 8: 16
Integer 9: 17
Integer 10: 18
The program (Fig. 8.13) begins by creating two objects of class SingleList (lines
18–22). This class’s constructor takes a list as an argument. To create this list, we call func-
tion getIntegers (lines 6–15). This function prompts the user to enter integers and
returns a list of these integers. Lines 25–26 use overloaded Python function len to deter-
mine the size of integers1 and use the print statement (which implicitly calls method
__str__) to confirm that the list elements were initialized correctly by the constructor.
Next, lines 29–30 output the size and contents of integers2.
Lines 35–41 test the overloaded equality operator (==) and inequality operator (!=) by
first evaluating the condition
integers1 != integers2
The program prints a message if the two objects are not equal (line 36). Similarly, line 41
prints a message if the two objects are identical.
Line 43 uses the overloaded subscript operator to refer to integers1[ 0 ]. This
subscripted name is used as an rvalue to print the value in integers1[ 0 ]. Line 45 uses
integers1[ 0 ] as an lvalue on the left side of an assignment statement to assign a new
value, 0, to element 0 of integers1.
Now that we have seen how this program operates, let us walk through the class’s
method definitions (Fig. 8.12). Lines 6–17 define the constructor for the class. The con-
structor initializes attribute _list to be the empty list. If the user specified a value for
parameter initialList, the constructor inserts all unique elements from initial-
List into _list.
Lines 20–36 define method __str__ for representing objects of class Integer-
List as a string. This method builds a string (tempString) by iterating over the ele-
ments in the list and formatting the elements in tabular format, with four elements in each
row. Line 36 returns the formatted string.
Lines 39–42 define method __len__, which overrides the Python len function.
When the interpreter encounters the expression
pythonhtp1_08.fm Page 286 Monday, December 10, 2001 6:49 PM
len( integers1 )
in the driver program, the interpreter generates the call
integers1.__len__()
This method simply returns the length of attribute __list.
Lines 44–56 define two overloaded subscript operators for the class. When the inter-
preter encounters the expression
integers1[ 0 ]
in the driver program, the interpreter invokes the appropriate method by generating the call
integers1.__getitem__( 0 )
to return the value of element 0 (e.g., line 43 in the driver program), or the call
integers1.__setitem__( 0, value )
to set the value of a list element (e.g., line 45 in the driver program). When the [] operator
is used in an rvalue expression, method __getitem__ is called; when the [] operator is
used in an lvalue expression, method __setitem__ is called.
Method __getitem__ (lines 44–47) simply returns the value of the appropriate ele-
ment. Method __setitem__ (lines 49–56) first ascertains whether the list already con-
tains the new element. If the list contains the new element, the method raises an exception;
otherwise, the method sets the new value. Because SingleList methods manipulate a
basic list, any out-of-range errors that apply to regular list data types apply to our Sin-
gleList type.
Lines 59–70 define the overloaded equality operator (==) for the class. When the inter-
preter encounters the expression
integers1 == integers2
the interpreter invokes the __eq__ method by generating the call
integers1.__eq__( integers2 )
The __eq__ method immediately returns 0 if the length of the lists are different (lines 62–
63). Otherwise, the method compares each pair of elements (lines 65–68). If they are all the
same, the method returns 1 (line 70). The first pair of elements to differ causes the method
to return 0 immediately (line 68). Line 72–75 define method __ne__ for testing whether
two NewLists are unequal. The method simply uses the overloaded == operator to deter-
mine whether the two objects are unequal.
Class SingleList defines only some of the methods suggested for sequences in
Fig. 8.11. The exercises contain instructions for implementing some of the remaining
methods.
number of key–value pairs) and can support the methods that dictionaries support. The ta-
ble in Fig. 8.14 contains some methods that a mapping class should provide. In the next sec-
tion, we show an example of a class that defines many of these methods, to provide a
dictionary interface to a basic object.
Method Description
Each method in the class (Fig. 8.15) simply calls the appropriate method for the object’s
__dict__ attribute. Method __getitem__ (lines 8–11) accepts a key argument that con-
tains the key value to retrieve from the dictionary. Line 11 simply uses the [] operator to
retrieve the specified key from the object’s __dict__. Method __setitem__ (lines 13–
16) accepts as arguments a key and a value. The method simply inserts or updates the key-
value pair in the object’s __dict__. Method __delitem__ (lines 18–21) executes when
the client uses keyword del to remove a key-value pair from the dictionary. The method
simply removes the key-value pair from the object’s __dict__. Method __str__ (lines
23–26) returns a string representation of an object of class SimpleDictionary by passing
the object’s __dict__ to built-in function str. Methods keys (lines 29–32), values
(lines 34–37) and items (lines 39—42) each return their appropriate value by calling the
corresponding method on the object’s __dict__.
The driver program (Fig. 8.16) creates one object of class SimpleDictionary and
uses the print statement to output the object’s value (lines 7–8). Lines 11–13 add new
values to the object with the [] operator, invoking method simple.__setitem__.
Line 16 uses keyword del to delete an element from the object, invoking method
object.__delitem__. Lines 20–22 call methods keys, values and items, to
print the key-value pairs that the object stores.
In this chapter, we introduced the concept of class customization, wherein a class
defines certain special methods to provide a syntax-based interface. These special methods
perform a wide variety of tasks in Python, including string representation, attribute access,
operator overloading and subscript access. We discussed the methods that provide each of
these behaviors, and implemented three case studies that demonstrated how these methods
can be used. In the next chapter, we discuss inheritance, a feature that allows programmers
pythonhtp1_08.fm Page 290 Monday, December 10, 2001 6:49 PM
to define new classes that take advantage of the attributes and behaviors of existing classes.
This ability is a key advantage of object-oriented programming, because it lets program-
mers focus only on the new behaviors a class should exhibit. For example, the technique
we employed in this chapter of implementing a dictionary interface by calling the methods
of an object’s underlying __dict__ attribute leads to some amount of redundant code.
With inheritance, we can define a class that “re-uses” the behaviors of the standard dictio-
nary type, without having to define every mapping method explicitly.
SUMMARY
• A special method is a method that has a special meaning in Python; the Python interpreter calls
one of an object’s special methods when the client performs a certain operation on the object.
• A class author implements special methods to customize the behavior of the class. The purpose of
customization is to provide the clients of a class with a simple notation for manipulating objects
of the class.
• Operator overloading consists of defining special methods to describe how operators behave with
objects of programmer-defined types.
• Python enables programmers to overload most operators to be sensitive to the context in which they
are used. The interpreter takes the action appropriate for the manner in which the operator is used.
• A Python class can define special method __str__, to provide an informal (i.e., human-read-
able) string representation of an object of the class. This method executes when a client uses an
object with the print statement, the % string formatting operator or built-in function str.
• Python provides three special methods—__getattr__, __setattr__ and
__delattr__—that a class can define to control how the dot access operator behaves on objects
of the class.
• If a class defines special method __setattr__, Python calls this method every time a program
makes an assignment to an object’s attribute through the dot operator.
• Assigning a value through the object’s __dict__ attribute does not invoke method
__setattr__, but simply inserts the appropriate key–value pair in the object’s __dict__.
• When a client program accesses an object attribute as an rvalue, Python first looks in the object’s
__dict__ attribute for the attribute name. If the attribute name is not in __dict__, Python in-
vokes the object’s __getattr__ method.
• The __getattr__ definition for every class should raise the AttributeError exception if
the attribute name cannot be found, to preserve Python’s default behavior for looking up nonex-
istent attributes.
• Although Python does not allow new operators to be created, it does allow most existing operators
to be overloaded so that, when these operators are used with objects of a programmer-defined type,
the operators have meaning appropriate to the new types.
• Operators are overloaded by writing a method definition as you normally would, except that the
method name corresponds to the Python special method for that operator. To use an operator on
an object of a class, the class must overload (i.e., define a special method for) that operator.
• Operator overloading is not automatic; the programmer must write operator-overloading methods
to perform the desired operations.
• The precedence of an operator cannot be changed by overloading.
• It is not possible to change the “arity” of an operator (i.e., the number of operands an operator
takes): Overloaded unary operators remain unary operators; overloaded binary operators remain
binary operators.
pythonhtp1_08.fm Page 291 Monday, December 10, 2001 6:49 PM
• The meaning of how an operator works on objects of built-in types cannot be changed by operator
overloading. Operator overloading works only with objects of user-defined classes or with a mix-
ture of an object of a user-defined class and an object of a built-in type.
• Overloading a binary mathematical operator automatically overloads the operator’s corresponding
augmented assignment statement, although the programmer can overload the augmented assign-
ment statement explicitly.
• A unary operator for a class is overloaded as a method that takes only the object reference argu-
ment (self).
• A binary operator or statement for a class is overloaded as a method with two arguments: self,
and other.
• A class also may define special methods that execute when certain built-in functions are called on
an object of the class.
• The interpreter knows how to perform certain conversions among built-in types. Programmers can
force conversions among built-in types by calling the appropriate function, such as int or float.
• The programmer must specify how conversions among user-defined classes and built-in types are
to occur. Such conversions can be performed with special methods that override the appropriate
Python functions.
• Method __truediv__ overloads the binary division operator / for Python versions 2.2 and
greater that use floating-point division.
• Method __coerce__ executes when a client calls built-in function coerce on an object of class
Rational and another object or when the client performs so-called “mixed-mode” arithmetic.
• Special method __coerce__ should contain code that converts the reference object and the other
type to the same type and should return a tuple that contains the two converted values. Python expects
special method __coerce__ to return None if a coercion of the two types is not possible.
• A class also can define several special methods to implement sequence operations, providing a list-
based interface to its clients.
• When a program accesses an element of a sequence- or dictionary-like object as an rvalue, the ob-
ject’s __getitem__ method executes. When a program assigns a value to an element of a se-
quence- or dictionary-like object, the object’s __setitem__ method executes.
• Python defines several special methods to provide a mapping-based interface to its clients.
TERMINOLOGY
__abs__ method (overloads built-in count
function abs) __delattr__ method (overloads
__add__ method (overloads operator +) attribute deletion)
__and__ method (overloads operator &) __delitem__ method (overloads
“arity” sequence/mapping element deletion)
append __div__ method (overloads operator /)
binary operator __divmod__ method (overloads built-in
clear function divmod)
__coerce__ method (overloads __float__ method (overloads built-in
coercion behavior) function float)
__complex__ method (overloads __floordiv__ method (overloads
built-in function complex) operator //)
__contains__ method (overloads get
operator in) __getattr__ method (overloads
copy attribute retrieval)
pythonhtp1_08.fm Page 292 Monday, December 10, 2001 6:49 PM
SELF-REVIEW EXERCISES
8.1 Fill in the blanks in each of the following statements:
a) Special methods , and customize attribute access
through the dot access operator.
b) Suppose a and b are integer variables and a program calculates the sum a + b. Now sup-
pose c and d are string variables and a program performs the concatenation c + d. The
pythonhtp1_08.fm Page 293 Monday, December 10, 2001 6:49 PM
two + operators here are clearly being used for different purposes. This is an example of
.
c) The method name overloads the + operator.
d) The , and of an operator cannot be changed by over-
loading.
e) The print statement implicitly invokes special method .
f) Special method __coerce__ should return if no coercion can be made.
g) Special method __ne__ overloads the .
h) Special method customizes the behavior of built-in function abs.
i) Special method overloads the exponentiation operator .
j) Special methods , and control attribute access
through the [] subscript operators for list- and dictionary-like types.
8.2 State whether each of the following is true or false. If false, explain why.
a) Customization is accomplished by implementing special methods.
b) Python allows the programmer to create new operators to overload.
c) Overloading a mathematical operator implicitly overloads its augmented assignment
counterpart.
d) User-defined objects can use Python’s implicit operator overloading to get the expected
results.
e) A class may overload the operation of the = assignment symbol.
f) Unary operators can be overloaded to accept two operands.
g) Operator overloading cannot change how an operator works with built-in types.
h) Comparison operators can be overloaded.
i) Subtraction can be overloaded with either special method __neg__ or __sub__.
j) A class must define special methods to provide a dictionary-like interface.
EXERCISES
8.3 The definition for class SimpleDictionary in Fig. 8.15 does not include all the methods
suggested for providing a dictionary interface. Review the list of mapping methods in Fig. 8.14, and
modify the definition for class SimpleDictionary to include definitions for methods clear,
copy, get, has_key and update. Each method of class SimpleDictionary should call at-
tribute __dict__’s corresponding method, passing any necessary arguments. Review the descrip-
tion of dictionary methods in Section 5.6—the corresponding methods of class
SimpleDictionary should specify the same arguments and should return the same value.
8.4 Implement methods append, count, index, insert, pop, remove, reverse and
sort for class SingleList. Review the description of list methods in Section 5.6—the corre-
pythonhtp1_08.fm Page 294 Monday, December 10, 2001 6:49 PM
sponding SingleList methods should specify the same arguments and should return the same val-
ue. Any new method that modifies the list should ensure that only unique values are inserted. The
method should raise an exception if the client attempts to insert an existing value. Also, implement
methods __delitem__ and __contains__ to enable clients to delete list elements with key-
word del or perform membership tests with keyword in.
8.5 Review the Rational class definition (Fig. 8.9) and driver (Fig. 8.10). What happens when
Python executes the following statement?
x = 1 + Rational( 3, 4 )
Special methods __radd__, __rsub__ and so on overload the mathematical operators for a
user-defined class when an object of that class is used as the right-hand value of an operator. For
each operator overloaded in Fig. 8.9 (i.e., operators +, -, *, / and //), add a corresponding method
for overloading the operator when a Rational appears to the right of that operator.
8.6 As class Rational is currently implemented, the client may modify the attributes (i.e.,
sign, numerator and denominator) and place the data in an inconsistent state. Modify the def-
inition for class Rational from Exercise 8.5 to include method __setitem__. If a client at-
tempts to change the numerator or denominator of an object of class Rational, __setitem__
determines whether the change affects the sign of the object. If so, the method changes the object’s
sign and sets the numerator or denominator as the absolute value of the client-specified value. The
method also should call method simplify to reduce the object. Beware: If __setitem__ assigns
a value to an attribute through the dot access operator, Python invokes __setitem__ again, result-
ing in infinite recursion. Make sure the method makes assignments through the object’s __dict__
attribute instead. [Note: Methods __init__ and simplify also must be updated to use the ob-
ject’s __dict__, to avoid infinite recursion].
8.7 Consider a class Complex that simulates the built-in complex data type. The class enables
operations on so-called complex numbers. These are numbers of the form realPart + imagi-
naryPart * i, where i has the value
–1
a) Modify the class to enable output of complex numbers in the form (realPart, imaginary-
Parti), through the overloaded __str__ method.
b) Overload the multiplication operator to enable multiplication of two complex numbers as
in algebra, using the equation
(a, bi) * (c, di) = (a*c - b*d, (a*d + b*c)i)
c) Overload the == operator to allow comparisons of complex numbers. [Note: (a, bi) is
equal to (c, di) if a is equal to c and b is equal to d.]
12
13 def __add__( self, other ):
14 """Returns the sum of two Complex instances"""
15
16 real = self.realPart + other.realPart
17 imaginary = self.imaginaryPart + other.imaginaryPart
18
19 # create and return new complexNumber
20 return Complex( real, imaginary )
21
22 def __sub__( self, other ):
23 """Returns the difference of two Complex instance"""
24
25 real = self.realPart - other.realPart
26 imaginary = self.imaginaryPart - other.imaginaryPart
27
28 # create and return new complexNumber
29 return Complex( real, imaginary )
9
Object-Oriented
Programming:
Inheritance
Objectives
• To create new classes by inheriting from existing
classes.
• To understand how inheritance promotes software
reusability.
• To understand the notions of base class and derived
class.
• To understand the concept of polymorphism.
• To learn about classes that inherit from base-class
object.
Say not you know another entirely, till you have divided an
inheritance with him.
Johann Kasper Lavater
This method is to define as the number of a class the class of
all classes similar to the given class.
Bertrand Russell
A deck of cards was built like the purest of hierarchies, with
every card a master to those below it, a lackey to those above
it.
Ely Culbertson
Good as it is to inherit a library, it is better to collect one.
Augustine Birrell
Save base authority from others’ books.
William Shakespeare, Love’s Labours Lost
pythonhtp1_09.fm Page 297 Friday, December 14, 2001 2:01 PM
Outline
9.1 Introduction
9.2 Inheritance: Base Classes and Derived Classes
9.3 Creating Base Classes and Derived Classes
9.4 Overriding Base-Class Methods in a Derived Class
9.5 Software Engineering with Inheritance
9.6 Composition vs. Inheritance
9.7 “Uses A” and “Knows A” Relationships
9.8 Case Study: Point, Circle, Cylinder
9.9 Abstract Base Classes and Concrete Classes
9.10 Case Study: Inheriting Interface and Implementation
9.11 Polymorphism
9.12 Classes and Python 2.2
9.12.1 Static Methods
9.12.2 Inheriting from Built-in Types
9.12.3 __getattribute__ Method
9.12.4 __slots__ Class Attribute
9.12.5 Properties
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
9.1 Introduction
In this chapter we discuss inheritance—one of the most important capabilities of object-
oriented programming. Inheritance is a form of software reusability in which new classes
are created from existing classes by absorbing their attributes and behaviors, and overriding
or embellishing these with capabilities the new classes require. Software reusability saves
time in program development. It encourages programmers to reuse proven and debugged
high-quality software, thus reducing problems after a system becomes functional. These are
exciting possibilities.
When creating a new class, instead of writing completely new attributes and methods,
the programmer can designate that the new class is to inherit the attributes and methods of
a previously defined base class. The new class is referred to as a derived class. Each
derived class itself can be a base class for some future derived class. With single inherit-
ance, a class is derived from one base class. With multiple inheritance, a derived class
inherits from several base classes. Single inheritance is straightforward—we show several
examples that should enable the reader to become proficient quickly. Multiple inheritance
is beyond the scope this edition—we do not show a live-code example and issue a strong
caution urging the reader to pursue further study before using this powerful capability.
Appendix O, Additional Python 2.2 Features, describes new Python 2.2 features that enable
the programmer to exercise more control over program execution when using multiple
inheritance in a more manner.
pythonhtp1_09.fm Page 298 Friday, December 14, 2001 2:01 PM
A derived class can add attributes and methods of its own, so an object of a derived
class can be larger than object of that derived-class’s base class. A derived class is more
specific than its base class and represents a smaller set of objects. With single inheritance,
the derived class starts out essentially the same as the base class. The real strength of inher-
itance comes from the ability to define in the derived class additions, replacements or
refinements for the features inherited from the base class.
With inheritance, every object of a derived class also may be treated as an object of
that derived class’s base class. We can take advantage of this “derived-class-object-is-a-
base-class-object” relationship to perform some interesting manipulations. For example,
we can thread a wide variety of different objects related through inheritance into a list
where each element of the list is treated as a base-class object. This allows a variety of
objects to be processed in a general way. As we will see, this capability—called polymor-
phism—is a key thrust of object-oriented programming (OOP).
With polymorphism, it is possible to design and implement systems that are more
easily extensible. Programs can be written to process generically—as base-class objects—
objects of all existing classes in a hierarchy. Classes that do not exist during program devel-
opment can be added with little or no modification to the generic part of the program—as
long as those classes are part of the hierarchy that is being processed generically. The only
parts of a program that need modification are those parts that require direct knowledge of
the particular class that is added to the hierarchy. Polymorphism enables us to write pro-
grams in a general fashion to handle many existing and yet-to-be-specified related classes.
Inheritance and polymorphism are effective techniques for managing software complexity.
Experience in building software systems indicates that significant portions of the code
deal with closely related special cases. It becomes difficult in such systems to see the “big
picture” because the designer and the programmer become preoccupied with the special
cases. Object-oriented programming provides several ways of “seeing the forest through
the trees”—a process called abstraction.
We distinguish between “is-a” relationships and “has-a” relationships. “Is a” is
inheritance. In an “is a” relationship, an object of a derived-class type may also be treated
as an object of the base-class type. “Has a” is composition (see Fig. 7.18). In a “has a” rela-
tionship, an object has references to one or more objects of other classes as members.
A derived class can access the attributes and methods of its base class. One problem
with inheritance is that a derived class can inherit method implementations that it does not
need to have or should expressly not have. When a base-class method implementation is
inappropriate for a derived class, that method can be overridden (i.e., redefined) in the
derived class with an appropriate implementation.
Perhaps most exciting is the notion that new classes can inherit from classes in existing
class libraries. Organizations develop their own class libraries and use other libraries avail-
able worldwide. Eventually, software will be constructed predominantly from standardized
reusable components just as hardware is often constructed today. This will help to meet the
challenges of developing the ever more powerful software we will need in the future.
Student GraduateStudent
UndergraduateStudent
Shape Circle
Triangle
Rectangle
Loan CarLoan
HomeImprovementLoan
MortgageLoan
Employee FacultyMember
StaffMember
Account CheckingAccount
SavingsAccount
CommunityMember
Shape
TwoDimensionalShape ThreeDimensionalShape
Fig. 9.4 Derived class inheriting from a base class. (Part 1 of 2.)
pythonhtp1_09.fm Page 302 Friday, December 14, 2001 2:01 PM
35
36 # demonstrate class relationships with built-in function issubclass
37 print "\nCircle is a subclass of Point:", \
38 issubclass( Circle, Point )
39 print "Point is a subclass of Circle:", issubclass( Point, Circle )
40
41 point = Point( 30, 50 ) # create Point object
42 circle = Circle( 120, 89, 2.7 ) # create Circle object
43
44 # demonstrate object relationship with built-in function isinstance
45 print "\ncircle is a Point object:", isinstance( circle, Point )
46 print "point is a Circle object:", isinstance( point, Circle )
47
48 # print Point and Circle objects
49 print "\npoint members:\n\t", point.__dict__
50 print "circle members:\n\t", circle.__dict__
51
52 print "\nArea of circle:", circle.area()
Point bases: ()
Circle bases: (<class __main__.Point at 0x00767250>,)
point members:
{'y': 50, 'x': 30}
circle members:
{'y': 89, 'x': 120, 'radius': 2.7000000000000002}
Fig. 9.4 Derived class inheriting from a base class. (Part 2 of 2.)
The constructor for class Point (lines 9–13) takes two arguments that correspond to
the point’s x- and y-coordinates. Class Circle (lines 15–28) inherits from class Point.
The parentheses (()) in the first line of the class definition indicate inheritance. The name
of the base class (Point) is placed inside the parentheses. Class Circle inherits all
attributes of class Point. This means that class Circle contains the Point members
(i.e., x and y) as well as the Circle members.
A derived class inherits the methods defined in its base class, including the base-class
constructor. Often, the derived class overrides the base-class constructor by defining a
derived-class __init__ method. A derived class overrides a base-class method when the
derived class defines a method with the same name as a base-class method. The overridden
derived-class constructor usually calls the base-class constructor, to initialize base-class
attributes before initializing derived-class attributes. Line 22 in the Circle constructor
calls the base-class constructor through an unbound method call. Until now, we have
invoked only bound method calls. A bound method call is invoked by accessing the method
pythonhtp1_09.fm Page 303 Friday, December 14, 2001 2:01 PM
name through an object (e.g., anObject.method()). We have seen that Python inserts
the object-reference argument for bound method calls. An unbound method call is invoked
by accessing the method through its class name and specifically passing an object refer-
ence. For example, line 22 calls method Point.__init__ and passes self (an object
of class Circle) as the object reference. The unbound method call also passes the values
for x and y so the Point constructor can initialize the Point attributes for the object of
class Circle. We explore method overriding and bound and unbound method calls fur-
ther in the next section. After the base-class constructor terminates, control returns to the
Circle constructor so it can perform any Circle-specific initialization. Line 23 adds a
new attribute—radius—to Circle’s namespace.
Software Engineering Observation 9.1
A derived class (like any class) is not required to define a constructor. If a derived class does
not define a constructor, the class’s base-class constructor executes when the client creates
a new object of the class. 9.1
Lines 25–28 define method area for class Circle. This method demonstrates how
the derived class can define new methods to extend the functionality of the base class. In
this example, derived class Circle provides extra functionality that computes the area of
an object of class Circle.
The driver program in Fig. 9.4 first prints the value of each class’s __bases__
attribute (lines 33–34). Recall from Chapter 7 that each class contains special attributes,
including __bases__, which is a tuple that contains references to each of the class’s base
classes. Notice from the output that Point.__bases__ is an empty tuple, because
Point does not inherit from any other class. However, Circle.__bases__ is a tuple
that contains one value—a reference to base-class Point. Lines 37–39 call built-in func-
tion issubclass to demonstrate that Circle is a subclass of Point, but that Point
is not a subclass of Circle. Function issubclass takes two arguments that are classes
and returns true if the first argument is a class that inherits from the second argument (or if
the first argument is the same class as the second argument).
Lines 41–42 create point as a reference to an object of class Point and circle as
a reference to an object of class Circle. Lines 45–46 demonstrate built-in function
isinstance. This function takes two arguments—an object and a class. If the object
argument is an object of the type specified by the class argument, or if the object argument
is an object of a derived class of the type specified by the class argument, function isin-
stance returns 1. Otherwise, the function returns 0. The two calls to function isin-
stance demonstrate that a derived class is an object of its base class (e.g., circle is a
Point), but the reverse is not true (e.g., point is not a Circle).
pythonhtp1_09.fm Page 304 Friday, December 14, 2001 2:01 PM
Lines 49–50 print the __dict__ attribute point and circle, respectively. Notice
from the output that circle’s __dict__ contains attributes x and y, initialized in the
base-class constructor. Line 52 calls circle method area, to demonstrate class
Circle’s extended functionality.
In this section, we demonstrated the mechanics of defining base and derived classes
and discussed bound and unbound methods. This material establishes the foundation we
need for our deeper treatment of object-oriented programming with inheritance in the
remainder of this chapter.
The HourlyWorker constructor uses an unbound method call to pass the strings
first and last to the Employee constructor so the base-class attributes can be initial-
ized, then initializes attributes hours and wage. Method getPay uses attributes hours
and wage to calculate the salary of the HourlyWorker.
HourlyWorker method __str__ overrides the Employee __str__ method.
Often, base-class methods are overridden in a derived class to provide more functionality.
The overridden method sometimes calls the base-class version of the method to perform
part of the new task. In this example, the derived-class __str__ method calls the base-
class __str__ method (with an unbound method call on line 40) to output the employee’s
name. The derived-class __str__ method also outputs the employee’s pay.
The driver program invokes an hourly object’s __str__ method in three different
ways. Line 47 simply uses the object in a print statement, which implicitly invokes the
object’s __str__ method. Line 48 makes an explicit, bound call to the object’s __str__
method. Line 49 makes an unbound call to class HourlyWorker’s __str__ method
and passes hourly as the object reference argument.
A base class specifies commonality—all classes derived from a base class inherit the
capabilities of that base class. In the object-oriented design process, the designer looks for
commonality and “factors it out” to form desirable base classes. Derived classes are then
customized beyond the capabilities inherited from the base class.
Software Engineering Observation 9.3
In an object-oriented system, classes often are closely related. “Factor out” common attributes
and behaviors and place these in a base class. Then use inheritance to form derived classes. 9.3
Note that reading a set of derived-class definitions can be confusing because inherited
members are not shown, but they are nevertheless present in the derived classes. A similar
problem can exist in the documentation of derived classes.
Software Engineering Observation 9.4
A derived class contains the attributes and behaviors of its base class. A derived class can
also contain additional attributes and behaviors. 9.4
classes. Although a person object is not a car and a person object does not contain a car, a
person object certainly uses a car. A program uses an object simply by calling a method of
that object through a reference.
An object can be aware of another object. Knowledge networks frequently have such
relationships. One object can contain a reference to another object to be aware of that
object. In this case, one object is said to have a knows a relationship with the other object;
this is sometimes called an association.
31
32 if __name__ == "__main__":
33 main()
X coordinate is: 72
Y coordinate is: 115
The new location of point is: ( 10, 10 )
Figure 9.7 demonstrates class Circle, which inherits from class Point (Fig. 9.6).
Lines 7–26 show the Circle class definition, and lines 29–45 contain the driver program
for class Circle. Note that, because class Circle inherits from class Point, the inter-
face to Circle includes the Point methods as well as the Circle method area.
36
37 # change circle attributes and print new values
38 circle.radius = 4.25
39 circle.x = 2
40 circle.y = 2
41
42 print "\nThe new location and radius of circle are:", circle
43 print "The area of circle is: %.2f" % circle.area()
44
45 print "\ncircle printed as a Point is:", Point.__str__( circle )
46
47 if __name__ == "__main__":
48 main()
X coordinate is: 37
Y coordinate is: 43
Radius is: 2.5
The driver program creates an object of class Circle, then prints the attributes of the
object. The driver program then changes the values of the object’s attributes and prints the
changed object. Line 43 calls circle method area to display the object’s area. Finally,
line 45 calls Point method __str__ as an unbound method and passes circle as the
object reference. This call prints the object of class Circle as an object of class Point,
demonstrating how a derived-class object can be used as a base-class object.
Our last example reuses the Point and Circle class definitions from Fig. 9.6 and
Fig. 9.7. Lines 8–32 show the Cylinder class definition, and lines 35–61 are the driver
program for class Cylinder. Note that class Cylinder inherits from class Circle, so
the interface to Cylinder includes the Circle methods and Point methods as well as
the Cylinder methods area (overridden from Circle) and volume. Note that the
Cylinder constructor invokes the constructor for its direct base class Circle, but not
its indirect base class Point. Each derived-class constructor is responsible only for calling
the constructors of that class’s immediate base class.
The driver program creates an object of class Cylinder (line 38), then prints the
values of the object’s attributes (lines 41–44). The driver program then changes the values
of the height, radius and coordinates of the cylinder (lines 47–49) and outputs the results of
the changes (lines 50–51). Finally, the program makes unbound method calls to the Point
and Circle __str__ methods (lines 57 and 61) to print the object of class Cylinder
as an object of classes Point and Circle, respectively.
This example nicely demonstrates inheritance. The reader should now be confident
with the basics of inheritance. In the remainder of the chapter, we show how to program
with inheritance hierarchies in a general manner.
pythonhtp1_09.fm Page 311 Friday, December 14, 2001 2:01 PM
54
55 # display the Cylinder as a Point
56 print "\ncylinder printed as a Point is:", \
57 Point.__str__( cylinder )
58
59 # display the Cylinder as a Circle
60 print "\ncylinder printed as a Circle is:", \
61 Circle.__str__( cylinder )
62
63 if __name__ == "__main__":
64 main()
X coordinate is: 12
Y coordinate is: 23
Radius is: 2.5
Height is: 5.7
classes for two-dimensional shapes such as circles and squares and concrete classes for three-
dimensional shapes such as spheres and cubes.
50
51 def __str__( self ):
52 """String representation of Boss"""
53
54 return "%17s: %s" % ( "Boss", Employee.__str__( self ) )
55
56 class CommissionWorker( Employee ):
57 """CommissionWorker class, inherits from Employee"""
58
59 def __init__( self, first, last, salary, commission, quantity ):
60 """CommissionWorker constructor, takes first and last names,
61 salary, commission and quantity"""
62
63 Employee.__init__( self, first, last )
64 self.salary = self._checkPositive( float( salary ) )
65 self.commission = self._checkPositive( float( commission ) )
66 self.quantity = self._checkPositive( quantity )
67
68 def earnings( self ):
69 """Compute the CommissionWorker's pay"""
70
71 return self.salary + self.commission * self.quantity
72
73 def __str__( self ):
74 """String representation of CommissionWorker"""
75
76 return "%17s: %s" % ( "Commission Worker",
77 Employee.__str__( self ) )
78
79 class PieceWorker( Employee ):
80 """PieceWorker class, inherits from Employee"""
81
82 def __init__( self, first, last, wage, quantity ):
83 """PieceWorker constructor, takes first and last names, wage
84 per piece and quantity"""
85
86 Employee.__init__( self, first, last )
87 self.wagePerPiece = self._checkPositive( float( wage ) )
88 self.quantity = self._checkPositive( quantity )
89
90 def earnings( self ):
91 """Compute PieceWorker's pay"""
92
93 return self.quantity * self.wagePerPiece
94
95 def __str__( self ):
96 """String representation of PieceWorker"""
97
98 return "%17s: %s" % ( "Piece Worker",
99 Employee.__str__( self) )
100
101 class HourlyWorker( Employee ):
102 """HourlyWorker class, inherits from Employee"""
103
104 def __init__( self, first, last, wage, hours ):
105 """HourlyWorker constructor, takes first and last names,
106 wage per hour and hours worked"""
107
108 Employee.__init__( self, first, last )
109 self.wage = self._checkPositive( float( wage ) )
110 self.hours = self._checkPositive( float( hours ) )
111
112 def earnings( self ):
113 """Compute HourlyWorker's pay"""
114
115 if self.hours <= 40:
116 return self.wage * self.hours
117 else:
118 return 40 * self.wage + ( self.hours - 40 ) *\
119 self.wage * 1.5
120
121 def __str__( self ):
122 """String representation of HourlyWorker"""
123
124 return "%17s: %s" % ( "Hourly Worker",
125 Employee.__str__( self ) )
126
127 # main program
128
129 # create list of Employees
130 employees = [ Boss( "John", "Smith", 800.00 ),
131 CommissionWorker( "Sue", "Jones", 200.0, 3.0, 150 ),
132 PieceWorker( "Bob", "Lewis", 2.5, 200 ),
133 HourlyWorker( "Karen", "Price", 13.75, 40 ) ]
134
135 # print Employee and compute earnings
136 for employee in employees:
137 print "%s earned $%.2f" % ( employee, employee.earnings() )
Class PieceWorker (lines 79–99) derives from class Employee. The methods
include a constructor (lines 82–88), the overridden earnings method (lines 90–93), and
an __str__ method (lines 95–99). The constructor takes a first name, a last name, a wage
per piece and a quantity of items produced as arguments and passes the first and last names
to the Employee constructor. Method earnings performs the PieceWorker-specific
earnings calculations. Method __str__ method creates a string with the type and name
of the employee.
pythonhtp1_09.fm Page 317 Friday, December 14, 2001 2:01 PM
Class HourlyWorker (lines 101–125) derives from class Employee. The methods
include a constructor (lines 104–110), the overridden earnings method (lines 112–119),
and an __str__ method (lines 121–125). The constructor takes a first name, a last name,
a wage and the number of hours worked as arguments and passes the first and last names
to the Employee constructor. Method earnings performs the HourlyWorker-spe-
cific earnings calculations.
The driver program is shown in lines 127–137. We create a list of four concrete objects
of class Employee—an object of class Boss, an object of class CommissionWorker,
an object of class PieceWorker and an object of class HourlyWorker. Lines 136–137
iterate over the list of objects of class Employee and call method earnings for each
object in the list. This technique—generically processing a list of objects of various
classes—is possible because of Python’s inherent polymorphic behavior, a topic we discuss
in the next section.
9.11 Polymorphism
Python enables polymorphism—the ability for objects of different classes related by inher-
itance to respond differently to the same message (i.e., method call). The same message
sent to many different types of objects takes on “many forms”—hence the term polymor-
phism. If, for example, class Rectangle is derived from class Quadrilateral, then
a Rectangle is a more specific version of a Quadrilateral. An operation (such as
calculating the perimeter or the area) that can be performed on an object of class Quadri-
lateral also can be performed on an object of class Rectangle. Python is inherently
polymorphic because the language is “dynamically typed.” This means that Python deter-
mines at runtime whether an object defines a method or contains an attribute. If so, Python
calls the appropriate method or accesses the appropriate attribute. Also, Python’s dynamic
typing enables programs to perform generic processing on objects of classes that are not
related by inheritance. If the objects in a list all provide the same operations (e.g., all the
objects define a certain method), then a program can process a list of those objects generi-
cally. The term polymorphism normally refers to the behavior of objects of classes related
by inheritance, so we discuss polymorphic behavior in the context of class hierarchies in
which all the classes in the hierarchy provide a common interface.
Consider the following example using the Employee base class and Hourly-
Worker derived class of Fig. 9.9. Our Employee base class and HourlyWorker
derived class each define their own __str__ methods. Calling the __str__ method
through an Employee reference invokes Employee.__str__, and calling the
__str__ method through an HourlyWorker reference invokes Hourly-
Worker.__str__. The base-class __str__ method also is available to the derived
class. To call the base-class __str__ method for a derived-class object, the method must
be called explicitly as follows
Employee.__str__( hourlyReference )
This specifies that the base-class __str__ should be called explicitly, using hourly-
Reference as the object reference argument.
Through polymorphism, one method call can cause different actions to occur
depending on the class of the object receiving the call. This gives the programmer tremen-
dous expressive capability.
pythonhtp1_09.fm Page 318 Friday, December 14, 2001 2:01 PM
54
55 print "\nRemoving one employee..."
56 del employeeList[ 0 ]
57
58 print "Employees are crowded?", answers[ Employee.isCrowded() ]
59
60 if __name__ == "__main__":
61 main()
Method __init__ (lines 18–23) takes two arguments that correspond to the
employee’s first and last name. The method also increments the value of Employee class
attribute numberOfEmployees. Method __del__ (lines 25–28) decrements the value
of Employee class attribute numberOfEmployees. Method __str__ (lines 30–33)
simply returns a string that contains the employee’s first and last name.
Static methods can be called either by using the class name in which the method is
defined or by using the name of an object of that class. Function main (lines 36–58) dem-
onstrates the ways in which a client program can call a static method. Variable answers
(line 37) is a list that contains the possible answers ("Yes" or "No") to the question, “Are
the employees crowded?” Line 43 calls static method isCrowded using the class name
(Employee). The method returns 0, because no objects of the class have been created.
Lines 48–53 contain a for loop that creates 11 objects of class Employee and adds each
object to list employeeList. For each object, the program calls static method
isCrowded using the newest object of that class. The program prints "Yes" in response
to the eleventh call to isCrowded, because the number of existing Employees (class
attribute numberOfEmployees) is greater than the maximum number that can work in
the office comfortably (class attribute maxEmployees). Line 56 deletes one of the
objects from employeeList, which invokes that object’s destructor. Line 58 calls static
method isCrowded once more to demonstrate that the number of employees has dropped
to an acceptable level.
pythonhtp1_09.fm Page 322 Friday, December 14, 2001 2:01 PM
Static methods are crucial in languages like Java which require the programmer to
place all program code in a class definition. In these languages, programmers often define
classes that contain only static utility methods. Clients of the class can then call the static
utility methods, much in the same way the Python programs invoke functions defined in a
module. In Python, static methods enable programmers to define a class interface more pre-
cisely. When a method of a class does not require an object of the class to perform its task,
the programmer designates that method as static.
display the behaviors of “new” classes. This definition for class SingleList differs from
our previous definition, because this definition does not maintain as an attribute an internal
list of values. SingleList is a list, so all methods of the class can treat the object ref-
erence as a list object—an extra attribute is not necessary. Class SingleList’s con-
structor (lines 7–14) first calls the base-class constructor, to initialize the list. If the client
passes an initial list value to the class’s constructor, line 14 calls SingleList method
merge (discussed shortly) to add unique values from the list argument to the empty list ini-
tialized by the base-class constructor.
Fig. 9.12 Inheriting from built-in type list—class SingleList. (Part 1 of 3.)
pythonhtp1_09.fm Page 324 Friday, December 14, 2001 2:01 PM
Fig. 9.12 Inheriting from built-in type list—class SingleList. (Part 2 of 3.)
pythonhtp1_09.fm Page 325 Friday, December 14, 2001 2:01 PM
Fig. 9.12 Inheriting from built-in type list—class SingleList. (Part 3 of 3.)
28
29 # remove values from list
30 popValue = single.pop()
31 print "\nRemoved", popValue, "from list:", single
32 single.append( popValue )
33 print "Added", popValue, "back to end of list:", single
34
35 # slice list
36 print "\nThe value of single[ 1:4 ] is:", single[ 1:4 ]
The list, after adding elements is: [1, 2, 3, 'hello', 4, 6, 9, 10, 20,
-1, -2, -3, 100]
Removed 100 from list: [1, 2, 3, 'hello', 4, 6, 9, 10, 20, -1, -2, -3]
Added 100 back to end of list: [1, 2, 3, 'hello', 4, 6, 9, 10, 20, -1,
-2, -3, 100]
Lines 36–44 overload the + operator for addition when a SingleList appears to the
left or right of the operator. Methods __add__ and __radd__ each return a new object
of class SingleList that is initialized with the elements of the two arguments passed to
either method. This operation has the same effect as merging two lists into one list of
unique values. Lines 46–53 overload the augmented assignment += symbol. The method
performs its operation in-place (i.e., on the object reference itself). For each value in the
right-hand operand, method __iadd__ calls SingleList method append, which
either inserts a new value at the end of the list or if the list already contains that value, raises
an exception. Python expects an overloaded, augmented-assignment method to return an
object of the class for which the method is defined, so line 53 returns the augmented object
reference. Lines 55–62 overload the multiplication operation (i.e., list repetition) for
objects of class SingleList. By definition, a SingleList cannot contain more than
one occurrence of any value, so method __mul__ raises an exception if the client attempts
such an operation. Line 62 binds the names for methods __rmul__ (right multiplication)
and __imul__ (augmented assignment multiplication) to the method defined for
__mul__; when clients invoke these operations, the corresponding methods also raise
exceptions.
Lines 65–88 define methods insert, append and extend for adding values to a
list. Methods insert and append first invoke utility method
_raiseIfNotUnique—to prevent the client from adding duplicate values to the list—
pythonhtp1_09.fm Page 327 Friday, December 14, 2001 2:01 PM
before invoking the base-class version of the corresponding method. Method extend uses
method append to add elements from another list to the reference object.
Method merge (lines 91–98) provides clients the ability to merge a SingleList
with another list that possibly contains duplicate values. Method merge provides the same
behavior that base-class list provides with method extend. However, method extend
in the derived class raises an exception if the client attempts to extend the SingleList
with a list that would insert duplicate values in the SingleList. By providing method
merge, we give clients a way to extend a SingleList without raising an exception. The
method adds only unique values to the SingleList, by calling list.append for
every unique value in the client-supplied list.
The driver program of Fig. 9.13 uses both SingleList-specific functionality and
functionality inherited from base-class list. Lines 6–7 create and print list dupli-
cates, which contains duplicate values. Line 9 creates an object of class SingleList,
which passes duplicates to the constructor. The new object—single—of class
SingleList contains one of each of the values from list duplicates. The remainder
of the driver program demonstrates SingleList’s capabilities. Line 10 prints single,
which implicitly invokes the object’s base-class __str__ method. Line 11 passes
single to function len, which calls the object’s base-class __len__ method to deter-
mine the number of elements in the list.
Lines 14–16 call single’s methods count and index to determine whether cer-
tain elements exist in the list and to locate an element in the list, respectively. Line 18 uses
keyword in, which implicitly invokes the base-class __contains__ method, to deter-
mine whether the list contains the integer element 4. Lines 22–25 call overridden Sin-
gleList methods to add elements to the list. Line 22 calls method append to add an
element to the list. Line 23 appends an element with symbol +=, which implicitly invokes
the object’s __iadd__ method. Line 24 calls method insert to insert the element
"hello" at index 3. Line 25 calls method extend to add elements from another list to
single. All these methods add unique elements to the list; if one of the method calls
attempted to add a duplicate value to the list, the method would raise an exception (as
shown in Fig. 9.14). The call to method merge in line 26 merges the values in single
with values from another list. Notice, from the output, that the effect of call in line 26 is
to add only the integer element 100, because this element is the only value that single
did not yet contain.
Lines 30–33 of Fig. 9.13 remove an element from the list, add the element back in to
the list and print the results. These statements demonstrate that the client can remove a
value from the list, using base-class method pop, and that reinserting the removed value
does not raise an exception. Line 36 demonstrates that class SingleList inherits slicing
capabilities from base-class list. This underscores the benefit of inheritance-based soft-
ware reuse. In the previous definition of class SingleList, we would have had to pro-
gram this capability explicitly. In this version, we simply inherit the capability from the
base class.
Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from NewList import SingleList
>>> single = SingleList( [ 1, 2, 3 ] )
>>>
>>> single.append( 1 )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "NewList.py", line 79, in append
self._raiseIfNotUnique( value )
File "NewList.py", line 22, in _raiseIfNotUnique
raise ValueError, \
ValueError: List already contains value 1
>>>
>>> single += [ 2 ]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "NewList.py", line 51, in __iadd__
self.append( value )
File "NewList.py", line 79, in append
self._raiseIfNotUnique( value )
File "NewList.py", line 22, in _raiseIfNotUnique
raise ValueError, \
ValueError: List already contains value 2
>>>
>>> single.insert( 0, 1 )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "NewList.py", line 70, in insert
self._raiseIfNotUnique( value )
File "NewList.py", line 22, in _raiseIfNotUnique
raise ValueError, \
ValueError: List already contains value 1
>>>
>>> single.extend( [ 3, 4 ] )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "NewList.py", line 88, in extend
self.append( value )
File "NewList.py", line 79, in append
self._raiseIfNotUnique( value )
File "NewList.py", line 22, in _raiseIfNotUnique
raise ValueError, \
ValueError: List already contains value 3
rect and indirect base classes. Classes that inherit from base-class object also can define
method __getattribute__, which executes for every attribute access. Figure 9.15 con-
tains a simple example. We define class DemostrateAccess (lines 4–29), which inherits
from base-class object and provides both __getattr__ and __getattribute__
methods. The constructor creates one attribute—value—and initializes it to 1.
pythonhtp1_09.fm Page 329 Friday, December 14, 2001 2:01 PM
Method __getattribute__ (lines 13–19) executes every time the client attempts
to access an object’s attribute through the dot (.) access operator. The method prints a line
indicating that the method is executing and a line that displays the name of the attribute that
the client is attempting to access. Line 19 returns the result of calling base-class method
__getattribute__, passing the specified attribute name. Method
__getattribute__ in a derived class must call the base-class version of the method to
retrieve an attribute’s value, because attempting to access the attribute’s value through the
object’s __dict__ would result in another call to __getattribute__.
Common Programming Error 9.5
To ensure proper attribute access, a derived-class version of method
__getattribute__ should call the base-class version of the method. Attempting to re-
turn the attribute’s value by accessing the object’s __dict__ causes infinite recursion.
9.5
Lines 21–29 define method __getattr__, which performs the same behavior as in
“classic” classes; namely, the method executes when the client attempts to access an
attribute that the object’s __dict__ does not contain. The method displays output that
indicates the method is executing and provides the name of the attribute that the client
attempted to access (lines 24–26). Lines 28–29 raise an exception to preserve Python’s
default behavior of raising an exception when a client accesses a nonexistent attribute.
Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from fig09_15 import DemonstrateAccess
>>> access = DemonstrateAccess()
>>>
>>> access.value
__getattribute__ executing...
Client attempt to access attribute: value
1
>>>
>>> access.novalue
__getattribute__ executing...
Client attempt to access attribute: novalue
__getattr__ executing...
Client attempt to access non-existent attribute: novalue
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "fig09_15.py", line 28, in __getattr__
raise AttributeError, "Object has no attribute %s" \
AttributeError: Object has no attribute novalue
The interactive session in the output box for Fig. 9.15 demonstrates when methods
__getattribute__ and __getattr__ execute. We first create an object of class
DemonstrateAccess, then access attribute value, using the dot access operator. The
output indicates that method __getattribute__ executes in response to the attribute
access; Python displays the return value (1) in the interactive session. Next, the program
accesses attribute novalue, a nonexistent attribute. Method __getattribute__ exe-
cutes first, because the method executes every time the client attempts to access an
attribute. When the base-class version of the method determines that the object does not
contain a novalue attribute, method __getattr__ executes. The method raises an
exception to indicate that the client has accessed a nonexistent attribute.
45
46 if __name__ == "__main__":
47 main()
The driver program (lines 28–44) demonstrates the difference between an object of a
class that defines __slots__ and an object of a class that does not define __slots__.
Lines 29–30 assign create objects of classes PointWithoutSlots and PointsWith-
Slots, respectively. The for loop in lines 32–44 iterates over each object and attempts
to replace the value of the object’s x attribute with a user-supplied value, obtained in line
36. Line 41 contains a logic error—the program intends to modify the value of the object’s
x attribute, but mistakenly creates an attribute called X and assigns the user-entered value
pythonhtp1_09.fm Page 333 Friday, December 14, 2001 2:01 PM
to the new attribute. For objects of class PointWithoutSlots (e.g., object noSlots),
line 41 executes without raising an exception, and line 44 prints the unchanged value of
attribute x. For objects of class PointWithSlots (e.g., slots), line 41 raises an excep-
tion, because the object’s __slots__ attribute does not contain the name "X".
The example in Fig. 9.16 demonstrates one benefit of defining the __slots__ attribute
for new classes, namely preventing accidental attribute creation. Programs that use new
classes also gain performance benefits, because Python knows in advance that programs
cannot add new attributes to an object; therefore, Python can store and manipulate the objects
in a more efficient manner. A disadvantage of __slots__ is that experienced Python pro-
grammers sometimes expect the ability to add object attributes dynamically. Defining
__slots__ can inhibit programmers’ abilities to create dynamic applications quickly.
9.12.5 Properties
Python’s new classes can contain properties that describe object attributes. A program ac-
cesses an object’s properties using object-attribute syntax. However, a class definition cre-
ates a property by specifying up to four components—a get method that executes when a
program accesses the property’s value, a set method that executes when a program sets the
property’s value, a delete method that executes when a program deletes the value (e.g., with
keyword del) and a docstring that describes the property. The get, set and delete methods
can perform the tasks that maintain an object’s data in a consistent state. Thus, properties
provide an additional way for programmers to control access to an object’s data.
Figure 9.17 redefines class Time—the class previously used to demonstrate attribute
access—to contain attributes hour, minute and second as properties. The constructor
(lines 7–12) creates private attributes __hour, __minute and __second. Typically,
classes that use properties define their attributes to be private, to hide the data from clients
of the class. The clients of the class then access the public properties of that class, which
get and set the values of the private attributes.
Method deleteValue (lines 20–23) raises an exception to prevent a client from
deleting an attribute. We use this method to create properties that the client cannot delete.
Each property (hour, minute and second) defines corresponding get and set methods.
Each get method takes only the object reference as an argument and returns the property’s
value. Each set method takes two arguments—the object-reference argument and the new
value for the property. Lines 25–32 define the set method (setHour) for the hour prop-
erty. If the new value is within the appropriate range, the method assigns the new value to
the property; otherwise, the method raises an exception. Method getHour (lines 34–37)
is the hour property’s get method, which simply returns the value of the corresponding
private attribute (__hour).
Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from TimeProperty import Time
>>>
>>> time1 = Time( 5, 27, 19 )
>>> print time1
05:27:19
>>> print time1.hour, time1.minute, time1.second
5 27 19
>>>
>>> time1.hour, time1.minute, time1.second = 16, 1, 59
>>> print time1
16:01:59
>>>
>>> time1.hour = 25
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "TimeProperty.py", line 31, in setHour
raise ValueError, \
ValueError: hour (25) must be in range 0-23, inclusive
>>>
>>> time1.minute = -3
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "TimeProperty.py", line 48, in setMinute
raise ValueError, \
ValueError: minute (-3) must be in range 0-59, inclusive
>>>
>>> time1.second = 99
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "TimeProperty.py", line 65, in setSecond
raise ValueError, \
ValueError: second (99) must be in range 0-59, inclusive
Built-in function property (line 40) takes as arguments a get method, a set method,
a delete method and a docstring and returns a property for the class. Line 40 creates the
hour property by passing to function property methods getHour, setHour and
deleteValue and the string "hour". Clients access properties, using the dot (.) access
operator. When the client uses a property as an rvalue, the property’s get method executes.
When the client uses the property as an lvalue, the property’s set method executes. When
the client deletes the property with keyword del, the property’s delete method executes.
The remainder of the class definition (lines 42–74) defines get and set methods for proper-
ties minute (created in line 57) and second (created in line 74).
Software Engineering Observation 9.12
Function property does not require that the caller pass all four arguments. Instead, the
caller can pass values for keyword arguments fget, fset, fdel and doc to specify the
property’s get, set and delete methods and the docstring, repsectively. 9.12
The interactive session in Fig. 9.17 highlights the benefits of properties. A client of the
class can access an object’s attributes, using the dot access operator, but the class author
also can ensure data integrity. Properties have added advantages over implementing
methods __setattr__, __getattr__ and __delattr__. For example, class
authors can state explicitly the attributes for which the client may use the dot access nota-
tion. Additionally, the class author can write separate get, set and delete methods for each
attribute, rather than using if/else logic to determine which attribute to access.
In this chapter, we discussed the mechanics of inheritance and how inheritance pro-
motes software reuse and data abstraction. We discussed two examples of inheritance—one
example of structural inheritance and one example of a class hierarchy headed by an
abstract base class. We also introduced new object-oriented-programming features avail-
able in Python 2.2. We continued our discussion of data integrity by presenting proper-
ties—a feature that allows clients of the class to access data with the dot access operator
and allows classes to maintain private data in a consistent state. Data hiding and data integ-
rity are fundamental object-oriented software design principles. The topics discussed in this
and the previous two chapters provide a solid foundation for programmers who want to
build large, industrial-strength software systems in Python.
SUMMARY
• Inheritance is a form of software reusability in which new classes are created from existing classes
by absorbing their attributes and behaviors and then overriding or embellishing these with capa-
bilities the new classes require.
• When creating a new class, instead of writing completely new attributes and methods, the pro-
grammer can designate that the new class is to inherit the attributes and methods of a previously
defined base class.
• The class that inherits from a base class is referred to as a derived class. Each derived class itself
becomes a candidate to be a base class for some future derived class.
• With single inheritance, a class is derived from one base class.
• With multiple inheritance, a derived class inherits from multiple (possibly unrelated) base classes.
Multiple inheritance can be complex and error prone.
• The real strength of inheritance comes from the ability to define in the derived class additions, re-
placements or refinements for the features inherited from the base class.
pythonhtp1_09.fm Page 337 Friday, December 14, 2001 2:01 PM
• With inheritance, every object of a derived class also may be treated as an object of that derived
class’s base class. However, the converse is not true—base-class objects are not objects of that
base class’s derived classes.
• With polymorphism, it is possible to design and implement systems that are more easily extensi-
ble. Programs can be written to process generically—as base-class objects—objects of all existing
classes in a hierarchy.
• Polymorphism enables us to write programs in a general fashion to handle a wide variety of exist-
ing and yet-to-be-specified related classes.
• Object-oriented programming provides several ways of “seeing the forest through the trees”—a
process called abstraction.
• “Is a” is inheritance. In an “is-a” relationship, an object of a derived-class type may also be treated
as an object of the base-class type.
• “Has a” is composition. In a “has-a” relationship, an object has references to one or more objects
of other classes as members.
• A derived class can access the attributes and methods of its base class. When a base-class member
implementation is inappropriate for a derived class, that member can be overridden (i.e., replaced)
in the derived class with an appropriate implementation.
• Inheritance forms tree-like hierarchical structures. A base class exists in a hierarchical relationship
with its derived classes.
• Function issubclass takes two arguments that are classes and returns true if the first argument
is a class that inherits from the second argument (or if the first argument is the same class as the
second argument)
• Python provides a built-in function—isinstance—that determines whether an object is an ob-
ject of a given class or of a subclass of that class.
• Parentheses, (), in the first line of the class definition indicates inheritance. The name of the base
class (or base classes) is placed inside the parentheses.
• A direct base class of a derived class is explicitly listed inside parentheses when the derived class
is defined.
• An indirect base class is not explicitly listed when the derived class is defined; rather the indirect
base class is inherited from two or more levels up the class hierarchy.
• To initialize an object of a derived class, the derived-class constructor must call the base-class con-
structor.
• A bound method call is invoked by accessing the method name through an object. Python auto-
matically inserts the object reference argument for bound method calls.
• An unbound method call is invoked by accessing the method through its class name then specifi-
cally passing an object.
• A class’s __bases__ attribute is a tuple that contains references to each of the class’s base classes.
• A derived class can override a base-class method by supplying a new version of that method with
the same name. When that method is mentioned by name in the derived class, the derived-class
version is automatically selected.
• A base class specifies commonality. In the object-oriented design process, the designer looks for
commonality and “factors it out” to form base classes. Derived classes are then customized beyond
the capabilities inherited from the base class.
• A program uses an object if the program simply calls a method of that object through a reference.
• An object is said to have a knows a relationship with a second object if the first object is aware of
(i.e., has a reference to) the second object. This is sometimes called an association.
pythonhtp1_09.fm Page 338 Friday, December 14, 2001 2:01 PM
• There are cases in which it is useful to define classes for which the programmer never intends to
create any objects. Such classes are called abstract classes.
• The sole purpose of an abstract class is to provide an appropriate base class from which classes
may inherit interface and possibly implementation. Classes from which objects can be created are
called concrete classes.
• Python does not provide a way to designate an abstract class. However, the programmer can im-
plement an abstract class by raising an exception in the class’s __init__ method.
• Python is inherently polymorphic because the language is dynamically typed. This means that Py-
thon determines at runtime whether an object defines a method or contains an attribute and, if so,
calls the appropriate method or accesses the appropriate attribute.
• Using polymorphism, one method call can cause different actions to occur depending on the class
of the object receiving the call. This gives the programmer tremendous expressive capability.
• Beginning with Python 2.2, the nature and behavior of classes will change. In all future 2.x releas-
es, a programmer can distinguish between two kinds of classes: “classic” classes and “new” class-
es. In Python 3.0, all classes will behave like “new” classes.
• Python 2.2 provides type object for defining “new” classes. Any class that inherits from ob-
ject exhibits the new-class behaviors.
• “New” classes can define static methods. A static method can be called by a client of the class,
even if no objects of the class exist.
• A class designates a method as static by passing the method’s name to built-in function static-
method and binding a name to the value returned from the function call.
• Static methods differ from regular methods in that when a program calls a static method, Python
does not pass the object reference argument to the method. Therefore, a static method does not
specify self as the first argument.
• The goal of the new class behavior is to remove the dichotomy that existed between Python types
and classes before version 2.2. The most practical use of this type-class unification is that program-
mers now can inherit from Python’s built-in types.
• Classes that inherit from base-class object also can define method __getattribute__,
which executes for every attribute access.
• Method __getattribute__ in a derived class must call the base-class version of the method
to retrieve an object’s attribute; otherwise, infinite recursion occurs.
• Python 2.2 allows “new” classes to define a __slots__ attribute listing the attributes that ob-
jects of the class are allowed to have.
• When a “new” class defines the __slots__ attribute, objects of the class can assign values only
to attributes whose names appear in the __slots__ list. If a client attempts to assign a value to
an attribute whose name does not appear in __slots__, Python raises an exception.
• ““New” classes can contain properties that describe object attributes. A program accesses an ob-
ject’s properties in the same manner as accessing the object’s attributes.
• A class definition creates a property by specifying four components—a get method, a set method,
a delete method and a docstring that describes the property. The get, set and delete methods can
perform any tasks necessary for maintaining data in a consistent state.
• Classes that use properties most often define their attributes to be private, to hide the data from
clients of the class. The clients of the class then access the public properties of that class, which
get and set the values of the private attributes.
• Built-in function property takes as arguments a get method, a set method, a delete method and
a docstring and returns a property for the class.
pythonhtp1_09.fm Page 339 Friday, December 14, 2001 2:01 PM
TERMINOLOGY
__bases__ attribute of a class int type
__getattribute__ method isinstance function
__slots__ attribute of a class issubclass function
“has-a” relationship list type
“is-a” relationship long type
“knows-a” relationship multiple inheritance
“uses-a” relationship NotImplementedError exception
abstract class object base class
abstract method object type
abstraction overriding a method
association polymorphism
base class property
bound method call property function
class library reusability
complex type single inheritance
composition standardized reusable components
concrete class static method
derived class staticmethod function
dict type str type
direct base class structural inheritance
extensible subclass
file type superclass
float type tuple type
indirect base class unbound method call
inherit unicode type
inheritance
SELF-REVIEW EXERCISES
9.1 Fill in the blanks in each of the following:
a) With , a class is derived from several base classes.
b) In other object-oriented programming languages, like Java, the base class is called the
and the derived class is the .
c) A has-a relationship creates new classes by of existing classes.
d) When an object has a knows a relationship with another object, this is an .
e) A base class exists in a relationship with its derived classes.
f) in the first line of a class definition are used to indicate inheritance.
g) An is inherited from two or more levels up the class hierarchy.
h) A base class specifies —all classes derived from a base class inherit the ca-
pabilities of that base class.
i) are classes for which the programmer never intends to create objects.
j) A method does not require an object of the class to perform its operation.
9.2 State whether each of the following is true or false. If false, explain why.
a) The derived class inherits all the attributes and methods of its base class.
b) A derived class must define a constructor that calls the base class’s constructor.
c) All base classes of a derived class are explicitly listed inside parentheses when the de-
rived class is defined.
d) To use an object of another class, a class must inherit from that class.
pythonhtp1_09.fm Page 340 Friday, December 14, 2001 2:01 PM
EXERCISES
9.3 Study the inheritance hierarchy of Fig. 9.2. For each class, indicate some common attributes
and behaviors consistent with the hierarchy. Add some other classes (e.g., UndergraduateStu-
dent, GraduateStudent, Freshman, Sophomore, Junior, Senior, etc.) to enrich the hi-
erarchy.
9.4 Consider the class Bicycle. Given your knowledge of some common components of bicy-
cles, show a class hierarchy in which the class Bicycle inherits from other classes, which, in turn,
inherit from yet other classes. Discuss the creation of various objects of class Bicycle. Discuss in-
heritance from class Bicycle for other closely related derived classes.
9.5 Many programs written with inheritance could be solved with composition instead, and vice
versa. Discuss the relative merits of these approaches in the context of the Point, Circle, Cyl-
inder class hierarchy in this chapter. Rewrite the classes in Figs. 8.6–8.8 (and the supporting pro-
grams) to use composition rather than inheritance. After you do this, reassess the relative merits of
the two approaches both for the Point, Circle, Cylinder problem and for object-oriented pro-
grams in general.
9.6 Write an inheritance hierarchy for class Quadrilateral, Trapezoid, Parallelo-
gram, Rectangle and Square. Use Quadrilateral as the base class of the hierarchy. Make
the hierarchy as deep (i.e., as many levels) as possible. The data of Quadrilateral should be the
(x, y) coordinate pairs for the four endpoints of the Quadrilateral. Write a driver program that
creates and displays objects of each of these classes.
9.7 Write a function that prints a class hierarchy. The function should take one argument that is
an object of a class. The function should determine the class of that object and all direct and indirect
base classes of the object. [Note: For simplicity, assume each class in the hierarchy uses only single
inheritance.] The function prints each class name on a separate line. The first line contains the top-
most class in the hierarchy, and each level in the hierarchy is indented by three spaces. For example,
the output for the function, when passed an object of class Cylinder from Fig. 9.8, should be:
pythonhtp1_09.fm Page 341 Friday, December 14, 2001 2:01 PM
Point
Circle
Cylinder
9.8 Create a class Date that has data members for the day, the month and the year. Modify the
payroll system of Fig. 9.9 to add data members birthDate (an object of class Date) and
departmentCode (a number) to class Employee. Assume this payroll is processed once per
month. Then, as your program calculates the payroll for each Employee, add a $100.00 bonus to the
person’s payroll amount if this is the month in which the Employee’s birthday occurs.
pythonhtp1_10.fm Page 342 Friday, December 14, 2001 2:02 PM
10
Graphical User Interface
Components: Part 1
Objectives
• To understand the design principles of graphical user
interfaces.
• To use the Tkinter module to build graphical user
interfaces.
• To create and manipulate labels, text fields, buttons,
check boxes and radio buttons.
• To learn to use mouse events and keyboard events.
• To understand and use layout managers.
… the wisest prophets make sure of the event first.
Horace Walpole
Do you think I can listen all day to such stuff?
Lewis Carroll
Speak the affirmative; emphasize your choice by utter
ignoring of all that you reject.
Ralph Waldo Emerson
You pays your money and you takes your choice.
Punch
Guess if you can, choose if you dare.
Pierre Corneille
All hope abandon, ye who enter here!
Dante Alighieri
Exit, pursued by a bear.
William Shakespeare
pythonhtp1_10.fm Page 343 Friday, December 14, 2001 2:02 PM
Outline
10.1 Introduction
10.2 Tkinter Overview
10.3 Simple Tkinter Example: Label Component
10.4 Event Handling Model
10.5 Entry Component
10.6 Button Component
10.7 Checkbutton and Radiobutton Components
10.8 Mouse Event Handling
10.9 Keyboard Event Handling
10.10 Layout Managers
10.10.1 Pack
10.10.2 Grid
10.10.3 Place
10.11 Card Shuffling and Dealing Simulation
10.12 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
10.1 Introduction
A graphical user interface (GUI) allows a user to interact with a program. A GUI (pro-
nounced “GOO-eE”) gives a program a distinctive “look” and “feel.” Providing different
programs with a consistent set of intuitive interface components provides users with a basic
level of familiarity with GUI programs before they ever use them. In turn, this reduces the
time users require to learn programs and increases their ability to use the programs in a pro-
ductive manner.
Look-and-Feel Observation 10.1
Consistent user interfaces enable users to learn new applications faster. 10.1
GUIs are built from GUI components (called widgets—shorthand for window gad-
gets). A GUI component is an object with which a user interacts via a mouse or a keyboard.
Figure 10.1 contains an example of a GUI, an Internet Explorer window with some of its
GUI components labeled. There is a menu bar containing such menus as File, Edit and
View. Below the menu bar is a set of buttons (e.g., Back, Search, and History), each of
which has a defined task in Internet Explorer. Below the buttons is a text field in which a
user can type a Web site address. To the left of the text field is a label (i.e., Address) that
indicates the purpose of the text field. The menus, buttons, text fields and labels are part of
the Internet Explorer GUI. These components enable a user to interact with the Internet
Explorer program by just pointing with a mouse and clicking an element.
pythonhtp1_10.fm Page 344 Friday, December 14, 2001 2:02 PM
Python programmers can construct GUIs by using the Tool Command Language
(TCL) program and its graphic interface development tool, Tool Kit (Tk). (Information
about this scripting language and its components can be found at www.scrip-
tics.com.) Figure 10.2 lists several common GUI components found in Tk. This chapter
and the next discuss these and other GUI components in detail.
Component Description
Frame
Label
Widget
Entry
Text
Button
Checkbutton
Radiobutton
Menu
Canvas
Scale
Listbox
Key
Subclass name Scrollbar
1. The Tkinter module is portable across many platforms. Some platforms, however,
need to have Tcl/Tk and Tkinter installed. The Deitel & Associates, Inc. Web site,
www.deitel.com, contains installation instructions for various platforms.
pythonhtp1_10.fm Page 346 Friday, December 14, 2001 2:02 PM
should not be confused with the relationship between a base class and a derived class. A
program builds a GUI from the top-level component by creating new components and
placing each new component in the parent component.
Each program in this chapter implements a GUI by inheriting from Widget’s subclass
Frame. In our programs, Frame will serve as the top-level component to which children
are added to extend the GUI’s functionality. This inheritance enables the reuse of compo-
nents in other GUI programs and promotes object-orientation.
Portability Tip 10.1
The Tkinter module can design graphical user interfaces for Unix, Macintosh and Win-
dows platforms. 10.1
33 def main():
34 LabelDemo().mainloop() # starts event loop
35
36 if __name__ == "__main__":
37 main()
Every child component has an attribute called master that references the child’s
parent component. Line 16 accesses the LabelDemo’s parent (top-level) component and
calls method title to change the title of the GUI to Labels, which then appears in the
GUI title bar.
Line 18 creates a Label object. Each GUI component’s class constructor takes a first
argument that corresponds to the new object’s parent. In this case, self is the first argu-
ment, indicating that the Label is a child of the LabelDemo component. The value of
keyword argument text indicates the contents of the Label component. Method pack
(line 21) inserts Label1 into the GUI, using the default settings. By default, Label1
occupies the top of the window.
Lines 23–24 create a second Label component. Line 27 calls method pack and
passes a value for keyword argument side, which describes where the new component is
placed. Value LEFT indicates that Label2 appears against the left side of the window.
Other possible values for the side option are BOTTOM, RIGHT and TOP (the default set-
ting). These options also determine the placing and sizing of child components when the
parent container resizes. Figure 10.4 displays the resulting arrangement after the window
size increases. As specified by the side option, label1 remains at the top of the container,
while label2 and label3 stay at the left side of the container. Section 10.10.1 discusses
different settings for method pack and the effects of resizing parent containers.
A Labels can display an image when a programmer specifies values for the keyword
argument bitmap. For example, a value of "warning" (line 30) displays a warning
bitmap image on label3. Figure 10.5 lists other values for bitmap that are available
error hourglass
gray75 info
gray50 questhead
gray25 question
gray12 warning
In addition to using existing bitmap images, programmers can create images to insert
in a GUI by using keyword argument image. Note that a hierarchy exists between image,
bitmap and text keyword arguments (in that order). For example, if an image option is
specified, any bitmap or text options are ignored. Similarly, if bitmap and text
options both are specified, the text option is ignored. Label options follow a precedence
hierarchy—the value of the option with the highest precedence appears on the GUI, and
other labels are ignored. Labels with the highest precedence are image, next is bitmap,
and the lowest precedence is text.
The third label component, Label3, has the side option set to LEFT (line 31). This
setting left-justifies the label against Label2, not against the edge of the GUI.
Section 10.10.1 offers for more information about how the pack method arranges compo-
nents in a GUI.
Lines 33–37 introduce a convention common to many GUI programs. Lines 36–37 test
whether the namespace is "__main__" and calls function main if the condition is true
(i.e., the interpreter has been invoked on the file) and false if the file has been imported as
a module. Function main executes if the program is run by itself, rather than imported as
a module for use in another program.
Function main creates a LabelDemo object and calls its mainloop method (line
34). Method mainloop starts the labelDemo GUI. The method redraws the GUI when
necessary (e.g., when the user changes the size of the GUI) and sends events to the appro-
priate components. [Note: We discuss events in Section 10.4.] Method mainloop termi-
nates when the user destroys (closes) the GUI.
When an event occurs, the GUI component with which the user interacted determines
whether an event handler has been specified for the event. If an event handler has been
specified, the event handler associated with the event executes. For example, a “rollover”
event occurs when the user moves the mouse over a component. A program might require
that the appearance of a label changes (e.g., by changing the background color of the label)
when a rollover event occurs. In this case, the programmer defines a method that changes
the label’s appearance and binds the rollover event to the method. When the user moves the
mouse over the label, the method executes.
Fig. 10.6 Entry components and event binding demonstration. (Part 1 of 3.)
pythonhtp1_10.fm Page 351 Friday, December 14, 2001 2:02 PM
Fig. 10.6 Entry components and event binding demonstration. (Part 2 of 3.)
pythonhtp1_10.fm Page 352 Friday, December 14, 2001 2:02 PM
Fig. 10.6 Entry components and event binding demonstration. (Part 3 of 3.)
Line 5 imports the class definitions and constants from module tkMessageBox.
Module tkMessageBox contains functions that display dialogs, which present messages
to users.
Class EntryDemo’s __init__ method calls the base class constructor, packs the
EntryDemo and titles the program (lines 13–15). Method geometry configures the
length and width of the top-level component in pixels (line 16). Line 18 creates the first
Frame component, frame1. The pack method call (line 19) introduces another option,
pady, which specifies the amount of empty vertical space between frame1 and other GUI
components in the parent container. Similarly, option padx, used later in the program,
specifies the amount of empty horizontal space between components.
Lines 21 create Entry component text1. Option name assigns a name to Entry.
We assign a name so the event handler can use that name to identify the component in
which an event has occurred.
Look-and-Feel Observation 10.3
If a name is not specified by the programmer, Tkinter assigns each component a unique
name. To obtain the full name of a component, pass the component object to function str.
10.3
Method bind (line 24) associates a <Return> event with component text1. A
<Return> event occurs when the user presses the Enter key. Method bind takes two
arguments. The first argument is the type of the event (the event format), and the second
argument is the name of the method to bind to that event. In this example, method show-
Contents executes when a <Return> event occurs in text1.
Lines 30–32 create and pack Entry component text2. Method insert writes text
in the Entry component (line 30). Method insert takes two arguments—a position at
which text is to be inserted and a string that contains the text to insert. Passing a value of
pythonhtp1_10.fm Page 353 Friday, December 14, 2001 2:02 PM
INSERT as the first argument causes the text to be inserted at the cursor’s current position.
Text also can be inserted at the end of an Entry component. For example, the call
removes all text in an Entry component in the range start to finish. If END is the
second argument, the method removes text up to the end of the text area. The first position
in an Entry component is position 0; therefore, delete( 0, END ) removes all text in
an Entry component.
Lines 34–35 creates and packs the second Frame component, frame2. The program
packs the Frames one below the other to create two rows into which the Entrys are
inserted. The program inserts Entry components text1 and text2 in frame1, while
text3 and text4 are packed into frame2.
Lines 41–43 create and pack text3 in the same way as the first two Entrys. In this
case, the component is bound to the <Return> event (line 42). In this example, we dem-
onstrate disabling text3 with method config. Method config allows the user to con-
figure a component by specifying keyword-value pairs (line 41). Specifying the value
DISABLED for option state disables the Entry component, preventing the user from
editing its text. As a result, text3 cannot generate a <Return> event. Disabling an
Entry can be useful to a program that wants to display text but does not want the user to
edit that text.
Lines 46–50 create and pack Entry component text4 in the same way as the first
three Entrys. This component enables the user to enter confidential information. Option
show specifies a character that will be displayed in the text box instead of the user-entered
text (line 47). In this example, asterisks (*) appear in place of the default text, "Hidden
text". Asterisks also appear in place of any text that the user types into the Entry com-
ponent.
Method showContents (lines 52–60) is the event handler for each <Return>
event generated in the Entry components. In Python, most event handlers take as a refer-
ence to an Event object as an argument; an Event object can have various attributes. The
component that generated the event is obtained from the object’s widget attribute (i.e.,
event.widget). In our program, event.widget refers to one of the four Entry
components whose <Return> event is bound to method showContents.
Common Programming Error 10.1
Failure to bind an event handler to an event type for a particular GUI component results in
no events being handled for that component for that event type. 10.1
Widget method winfo_name (line 56) returns the name of the component. Entry
method get (line 59) returns the contents of the Entry. The event handler uses both
return values to construct a message to display to the user. The tkMessageBox function
showinfo (line 60) displays a dialog box labeled "Message" that contains the name
and contents of the Entry that generated the event. The screenshots that appear at the end
pythonhtp1_10.fm Page 354 Friday, December 14, 2001 2:02 PM
of Fig. 10.6 demonstrate what happens when each Entry component receives the
<Enter> event.
Figure 10.7 creates two Buttons and demonstrates that Buttons, like Labels, can
display both images and text.
Lines 18–19 create a Button called plainButton. Option text sets the button’s
label. Keyword argument command specifies the event handler that executes when a user
selects the button. In our example, plainButton’s label is "PlainButton", and its
event handler is method pressedPlain.
Lines 20–21 bind methods rolloverEnter and rolloverLeave to plain-
Button events <Enter> and <Leave> events, respectively. The <Enter> event
occurs when the user places the mouse cursor over the button; the <Leave> event occurs
when the user removes the mouse cursor from the button. Section 10.8 discusses mouse
events in detail.
Many Tkinter components, including Buttons, can display images by specifying
image arguments to their constructors or their config methods. The image to display
must be an object of a Tkinter class that loads an image file. One such class is Photo-
Image, which supports three image formats—Graphics Interchange Format (GIF), Joint
pythonhtp1_10.fm Page 356 Friday, December 14, 2001 2:02 PM
Photographic Experts Group (JPEG) and Portable Greymap Format (PGM). File names
for each of these types typically end with .gif, .jpg (or .jpeg) or .pgm (or .ppm),
respectively. An additional image class is class BitmapImage, which supports the
Bitmap (BMP) image format (.bmp). Line 25 creates a PhotoImage object. File
logotiny.gif contains the image to load and store in the PhotoImage object. (This
file resides in the same directory as the program.) The program assigns the newly created
PhotoImage object to reference myImage.
Lines 26–27 create fancyButton with image attribute myImage. As with
Labels, the image attribute takes precedence over text and bitmap attributes, and if
text or bitmap are specified, they are ignored.
The event handler for fancyButton is pressedFancy. Note that methods
pressedPlain (lines 32–33) and pressedFancy (lines 35–36) do not take an Event
object as an argument. This is because Button callbacks do not take Event objects as
arguments. Without an Event object, a callback cannot determine for which component
the event occurred; therefore, it is important to specify a separate callback method for each
Button, to ensure that the calling component can be identified. Methods pressed-
Plain and pressedFancy create the "Message" dialog boxes, which notify users of
the buttons that generated the events.
Good Programming Practice 10.2
Defining a separate callback method for each Button avoids confusion, ensures desired be-
havior and makes debugging a GUI easier. 10.2
54
55 self.text.config( font = desiredFont )
56
57 def main():
58 CheckFont().mainloop()
59
60 if __name__ == "__main__":
61 main()
Figure 10.9 is similar to the program in Fig. 10.8 in that the user can alter the font style
of an Entry’s text. However, this example permits only a single font style in the group to
be selected at a time, using radio buttons.
Sequence fontSelections (lines 27–28) lists several font styles. Lines 29–32
define a StringVar object, chosenFont, and sets the initial value to the default style,
"Plain". Like BooleanVar, StringVar is a subclass of Tkinter class Vari-
able, and it acts as a container for a string variable. Unlike our CheckButtons example,
pythonhtp1_10.fm Page 361 Friday, December 14, 2001 2:02 PM
<ButtonPress-n> Mouse button n has been selected while the mouse pointer is
over the component. n may be 1 (left button), 2 (middle button)
or 3 (right button). (e.g., <ButtonPress-1>).
<Button-n>, <n> Shorthand notations for <ButtonPress-n>.
<ButtonRelease-n> Mouse button n has been released.
<Bn-Motion> Mouse is moved with button n held down.
<Prefix-Button-n> Mouse button n has been Prefix clicked over the component.
Prefix may be Double or Triple.
<Enter> Mouse pointer has entered the component.
<Leave> Mouse pointer has exited the component.
Lines 17–18 create a StringVar object mousePosition and initializes its value
to "Mouse outside window". Lines 19–21 create and pack Label position-
Label with textvariable option mousePosition. Option textvariable asso-
ciates the text displayed by a Label component with a StringVar object. Option
textvariable must be associated with a Tkinter Variable object. (Note that in
Fig. 10.4 we demonstrated the Label component’s text option which is associated with a
Python variable.) When the string value of the object—in this case mousePosition—
changes, the text of the label, positionLabel, is updated.
Lines 24–28 bind a few common mouse events to the window. An event is generated
when the left mouse button is selected or released while the mouse pointer is in the window,
when the mouse pointer enters or leaves the window or when the mouse is moved with the
left button pressed.
When a <Button-1> event or a <ButtonRelease-1> event is generated,
method buttonPressed (lines 30–34) or method buttonReleased (lines 36–40),
respectively, calls method set to change the value of variable mousePosition to
inform the user of the event. A mouse event’s Event object contains the x- and y-coordi-
pythonhtp1_10.fm Page 364 Friday, December 14, 2001 2:02 PM
nates, stored in the x and y attributes of the Event object, that describe where the event
occurred.
When a mouse pointer enters the application area, method enteredWindow (lines
42–45) executes. When a mouse pointer exits the application area, method exited-
Window (lines 47–50) executes. As the screen captures demonstrate, each method prints
an appropriate message indicating whether the mouse is over or not over the MouseLo-
cation object. The methods modify the value in StringVar object mousePosition
to update the Label’s text.
Event handler mouseDragged (lines 52–56) is triggered under different circum-
stances than event handlers buttonPressed and buttonReleased. There are two
conditions which must be met before a <B1-Motion> event is triggered: button B1 must
be pressed and the mouse must be moving. Once these requirements are met, the <B1-
Motion> event is fired at a rate that is defined by the operating system. In other words, on
on one operating system, dragging a mouse to the right might trigger 50 <B1-Motion>
events, while on a different operating system the rate might be much lower. For each <B1-
Motion> event, method mouseDragged displays the events and the coordinates from
which the event originated.
A mouse may have one, two, or three buttons. A program may need to take different
actions, depending on which button the user has pressed. Figure 10.12 contains a program
that demonstrates how to distinguish between different mouse buttons.
Figure 10.12 is similar to Fig. 10.11 except that lines 25–27 bind methods to events for
different mouse buttons by changing the number in the event format (<Button-n>). When
the user presses a button while the mouse pointer is inside the window, the window’s title
pythonhtp1_10.fm Page 366 Friday, December 14, 2001 2:02 PM
changes to indicate which button was pressed. Each event handler calls method showPo-
sition (lines 47–51), which displays the coordinates of the mouse event.
Lines 17–25 create and pack two Labels—line1 and line2—that display infor-
mation about the key events. Lines 28–29 bind methods keyPressed and keyRe-
leased to <KeyPress> and <KeyRelease> events, respectively. Method bind
(lines 32–34) associates the <KeyPress-n> and <KeyRelease-n> events for the left
Shift key (Shift_L) to methods shiftPressed and shiftReleased, respectively.
Methods shiftPressed (lines 50–54) and shiftReleased (lines 56–60) dis-
play messages in the Label components when the user presses and releases the left Shift
key, respectively. If the user selects a key other than the Shift key, methods keyPressed
and keyReleased display messages in line1 and line2 indicating which key gener-
ated the event. Methods keyPressed and keyReleased obtain the name of the key
with the char attribute of the Event object.
Portability Tip 10.2
Not all systems can distinguish between the left and right Shift keys. 10.2
10.10.1 Pack
All the previous GUI examples used the most basic layout manager, Pack. Unless a pro-
grammer specifies a different order, Pack places GUI components in a container from top
to bottom in the order in which they listed in the program. A container is a GUI component
into which other components may be placed. Containers are useful for managing the layout
of GUI components. When the edge of the container is reached, the container expands, if
possible. If the container cannot expand, the remaining components are not visible.
A programmer has several options when packing components in a container. Option
side indicates the side of the container against which the component is placed. Setting
side to TOP (the default value) packs components vertically. Other possible values are
BOTTOM, LEFT (for horizontal placement) and RIGHT. The fill option, which can be
set to NONE (default), X, Y or BOTH, allots the amount of space the component should
occupy in the container. Setting fill to X, Y or BOTH ensures that a component occupies
all the space the container has allocated to it in the specified direction. The expand option
can be set to YES or NO (1 or 0). The default value is NO. If expand is set to YES, the
component expands to fill any extra space in the container. The padx and pady options
insert padding, or empty space, around a component. The method pack_forget removes
a packed component from a container.
Good Programming Practice 10.4
Review the list of options and methods for layout managers found in the Python on-line doc-
umentation before using layout managers. 10.4
Figure 10.16 creates four Buttons and adds them to the application using the Pack
layout manager. The example manipulates the button locations and sizes.
The Frame constructor (line 12) allows the base class to perform any initialization
that it requires before we add components. Method title (line 13) displays the title in the
GUI. Method geometry (line 14) sets the width and height to 300 and 150 pixels, respec-
tively. The expand and fill options (line 15) are set to YES and BOTH, respectively,
ensuring that the packDemo GUI fills the entire window. The second screen capture illus-
trates the GUI’s appearance after it has been resized by dragging the borders with the
mouse.
pythonhtp1_10.fm Page 370 Friday, December 14, 2001 2:02 PM
Lines 17–42 create and pack four Buttons, specifying different packing options for
each Button. The Pack layout manager places each item on the top-level component in
the order that they appear in the program. The specified values for options side, expand
and fill ensure that the buttons appear as they do in the screenshots. Method pack (line
21) places button1 at the top of the container as specified by option side. Since fill
and expand are false by default, the Button component maintains its default size. The
pythonhtp1_10.fm Page 372 Friday, December 14, 2001 2:02 PM
next component, button2, (line 28) is placed at the bottom of the container. The fill
option’s value, BOTH, indicates that the Button component should occupy all space allo-
cated to it by the container. The expand option is set for button3 (line 35). Method
pack places this component on the left side of the container. The expand option specifies
that the button should take any available space in the container. The X fill option sets the
button to fill all horizontal space given to it by the container. The last component,
button4, is placed on the right side of the container. Fill option Y causes the button to fill
all its allocated vertical space.
Only one Button—button1—specifies a callback method. When the user presses
button1, method addButton (lines 44–47) creates and packs a new Button. The
newly created Buttons are packed vertically below button1 and are each padded in the
vertical direction by five pixels.
10.10.2 Grid
The Grid layout manager divides the container into a grid, so that components can be
placed in rows and columns. Components are added to a grid at their specified row and
column indices; every cell in the grid can contain a component. Row and column numbers
begin at 0. If the row option is not specified, the component is placed in the first empty row
and the default column value is 0. If the column option is omitted, the column value
defaults to 0. The programmer may set the initial number of rows and columns in the grid
by specifying both options in a grid constructor call. In addition, the rows and columns
can be set with calls to methods rowconfigure and columnconfigure, respectively.
Figure 10.17 demonstrates the Grid layout manager by placing several types of compo-
nents in the GUI.
Line 20 introduces the Text component, which creates a multiple-line text area
text1. Method grid (line 23) inserts component text1 into the grid and introduces
keyword argument rowspan. The rowspan option sets the number of rows that a com-
ponent occupies in the GUI.
Component button1 (line 27) spans two columns, as indicated by keyword argu-
ment columnspan. Option columnspan causes a component to stretch across a speci-
fied number of columns. Lines 27–43 create four buttons and explicitly insert each button
at a certain row and column.
The Entry component inserted at row 3 (lines 46–49) spans two columns and fills all
available space in the cell. In line 47, columnspan is assigned 2 and sticky is set to
W+E+N+S—creating an Entry component that fills the first two columns of row 3.
Methods rowconfigure and columnconfigure ensure that the second row and
column expand when a user resizes the window (lines 57–58).
As in the Pack layout manager, Grid options padx and pady set the size of vertical
and horizontal padding around a component in a cell. To place padding inside the compo-
nent, use options ipadx and ipady. When a component is smaller than its cell, it is cen-
tered in the cell by default.
Common Programming Error 10.4
It is possible to specify overlapping components. The components that are packed earliest in
the code are obscured by the most recently added component. 10.4
10.10.3 Place
The Place layout manager allows the user to set the position and size of a GUI component
absolutely or relatively to the position and size of another component. The component be-
ing referenced is specified with the in_ option and may be only the parent of the compo-
nent being placed (default) or a descendant of its parent.
Layout manager Place is more complicated than the other managers. For this reason,
we do not discuss the Place layout manager in detail, although Fig. 10.19 lists the most
common Place methods. Figure 10.20 lists the common place and place_configure
method options. For more information on layout manager Place, visit www.python.org.
pythonhtp1_10.fm Page 376 Friday, December 14, 2001 2:02 PM
Class Deck (lines 32–95) consists of a list deck of 52 Card objects, an integer cur-
rentCard representing the most recently dealt card in the deck list and the GUI compo-
nents used to manipulate the deck of cards. The constructor uses the for structure (lines
41–43) to fill the deck list with Card objects. Each Card is instantiated and initialized
with two strings—one from the faces list (Strings "Ace" through "King") and one
from the suits list ("Hearts", "Diamonds", "Clubs" and "Spades"). Note that
the lists are referenced as Card.faces and Card.suits, respectively, because they
are class attributes of class Card. The calculation i % 13 always results in a value from 0
to 12 (the thirteen subscripts of the faces list), and the calculation i / 13 always results
in a value from 0 to 3 (the four subscripts in the suits list).
40 # create deck
41 for i in range( 52 ):
42 self.deck.append( Card( Card.faces[ i % 13 ],
43 Card.suits[ i / 13 ] ) )
44
45 # create buttons
46 self.dealButton = Button( self, text = "Deal Card",
47 width = 10, command = self.dealCard )
48 self.dealButton.grid( row = 0, column = 0 )
49
50 self.shuffleButton = Button( self, text = "Shuffle cards",
51 width = 10, command = self.shuffle )
52 self.shuffleButton.grid( row = 0, column = 1 )
53
54 # create labels
55 self.message1 = Label( self, height = 2,
56 text = "Welcome to Card Dealer!" )
57 self.message1.grid( row = 1, columnspan = 2 )
58
59 self.message2 = Label( self, height = 2,
60 text = "Deal card or shuffle deck" )
61 self.message2.grid( row = 2, columnspan = 2 )
62
63 self.shuffle()
64 self.grid()
65
66 def shuffle( self ):
67 """Shuffle the deck"""
68
69 self.currentCard = 0
70
71 for i in range( len( self.deck ) ):
72 j = random.randint( 0, 51 )
73
74 # swap the cards
75 self.deck[ i ], self.deck[ j ] = \
76 self.deck[ j ], self.deck[ i ]
77
78 self.message1.config( text = "DECK IS SHUFFLED" )
79 self.message2.config( text = "" )
80 self.dealButton.config( state = NORMAL )
81
82 def dealCard( self ):
83 """Deal one card from the deck"""
84
85 # display the card, if it exists
86 if self.currentCard < len( self.deck ):
87 self.message1.config(
88 text = self.deck[ self.currentCard ] )
89 self.message2.config(
90 text = "Card #: %d" % self.currentCard )
91 else:
92 self.message1.config( text = "NO MORE CARDS TO DEAL" )
93 self.message2.config( text =
94 "Shuffle cards to continue" )
95 self.dealButton.config( state = DISABLED )
96
97 self.currentCard += 1 # increment card for next turn
98
99 def main():
100 Deck().mainloop()
101
102 if __name__ == "__main__":
103 main()
When the user clicks the Deal card button, method dealCard (lines 82–95) gets the
next card in the list. If currentCard is less than 52 (the length of deck), lines 87–88 dis-
play the face and suit of the card in Label message1. Label message2 (lines 89–90)
displays the number of the card (currentCard). If there are no more cards to deal (i.e.,
currentCard is greater than or equal to 52), the string "NO MORE CARDS TO
DEAL" is displayed in message1 and string "Shuffle cards to continue" is displayed
in message2.
When the user clicks the Shuffle cards button, method shuffle (lines 66–80)
shuffles the cards. The method loops through all 52 cards (list subscripts 0 to 51). For each
card, a number between 0 and 51 is picked randomly. Next, the current Card object and
the randomly selected Card object are swapped in the list. A total of only 52 swaps are
made in a single pass of the entire list, and the list of Card objects is shuffled! When the
shuffling is complete, "DECK IS SHUFFLED" is displayed in a Label.
faqts.com/knowledge_base/index.phtml/fid/265
This page lists questions and answers concerning handling events.
www.pythonware.com/library/tkinter/introduction
Fredrik Lundh’s An Introduction to Tkinter offers information about Widget classes and event han-
dling.
www.python.org/topics/tkinter
This Web page provides links to documentation about Tkinter, additional Widget classes and
troubleshooting tips.
www.csis.hku.hk/~kkto/doc-tkinter/tkinter/tkinter.html
Isaac K. K. To’s Building GUI Programs Using Tkinter: A Tkinter Manual provides information
about layout managers, events, the Widget class and subclasses.
SUMMARY
• A graphical user interface (GUI) presents a pictorial interface to a program.
• A GUI (pronounced “GOO-eE”) gives a program a distinctive “look” and “feel.”
• By providing different applications with a consistent set of intuitive user-interface components,
GUIs allow the user to spend less time trying to remember which keystroke sequences do what and
spend more time using the program in a productive manner.
• GUIs are built from GUI components (sometimes called controls or widgets—shorthand for win-
dow gadgets).
• A GUI component is an object with which a user interacts via a mouse or a keyboard.
• The Tkinter module is the most frequently used module for programming graphical user inter-
faces in Python.
• The Tkinter library provides an interface to the Tk (Tool Kit) GUI toolkit–the graphical inter-
face development tool for the Tool Command Language (TCL).
• Tkinter implements each Tk GUI component as a class that inherits from class Widget.
• All Widgets have common attributes and behaviors.
• A GUI consists of a top-level component that may contain more GUI components. The top-level
component is the parent component. The remaining components are children of the top-level com-
ponent and each child of the top-level component may itself contain children (descendants of the
parent component). A program builds a GUI from the top-level component by creating new com-
ponents and placing each new component in its parent.
• Inheriting from class Frame extends the GUI’s functionality. This inheritance enables the reuse
of components in other GUI programs and promotes object-orientation in GUI programs.
• The Tkinter module, like the rest of Python, is portable across many platforms.
• Labels display text or images and usually provide instructions or other information on a graphical
user interface.
• The Frame constructor initializes the Frame and creates a top-level component into which the
Frame is placed.
• The creation of a GUI object initially does not display it on the screen. The program must specify
where and how to draw the object.
• Method pack places components in the GUI.
• Keyword argument fill specifies how much available space the component occupies, beyond
the component’s default size. Possible values for fill are X (all available horizontal space), Y (all
pythonhtp1_10.fm Page 381 Friday, December 14, 2001 2:02 PM
available vertical space), BOTH (both vertical and horizontal available space) and NONE (the de-
fault value—do not take up available space).
• Keyword argument expand specifies whether a child component should take up any extra space in
its parent component (i.e., any space not yet occupied once all other components have been placed).
• Each GUI component’s class constructor takes a first argument that corresponds to the new ob-
ject’s parent.
• The value of keyword argument text specifies the contents of the Label component.
• The keyword argument side describes where the new component is drawn. Value LEFT speci-
fies that a component is placed against the left side of the window. Other possible values for the
side option are BOTTOM, RIGHT and TOP, the default setting.
• Many components display images by specifying a value for the keyword argument bitmap.
• Keyword argument image inserts a programmer-defined image. Label options have the following
precedence, from highest to lowest: image, bitmap and text. Each Label component dis-
plays only one bitmap, image or text message. The value of the option with the highest precedence
appears on the GUI. Any other values are ignored.
• If the interpreter is running the program, method mainloop method starts the GUI, redraws the
GUI as needed and sends events to the appropriate components. It terminates when the user de-
stroys (closes) the GUI.
• GUIs are event driven (i.e., they generate events when the user of the program interacts with the
GUI). Some common interactions are moving the mouse, clicking a mouse button, typing in a text
field, selecting an item from a menu and closing a window. When a user interaction occurs, an
event is sent to the program.
• GUI event information is stored in an object of a class Event.
• An event-driven program is asynchronous—the program does not know when events will happen.
• To process a GUI event, the programmer must perform two key tasks—bind an event to a graph-
ical component and implement an event handler. A program explicitly binds, or associates, an
event with a graphical component and specifies an action to perform when that event occurs. Typ-
ically, the action is performed by an event handler—a method that is called in response to its as-
sociated event.
• When an event occurs, the GUI component with which the user interacted determines whether an
event handler has been specified for the event. If an event handler has been specified, that event
handler executes. The program can specify an event handler that executes when this event occurs.
• Entry components are areas in which users can enter or programmers can display a single line
of text.
• When the user types data into an Entry component and presses the Enter key, a <Return>
event occurs. If an event handler is bound to that event for the Entry component, the event is
processed and the data in the Entry can be used by the program.
• Module tkMessageBox contains functions that display dialogs. Dialogs present messages to the
user.
• Method geometry specifies the length and width of the top-level component in pixels.
• Option pady of method pack specifies the amount of empty vertical space between GUI compo-
nents. Similarly, option padx specifies the amount of empty horizontal space between components.
• The Entry constructor’s width argument specifies that 20 columns of text can appear in the text
area on the GUI, although the Entry component accommodates larger inputs. The width of the
text field will be the width, in pixels, of the average character in the text field’s current font mul-
tiplied by 20.
pythonhtp1_10.fm Page 382 Friday, December 14, 2001 2:02 PM
• Option name assigns a name to the Entry. A program can use the name to identify the component
in which an event has occurred.
• Method bind associates an event with a component. Method bind takes two arguments. The first
argument is the type of the event, and the second argument is the name of the method to bind to
that event.
• Method insert writes text in the Entry component. Method insert takes two arguments—
a position at which text is to be inserted and a string that contains the text to insert.
• Passing a value of INSERT as the first argument to method insert causes the text to be inserted
at the cursor’s current position.
• Method call insert( END, text ) appends text to the end of any text already displayed in
the component.
• Method call delete( start, finish ) removes all text in an Entry component in the range
start to finish. Using END as the second argument removes text up to the end of the text area.
The first position in an Entry component is position 0; delete( 0, END ) removes all text in
an Entry component.
• Method config configures a component’s options.
• Specifying the value DISABLED for option state disables the Entry component, preventing
the user from editing its text.
• Option show sets the character that appears in place of the actual text.
• Most event handlers take as an argument an Event object, which has various attributes. The com-
ponent that generated the event is obtained from the widget attribute of the Event object (i.e.,
event.widget).
• Widget method winfo_name and Entry method get acquire the name and contents of an
Entry, respectively.
• The tkMessageBox function showinfo displays a dialog box.
• A button is a component the user clicks to trigger a specific action. A button generates an event
when the user clicks the button with the mouse.
• Buttons are created with class Button, which inherits from class Widget.
• The text or image on the face of a Button component is called a button label.
• Buttons (like Labels) can display both images and text.
• Option text sets the button’s label.
• Keyword argument command specifies the event handler (or callback) that is invoked when the
button is selected.
• Many Tkinter components, including Buttons, can display images by specifying an image
argument to their constructor or their config method.
• A specified image must be an object of a Tkinter class that loads an image file. One such class
is PhotoImage, which supports three image formats—Graphics Interchange Format (GIF), Joint
Photographic Experts Group (JPEG) and Portable Greymap Format (PGM). File names for each
of these types typically end with .gif, .jpg (or .jpeg) or .pgm (or .ppm), respectively.
• Class is BitmapImage supports the Bitmap (BMP) image format (.bmp).
• As with Labels, the image attribute of a Button component takes precedence over text and
bitmap attributes.
• The relief option of the Buttons is changed to GROOVE or RAISED to create rollover effects.
• Tkinter contains two GUI components—Checkbutton and Radiobutton—that have on/
off or true/false values.
pythonhtp1_10.fm Page 383 Friday, December 14, 2001 2:02 PM
• The sticky option specifies the component’s alignment or stretches the component to fill the
cell. Possible values for sticky are any combination of W, E, N, S, NW, NE, SW and SE.
• A sticky value of W+E is similar to setting fill to X when packing a component—the compo-
nent stretches from the left (W) to the right (E). The component stretches horizontally to fill the cell.
• Setting sticky to W+E+N+S produces results similar to those produced by a fill value of BOTH.
• Specifying only one value for sticky is analogous to the side option of Pack. The component
aligns to the specified cell border without being stretched.
• The Grid manager, the component into which other components have been placed, supports sev-
eral methods that control the grid.
• Methods rowconfigure and columnconfigure change options of rows and columns, re-
spectively.
• The weight option specifies the relative weight of growth for a row or column. The default is 0,
therefore, cells will not change size if the window is resized unless the weight option has been
changed.
• The Text component creates a multiple-line text area.
• The rowspan option sets the number of rows that a component occupies in the GUI.
• Option columnspan causes a component to span the specified number of columns.
• As in the Pack layout manager, Grid options padx and pady specify the size of vertical and
horizontal padding around a component in a cell.
• To place padding inside the component, use options ipadx and ipady. If a component is smaller
than its cell, it is centered in the cell by default.
• The Place layout manager allows the user to set the position and size of a GUI component in
relation to the position and size of another component relatively or absolutely. The component be-
ing referenced is specified with the in_ option and may be only the parent of the component being
placed (default) or a descendant of its parent.
• The Place layout manager is more complicated than the other managers, so most programmers
prefer to use the other, more simpler managers.
TERMINOLOGY
anchor option of layout manger Place Checkbutton component
bitmap image children
bitmap option of Button component columnconfigure method of
bitmap option of Entry component layout manager Grid
BitmapImage class column option of method grid
<Bn-motion> event columnspan option of method grid
BooleanVar class config method of class Widget
BOTTOM value of option side of method pack E value of option anchor of
Button component layout manger Place
button label E value of option sticky of method grid
<Button-n> event <Enter> event
<ButtonPress-n> event Entry component
<ButtonRelease-n> event Event class
callback event handler
CENTER value of option anchor of expand option of method pack
layout manger Place fill option of method pack
char attribute of the Event object font attribute of component Entry
check box Frame component
pythonhtp1_10.fm Page 385 Friday, December 14, 2001 2:02 PM
SELF-REVIEW EXERCISES
10.1 Fill in the blanks in each of the following:
a) A presents a pictorial user interface to a program.
b) Labels are defined with class — a subclass of .
c) are single-line areas in which text can be displayed.
d) Method displays text in an Entry.
e) Method displays a message dialog.
f) A is a container for other components.
g) Use method of class to acquire the name of an Entry.
h) arrange GUI components on a container for presentation purposes.
i) A is a component that the user clicks to trigger an action.
j) The places components in the specified row and column.
10.2 State whether each of the following is true or false. If false, explain why.
a) All Tkinter classes inherit from Frame.
b) A Label displays only text.
c) The Entry component creates multiple-line text areas.
d) When the user types data into an Entry and presses the Enter key, an <Enter> event
occurs.
e) Tkinter Button components display images using method img.
f) Class PhotoImage supports GIF, JPEG and PGM images.
g) Only one Radiobutton can be selected at a time.
h) Boolean objects are Tkinter integer variables that can have a value of 0 or 1.
i) Event format <Left> handles the event in which a mouse pointer has exited the com-
ponent.
j) Layout managers arrange the placement of GUI components.
EXERCISES
10.3 Create the following GUI using the Grid layout manager. You do not have to provide any
functionality.
10.4 Write a temperature conversion program that converts Fahrenheit to Celsius. Use the Pack
layout manager. The Fahrenheit temperature should be entered from the keyboard via an Entry
component. A tkMessageBox should display the converted temperature. Use the following formu-
la for the conversion:
10.5 Enhance the temperature conversion program of Exercise 10.4 by adding the Kelvin temper-
ature scale. The program should also allow the user to make conversions between any two scales. Use
the following formula for the conversion between Kelvin and Celsius (in addition to the formula in
Exercise 10.4):
10.6 Add functionality—addition, subtraction, multiplication and division—to the calculator cre-
ated in Exercise 10.3. Use the built-in Python function eval to evaluates strings. For instance,
eval( "34+24" ) returns the integer 58.
10.7 Write a program that allows the user to practice typing. When the user clicks a button, the
program generates and displays a random sequence of letters in an Entry component. The user re-
peats the sequence in another Entry component. When the user enters an incorrect letter, the pro-
gram displays an error message until the user types the correct letter. Use keyboard events.
10.8 Create a GUI for a matching game. Initially, buttons should cover pairs of images. When the
user clicks a button, the image displays. If the user finds a matching pair, disable the buttons and dis-
play their images. If the user’s choices do not match, hide the images.
pythonhtp1_11.fm Page 388 Friday, December 14, 2001 2:03 PM
11
Graphical User Interface
Components: Part 2
Objectives
• To create a scrolled list of items from which a user can
make a selection.
• To create scrolled text areas.
• To create menus and popup menus.
• To create and manipulate canvases and scales.
I claim not to have controlled events, but confess plainly that
events have controlled me.
Abraham Lincoln
A good symbol is the best argument, and is a missionary to
persuade thousands.
Ralph Waldo Emerson
Capture its reality in paint!
Paul Cézanne
pythonhtp1_11.fm Page 389 Friday, December 14, 2001 2:03 PM
Outline
11.1 Introduction
11.2 Overview of Pmw
11.3 ScrolledListbox Component
11.4 ScrolledText Component
11.5 MenuBar Component
11.6 Popup Menus
11.7 Canvas Component
11.8 Scale Component
11.9 Other GUI Toolkits
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
11.1 Introduction
In this chapter, we continue our study of GUIs. We discuss more advanced components and
lay the groundwork for building complex GUIs.
We discuss Python megawidgets (Pmw)—a toolkit that provides high-level GUI com-
ponents developed from smaller components provided by the Tkinter module. For
example, a Pmw ScrolledListBox component allows the user to select an item from a
drop-down list. We continue our discussion with a look at the ScrolledText compo-
nent that allows a user to manipulate multiple lines of text. We also discuss menus; Pmw
class MenuBar creates a component that helps a user organize a menu.
We also introduce more Tkinter classes. We use Tkinter class Menu to create
popup menus—context-sensitive menus that typically appear when the user right clicks on
components that have popup menus. Finally, we discuss the Tkinter Canvas compo-
nent for displaying and manipulating text, images, lines and shapes. There are many GUI
components and toolkits available to Python programmers, so we end this chapter with a
description of several other toolkits.
The following line configures the height of the text component in an existing Pmw
TextDialog component
textdialog.configure( text_height = 10 )
Although Pmw extends the functionality of the Tkinter module by providing addi-
tional components, Pmw is not packaged with Python. To download the product, visit
pmw.sourceforge.net. For installation instructions, visit the Deitel & Associates
Web site at www.deitel.com.
23
24 # create scrolled list box with vertical scrollbar
25 self.listBox = Pmw.ScrolledListBox( self, items = images,
26 listbox_height = 3, vscrollmode = "static",
27 selectioncommand = self.switchImage )
28 self.listBox.pack( side = LEFT, expand = YES, fill = BOTH,
29 padx = 5, pady = 5 )
30
31 self.display = Label( self, image = self.photos[ 0 ] )
32 self.display.pack( padx = 5, pady = 5 )
33
34 def switchImage( self ):
35 """Change image in Label to current selection"""
36
37 # get tuple containing index of selected list item
38 chosenPicture = self.listBox.curselection()
39
40 # configure label to display selected image
41 if chosenPicture:
42 choice = int( chosenPicture[ 0 ] )
43 self.display.config( image = self.photos[ choice ] )
44
45 def main():
46 images = [ "bug1.gif", "bug2.gif",
47 "travelbug.gif", "buganim.gif" ]
48 ImageSelection( images ).mainloop()
49
50 if __name__ == "__main__":
51 main()
Line 5 imports module Pmw. In line 14, function Pmw.initialise initializes Pmw.
The call to function initialise enables the program to access the full functionality of
the Pmw module.
Testing and Debugging Tip 11.1
A program that uses module Pmw but does not invoke Pmw.initialise is not able to ac-
cess the full functionality of module Pmw. 11.1
Method main (lines 45–48) creates a list of image filenames, images, that the pro-
gram passes to the constructor method of class ImageSelection (lines 7–43). Lines
21–22 create a list of PhotoImage instances from the filenames in images. Lines 25–
pythonhtp1_11.fm Page 392 Friday, December 14, 2001 2:03 PM
nent. Sometimes, no event types are bound for a ScrolledText. Instead, an external
event, (i.e., an event generated by a different GUI component) indicates when to process the
text in a ScrolledText component. For example, many graphical e-mail programs pro-
vide a Send button to send the text of the message to the recipient. In this program, a button
generates the external event that determines when the program copies the selected text in the
left ScrolledText component into in the right ScrolledText component.
Fig. 11.2 Text copied from one component to another. (Part 1 of 2.)
pythonhtp1_11.fm Page 394 Friday, December 14, 2001 2:03 PM
Fig. 11.2 Text copied from one component to another. (Part 2 of 2.)
Lines 25–27 create and pack copyButton and bind callback method copyText.
Lines 30–34 create and pack the second ScrolledText component, text2. Line 30
sets text2’s text_state option to DISABLED, rendering the text area uneditable by
disabling calls to insert and delete for the component.
When the user clicks copyButton, method copyText (lines 36–39) executes. This
method retrieves the user-entered text from text1 by invoking the component’s method
get. Method get takes two arguments that specify the range of text to retrieve from the
component. Line 39 retrieves the text1’s selected text by specifying a range that starts at
the beginning of the selection (SEL_FIRST) and stops at the end of the selection
(SEL_LAST). Method settext deletes the current text in the component and inserts the
text the method receives as an argument. In this case, method settext inserts text
returned by method get into text2. If the user has not selected any text, the program
pythonhtp1_11.fm Page 395 Friday, December 14, 2001 2:03 PM
raises a TclError exception and displays the error in an error dialog. We discussed
exceptions briefly in Chapter 7, Object-Based Programming. In Chapter 12, Exception
Handling, we discuss in detail how to handle exceptions (e.g., to prevent the program from
displaying the error dialog).
A menu item is a GUI component inside a menu that performs an action when selected
by a user. Menu items can be of different forms. The command menu item initiates an
action. When a user selects a command menu item, the application invokes the item’s call-
back method. The checkbutton menu item can be toggled on or off. When a user selects
a checkbutton menu item, a checkmark appears to the left of the menu item. A user can
select multiple checkbuttons (i.e, they are not mutually exclusive). Selecting a checked
checkbutton removes the checkmark.
The radiobutton menu item is another menu item that can be toggled on or off.
When multiple radiobutton menu items are grouped together, a user can select only one
item from each radiobutton-menu-item group. After selecting a radiobutton menu
item, a checkmark appears to the left of the menu item. When a user selects another
radiobutton menu item from the same group, the application removes the checkmark
from the previously selected menu item. Like radioButtons (discussed in Chapter 10,
Graphical User Interface Components: Part 1), radiobutton menu items are grouped
logically by a shared variable.
The separator menu item is a horizontal line in a menu. The cascade menu item
is a submenu (or cascade menu) that provides more menu items from which the user can
select.
Look-and-Feel Observation 11.4
The separator menu item can be used to group related menu items. 11.4
A menu bar contains menu items and submenus. When a menu is clicked, the menu
expands to show its list of menu items and submenus. Clicking a menu item generates an
event. Figure 11.3 provides menus and menu items that enable a user to change the prop-
erties of a line of text. The program also introduces balloons (also called a tool-tips) that
display decriptions of menus and menu items. When the user moves the mouse cursor over
a menu or menu item with a balloon, the program displays a specified help message.
Line 20 creates myBalloon—a Pmw Balloon component. Lines 21–23 create and
pack a MenuBar component choices. Option balloon specifies a Balloon compo-
pythonhtp1_11.fm Page 396 Friday, December 14, 2001 2:03 PM
nent that is attached to the menubar. Lines 26–34 build the program’s menu bar. Method
addmenu (line 26) adds a new menu to choices. The method’s first argument ("File")
is the menu name. The second argument ("Exit") contains the text that appears in the
menu’s balloon. When the user places the mouse cursor over the File menu, the program
displays this text in a floating label next to the cursor.
99
100 def closeDemo( self ):
101 """Exit the program"""
102
103 sys.exit()
104
105 def main():
106 MenuBarDemo().mainloop()
107
108 if __name__ == "__main__":
109 main()
Line 37 defines the list of color choices for the sample text. Lines 38–39 create
StringVar selectedColor and initialize it to the first element of the list of color
choices. Lines 41–44 add a radiobutton menu item to the Color submenu for each
item in a list of colors. Note that each radiobutton menu item shares the same callback
method (changeColor) and the same variable (selectedColor). When the user
selects an item, selectedColor’s value changes to the item’s text value and method
changeColor is invoked. Variable selectedColor is shared by the radiobutton
menu items in the group.
Lines 51–54 add a radiobutton menu item for each item in a list of fonts to the
Format menu’s Font submenu. Each radiobutton menu item shares the same call-
back method (changeFont) and the same variable (selectedFont).
Line 57 adds a separator menu item to the "Font" submenu. Lines 60–69 then add
"Bold" and "Italic" checkbutton menu items to the Font submenu. Lines 60 and
66 create two BooleanVar variables to represent whether these menue items are checked
or unchecked). These values are passed to method addmenuitem through its variable
keyword parameter. Although both checkbutton menu items share the same callback
method (changeFont), they each have a different BooleanVar variable. The menu
items’ BooleanVar variables serve the same purpose as in Tkinter Checkbutton
components. When the user selects the menu item, the BooleanVar’s value changes to 1.
When the user deselects the menu item, the BooleanVar’s value changes to 0.
Lines 72–73 create and pack display—a Tkinter Canvas with a white back-
ground on which a program can display text, lines and shapes. A Canvas displays a canvas
item—an object, like a string or a shape, that is drawn on the Canvas component. Each
Canvas has a method that corresponds to a canvas item. Each of these methods creates a
canvas item and adds it to the Canvas. For example, method create_text (lines 75–
76) creates a canvas text item. This method draws the text "Sample Text" onto dis-
play in the font ("Times 48") specified by keyword parameter font. We discuss
Canvas components in more detail in Section 11.7.
When the user selects a Color menu item, method changeColor (lines 78–82) con-
figures sampleText to be filled (colored) with the value of selectedColor. Method
itemconfig configures items on Canvas. Lines 77–78 set the color of sampleText
to the selected color by specifying option fill.
When the user selects a radiobutton menu item in the Font submenu of the
Format menu, method changeFont (lines 84–98) changes the font of sampleText.
Line 98 retrieves the desired font name from selectedFont. Lines 91–95 determine
whether any checkbutton menu items of the Font submenu are selected. If so, the pro-
gram appends the specified style to the font name. Line 92 then updates the text with the
specified font.
are specific to the component for which the popup trigger event was generated. On most
systems, the popup trigger event occurs when the user presses and releases the right mouse
button. However, with Tkinter, a popup trigger event must be specified by binding a
callback to the desired trigger for a component.
Figure 11.4 creates a Menu that allows the user to select one of three colors as the
background color of the Frame. When the user clicks the right mouse button on the
Frame, the program displays a popup menu containing a list of colors. If the user selects
one of the radiobutton menu items that represents a color, the program changes the
background color of the Frame.
The Frame constructor’s bg option is a string that specifies the Frame’s background
color. Lines 18–19 create and pack frame1 with a white background. Line 22 creates a
Tkinter Menu component called menu. Note that Menu’s tearoff option is set to 0.
This setting removes the dashed separator line that is, by default, the first entry in a Menu.
Lines 28–31 add a radiobutton menu item to menu for each item in a list of colors.
Each radiobutton menu item has the same callback method (changeBackground-
Color) and the same variable (selectedColor).
Line 34 binds method popUpMenu to a right-mouse click (<Button-3>) for
frame1. When the user right-clicks in frame1, the popUpMenu callback (lines 36–39)
executes. Line 39 calls Menu method post, which displays a Menu at a given position.
This method accepts two arguments that correspond to the position on the top-level com-
ponent at which the menu is displayed. Event attributes x_root and y_root contain the
coordinates of the mouse cursor when the event was triggered.
When a user selects one of the radiobutton menu items, method changeBack-
groundColor executes. This method (lines 41–44) calls the config method of
frame1, specifying the new bg to be the value of selectedColor (line 44). This
method call changes frame1’s background color.
Figure 11.5 uses the <B1-Motion> event and a Canvas to create a simple drawing
program. The user draws pictures by dragging the mouse cursor over a Canvas.
Lines 17–19 create and pack a Label with user instructions. Lines 22–23 create and
pack Canvas instance myCanvas. Line 26 binds the mouse-drag event (<B1-
Motion>) for the canvas to method paint (lines 28–33). When the user moves the
mouse while holding down the left button, method paint executes. This method draws an
oval on the Canvas myCanvas. Canvas method create_oval creates an oval
Canvas item with a radius of 4 and a fill color of "black" centered at the current mouse
cursor position (line 33).
numeric value
slider
Fig. 11.7 Scale used to control the size of a circle on a Canvas. (Part 1 of 2.)
pythonhtp1_11.fm Page 404 Friday, December 14, 2001 2:03 PM
Fig. 11.7 Scale used to control the size of a circle on a Canvas. (Part 2 of 2.)
Lines 18–20 create and pack control, the Scale used to change the size of the
circle. The constructor’s orient option (HORIZONTAL or VERTICAL) determines
whether the new Scale instance has a horizontal or vertical orientation. Options from_
and to specify the Scale component’s minimum and maximum values, respectively. The
option values in lines 18–19 create a horizontal Scale with a minimum value of 0 and a
maximum value of 200. The Scale’s callback is method updateCircle, which exe-
cutes when the user moves the slider to change the numerical value. Note that although
nothing is drawn on display in __init__, the circle appears on display when the
program starts. This is because when the Scale is created, its callback method (update-
Circle) is invoked. Line 21 sets control’s value to 10, so that when the program starts,
a circle of diameter 10 appears on the screen. Lines 24–25 create and pack display, a
Canvas with a white background.
When the user drags the slider, method updateCircle (lines 27–33) executes. The
callback accepts as an argument the current value of the scale, represented as a string. Line 30
converts this value to an integer, adds 10 to it and stores the value in variable end.
pythonhtp1_11.fm Page 405 Friday, December 14, 2001 2:03 PM
Canvas method delete (line 31) deletes the old circle before drawing a new one.
Method delete accepts one argument—either an item handle or a tag. Item handles are
integer values that identify a newly drawn item. A tag is a name that can be attached to a
canvas item at creation. To attach a tag to a canvas item, pass a string value to the tags
option of the item’s create method.
Method create_oval (lines 32–33) draws an oval with coordinates (10, 10, end,
end), specifying option fill to be "red" and option tags to be "circle". The coor-
dinates specify points on the oval’s bounding rectangle. Canvas method create_item
allows the user to create the following items by substituting their names for item—arc,
line, oval, rectangle, polygon, image, bitmap, text and window.
SUMMARY
• The Pmw (Python Mega Widgets) toolkit provides high-level GUI components composed of
Tkinter components.
• Megawidgets can also be configured for a particular use. The appearance and functionality of the
components and their subcomponents can be modified.
• The components can be configured either during or after creation with method configure.
• In general, subcomponent options are named subcomponent_option.
• A list box provides a list of items from which the user can make a selection. List boxes are imple-
mented with Tkinter class Listbox, which inherits from class Widget.
• Often, it is desirable to allow the user to scroll up and down a list. Scrolling can be achieved by
creating a Tkinter Scrollbar and a Listbox separately and configuring them properly.
Conveniently, Pmw provides a megawidget called ScrolledListBox that serves this purpose.
• Function Pmw.initialise initializes Pmw. This function call allows a list of top-level compo-
nents to be maintained. This call also ensures that Pmw is notified after the destruction of a com-
ponent.
• The items option contains the list of items that will be displayed in a ScrolledListBox.
pythonhtp1_11.fm Page 406 Friday, December 14, 2001 2:03 PM
• The method specified as a value for option selectioncommand executes each time an entry in
a ScrolledListBox is selected.
• Setting the vscrollmode option for a ScrolledListBox to "static" ensures that the
vertscrollbar subcomponent of the ScrolledListBox (a Tkinter Scrollbar) will
always be present. Other possible values are "dynamic" (display the vertscrollbar only if
necessary) and "none" (the vertscrollbar will never be displayed). The default value is
"dynamic".
• Method curselection returns a tuple of the indices of the currently selected items in a
ScrolledListBox.
• The ScrolledListBox component also supports a getcurselection method that returns
a tuple of the currently selected values, rather than the values’ indices.
• By default, the user can select only one option in a ScrolledListbox component.
• A multiple-selection list enables the user to select many items from a ScrolledListbox.
• A ScrolledListbox’s listbox_selectmode option controls how many items a user
may select. Possible values are SINGLE, BROWSE (default), MULTIPLE and EXTENDED. Value
SINGLE allows the user to select only one item in the ScrolledListbox at a time. Value
BROWSE is the same as SINGLE, except that the user also may move the selection by dragging
the mouse, rather than simply clicking an item. Value MULTIPLE allows the user to select multi-
ple options, by clicking on multiple values. Value EXTENDED acts like BROWSE, except that when
the user drags the mouse, the user selects multiple values.
• A multiple-selection list does not have a specific event associated with making multiple selections.
Normally, an external event generated by another GUI component specifies when the multiple se-
lections in a ScrolledListbox should be processed.
• Tkinter Text components provide an area for manipulating multiple lines of text. Pmw defines
a ScrolledText component, which is a scrolled Tkinter Text.
• Sometimes, no event types are bound for a ScrolledText. Instead, an external event indicates
when the text in a ScrolledText should be processed.
• The ScrolledText component’s wrap option controls the appearance of text lines that are too
long to display in the component. Value NONE (default) for wrap means that the component trun-
cates the line and displays only the text that fits in the component. Value CHAR for wrap means
that the text is broken up when it becomes too long; the remainder of the text is displayed on the
next line. Value WORD for wrap is similar to value CHAR, except that the component breaks the
text on word boundaries. This last value enables word-wrapping, a common feature in many pop-
ular text editors.
• Setting a text subcomponent’s state as DISABLED renders the text area uneditable by disabling
calls to insert and delete for the component.
• The ScrolledText component’s method get retrieves the user-entered text. Method get
takes two arguments that specify the range of text to retrieve from the component. Constant
SEL_FIRST specifies the beginning of the selection. Constant SEL_LAST specifies the end of
the selection.
• Method settext deletes the current text in the component and inserts the specified text.
• Menus are an integral part of GUIs. Menus allow the user to perform actions without unnecessarily
“cluttering” a graphical user interface with extra GUI components.
• Simple Tkinter GUIs create menus with Menu components. However, Pmw supplies class
MenuBar, which contains the methods necessary to manage a menu bar, a container for menus.
• A menu item is a GUI component inside a menu that causes an action to be performed when se-
lected. Menu items can be of different forms.
pythonhtp1_11.fm Page 407 Friday, December 14, 2001 2:03 PM
• A command menu item initiates an action. When the user selects a command menu item, the
item’s callback method is invoked.
• A checkbutton menu item can be toggled on or off. When a checkbutton menu item is se-
lected, a check appears to the left of the menu item. When the checkbutton menu item is se-
lected again, the check to the left of the menu item is removed.
• When multiple radiobutton menu items are assigned to the same variable, only one item in
the group can be selected at a given time. When a radiobutton menu item is selected, a check
appears to the left of the menu item. When another radiobutton menu item is selected, the
check to the left of the previously selected menu item is removed.
• A separator menu item is a horizontal line in a menu that groups menu items logically.
• A cascade menu item is a submenu. A submenu (or cascade menu) provides more menu items
from which the user can select.
• When a menu is clicked, the menu expands to show its list of menu items and submenus.
• Clicking a menu item generates an event.
• A balloon (also called a tool-tip) displays helpful text for menus and menu items. When the user
moves the mouse cursor over a menu or menu item with a balloon, the program displays a specified
help message.
• Option balloon specifies a Balloon component that is attached to the menu.
• Method addmenu of Pmw class MenuBar adds a new menu. The method’s first argument con-
tains the name of the menu; the second argument contains the text that appears in the menu’s bal-
loon. When the user places the mouse cursor over the menu, the program displays this text.
• Method addmenuitem of Pmw class MenuBar adds a menu item to a menu. This method re-
quires two arguments: the name of the menu to which the item belongs and the menu item’s type.
• MenuBar method addmenuitem’s keyword argument label specifies the menu item’s text.
Keyword argument command specifies the item’s callback.
• Method addcascademenu of Pmw class MenuBar adds a submenu to an existing menu. The
method requires two arguments: the name of the menu to which the submenu belongs and the sub-
menu’s text.
• Many of today’s computer applications provide context-sensitive popup menus. These menus pro-
vide options that are specific to the component for which the popup trigger event was generated.
• Context menus be created easily with Tkinter class Menu (a subclass of Widget). A popup
trigger event must be specified by binding a callback to the desired trigger for a component.
• The Frame constructor’s bg option takes a string specifying the Frame’s background color.
• Setting a Menu’s tearoff option to 0 removes the dashed separator line that is, by default, the
first entry in a Menu.
• Menu method post displays a Menu at a given position. This method accepts two arguments that
correspond to the position on the top-level component at which the menu is displayed.
• The current mouse position is specified by the x_root and y_root attributes of the Event in-
stance passed to an event handler.
• Canvas is a Tkinter component that displays text, images, lines and shapes. Canvas inherits
from Widget.
• By default, a Canvas is blank. To display items on a Canvas, a program creates canvas items.
New items are drawn on top of existing items unless otherwise specified.
• Adding canvas items to a Canvas displays something on the Canvas. Each canvas item has a
corresponding Canvas method that creates the item and adds it to the canvas.
pythonhtp1_11.fm Page 408 Friday, December 14, 2001 2:03 PM
• Method create_text of class Canvas creates a canvas text item. Canvas method
create_oval creates an oval Canvas item.
• Method itemconfig of class Canvas configures items on Canvas.
• Specifying a value for option fill sets the color of a canvas item.
• The Scale component enables the user to select from a range of integer values. Class Scale in-
herits from Widget. Scales have either a horizontal orientation or a vertical orientation. For a
horizontal Scale, the minimum value is at the extreme left and the maximum value is at the ex-
treme right of the Scale. For a vertical Scale, the minimum value is at the extreme top and the
maximum value is at the extreme bottom of the Scale.
• The Scale constructor’s orient option (HORIZONTAL or VERTICAL) determines whether
the new Scale instance has a horizontal or vertical orientation. Options from_ and to specify
the Scale component’s minimum and maximum values. When the Scale is created, its callback
method is invoked.
• Item handles are integer values that identify a newly drawn item.
• A tag is a name that can be attached to a canvas item when the item is created.
• Canvas method delete deletes a canvas item. Method delete accepts one argument—either
an item handle or a tag.
• To attach a tag to a canvas item, pass a string value to the tags option of the item’s create method.
• Canvas methods create_item allow the user to create the following items by substituting
their names for item: arc, line, oval, rectangle, polygon, image, bitmap, text and window.
• PyGTK provides an object-oriented interface for the GTK component set (www.gtk.org). GTK
is an advanced component set used primarily under the X Windows system (a graphics system pro-
viding a common interface for displaying windowed graphics).
• wxPython is a Python extension module that enables access of wxWindows. wxWindows is a GUI
library written in C++. It currently supports Microsoft Windows and most of the Unix-like systems.
• PyOpenGL provides a Python interface to the OpenGL (www.opengl.org) library—one of the
most widely used libraries designed for developing interactive two-dimensional and three-dimen-
sional graphical applications. It is available under Microsoft Windows, MacOS and most Unix-
like systems. PyOpenGL can be used with Tkinter, wxPython and other windowing libraries.
TERMINOLOGY
addcascademenu method of MenuBar configure method of Pmw
addmenu method of MenuBar component create_oval method of Canvas
addmenuitem method of MenuBar curselection method of
balloon ScrolledListBox
balloon option of MenuBar "none" option of vscrollmode option of
bg option of Frame component ScrolledListBox
BROWSE value of listbox_selectmode "static" option of vscrollmode option of
option of ScrolledListbox ScrolledListBox
Canvas component "dynamic" value of vscrollmode option of
cascade menu ScrolledListBox
cascade menu item EXTENDED value of listbox_selectmode
CHAR option of wrap option of option of ScrolledListbox
ScrolledText external event
checkbutton menu item fill option of Canvas
command menu item font option of Canvas
command option of MenuBar from_ option of Scale
pythonhtp1_11.fm Page 409 Friday, December 14, 2001 2:03 PM
SELF-REVIEW EXERCISES
11.1 Fill in the blanks in each of the following:
a) Tkinter class , which inherits from class , implement list boxes.
b) If the vscrollmode of a vertical ScrolledListBox is set to , the scroll
bar component will never be displayed.
c) A enables the user to select many items from a list box.
d) Set text_wrap to in a ScrolledText widget to enable word wrap.
e) When the user selects a menu item, its callback function is invoked.
f) A displays help text for menu items.
g) Tkinter component displays text, images, lines and shapes.
h) The component enables a user to select from a range of integer values.
i) are integer values identifying an item drawn on a Canvas.
j) An allows selection of a contiguous range of items in the list.
11.2 State whether each of the following is true or false. If false, explain why.
a) Tkinter cannot provide a scrollbar with a list.
b) By default, the scrollbar component of a ScrolledListBox is always displayed.
c) The Pmw component ScrolledText is a scrolled Tkinter Text.
d) Tkinter Menu components contain the methods necessary to manage a menu bar.
e) A cascade menu is a submenu that provides more items from which the user can select.
f) Method addmenuitem adds menus to a menu bar, which can contain menu items.
g) Tkinter class Menu can create context-sensitive popup menus.
h) The minimum and maximum value positions on a Scale can be specified by setting the
from_ and to options.
pythonhtp1_11.fm Page 410 Friday, December 14, 2001 2:03 PM
i) A Scale must be horizontal with the maximum value at the extreme right and the min-
imum value at the extreme left.
j) A radiobutton menu item can be toggled on and off.
EXERCISES
11.3 Modify Exercise 10.4. Allow the user to select a Fahrenheit temperature to be converted with
a horizontal Scale. When the user interacts with the Scale, update the temperature conversion.
11.4 Rewrite the program of Fig. 11.2. Create a multiple-selection list of colors. Allow the user to
select one or more colors and copy them to a ScrolledText component.
11.5 Write a program that allows the user to draw a rectangle by dragging the mouse on a Can-
vas. The drawing should begin when the user holds the left-mouse button down. With this button
held down, the user should be able to resize the rectangle. The drawing ends when the user releases
the left button. When the user next clicks on the Canvas, the rectangle should be deleted.
11.6 Modify Exercise 11.5. Allow the user to fill the rectangle with a color. Create a popup menu
of possible colors. The popup menu should appear when the user presses the right-mouse button.
11.7 Write a menu designer program. The program allows the user to enter menu information and
generates code to create that menu based on the user input. The GUI allows the user to enter the nec-
essary input and displays the menu names in a Pmw ScrolledListBox as they are added. The pro-
gram displays the generated code when the user has finished adding information. The program can
be written with two distinct parts.
a) Class Menus creates a GUI that allows the user to enter a menu name or a menu name,
menu item and callback function. The program should issue a warning with a dialog box if
the user does not enter the specified information. The GUI provides two Entry compo-
nents for the menu name and the menu item and a Pmw ScrolledText component for
the callback function. The program generates code based on the user input that the user
could execute to create the menu. The GUI should have an Add button whose callback
adds the user input to the generated menu code, a Clear button whose callback resets the
GUI and a Finish button that ends the program, displaying the generated menu code.
b) Class MenusList creates a single-selection list of the added menus. As the user adds
menus, the Pmw ScrolledListBox should be updated with the new information.
When the user selects a menu in the list, the menu items in that list are displayed in a Pmw
ScrolledText component.
You may add extra error checking and special features. For instance, when the user selects a
menu name in the Pmw ScrolledListBox display the menu name in the Entry component for
menu names.
11.8 Modify Exercise 11.7 so that, as the user adds menus and menu items a sample menu displays
and is updated with any new information.
pythonhtp1_12.fm Page 411 Friday, December 14, 2001 2:04 PM
12
Exception Handling
Objectives
• To understand exceptions and error handling.
• To use the try statement to delimit code in which
exceptions may occur.
• To be able to raise exceptions.
• To use except clauses to specify exception handlers.
• To use the finally clause to release resources.
• To understand the Python exception class hierarchy.
• To understand Python’s traceback mechanism.
• To create programmer-defined exceptions.
It is common sense to take a method and try it. If it fails,
admit it frankly and try another. But above all, try something.
Franklin Delano Roosevelt
O! throw away the worser part of it,
And live the purer with the other half.
William Shakespeare
If they’re running and they don’t look where they’re going
I have to come out from somewhere and catch them.
Jerome David Salinger
And oftentimes excusing of a fault
Doth make the fault the worse by the excuse.
William Shakespeare
pythonhtp1_12.fm Page 412 Friday, December 14, 2001 2:04 PM
Outline
12.1 Introduction
12.2 Raising an Exception
12.3 Exception-Handling Overview
12.4 Example: DivideByZeroError
12.5 Python Exception Hierarchy
12.6 finally Clause
12.7 Exception Objects and Tracebacks
12.8 Programmer-Defined Exception Classes
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
12.1 Introduction
In this chapter, we introduce exception handling. An exception is an indication of a “special
event” that occurs during a program’s execution. The name “exception” indicates that, al-
though the event can occur, the event occurs infrequently. Often, the special event is an er-
ror (e.g., dividing by zero or adding two incompatible types); sometimes, the special event
is something else (e.g., the termination of a for loop). Exception handling enables pro-
grammers to create applications that can handle (or resolve) exceptions. In many cases,
handling an exception allows a program to continue executing as if no problems were en-
countered. More severe problems may prevent a program from continuing normal execu-
tion. In such cases, the program can notify the user of the problem, then terminate in a
controlled manner. The features presented in this chapter enable programmers to write pro-
grams that are clear, robust and more fault tolerant.
The style and details of exception handling in Python are based on the work of the cre-
ators of the Modula-3 programming language. The exception-handling mechanism is sim-
ilar to that used in C# and Java.
We begin with an overview of exception-handling concepts, then demonstrate basic
exception-handling techniques. The chapter then overviews the exception-handling class
hierarchy.
Programs typically request and release resources (such as files on disk) during program
execution. Often, these resources are in limited supply or can be used only by one program
at a time. We demonstrate a part of the exception-handling mechanism that enables a pro-
gram to use a resource, then guarantee that the program releases the resource for use by
other programs.
The chapter continues with an explanation and example of traceback objects—the
objects that Python creates when it encounters an exception. The chapter concludes with an
example that shows programmers how to create and use their own exception classes.
ment indicates that an exception occurred (e.g., a function could not complete successful-
ly). This is called raising (or sometimes throwing) an exception.
The simplest form of raising an exception consists of the keyword raise, followed by
the name of the exception to raise. Exception names identify classes; Python exceptions are
objects of those classes. When a raise statement executes, Python creates an object of the
specified exception class. The raise statement also may specify arguments that initialize
the exception object. To do so, follow the exception class name with a comma (,) and the
argument (or a tuple of arguments). Programs can use an exception object’s attributes to dis-
cover more information about the exception that occurred. The raise statement has many
forms. Section 12.7 discusses another form of raise that specifies no exception name.
Testing and Debugging Tip 12.1
The arguments used to initialize an exception object can be referenced in an exception han-
dler to perform an appropriate task. 12.1
Until now, we have seen only how a raise statement causes a program to terminate
and print an error message. This chapter demonstrates how a program detects that an excep-
tion occurred (called catching an exception), then, based on that exception, takes appro-
priate action (called handling the exception). Catching and handling exceptions enables a
program to know when an error has occurred, then to take actions to minimize the conse-
quences of that error.
Exception handling enables programmers to remove error-handling code from the “main
line” of the program’s execution. This improves program clarity and enhances modifiability.
Programmers can decide to handle whatever exceptions they choose—all types of exceptions,
all exceptions of a certain type or all exceptions of a related type. Such flexibility reduces the
likelihood that errors will be overlooked, thereby increasing a program’s robustness.
Testing and Debugging Tip 12.3
Exception handling helps improve a program’s fault tolerance. When it is easy to write error-
processing code, programmers are more likely to use it. 12.3
The exception-handling mechanism also is useful for processing problems that occur
when a program interacts with reusable software elements, such as functions, classes and
modules. Rather than internally handling problems that occur, such software elements use
exceptions to notify client code when problems occur. This enables programmers to imple-
ment error handling that is appropriate to each application.
Common Programming Error 12.1
Aborting a program component could leave a resource—such as a file or a network connec-
tion—in a state in which other programs are not able to acquire the resource. This is known
as a “resource leak.” 12.1
Exception handling is geared to situations in which the code that detects an error is
unable to handle it. Such code raises or throws an exception. There is no guarantee that
there will be an exception handler—code that executes when the program detects an excep-
tion—to process that kind of exception. If there is, the exception will be caught (detected)
and handled. The result of an uncaught exception depends on whether the program is a GUI
program or a console (non-GUI) program and on whether the program is running in inter-
active mode. In a non-GUI program, an uncaught exception simply causes the program to
print an error message and terminate. When a GUI program detects an uncaught exception,
the program displays the error message (either in the console or in a dialog box, depending
on the GUI package) and the program continues execution. Although a GUI program con-
tinues execution after an uncaught exception, the program may fail to behave as expected,
because of the error that caused the exception. When a program running in interactive mode
detects an uncaught exception, the program displays an error message, terminates execu-
tion and displays the interactive Python prompt.
Python uses try statements to enable exception handling. The try statement
encloses other statements that potentially cause exceptions. A try statement begins with
keyword try, followed by a colon (:), followed by a suite of code in which exceptions
may occur. The try statement may specify one or more except clauses that immediately
follow the try suite. Each except clause specifies zero or more exception class names
that represent the type(s) of exceptions that the except clause can handle. An except
clause (also called an except handler) also may specify an identifier that the program can
use to reference the exception object that was caught. The handler can use the identifier to
obtain information about the exception from the exception object. An except clause that
specifies no exception type is called an empty except clause. Such a clause catches all
exception types. After the last except clause, an optional else clause contains code that
executes if the code in the try suite raised no exceptions. If a try statement specifies no
except clauses, the statement must contain a finally clause, which always executes,
regardless of whether an exception occurs. We discuss each possible combination of
clauses over the next several sections.
Common Programming Error 12.2
It is a syntax error to write a try statement that contains except and finally clauses.
The only acceptable forms are try/except, try/except/else and try/finally. 12.2
When code in a program causes an exception, or when the Python interpreter detects a
problem, the code or the interpreter raises (or throws) an exception. Some programmers
refer to the point in the program at which an exception occurs as the throw point—an
important location for debugging purposes (as we demonstrate in Section 12.7). Exceptions
are objects of classes that inherit from class Exception.1 If an exception occurs in a try
1. Python exceptions also may be strings, to support programs that require earlier versions of the Py-
thon interpreter. For newer Python versions (greater than 1.5.2), the class-based exception-han-
dling technique is preferred.
pythonhtp1_12.fm Page 416 Friday, December 14, 2001 2:04 PM
suite, the try suite expires (i.e., terminates immediately), and program control transfers to
the first except handler (if there is one) following the try suite. Next, the interpreter
searches for the first except handler that can process the type of exception that occurred.
The interpreter locates the matching except by comparing the raised exception’s type to
each except’s exception type(s) until the interpreter finds a match. A match occurs if the
types are identical or if the raised exception’s type is a derived class of the handler’s excep-
tion type. If no exceptions occur in a try suite, the interpreter ignores the exception han-
dlers for the try statement and executes the try statement’s else clause (if the statement
specifies an else clause). If no exceptions occur, or if one of the except clauses success-
fully handles the exception, program execution resumes with the next statement after the
try statement. If an exception occurs in a statement that is not in a try suite and that state-
ment is in a function, the function containing that statement terminates immediately and the
interpreter attempts to locate an enclosing try statement in a calling code—a process
called stack unwinding (discussed in Section 12.7).
Python is said to use the termination model of exception handling, because the try
suite that raises an exception expires immediately when that exception occurs.2
2. Some languages use the resumption model of exception handling in which, after handling the ex-
ception, control returns to the point at which the exception was raised and execution resumes from
that point.
pythonhtp1_12.fm Page 417 Friday, December 14, 2001 2:04 PM
denominator. The program then attempts to convert the user-entered values to floating-point
values and to divide the numerator by the denominator. Lines 8–11 begin a try statement
enclosing the code that may raise exceptions. Notice that the code in the try suite does not
itself contain any raise statements and therefore may not appear to raise exceptions. In gen-
eral, the statements in a try suite may call other code that possibly raises exceptions; or the
statements in a try suite may raise exceptions if, for example, the code accesses an invalid
sequence subscript, dictionary key or object attribute. In Fig. 12.1, the try suite makes two
calls to function float (that may raise a ValueError exception) and performs one divi-
sion operation (that may raise a ZeroDivisionError exception).
pythonhtp1_12.fm Page 418 Friday, December 14, 2001 2:04 PM
Function float converts the user-entered values to floating-point values (lines 9–10).
This function raises a ValueError exception if it cannot convert its string argument to a
floating-point value. If lines 9–10 properly convert the values (i.e., no exceptions occur),
then line 11 divides the numerator by the denominator and assigns the result to variable
result. If the denominator is zero, line 11 causes the Python interpreter to raise a
ZeroDivisionError exception. If line 11 does not cause an exception, then the try
suite completes its execution. If no exceptions occur in the try suite, the program ignores
the except handlers in lines 14–15 and 18–19 and continues program execution with the
first statement of the else suite (lines 22-23). The else suite contains a single line that
prints the result of division. After the else suite terminates, program execution continues
with the first statement after the entire try statement (i.e., after line 23). In this example,
the program contains no more statements, so program execution terminates.
Common Programming Error 12.3
It is a syntax error to place statements between a try suite and its first except handler,
between except handlers, between the last except handler and the else clause, or be-
tween the try suite and the finally clause. 12.3
Immediately following the try suite are two except clauses (also called except
handlers or exception handlers)—lines 14–15 define the exception handler for a Val-
ueError exception and lines 18–19 define the exception handler for the ZeroDivi-
sionError exception. Each except clause begins with keyword except followed by
an exception name that specifies the type of exception handled by the except clause, fol-
lowed by a colon (:). The exception-handling code appears in the body of the except
clause (i.e., in the indented code suite). In general, when an exception occurs in a try suite,
an except clause catches the exception and handles it. In Fig. 12.1, the first except
clause specifies that it catches ValueError exceptions (raised by function float). The
second except clause specifies that it catches ZeroDivisionError exceptions
(raised by the interpreter). Only the matching except handler executes if an exception
occurs. Both the exception handlers in this example display an error message. When pro-
gram control reaches the last statement of an except handler’s suite, the interpreter con-
siders the exception handled, and program control continues with the first statement after
the entire try statement (the end of the program in this example).
pythonhtp1_12.fm Page 419 Friday, December 14, 2001 2:04 PM
In the second input/output dialog, the user input the string "hello" as the denomi-
nator. When line 10 executes, float cannot convert this string value to a floating-point
value, so float raises a ValueError exception to indicate that the function was unable
to perform the conversion. When an exception occurs, the try suite expires (terminates)
immediately. Next, the interpreter attempts to locate a matching except handler starting
with the except at line 14. The interpreter compares the type of the raised exception
(ValueError) with the type following keyword except (also ValueError). A match
occurs, so that exception handler executes, and the interpreter ignores all other exception
handlers following the corresponding try suite. If a match did not occur, the interpreter
compares the type of the raised exception with the next except handler in sequence and
repeats the process until a match is found.
Software Engineering Observation 12.6
An except clause can specify more than one exception with a comma-separated sequence
of exception names in parentheses, following keyword except. If an except clause spec-
ifies more than one exception, the exceptions should be related in some way (e.g., the excep-
tions all are caused by mathematical errors). Use a separate except clause for each group
of related exceptions. 12.6
In the third input/output dialog of Fig. 12.1, the user input 0 as the denominator. When
line 11 executes, the interpreter raises a ZeroDivisionError exception to indicate an
attempt to divide by zero. Once again, the try suite terminates immediately upon encoun-
tering the exception and the interpreter attempts to locate a matching except handler,
starting from the except handler at line 14. The interpreter compares the type of the raised
exception (ZeroDivisionError) with the type following keyword except (Value-
Error). In this case, there is no match, because ZeroDivisionError and Val-
ueError are not the same exception types and ValueError is not a base class of
ZeroDivisionError. So, the interpreter proceeds to line 18 and compares the type of
the raised exception (ZeroDivisionError) with the type following keyword except
(ZeroDivisionError). A match occurs, so exception handler executes. If there were
additional except handlers, the interpreter would ignore them.
For example, if a Python 2.2 program uses a variable named yield, Python raises a
Warning exception, because future versions of Python, will reserve yield for use as a
keyword. StandardError is the base class for all Python error exceptions (e.g.,
ValueError and ZeroDivisionError).
Figure 12.2 contains the exception hierarchy for Python 2.2. For any version of
Python, the programmer can obtain the exception hierarchy with the statements
import exceptions
print exceptions.__doc__
Many StandardError exceptions can be caught at runtime and handled, so the pro-
gram can continue running. Such exceptions often can be avoided by coding properly. For
example, if a program attempts to access an out-of-range sequence subscript, the interpreter
raises an exception of type IndexError. Similarly, an AttributeError exception
occurs when a program attempts to access a non-existent object attribute.
One of the benefits of the exception class hierarchy is that an except handler can
catch exceptions of a particular type or can use a base-class type to catch exceptions in a
hierarchy of related exception types. For example, Section 12.3 discussed the empty
except handler, which catches exceptions of all types. An except handler that specifies
an exception of type Exception also can catch all exceptions (assuming the raised excep-
tions inherit from class Exception), because Exception is the base class of all excep-
tion classes.
Using inheritance with exceptions enables an exception handler to catch related excep-
tions with a concise notation. An exception handler certainly could catch each derived-class
exception individually, but it is more concise to catch the base-class exception if the han-
dling behavior is the same for all derived classes. Otherwise, catch each derived-class
exception individually.
Common Programming Error 12.4
It is a syntax error to place an empty except clause before the last except clause follow-
ing a particular try suite. 12.4
Determining when Python and standard and third-party components raise exceptions
can be difficult—there is no way for a program to determine whether, for example, a func-
tion may raise a particular exception. The language reference and standard library docu-
mentation3 often specify cases in which exceptions are raised. For example, in Fig. 12.1,
Python exceptions
Exception
SystemExit
StopIteration
StandardError
KeyboardInterrupt
ImportError
EnvironmentError
IOError
OSError
WindowsError (Note: Defined on Windows platforms only)
EOFError
RuntimeError
NotImplementedError
NameError
UnboundLocalError
AttributeError
SyntaxError
IndentationError
TabError
TypeError
AssertionError
LookupError
IndexError
KeyError
ArithmeticError
OverflowError
ZeroDivisionError
FloatingPointError
ValueError
UnicodeError
ReferenceError
SystemError
MemoryError
Warning
UserWarning
DeprecationWarning
SyntaxWarning
OverflowWarning
RuntimeWarning
Most resources that require explicit release have potential exceptions associated with
processing those resources. For example, a program that processes a file might receive
IOError exceptions during the processing. For this reason, file processing code normally
appears in a try suite. Regardless of whether a program successfully processes a file, the
program should close the file when the file is no longer needed.
Suppose a program places all resource-request and resource-release code in a try
suite. If no exceptions occur, the try suite executes normally and releases the resources.
However, if an exception occurs, the try suite expires before the resource-release code can
execute. We could duplicate all resource-release code in the except handlers, but this
makes the code more difficult to modify and maintain.
Python’s exception handling mechanism provides the finally clause, which is guar-
anteed to execute if program control enters the corresponding try suite, regardless of
whether that try suite executes successfully or an exception occurs. This guarantee makes
the finally suite an ideal location to place resource-deallocation code for resources
acquired and manipulated in the corresponding try suite. If the try suite executes suc-
pythonhtp1_12.fm Page 423 Friday, December 14, 2001 2:04 PM
cessfully, the finally suite executes immediately after the try suite terminates. If an
exception occurs in the try suite, the finally suite executes immediately after the line
that caused the exception. The exception is then processed by the next enclosing try state-
ment (if there is one).
Testing and Debugging Tip 12.7
A finally suite typically contains code to release resources acquired in the corresponding
try suite, making the finally suite an effective way to eliminate resource leaks. 12.7
Figure 12.3 demonstrates that the finally clause always executes, regardless of
whether an exception occurs in the corresponding try suite. The program consists of two
functions to demonstrate finally—doNotRaiseException (lines 4–14) and rai-
seExceptionDoNotCatch (lines 16–27). The main program calls these functions to
demonstrate when finally clauses execute.
Calling doNotRaiseException
In doNotRaiseException
Finally executed in doNotRaiseException
End of doNotRaiseException
Calling raiseExceptionDoNotCatch
In raiseExceptionDoNotCatch
Finally executed in raiseExceptionDoNotCatch
Caught exception from raiseExceptionDoNotCatch in main program.
Lines 40–41 of the main program begin a try statement that invokes function raise-
ExceptionDoNotCatch (lines 16–27). The try statement enables the main program to
pythonhtp1_12.fm Page 425 Friday, December 14, 2001 2:04 PM
Note that the point at which program control continues after the finally clause exe-
cutes depends on the exception-handling state. If the try suite successfully completes, the
finally suite executes and control continues with the next statement after the finally
suite. If the try suite raises an exception, the finally suite executes then program con-
trol continues in the next enclosing try statement. The enclosing try may be in the
calling function or one of its callers. It also is possible to nest a try/except form in a
try suite, in which case the outer try statement’s exception handlers would process any
exceptions the were not caught in the inner try statement.
each function, Python inserts the function name at the beginning of the function call stack.
When an exception is raised, Python begins searching for an exception handler. If no excep-
tion handler exists in the current function, the current function terminates execution, and
Python searches the current function’s calling function, and so on, until either an exception
handler is found or Python reaches the main program. This process of searching for an
appropriate exception handler is called stack unwinding. Just as the interpreter maintains
information about functions that are placed on the stack, the interpreter maintains informa-
tion about functions that have been unwound from the stack.
Testing and Debugging Tip 12.10
A traceback shows the complete function call stack from the time at which an exception oc-
curred. This lets the programmer view the series of function calls that led to the exception.
Information in the traceback includes names of unwound functions, names of the files in
which the functions are defined and line numbers that indicate where the program encoun-
tered an error. The last line number in the traceback indicates the throw point (i.e., the loca-
tion where the original exception was raised). Previous line numbers indicate the locations
from which each function in the traceback was called. 12.10
Our next example (Fig. 12.4) demonstrates exception object’s args attribute and
exception object string representation. The example also demonstrates how to access tra-
ceback objects to print information about stack unwinding. As we discuss this example,
we keep track of the functions on the call stack so we can discuss the traceback object
and the stack-unwinding mechanism.
Traceback:
Traceback (most recent call last):
File "fig12_04.py", line 24, in ?
function1()
File "fig12_04.py", line 7, in function1
function2()
File "fig12_04.py", line 10, in function2
function3()
File "fig12_04.py", line 16, in function3
raise Exception, "An exception has occurred"
Exception: An exception has occurred
The interpreter begins executing the program with line 1. This is technically the first
line in the main program. The main program is the first entry in the function call stack,
because it is the entity that invokes all other functions. Line 24 of the try suite in the main
program invokes function1 (defined in lines 6–7), which becomes the second entry on
the stack. If function1 raises an exception, the except handler in lines 28–33 catch
the exception and output information about the exception that occurred. Line 7 of
function1 invokes function2 (defined in lines 9–10), which becomes the third entry
on the stack. Then, line 10 of function2 invokes function3 (defined in lines 12–19)
which becomes the fourth entry on the stack.
At this point, the call stack for the program is
function3 (top)
function2
function1
Main Program
with the last function called (function3) at the top and the main program at the bottom.
Line 16 in function3 raises an Exception and passes the string "An exception
has occurred" as an argument. In response to the raise statement, Python creates an
Exception object, with the specified argument. The except clause in lines 17–19
catches the exception and first prints a message. Line 19 uses an empty raise statement
pythonhtp1_12.fm Page 428 Friday, December 14, 2001 2:04 PM
to reraise the exception. Usually, reraising an exception indicates that the except handler
performed partial processing of the exception and is now passing the exception back to the
caller (in this case function2) for further processing. In this example, the function3
demonstrates that keyword raise, with no specified exception name, reraises the most re-
cently raised exception.
Software Engineering Observation 12.10
If a function is capable of handling a given type of exception, then let that function handle it,
rather than passing the exception to another region of the program. 12.10
Next, function3 terminates because the reraised exception is not caught in the
function body. Thus, control will return to the statement that invoked function3 in the
prior function in the call stack (function2). This removes or unwinds function3 from
the function call stack (thus, terminating the function) and Python maintains information
about the function call in a traceback object.
When control returns to line 10 in function2, the interpreter ascertains that line 10
is not in a try suite. Therefore, the exception cannot be caught in function2, and
function2 terminates. This unwinds function2 from the function call stack, creates
another traceback object (to represent the current level of unwinding) and returns con-
trol to line 7 in function1. Here again, line 7 is not in a try suite, so the exception
cannot be caught in function1. The function terminates and unwinds from the call stack,
creating another traceback object and returning control to line 24 in the main program,
which is in a try suite. The try suite in the main program expires and the except han-
dler in lines (28–33) catches the exception.
Notice that the except clause in line 28 differs from the except clauses presented
thus far. When Python encounters an except clause in which except is followed by an
exception type (or tuple of exception types), a comma, and an identifier, Python binds the
identifier to the matching exception object. Now, the except handler can use the identi-
fier to obtain information about the specific exception that occurred. The except suite in
lines 29–33 prints the exception object’s args attribute (line 30). Then, the handler prints
the string representation of the exception. Python’s string representation of an exception
object depends on the value of its args attribute. If the args attribute is an empty tuple,
Python represents the exception as the empty string. If an exception objects’s args tuple
contains only one value, Python’s represents the exception as the string representation of
that value. If an exception object’s args tuple contains multiple items, Python represents
the exception as the string representation of the args tuple. In this example, the exception
object’s arg attribute contains only one value, so Python represents the exception as that
value (i.e., the string "An exception has occurred").
Line 33 of the except handler calls function traceback.print_exc to print the
traceback. Module traceback contains many functions for manipulating the trace-
back objects that Python creates during stack unwinding. Recall that stack unwinding con-
tinues until either an except handler catches the exception or the program terminates.
Function print_exc, when called with no arguments, prints all the traceback objects
accumulated thus far in the stack-unwinding process. This output is identical to the output
Python produces when the interpreter encounters an uncaught exception. Let us examine
the output from function print_exc. The first line
is the standard traceback line that Python prints when an error occurs. This line indicates
that the most recent call (i.e., the call at the top of the call stack when the exception oc-
curred) appears last in the traceback output. The next two lines in the traceback output con-
tain information about the first call on the function call stack (i.e., the call to function1
from the main program). The information includes the file in which the call occurred
(fig12_04.py), the line number of the file that called the function (24) and the calling
entity from which the function was invoked (?, which corresponds to the main program).
The subsequent pairs of lines in the traceback output each correspond to a call on the func-
tion call stack. The second-to-last line contains the code that caused the exception (i.e., the
code from line 16 in function3 that contains the raise statement). This demonstrates
the fact that the empty raise statement in line 19 simply reraises the exception from line
16. The final line of the output contains a string representation of the exception type and its
argument. Note that traceback output contains information about the call stack from the
point at which the exception occurred to the point at which the exception is caught (or the
point at which the program terminates, if the exception is not caught).
Testing and Debugging Tip 12.11
When reading a traceback, start from the end of the traceback and read the error message
first. Then, read up the remainder of the traceback, looking for the first line that indicates
code that you wrote in your program. Normally, this is the location that caused the exception.
12.11
exceptions most likely occur during arithmetic, so it seems logical to derive class Negativ-
eNumberError from class ArithmeticError. Creating simple, programmer-defined
exceptions in Python is easy, because the new exception class inherits all its functionality
pythonhtp1_12.fm Page 431 Friday, December 14, 2001 2:04 PM
from the base-class exception. Therefore, the body of the class contains only the keyword
pass—the keyword that indicates a suite or block performs no work.
The remainder of the program (lines 10–37) demonstrates our programmer-defined
exception class. The program enables the user to input a numeric value, then invokes function
squareRoot (lines 10–18) to calculate the square root of that value. For this purpose,
squareRoot invokes function math.sqrt, which wants a nonnegative value as its argu-
ment. If math.sqrt receives a negative value, the function raises a ValueError excep-
tion with the argument "math domain error". In this program, we essentially write our
own square root function that uses a programmer-defined exception to prevent the user from
calculating the square root of a negative number. If the numeric value received from the user
is negative, function squareRoot raises a NegativeNumberError (lines 14–16). Oth-
erwise, squareRoot invokes function math.sqrt to compute the square root.
In the main program, a while loop (lines 20–37) continues executing until the user
enters a nonnegative value. The try suite (lines 24–25) attempts to obtain a numerical
value from the user and to pass that value to function squareRoot. When the user inputs
a value and presses Enter, the program passes the user-entered value to function float. If
the value is not a number, function float raises a ValueError exception, and the
except handler in lines 28–29 prints an error message. Control then returns to the beginning
of the while loop. If the user inputs a negative number, function squareRoot raises a
NegativeNumberError. The except handler in lines 32–33 simply prints the exception
object before control returns to the beginning of the while loop. If the user enters a valid,
nonnegative number, line 25 prints the square root of the number before program control
proceeds to the else clause in lines 36–37. The else suite contains only the keyword
break, which terminates the while loop.
In this chapter, we demonstrated how the exception-handling mechanism works and
discussed how to make applications more robust by writing exception handlers to process
potential problems. When developing new applications, it is important to investigate poten-
tial exceptions raised by the functions your program invokes or by the interpreter, then
implement appropriate exception-handling code to make those applications more robust. In
Chapter 13, String Manipulation and Regular Expressions, we begin a discussing a series
of techniques for developing substantial software. These techniques, when combined with
disciplined exception handling, enable Python programmers to create viable, valuable soft-
ware components.
SUMMARY
• An exception is an indication of a “special event” that occurs during a program’s execution. Often
the special event is an error (e.g., dividing by zero or adding two incompatible types). Sometimes
the special event is something else (e.g., the termination of a for loop).
• Exception handling enables programmers to write clear, robust, more fault-tolerant programs that
can resolve (or handle) exceptions.
• The style and details of exception handling in Python are based on the Modula-3 language. This
exception-handling mechanism is similar to that used in C# and Java.
• The raise statement executes to indicate that an exception has occurred. This is called raising
(or sometimes throwing) an exception.
• The simplest raise statement consists of the keyword raise, followed by the name of the ex-
ception to be raised.
pythonhtp1_12.fm Page 432 Friday, December 14, 2001 2:04 PM
• Exception names specify classes and Python exceptions are objects of those classes. When the
raise statement executes, Python creates an object of the specified exception class.
• The raise statement may specify an argument or arguments that initialize the exception object.
In this case, a comma follows the exception name, and the argument or a tuple of arguments fol-
lows the comma.
• Exception handling enables the programmer to remove error-handling code from the “main line”
of the program’s execution. This improves program clarity and enhances modifiability.
• Programmers can decide to handle whatever exceptions they choose—all types of exceptions, all
exceptions of a certain type or all exceptions of related types.
• The exception-handling mechanism is useful for processing problems that occur when a program
interacts with reusable software components. Rather than internally handling problems that occur,
such components use exceptions to notify client code of problems. This enables programmers to
implement error handling that is appropriate to each application.
• Exception handling is geared to situations in which the code that detects an error is unable to han-
dle it. Such code raises or throws an exception.
• Python uses try statements to enable exception handling. The try statement encloses statements
that potentially cause exceptions. A try statement consists of keyword try, followed by a colon
(:), followed by a suite of code in which exceptions may occur, followed by one or more clauses.
• Immediately following the try suite may be one or more except clauses (also called except
handlers). Each except clause specifies zero or more exception names that represent the type(s)
of exceptions the except clause can handle.
• The except clause also may specify an identifier for the exception that was raised, and the han-
dler can use the exception object to obtain information about that exception.
• An except clause that specifies no exception type is an empty except clause, which catches
all exception types. It is a syntax error to place an empty except clause before any other except
clauses in a particular try statement.
• After the last except clause, an optional else clause contains code that executes if the code in
the try suite raised no exceptions.
• A try suite can be followed by zero except clauses; in that case, it must be followed by a fi-
nally clause. The code in the finally suite always executes, regardless of whether an excep-
tion occurs.
• Programmers sometimes refer to the point in the program at which an exception occurs as the
throw point.
• Exceptions are objects of classes that inherit from class Exception.
• If an exception occurs in a try suite, the try suite expires and program control transfers to the
first matching except handler (if there is one) following the try suite. A match occurs if the types
are identical or if the raised exception’s type is a derived class of the handler’s exception type.
• If no exceptions occur in a try suite, the interpreter ignores the exception handlers for that try
statement.
• If an exception occurs in a statement that is not in a try suite and that statement is in a function,
the function containing that statement terminates immediately and the interpreter attempts to lo-
cate an enclosing try statement in a calling function—a process called stack unwinding.
• Python is said to use the termination model of exception handling, because the try statement en-
closing a raised exception expires immediately when that exception occurs.
• Function float raises a ValueError exception if the function cannot convert its argument val-
ue to a floating-point value.
pythonhtp1_12.fm Page 433 Friday, December 14, 2001 2:04 PM
• The Python interpreter automatically tests for division by zero and raises a ZeroDivision-
Error exception if the denominator is zero.
• As good programming practice, an except handler always should specify the name of the excep-
tion to catch. An empty except handler should be used only for a default catch-all case.
• The preferred exception-handling mechanism is to allow objects of class Exception and its de-
rived classes to be raised and caught.
• An except handler can catch exceptions of a particular type or can use a base-class type to catch
exceptions in a hierarchy of related exception types.
• A third-party component intended for distribution and use in software development also should
include documentation that indicates which exceptions are raised by the component.
• Programs frequently request and release resources dynamically. Programs that obtain certain types
of resources (such as files) sometimes must return those resources explicitly to the system to avoid
resource leaks. Most resources that require explicit release have potential exceptions associated
with processing those resources.
• The finally clause that guaranteed to execute if program control enters the corresponding try
suite. The finally clause is an ideal location to place resource deallocation code for resources
acquired and manipulated in the corresponding try suite.
• Objects of exception data types can be created with zero or more arguments. These arguments fre-
quently are used to formulate error messages for a raised exception.
• When Python creates an exception object in a raise statement, Python places any arguments
from the raise statement in the exception object’s args attribute.
• When an exception occurs, Python remembers the exception that was raised and the current state
of the program. Python also maintains traceback objects that contains information about the
function call stack from the time the exception occurred.
• Python maintains information about functions that have been unwound from the stack with tra-
ceback objects.
• An empty raise statement reraises the most recently raised exception.
• When Python encounters an except clause in which except is followed by an exception type
(or tuple of exception types), a comma and an identifier, Python binds the identifier to the excep-
tion object that the except handler catches.
• If an exception object’s args attribute is an empty tuple, the exception’s string representation is
the empty string.
• If an exception objects’s args tuple contains only one value, the exception’s string representation
is the string representation of that value.
• If an exception object’s args tuple contains multiple items, the exception’s string representation
is the string representation of the args tuple.
• Module traceback contains many functions for manipulating the traceback objects that Py-
thon creates during stack unwinding.
• Function traceback.print_exc, when called with no arguments, prints all the traceback
objects accumulated thus far in the stack-unwinding process.
• A Python traceback object stores information about a function call, including the file name,
line numbers and the code that caused an error.
• Programmer-defined exception classes should derive directly or indirectly from class Exception.
• If a programmer-defined exception requires no extra functionality, the programmer can create the
exception merely by inheriting from an existing exception class and placing keyword pass in the
body of the class.
pythonhtp1_12.fm Page 434 Friday, December 14, 2001 2:04 PM
TERMINOLOGY
args attribute of exception object out-of-range sequence subscript
automatic garbage collection print_exc function of module traceback
call stack raise an exception
catch related errors raise statement
divide by zero release a resource
eliminate resource leaks reraise an exception
empty except clause resource leak
empty raise clause resumption model of exception handling
error-processing code sqrt function of module math
except handler stack unwinding
except clause StandardError exception
except suite expires StopIteration exception
exception SystemExit exception
Exception class termination model of exception handling
exception handler throw an exception
fault-tolerant program throw point
finally clause traceback module
FormatException class traceback object
function call stack try statement
garbage collection try/except form
IndexError exception try/except/else form of a try statement
inheritance with exceptions try/finally form of a try statement
memory exhaustion programmer-defined exception class
memory leak Warning exception
Modula-3 ZeroDivisionError exception
SELF-REVIEW EXERCISES
12.1 Fill in the blanks in each of the following statements:
a) Python uses exception handling to determine when a loop terminates.
b) A function is said to an exception when it detects that a problem occurred.
c) The clause, if it appears after a try suite, always executes.
d) Most basic Python exceptions derive from class .
e) The statement that raises an exception is sometimes called the of the exception.
f) A statement encloses code that may raise an exception.
g) If the catch-all exception handler is specified before another exception handler, a
may occur.
h) An uncaught exception in a function causes that function to be from the
function call stack.
i) Function float can raise a(n) exception if its argument cannot be convert-
ed to a floating-point value.
j) Python maintains information about the functions unwound from the stack in
objects.
12.2 State whether each of the following is true or false. If false, explain why.
a) Exceptions always are handled in the function that initially detects the exception.
b) Accessing a nonexistent object attribute causes an AttributeError exception.
c) Accessing an out-of-bounds sequence subscript causes the interpreter to raise an exception.
d) A try statement must contain one or more clauses.
e) If a finally clause appears in a function, that finally clause is guaranteed to execute.
pythonhtp1_12.fm Page 435 Friday, December 14, 2001 2:04 PM
f) In Python, it is possible to return to the throw point of an exception via keyword return.
g) Exceptions can be reraised.
h) Function math.sqrt raises a NegativeNumberError exception if called with a
negative-integer argument.
i) Exception object attribute args contains a string that corresponds to the exception’s er-
ror message.
j) Exceptions can be raised only by functions explicitly called in try statements.
EXERCISES
12.3 Use inheritance to create an exception base class and various exception-derived classes.
Write a program to demonstrate that the except clause specifying the base class catches derived-
class exceptions.
12.4 Write a Python program that demonstrates how various exceptions are caught with
12.5 Write a Python program that shows the importance of the order of exception handlers. Write
two programs, one with the correct order of except handlers and another with an order that causes
a logic error. If you attempt to catch a base-class exception type before a derived-class type, the pro-
gram may produce a logic error.
12.6 Exceptions can be used to indicate problems that occur when an object is being constructed.
Write a Python program that shows a constructor passing information about constructor failure to an
exception handler that occurs after a try statement. The exception raised also should contain the ar-
guments sent to the constructor.
12.7 Write a Python program that illustrates reraising an exception.
12.8 Write a Python program that shows that a function with its own try statement does not have
to catch every possible exception that occurs within the try suite. Some exceptions can slip through
to, and be handled in, other scopes.
pythonhtp1_13.fm Page 436 Friday, December 14, 2001 2:07 PM
13
String Manipulation and
Regular Expressions
Objectives
• To understand text processing in Python.
• To use Python’s string data-type methods.
• To manipulate and search string contents.
• To understand and create regular expressions.
• To use regular expressions to match patterns in
strings.
• To use metacharacters, special sequences and
grouping to create complex regular expressions.
The chief defect of Henry King
Was chewing little bits of string.
Hilaire Belloc
Vigorous writing is concise. A sentence should contain no
unnecessary words, a paragraph no unnecessary sentences.
William Strunk, Jr.
I have made this letter longer than usual, because I lack the
time to make it short.
Blaise Pascal
The difference between the almost-right word & the right
word is really a large matter—it’s the difference between the
lightning bug and the lightning.
Mark Twain
Mum’s the word.
Miguel de Cervantes, Don Quixote de la Mancha
pythonhtp1_13.fm Page 437 Friday, December 14, 2001 2:07 PM
Outline
13.1 Introduction
13.2 Fundamentals of Characters and Strings
13.3 String Presentation
13.4 Searching Strings
13.5 Joining and Splitting Strings
13.6 Regular Expressions
13.7 Compiling Regular Expressions and Manipulating Regular
Expression Objects
13.8 Regular Expression Repetition and Placement Characters
13.9 Classes and Special Sequences
13.10 Regular Expression String-Manipulation Functions
13.11 Grouping
13.12 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
13.1 Introduction
This chapter introduces Python’s string and character processing capabilities and demon-
strates using regular expressions to search for patterns in text. The techniques presented in
this chapter can be employed to develop text editors, word processors, page-layout soft-
ware, computerized typesetting systems and other text-processing software. Previous chap-
ters presented several string-processing capabilities. In this chapter, we expand on this
information by detailing the capabilities of various methods of the basic string data type and
the powerful text-processing capabilities provided in the Python module re.
how to format strings with format operator %. Strings also support methods that perform
various other formatting and processing capabilities. The table in Fig. 13.2 lists the string
methods. When a program invokes a string method that appears to modify the string, the
method actually returns its results as a new string. In the table, the “original string” refers
to the string on which a method is invoked. We discuss many of these methods in the fol-
lowing sections.
find( substring[, start[, end]] ) Returns the lowest index at which substring
occurs in the string; returns –1 if the string does
not contain substring. If argument start is speci-
fied, searching begins at that index. If argument
end is specified, the method searches through the
slice start:end.
index( substring[, start[, end]] ) Performs the same operation as find, but raises a
ValueError exception if the string does not
contain substring.
isalnum() Returns 1 if the string contains only alphanumeric
characters (i.e., numbers and letters); otherwise,
returns 0.
isalpha() Returns 1 if the string contains only alphabetic
characters (i.e., letters); returns 0 otherwise.
isdigit() Returns 1 if the string contains only numerical
characters (e.g., "0", "1", "2"); otherwise,
returns 0.
islower() Returns 1 if all alphabetic characters in the string
are lower-case characters (e.g., "a", "b", "c");
otherwise, returns 0.
isspace() Returns 1 if the string contains only whitespace
characters; otherwise, returns 0.
istitle() Returns 1 if the first character of each word in the
string is the only uppercase character in the word;
otherwise, returns 0.
isupper() Returns 1 if all alphabetic characters in the string
are uppercase characters (e.g., "A", "B", "C");
otherwise, returns 0.
join( sequence ) Returns a string that concatenates the strings in
sequence using the original string as the separator
between concatenated strings.
ljust( width ) Returns a new string left-aligned in a whitespace
string of width characters.
lower() Returns a new string in which all characters in the
original string are lowercase.
lstrip() Returns a new string in which all leading
whitespace is removed.
replace( old, new[, maximum ] ) Returns a new string in which all occurrences of
old in the original string are replaced with new.
Optional argument maximum indicates the maxi-
mum number of replacements to perform.
rfind( substring[, start[, end]] ) Returns the highest index value in which substring
occurs in the string or –1 if the string does not con-
tain substring. If argument start is specified,
searching begins at that index. If argument end is
specified, the method searches the slice start:end.
rindex( substring[, start[, end]] ) Performs the same operation as rfind, but raises
a ValueError exception if the string does not
contain substring.
rjust( width ) Returns a new string right-aligned in a string of
width characters.
rstrip() Returns a new string in which all trailing
whitespace is removed.
split( [separator] ) Returns a list of substrings created by splitting the
original string at each separator. If optional argu-
ment separator is omitted or None, the string is
separated by any sequence of whitespace, effec-
tively returning a list of words.
splitlines( [keepbreaks] ) Returns a list of substrings created by splitting the
original string at each newline character. If
optional argument keepbreaks is 1, the substrings
in the returned list retain the newline character.
startswith( substring[, start[, end]] ) Returns 1 if the string starts with substring; other-
wise, returns 0. If argument start is specified,
searching begins at that index. If argument end is
specified, the method searches through the slice
start:end.
strip() Returns a new string in which all leading and trail-
ing whitespace is removed.
swapcase() Returns a new string in which uppercase charac-
ters are converted to lowercase characters and
lower-case characters are converted to uppercase
characters.
title() Returns a new string in which the first character of
each word in the string is the only uppercase char-
acter in the word.
translate( table[, delete ] ) Translates the original string to a new string. The
translation is performed by first deleting any char-
acters in optional argument delete, then by replac-
ing each character c in the original string with the
value table[ ord( c ) ].
upper() Returns a new string where all characters in the
original string are uppercase.
Now I am here.
Now I am here.
Now I am here.
28 if string2.endswith( "even" ):
29 print '"%s" ends with "even"\n' % string2
30
31 # searching from end of string
32 print 'Index from end of "test" in "%s" is %d' \
33 % ( string1, string1.rfind( "test" ) )
34 print
35
36 # find rindex of "Test"
37 try:
38 print 'First occurrence of "Test" from end at index', \
39 string1.rindex( "Test" )
40 except ValueError:
41 print '"Test" does not occur in "%s"' % string1
42
43 print
44
45 # replacing a substring
46 string3 = "One, one, one, one, one, one"
47
48 print "Original:", string3
49 print 'Replaced "one" with "two":', \
50 string3.replace( "one", "two" )
51 print "Replaced 3 maximum:", string3.replace( "one", "two", 3 )
Lines 5–11 use string method count to return the number of occurrences of a sub-
string in a string or a string slice. If the method does not find the specified substring, the
method returns 0. Line 8 prints the number of times the substring "test" occurs in
string1. Method count takes two optional arguments that specify a slice of the string
to search. Line 10 passes arguments to count that cause the method to search string1
starting at index 18 (i.e., character "3") and terminating at the end of the string. This call
produces the same result as the statement
string1[ 18:len( string1 ) ].count( "test" )
pythonhtp1_13.fm Page 444 Friday, December 14, 2001 2:07 PM
but the method call with optional arguments has the added benefit of better readability and
better performance, because the program does not create a new slice.
Lines 14–29 demonstrate string that search for substrings. Line 17 uses method find
to return the lowest index at which the substring occurs. If a string does not contain the sub-
string, the method returns –1. Method index (line 21) resembles method find, except
that if a string does not contain the substring, the method raises a ValueError exception.
A program can catch this exception and handle it appropriately, in the case that the string
does not contain the specified substring.
Lines 25–29 use methods that determine whether a string begins or ends with a specific
substring. If the string begins with the substring, method startswith returns 1 (line 25).
This call produces the same result as the expression
string2[ 0:len( "Odd" ) ] == "Odd"
If a string ends with the substring, method endswith returns 1 (line 28). Using this meth-
od produces the same result as the expression
string2[ -len( "even" ): ] == "even"
The program can search for a substring starting from the end of a string. Lines 32–43
use methods rfind and rindex to determine whether string1 contains certain sub-
strings. Method rfind returns the index of the first occurrence of the substring searching
from the end of the string. If the method does not find the substring, it returns –1. Method
rindex returns the highest index at which the substring begins and raises a ValueError
if the method does not find the substring. Our program catches the exception to handle the
case where the string does not contain the specified substring.
At times, a user may want to find substring to perform an action on that substring. For
example, a user may perform a search for a current phrase in a document and replace that
phrase with another phrase. Method replace takes two substrings and searches a docu-
ment for the first substring then replaces that substring with the substring in the second
argument. Line 50 replaces all occurrences of the substring "one" in string3 with the
substring "two". Method replace takes an optional third argument that sets the max-
imum number of replacements. Line 51 replaces up to three occurrences of substring
"one" with substring "two".
String is: A, B, C, D, E, F
Split string by spaces: ['A,', 'B,', 'C,', 'D,', 'E,', 'F']
Split string by commas: ['A', ' B', ' C', ' D', ' E', ' F']
Split string by commas, max 2: ['A', ' B', ' C, D, E, F']
the program displays the tokens. In line 9, method split receives the argument ",",
which represents the delimiter (the string is split at each occurrence of a comma). In line
10, method split receives two arguments—the delimiter and an integer value that spec-
ifies the maximum number of splits to perform.
Given a list of tokens, method join combines the list with a pre-defined delimiter.
Line 14 creates a list of letter tokens and line 15 creates a delimiter string2 that contains
three underscore ("_") characters. Lines 18–19 show the results of calling string2’s
join method. The method receives the list of tokens as an argument and returns a string
where the tokens are joined by the underscore delimiter in string2. Line 20 demonstrates
combining the print method with a call to a string’s join method.
Performance Tip 13.1
When building a complex string, it is more efficient to include the pieces in a list and then
use method join to assemble the string, rather than using the concatenation (+) operator. 13.1
"Tuesday", "Wednesday", etc.), the program can invoke string method find for each
substring (i.e., the program needs to invoke method find seven times to search for every
day of the week). Depending on the search, a program may need to invoke method find
numerous time, an inefficient way to solve a problem. Regular expressions provide a more
efficient and powerful alternative. A regular expression is a text pattern that a program uses
to find substrings that match patterns. In the remainder of this chapter, we discuss Python’s
various regular-expression capabilities.
Good Programming Practice 13.1
Use string methods where only simple processing is required. This prevents errors caused by
the more complex regular expressions and increases program readability. 13.1
We begin our discussion with a simple example (Fig. 13.7) in which we search various
welcoming phrases for "hello", "Hello" and "world!".
Line 4 imports the regular-expression module re, which provides regular-expression
processing capabilities in Python. List testStrings (line 7) contains the strings that are
searched with the regular expressions created in line 8. Note that the regular expressions
closely resemble the strings.
The remainder of the program consists of a nested for loop that tests each regular
expression in list expressions against each string in list testStrings. Function
re.search looks for the first occurrence of a regular expression in a string and returns
an object that contains the substring matching the regular expression. If the string does not
contain the pattern, re.search returns None. The program determines whether the func-
tion call returns a value, then prints an appropriate message. We discuss how to use the
object returned by re.search in the next section.
Each regular expression in this example is a substring of one of the test strings. In fact,
line 15 could be replaced with the expression
if string.find( expression ) >= 0:
and the program would produce the same result. In the remaining sections, we explore how
to create more powerful regular-expression pattern strings.
Fig. 13.9 Searching and matching strings with repetition metacharacters. (Part 1 of 2.)
pythonhtp1_13.fm Page 449 Friday, December 14, 2001 2:07 PM
Fig. 13.9 Searching and matching strings with repetition metacharacters. (Part 2 of 2.)
pythonhtp1_13.fm Page 450 Friday, December 14, 2001 2:07 PM
Fig. 13.11 Regular expressions with classes and special sequences. (Part 1 of 2.)
pythonhtp1_13.fm Page 452 Friday, December 14, 2001 2:07 PM
Fig. 13.11 Regular expressions with classes and special sequences. (Part 2 of 2.)
Notice that \ the escape metacharacter, precedes the character + in the regular expres-
sion at line 8. This matches the character +, rather than using the repetition metacharacter
+. If the + was not escaped, the regular expression would match one or more x characters,
followed by the numeric character 5 (as shown in Fig. 13.12).
Note also that the regular expression in line 8 is a raw string—i.e., a string created by
preceding the string with the character r. Usually, when a \ appears in a string, Python
interprets this character as an escape character and attempts to replace the \ and the char-
acter that follows with the correct escape sequence. When a \ appears within a raw string,
Python does not interpret the character as the escape character, but instead interprets the
character as the literal backslash character. For example, Python interprets the string "\n"
as one newline character, but it interprets the string r"\n" as two characters—a backslash
and the character n.
Common Programming Error 13.1
Placing a backslash at the end of a raw string results in a syntax error. 13.1
Lines 9–10 create two additional regular expressions. The metacharacter . matches
any character in a string except for a newline. The regular expression in line 9 matches a
digit, followed by an alphanumeric character, followed by any character except a newline,
followed by a digit, followed by the letter y or the letter z. The regular expression in line
10 uses special sequences to create a similar regular expression. This expression matches a
digit, followed by an alphanumeric character, followed by the character -, followed by a
digit, followed by an alphanumeric character. Lines 13–18 contain a nested for loop that
attempts to match each expression from lines 8–10 to each string in line 7.
The remainder of the program creates more complex regular expressions. The meta-
characters { and } provide another way to repeat characters. The expression in line 25
matches three digits (as specified between curly brackets), followed by the character -,
three digits, another - and four digits. By placing the regular expression between metachar-
acters ^ and $, we specify that we want the regular expression to match the entire string.
We can also use the bracket metacharacters to specify a range of repetitions. For example,
the expression "\d{1,3}" matches one, two or three digits.
Line 26 creates a regular expression that matches one or more alphanumeric charac-
ters, followed by a colon (:), followed by one or more whitespace characters, followed by
one or more alphanumeric characters, followed by the @ character, followed by one or more
alphanumeric characters, followed by the . character (notice the backslash to escape this
regular expression character), followed by the sequence of characters com, org or net.
The remainder of the program attempts to match the regular expressions to the test strings.
23
24 print formatString % ( 'Replace first 3 digits by "digit"',
25 re.sub( r"\d", "digit", testString2, 3 ) )
26
27 # regular expression splitting
28 print formatString % ( "Splitting " + testString2,
29 re.split( r",", testString2 ) )
30
31 print formatString % ( "Splitting " + testString3,
32 re.split( r"[+\-*/%]", testString3 ) )
Function re.sub (line 14) takes three arguments. The second argument is a substring
that is substituted for every substring in the third argument that matches the pattern
described by the first argument. Line 14 substitutes the caret character (^) for the asterisk
character (*) in string testString1. To replace the asterisk character, the method must
use the regular expression "\*", because * is a metacharacter. Lines 21–22 replace every
word ("\w+") by the substring "word". Lines 24–25 use the function’s optional fourth
argument to specify a maximum number of replacements to perform.
Function re.split takes two arguments. The first argument is a regular expression
that describes a pattern delimiter. The function returns a list of tokens created by splitting
the second argument at the delimiter. Lines 28–29 print the results of splitting variable
testString2 on commas (,). Line 32 calls re.split, passing a delimiter pattern that
matches one of five mathematical operators. Notice that this regular expression defines a
class and escapes the - character, but not the * character. This demonstrates a subtle regular
expression feature. When any character—except ^ (for negation) or - (for a range)—
appears inside a class that character is interpreted literally as the character. Therefore, meta-
characters such as $, + or * do not need to be escaped when they appear inside a class.
13.11 Grouping
In Fig. 13.8, we saw how a program can use method group to extract matching substrings
from an SRE_Match object. This method arises from a more sophisticated regular-expres-
sion technique—grouping. A regular expression may specify groups of substrings to match
in a string. A program then searches or matches a string with the regular expression and
extracts information from the matching groups. Figure 13.14 creates regular expressions
with groups and prints the information extracted from these groups.
The regular expression in line 12 describes three groups. The metacharacters ( and )
denote a group. The first group matches a word (\w+), followed by a space, followed by
another word. The second group matches three digits, followed by the character -, followed
pythonhtp1_13.fm Page 455 Friday, December 14, 2001 2:07 PM
by four digits. The third group matches one or more alphanumeric characters, followed by
the character @, followed by one or more alphanumeric characters, followed by the char-
acter ., followed by three alphanumeric characters. This regular expression matches the
string in variable testString1. The three groups match the name, phone number and e-
mail address of the person, respectively.
Lines 14–17 demonstrate the benefits of grouping. Line 15 calls function re.match,
which returns an SRE_Match object. This object’s groups method returns a list of sub-
strings. Each substring in the list corresponds to the substring that matches a group in the reg-
ular expression. The first substring in the list matches the first group in the regular expression,
and so on. The result of line 15 is that the program obtains the person’s name, phone number
pythonhtp1_13.fm Page 456 Friday, December 14, 2001 2:07 PM
and e-mail address. Line 17 calls the SRE_Match’s method group, passing integer value 3
as an argument. This call returns the substring that matches the third group in the regular
expression, which retrieves the e-mail address substring from testString1.
Regular-expression grouping introduces another subtle regular-expression issue. The
metacharacters + and * are called greedy operators. A greedy operator attempts to match
as many characters as possible. Sometimes this is not the desired behavior. Lines 20–32
demonstrate the problem of greedy operators. Line 24 is a string that contains a sample path
that might be part of a URL. Suppose we wish to write a regular expression that obtains the
root directory name from the path (i.e., /books in this example). Lines 26–28 attempt this
operation, but fail because of the greedy behavior of the + operator in expression2.
When an operator is greedy the regular expression module tries to match as much of the
expression that precedes the operator as possible. Initially, this causes expression2 to
match the entire string. However, the regular expression module must allow for the rest of
the pattern to be matched. In this case, the group that contains the greedy operator must be
immediately followed by a slash (/) as specified in expression2. Therefore, the regular
expression module searches backwards in the string until the regular expression module
can guarantee that there is a slash (/) that will follow the initial group in expression2.
Thus, the group that contains the greedy operator matches /books/2001.
The regular expression in line 30 modifies the behavior of the greedy + operator to obtain
the root directory name in the sample path correctly. Placing the ? metacharacter after the
greedy + operator changes the behavior of the + operator. Now, when the regular expression
module searches the string using expression3, the module searches one character at a
time until it finds the smallest string that matches the pattern in the group (i.e., /books).
This chapter presented basic string manipulation capabilities, as well as how to use the
powerful regular-expression-processing capabilities of module re. Chapter 14 introduces
file processing, which enables programs to read information from files on disk and write
information to files on disk. Many programs that process files use regular expressions and
other string-processing capabilities to search and manipulate the contents of those files.
SUMMARY
• Python represents strings as sequences of characters. Characters are the fundamental building
blocks of Python source programs. Every program is composed of a sequence of characters that—
when they are grouped together meaningfully—is interpreted by the computer as a series of in-
structions used to accomplish a task.
• Each character has a corresponding character code (sometimes called its integer ordinal value) the
ASCII or Unicode character set.
• Python supports strings as a basic data type.
• Strings are immutable sequences—once a string is created, it cannot be changed.
• Methods center, ljust and rjust control how a string is output by “padding” the string with
space characters. Method center takes one argument, which corresponds to the length of the out-
put string. The method then creates a new string of this length and centers the calling string in the
specified number of spaces. Method rjust right-justifies the calling string by preceding the
string with the difference of the specified number of spaces and the length of the calling string.
Method ljust creates a new string where the calling string is followed by the difference of the
specified number of spaces and the length of the calling string.
• String method strip removes leading and trailing whitespace from the calling string. String
method lstrip removes only leading whitespace. Method rstrip removes only trailing
whitespace.
• String method count returns the number of times the substring occurs in the string. If the method
does not find the specified substring, the method returns 0.
• Method find returns the lowest index in the string that begins the specified substring. If the string
does not contain the specified substring, the method returns –1.
• Method index is similar to method find. However, if the method does not find the specified
substring, the method raises a ValueError exception.
• Method startswith returns 1 if the string begins with the specified substring.
• Method endswith returns 1 if the string ends with the specified substring.
• Method rfind is similar to method find, except the former returns the highest index at which
the specified substring begins. If the method does not find the specified substring, it returns –1.
• Method rindex is similar to method index, except that rindex returns the highest index at
which the specified substring begins (and raises a ValueError if the method does not find the
substring).
• Method replace receives two substrings as arguments. The method searches the calling string
for the first substring and replaces that substring with the second argument. Method replace
takes an optional third argument that specifies the maximum number of replacements.
• When you read a sentence, your mind breaks the sentence into words, or tokens, each of which con-
veys meaning to you. Interpreters also perform tokenization. They break up statements into individ-
ual pieces like keywords, identifiers, operators and other elements of a programming language.
• Tokens are separated from one another by delimiters, typically whitespace characters such as blank,
tab, newline and carriage return. Other characters also may be used as delimiters to separate tokens.
• Method split returns a list of tokens. When a call to method split passes no arguments, the
method splits the string by any whitespace. The method takes an optional second argument that
specifies the maximum number of splits to perform.
• Given a list of tokens, method join joins that list with a delimiter. The method receives the list
of tokens as an argument and returns a string where the tokens are joined by the delimiter specified
in the calling string.
pythonhtp1_13.fm Page 458 Friday, December 14, 2001 2:07 PM
• A regular expression is a text pattern that a program uses to find substrings that match patterns.
• The re regular-expression module provides regular expression capability in Python.
• Function re.search looks for the first occurrence of a regular expression in a string and returns
an object that contains the substring matching the regular expression. If the string does not contain
the pattern, re.search returns None.
• Compiling regular expressions can make programs more efficient. To use a regular expression, the
re module first compiles the expression into a form that the module uses to process a string.
• Function re.compile takes as an argument a regular expression and returns an SRE_Pattern
object that represents a compiled regular expression. Compiled regular expression objects provide
all the functionality available in module re.
• SRE_Match methods enable a program to retrieve the results of regular-expression processing.
• Most patterns are built using a combination of characters, metacharacters and escape sequences.
A metacharacter is a regular-expression syntax element. A metacharacter’s job is to repeat, group,
place or classify.
• Metacharacter ? matches exactly zero or one occurrences of the expression it follows. Metachar-
acter + matches one or more occurrences of the expression it follows. Metacharacter * matches
zero or more occurrences of the expression it follows.
• Function re.match matches an expression to a string. Unlike function re.search (which re-
turns an SRE_Match object if any part of the string matches the expression), function re.match
returns an SRE_Match object only if the beginning of the string matches the regular expression.
• Metacharacter ^ indicates placement at the beginning of the string; metacharacter $ indicates
placement at the end of the string.
• A character class specifies a group of characters to match in a string.
• A special sequence is a shortcut for a common class of characters.
• The metacharacters [ and ] denote a regular expression class. A regular expression that contains
a class matches one character in the class.
• Classes can use the - character to specify a range of consecutive characters.
• When placed within a class as the first character after the square bracket, metacharacter ^ negates
the class—the regular expression matches all characters except those specified in the class.
• The metacharacter | matches either the regular expression to the left of the | or the regular ex-
pression to the right.
• A raw string is created by preceding the string with the character r.
• Usually, when a \ appears in a string, Python interprets this character as an escape character and
attempts to replace the \ and the character that follows with the correct escape sequence.
• When a \ appears within a raw string, Python does not interpret the character as the escape char-
acter, but instead as the literal backslash character.
• The metacharacter . matches any character in a string except for a newline.
• The metacharacters { and } provide another way to repeat characters.
• By placing a regular expression between metacharacters ^ and $, we specify that we want the reg-
ular expression to match the entire string.
• Module re provides pattern-based, string-manipulation capabilities, such as substituting a sub-
string in a string and splitting a string with a delimiter.
• Function re.sub takes three arguments. The second argument is a substring that is substituted
for every substring in the third argument that matches the pattern described by the first argument.
The function’s optional fourth argument specifies a maximum number of replacements to perform.
pythonhtp1_13.fm Page 459 Friday, December 14, 2001 2:07 PM
• Function re.split takes two arguments. The first argument is a regular expression that de-
scribes a pattern delimiter. The function returns a list of tokens created by splitting the second ar-
gument at the delimiter.
• If metacharacters such as $, + or * appear inside a class, they do not need to be escaped.
• Method group extracts matching substrings from an SRE_Match object. A regular expression
may specify groups of substrings to match in a string. The metacharacters ( and ) denote a group.
• Function re.match returns an SRE_Match object. This object’s groups method returns a list of
substrings. Each substring in the list corresponds to the substring that matches a group in the regular
expression. The first substring in the list matches the first group in the regular expression, and so on.
• The metacharacters + and * are called greedy operators. A greedy operator attempts to match as
many characters as possible.
TERMINOLOGY
$ metacharacter istitle method
% metacharacter isupper method
( metacharacter join method
) metacharacter ljust method
* metacharacter lower method
+ metacharacter lstrip method
. metacharacter metacharacter
? metacharacter ord function
\ metacharacter raw string
[ metacharacter re module
] metacharacter re.compile function
^ metacharacter re.match function
{ metacharacter re.search function
} metacharacter re.split function
| metacharacter re.sub function
capitalize method regular expression
character replace method
character class rfind method
center method rindex method
count method rjust method
delimiter rstrip method
encode method search method
endswith method split method
escape sequence splitlines method
expandtabs method SRE_MATCH object
find method SRE_Pattern object
greedy operator startswith method
group method string
groups method strip method
index method swapcase method
integer ordinal value title method
isalnum method token
isalpha method tokenization
isdigit method translate method
islower method upper method
isspace method white space character
pythonhtp1_13.fm Page 460 Friday, December 14, 2001 2:07 PM
EXERCISES
13.3 Use a regular expression to count the number of digits, non-digit characters, whitespace char-
acters and words in a string.
13.4 Use a regular expression to search through an XHTML string and to locate all valid URLs. For
the purpose of this exercise, assume that a valid URL is enclosed in quotes and begins with http://.
pythonhtp1_13.fm Page 461 Friday, December 14, 2001 2:07 PM
13.5 Write a regular expression that searches a string and matches a valid number. A number can
have any number of digits, but it can have only digits and a decimal point. The decimal point is op-
tional, but if it appears in the number, there must be only one, and it must have digits on its left and
its right. There should be whitespace or a beginning or end-of-line character on either side of a valid
number. Negative numbers are preceded by a minus sign.
13.6 Write a program that receives XHTML as input and outputs the number of XHTML tags in
the string. The program should count the number of tags nested at each level. For example, the XHT-
ML:
<p><strong>hi</strong></p>
has a p tag (nesting level 0—i.e., not nested in another tag) and a strong tag (nesting level 1).
13.7 Write a function that takes a list of dollar values separated by commas, converts each number
from dollars to pounds (at an exchange rate 0.667 dollars per pound) and prints the results in a com-
ma-separated list. Each converted value should have the £ symbol in front of it. This symbol can be
obtained by passing the ASCII value of the symbol (156) to the chr function, which returns a string
composed of that character. Ambitious programmers can attempt to do the conversion all in one state-
ment.
13.8 Write a program that asks the user to enter a sentence and checks whether the sentence con-
tains more than one space between words. If so, the program should remove the extra spaces. For ex-
ample, "Hello World" should be "Hello World". (Hint: Use split and join.)
pythonhtp1_14.fm Page 462 Friday, December 14, 2001 2:06 PM
14
File Processing and
Serialization
Objectives
• To create, read, write and update files.
• To become familiar with sequential-access file
processing.
• To understand random-access file processing via
module shelve.
• To specify high-performance, unformatted I/O
operations.
• To understand the differences between formatted and
raw data-file processing.
• To build a transaction-processing program with
random-access file processing.
• To serialize complex objects for storage.
I read part of it all the way through.
Samuel Goldwyn
I can only assume that a “Do Not File” document is filed in
a “Do Not File” file.
Senator Frank Church
Senate Intelligence Subcommittee Hearing, 1975
pythonhtp1_14.fm Page 463 Friday, December 14, 2001 2:06 PM
Outline
14.1 Introduction
14.2 Data Hierarchy
14.3 Files and Streams
14.4 Creating a Sequential-Access File
14.5 Reading Data from a Sequential-Access File
14.6 Updating Sequential-Access Files
14.7 Random-Access Files
14.8 Simulating a Random-Access File: The shelve Module
14.9 Writing Data to a shelve File
14.10 Retrieving Data from a shelve File
14.11 Example: A Transaction-Processing Program
14.12 Object Serialization
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
14.1 Introduction
Variables and sequences offer only temporary storage of data—the data is lost when a local
variable “goes out of scope” or when the program terminates. By contrast, files are used for
long-term retention of large amounts of data, even after the program that created the data
terminates. Data maintained in files often is called persistent data. Computers store files on
secondary storage devices, such as magnetic disks, optical disks and tapes. In this chapter,
we explain how Python programs create, update and process data files. We consider both
sequential-access files and random-access files, indicating the types of applications for
which each is best suited. We compare formatted data-file processing and raw data-file pro-
cessing, and we also examine various file-based data storage mechanisms, such as the
shelve and cPickle modules.
1. Generally, a file can contain arbitrary data in arbitrary formats. In some operating systems, a file
is viewed as nothing more than a collection of bytes. In such an operating system, any organization
of bytes in a file (such as organizing the data into records) is a view created by the application’s
programmer.
pythonhtp1_14.fm Page 465 Friday, December 14, 2001 2:06 PM
Sally Black
Tom Blue
Iris Orange
Randy Red
J u d y Field
1 Bit
Most businesses use many different files to store data. For example, companies might
have payroll files, accounts-receivable files (listing money due from clients), accounts-pay-
able files (listing money due to suppliers), inventory files (listing facts about all the items
handled by the business) and many other types of files. Sometimes, a group of related files
is called a database. A collection of programs designed to create and manage databases is
called a database management system (DBMS). We discuss databases in detail in
Chapter 17, Database Application Programming Interface (DB-API).
0 1 2 3 4 5 6 7 8 9 ... n-1
... end-of-file marker
When a Python program begins execution, Python creates three file streams—
sys.stdin (standard input stream), sys.stdout (standard output stream) and
sys.stderr (standard error stream). These streams provide communication channels
between a program and a particular file or device. Python file streams are created regardless
of whether a Python program imports the sys module, although a program must import the
sys module to access the streams directly. Program input corresponds to sys.stdin. In
fact, raw_input uses sys.stdin to retrieve user input. Program output corresponds to
sys.stdout. The print statement sends information to the standard output stream, by
default. Program errors are printed to sys.stderr.
The sys.stdin stream enables a program to receive input from the keyboard or
other devices, the sys.stdout stream enables a program to output data to the screen or
other devices and the sys.stderr stream enables a program to output error messages to
the screen or other devices.
Fig. 14.3 File-stream objects for opening and writing data to a file. (Part 1 of 2.)
pythonhtp1_14.fm Page 467 Friday, December 14, 2001 2:06 PM
15
16 while 1:
17
18 try:
19 accountLine = raw_input( "? " ) # get account entry
20 except EOFError:
21 break # user entered EOF
22 else:
23 print >> file, accountLine # write entry to file
24
25 file.close()
Fig. 14.3 File-stream objects for opening and writing data to a file. (Part 2 of 2.)
The file-open mode indicates whether a user can open the file for reading, writing or
both. File-open mode "w" opens a file to output data to the file. Existing files opened with
mode "w" are truncated—all data in the file is deleted—and re-created with the new data.
If the specified file does not yet exist, then a file is created. The newly created file is
assigned the name provided in the file name argument (i.e., clients.dat). If the loca-
tion of the file is not specified in the file name argument, Python attempts to create the file
in the current directory. If the file open-mode argument is not specified, the default value
is "r", which opens a file for reading. Figure 14.4 lists various file-open modes. The third
argument to function open—the buffering-mode argument—is for advanced control of
file input and output and usually is not specified. We do not assign a value to the buffering-
mode argument in this example.
Mode Description
"a" Writes all output to the end of the file. If the indicated file does not exist, it is
created.
"r" Opens a file for input. If the file does not exist, an IOError exception is raised.
"r+" Opens a file for input and output. If the file does not exist, causes an IOError
exception.
"w" Opens a file for output. If the file exists, it is truncated. If the file does not exist,
one is created.
"w+" Opens a file for input and output. If the file exists, it is truncated. If the file does
not exist, one is created.
Mode Description
"ab", "rb", Opens a file for binary (i.e., non-text) input or output. [Note: These modes are
"r+b", supported only on the Windows and Macintosh platforms.]
"wb", "w+b"
When open encounters an error, the function raises an IOError exception. Some
possible errors include attempting to open a file for reading that does not exist, opening a
read-only file for writing and opening a file for writing when no disk space is available.
In Fig. 14.3, if open raises an IOError exception, line 10 prints the error message
"File could not be opened" to sys.stderr. By default, the print statement
sends output to the sys.stdout file object. Programs can redirect output from the
print statement to print to a different file object. In our example, the statement
redirects output to the sys.stderr (standard error) file object. When >> symbol follows
the print keyword, the print statement redirects the output to the file object that ap-
pears to the right of >>. A comma follows the output file object, and the value to print fol-
lows the comma.
Common Programming Error 14.2
When redirecting file output with >>, forgetting to put a comma (,) after the file object is a
syntax error. 14.2
Redirecting output with >> was added to Python in version 2.0. For earlier versions,
or to support multiple versions, the effect of redirecting output with the >> symbol can be
accomplished with file object method write as follows:
sys.stderr.write( output )
Method Description
Method Description
If an error occurs when the program in Fig. 14.3 opens a file, function sys.exit
(line 11) terminates the program. Function sys.exit returns its optional argument to the
environment from which the program was invoked. Argument 0 (the default) indicates
normal program termination; any other value indicates that the program terminated due to
an error. The calling environment (most likely the operating system) uses the value returned
by sys.exit to respond to the error appropriately.
If the file, clients.dat, opens successfully, the program processes data. Lines 13–
14 prompt the user to enter the various fields for each record, or the end-of-file marker
when data entry is completed.
Lines 16–23 extract each set of data from the standard input using a try/except/
else block in a repetition structure. Function raw_input retrieves a line of input from
pythonhtp1_14.fm Page 470 Friday, December 14, 2001 2:06 PM
the user. If the user enters the end-of-file character, raw_input raises an EOFError
exception. Lines 20–21 catch this error and use a break statement to exit the infinite
while loop. If the user does not enter the end-of-file character, the else block (lines 22–
23) executes and prints the user-entered line to the output file using the >> symbol.
The close method (line 25) closes the file object after the while loop terminates.
Although Python closes open files when a program terminates, it is good practice to close
a file with the close method as soon as the program no longer needs the file.
Performance Tip 14.1
Close each file explicitly as soon as it is known that the program will not reference the file
again. This can reduce resource use in a program that continues executing after it no longer
needs a particular file. This practice also improves program clarity. 14.1
In the sample execution for the program of Fig. 14.3, the user enters information for
five accounts and signals that data entry is complete by entering end-of-file. This dialog
does not show how the data records actually appear in the file. To verify that the file has
been created successfully, the next section demonstrates a program that reads the file and
displays its contents.
Method readlines (line 13) reads the entire file contents of Fig. 14.3 into the pro-
gram. This method returns a list of the lines in the file, which the program stores in variable
records. For each line (record) in the file, method split returns the words
(fields) in the line as a list. Lines 19–23 output the fields. Methods ljust and rjust
left- and right-justify the fields, respectively, to format the output. Method close (line 25)
closes the file associated with the file object.
Python version 2.2 contains additional features that enable the programmer to use a file
object in a for statement. For example, line 19 in the above example could be replaced by:
for record in file:
in a program that uses Python 2.2. This technique reads one line of file at a time and as-
signs the line to record. The program can process that line immediately. Iterating over
the lines in a file in this manner can be more efficient than reading the contents of a large
file with method readlines, which requires the program to wait for the entire file to be
read into memory before any of the file’s contents can be processed.
To retrieve data sequentially from a file, programs normally start from the beginning
of the file and read all the data consecutively until the desired data is found. It sometimes
is necessary to process a file sequentially several times (from the beginning of the file)
during the execution of a program. File objects provide method seek for repositioning the
file-position pointer (which contains the byte number of the next byte to be read from or
written to the file). The statement
file.seek( 0, 0 )
pythonhtp1_14.fm Page 472 Friday, December 14, 2001 2:06 PM
repositions the file-position pointer at the beginning of the file. The first argument seek
takes is the offset, which is an integer value that specifies the location in the file as a number
of bytes from the seek direction of the file. The second (optional) argument is the seek di-
rection, or location, from which the offset begins. The seek direction can be 0 (the default)
for positioning relative to the beginning of a file, 1 for positioning relative to the current
position in a file or 2 for positioning relative to the end of a file. Some examples of posi-
tioning the file position pointer are
File-object method tell returns the current location of the file-position pointer. The
following statement assigns the current file-position pointer value to variable location
location = file.tell()
Figure 14.7 uses seek in a program that enables a credit manager to display the
account information for customers with zero balances (i.e., customers who do not owe any
money), credit balances (i.e., customers to whom the company owes money) and debit bal-
ances (i.e., customers who owe the company money for goods and services received in the
past). The program displays a menu and allows the credit manager to enter one of three
options to obtain credit information. Option 1 produces a list of accounts with zero balances
(lines 56–57). Option 2 produces a list of accounts with credit balances (lines 58–59).
Option 3 produces a list of accounts with debit balances (lines 60–61). Option 4 terminates
the program execution (lines 62–63). Entering an invalid option causes the program to
prompt the user to enter another choice (lines 64–65).
Lines 52–77 process the request for each request that is not option 4. Method read-
line (line 67) reads one line from the file and moves the file-position pointer to the next
line in the file. When method readline has finished reading all lines from the file (i.e.,
the program has reached the end of the file), readline returns the empty string ("").
Method split (line 71) unpacks each record to three variables—account, name
and balance. The program calls function shouldDisplay (lines 18–29), which
returns 1 (true), if a record should be displayed. If applicable, function outputLine
(lines 32–36) displays the record fields.
4 import sys
5
6 # retrieve one user command
7 def getRequest():
8
9 while 1:
10 request = int( raw_input( "\n? " ) )
11
12 if 1 <= request <= 4:
13 break
14
15 return request
16
17 # determine if balance should be displayed, based on type
18 def shouldDisplay( accountType, balance ):
19
20 if accountType == 2 and balance < 0: # credit balance
21 return 1
22
23 elif accountType == 3 and balance > 0: # debit balance
24 return 1
25
26 elif accountType == 1 and balance == 0: # zero balance
27 return 1
28
29 else: return 0
30
31 # print formatted balance data
32 def outputLine( account, name, balance ):
33
34 print account.ljust( 10 ),
35 print name.ljust( 10 ),
36 print balance.rjust( 10 )
37
38 # open file
39 try:
40 file = open( "clients.dat", "r" )
41 except IOError:
42 print >> sys.stderr, "File could not be opened"
43 sys.exit( 1 )
44
45 print "Enter request"
46 print "1 - List accounts with zero balances"
47 print "2 - List accounts with credit balances"
48 print "3 - List accounts with debit balances"
49 print "4 - End of run"
50
51 # process user request(s)
52 while 1:
53
54 request = getRequest() # get user request
55
Enter request
1 - List accounts with zero balances
2 - List accounts with credit balances
3 - List accounts with debit balances
4 - End of run
? 1
Accounts with zero balances:
300 White 0.0
? 2
Accounts with credit balances:
400 Stone -42.16
? 3
Accounts with debit balances:
100 Jones 24.98
200 Doe 345.67
500 Rich 224.62
? 4
End of run.
cars—some empty, some with contents. Data can be inserted into a random-access file
without destroying other data in the file. In addition, previously stored data can be updated
or deleted without rewriting the entire file. In the following sections, we explain how to
create a random-access file, enter data to that file, read the data, update the data and delete
data that is no longer needed.
Lines 20–21 prompt the user for the account numbers in the range 1–100, inclusive.
Line 28 writes data to the shelve file. The program manipulates the data in a shelve
file through a dictionary interface. Each key in a shelve file must be a string; therefore,
function str converts the integer value accountNumber to a string (line 28). Method
split converts the user-entered data into a list, which is stored as the record key’s value
(line 28). When the user enters 0 to indicate the end of data, the file object’s close method
closes the shelve file (line 33).
Method keys returns a list of the record keys in the shelve file (line 28). A for
loop iterates over this list and passes each record key and its value to function output-
Line. Function outputLine (lines 8–13) prints the record key and its associated values.
20
21 while 1:
22 menuChoice = int( raw_input( "? " ) )
23
24 if not 1 <= menuChoice <= 5:
25 print >> sys.stderr, "Incorrect choice"
26
27 else:
28 break
29
30 return menuChoice
31
32 # create formatted text file for printing
33 def textFile( readFromFile ):
34
35 # open text file
36 try:
37 outputFile = open( "print.txt", "w" )
38 except IOError:
39 print >> sys.stderr, "File could not be opened."
40 sys.exit( 1 )
41
42 print >> outputFile, "Account".ljust( 10 ),
43 print >> outputFile, "Last Name".ljust( 10 ),
44 print >> outputFile, "First Name".ljust( 10 ),
45 print >> outputFile, "Balance".rjust( 10 )
46
47 # print shelve values to text file
48 for key in readFromFile.keys():
49 print >> outputFile, key.ljust( 10 ),
50 print >> outputFile, readFromFile[ key ][ 0 ].ljust( 10 ),
51 print >> outputFile, readFromFile[ key ][ 1 ].ljust( 10 ),
52 print >> outputFile, readFromFile[ key ][ 2 ].rjust( 10 )
53
54 outputFile.close()
55
56 # update account balance
57 def updateRecord( updateFile ):
58
59 account = getAccount( "Enter account to update" )
60
61 if updateFile.has_key( account ):
62 outputLine( account, updateFile[ account ] ) # get record
63
64 transaction = raw_input(
65 "\nEnter charge (+) or payment (-): " )
66
67 # create temporary record to alter data
68 tempRecord = updateFile[ account ]
69 tempBalance = float( tempRecord[ 2 ] )
70 tempBalance += float( transaction )
71 tempBalance = "%.2f" % tempBalance
72 tempRecord[ 2 ] = tempBalance
73
74 # update record in shelve
75 del updateFile[ account ] # remove old record first
76 updateFile[ account ] = tempRecord
77 outputLine( account, updateFile[ account ] )
78 else:
79 print >> sys.stderr, "Account #", account, \
80 "does not exist."
81
82 # create and insert new record
83 def newRecord( insertInFile ):
84
85 account = getAccount( "Enter new account number" )
86
87 if not insertInFile.has_key( account ):
88 print "Enter lastname, firstname, balance"
89 currentData = raw_input( "? " )
90 insertInFile[ account ] = currentData.split()
91 else:
92 print >> sys.stderr, "Account #", account, "exists."
93
94 # delete existing record
95 def deleteRecord( deleteFromFile ):
96
97 account = getAccount( "Enter account to delete" )
98
99 if deleteFromFile.has_key( account ):
100 del deleteFromFile[ account ]
101 print "Account #", account, "deleted."
102 else:
103 print >> sys.stderr, "Account #", account, \
104 "does not exist."
105
106
107 # output line of client information
108 def outputLine( account, record ):
109
110 print account.ljust( 10 ),
111 print record[ 0 ].ljust( 10 ),
112 print record[ 1 ].ljust( 10 ),
113 print record[ 2 ].rjust( 10 )
114
115 # get account number from keyboard
116 def getAccount( prompt ):
117
118 while 1:
119 account = raw_input( prompt + " (1 - 100): " )
120
121 if 1 <= int( account ) <= 100:
122 break
123
124 return account
125
Execute the program in Fig. 14.9 to insert data in the file that is used in this transaction-
processing program (Fig. 14.11). The transaction-processing program offers a user five
options (1–5) with which to work in the program. Option 1 calls function textFile
(lines 33–54), which stores a formatted list of all the account information in a text file called
print.txt. From this file, a user can print a list of account information. Function text-
File takes a shelve file as an argument and uses the data in that shelve file to create
the text file. Function outputLine (lines 108–113) outputs the data to file stdout.
After a user chooses option 1, the file print.txt contains the following text:
When a user selects option 2, the program calls function updateRecord (lines 57–80)
to update an account. First, the function determines whether the record that the user specifies
exists, because the function can update only existing records. If the record exists, it is read
into variable tempRecord. Lines 69–70 convert the string representation of the account bal-
ance to a floating-point value before manipulating its numerical value. Before updating a
record in the shelve file, the program must delete the existing record for the specified
account; keyword del (line 75) deletes the current record. Line 76 updates the record by
assigning the new record values to the corresponding account number (record key). The pro-
gram then outputs the updated values. The following is a typical output for this option:
pythonhtp1_14.fm Page 483 Friday, December 14, 2001 2:06 PM
Option 3 calls function newRecord to enable a user to add a new account. This func-
tion adds an account in the same manner as that of the program of Fig. 14.9. If the user
enters an account number for an existing account, newRecord displays a message that the
account exists and the program allows the user to select the next operation to perform. A
typical output for option 3 is as follows:
Option 5 terminates the program. The main portion of the program (lines 127–146)
creates a list of functions that correspond to the user-menu options (line 127). The program
then opens the shelve file for the bank accounts and gets the user’s menu choice.
Line 144 calls a function that corresponds with a user option. Recall that parentheses
(()) are Python operators. When used in conjunction with the function name (e.g., text-
File), the operator calls the function and passes any indicated arguments. Variable
options holds a list of function names, so a statement such as
options[ 0 ]( creditFile )
invokes function textFile (the first function in the list) and passes creditFile as an
argument. Statements like this avoid the need for long if/else statements that determine
the user menu option and call the appropriate function.
pythonhtp1_14.fm Page 484 Friday, December 14, 2001 2:06 PM
Line 8 opens user.dat, the file in which the pickled object resides. Variable
inputList, initialized in line 16, is a list that contains the user-entered information to
pickle. Lines 18–25 prompt the user to enter information and append the user’s entries to
inputList. Function cPickle.dump (line 27) pickles inputList to the file. The
first argument to dump is the object to pickle and the second argument is the file object that
represents the file in which method dump will store the pickled object. The function con-
verts inputList to a series of bytes and writes the stream to the file. Line 29 calls file
object method close to close the file.
A program can convert pickled data back to the original format by unpickling the data.
Figure 14.13 demonstrates unpickling. This example uses the pickled file created by the
program in Fig. 14.12.
The program first opens file users.dat (lines 7–11). Function cPickle.load
(line 13) unpickles the data in the file. The function takes as an argument a file object that
contains a pickled object, converts the pickled object into a Python object and returns a ref-
erence to the unpickled object. We assign this reference to variable records. The pro-
gram then closes the file, because the file is no longer needed (line 14). The remainder of
the program (lines 16–23) displays the unpickled data by iterating over the list of lists.
SUMMARY
• Files are used for long-term retention of large amounts of data.
• Computers store files on secondary storage devices, such as magnetic disks, optical disks and tapes.
• Ultimately, all data items processed by digital computers are reduced to combinations of zeros and
ones. This occurs because it is simple and economical to build electronic devices that can assume
two stable states—0 represents one state, and 1 represents the other.
• The smallest data item that computers support is called a bit (short for “binary digit”—a digit that
can assume one of two values). Each data item, or bit can assume either the value 0 or the value 1.
• It is preferable to program with data forms such as decimal digits (e.g., 0, 1, 2, 3, 4, 5, 6, 7, 8 and
9), letters (e.g., A through Z and a through z) and special symbols (e.g., $, @, %, &, *, (, ), -, +, “,
:, ?, /, etc.).
• Digits, letters and special symbols are referred to as characters.
• The set of all characters used to write programs and represent data items on a particular computer
is called that computer’s character set.
• Because computers can process only 1s and 0s, every character in a computer’s character set is
represented as a sequence of 1s and 0s (called a byte). Bytes are composed of eight bits.
• Just as characters are composed of bits, fields are composed of characters (or bytes). A field is a
group of characters that convey a meaning.
• Data items processed by computers form a data hierarchy in which data items become larger and
more complex in structure in the progression from bits, to characters (bytes), to fields and up to
larger data structures.
• A record, which we can implement as a tuple, a dictionary or instance in Python, is a group of re-
lated fields.
• To facilitate the retrieval of specific records from a file, at least one field in each record is chosen
as a record key. A record key identifies a record as belonging to a particular person or entity and
distinguishes that record from all other records in the file.
• There are many ways to organize records in a file. In the most common organization is a sequential
file, in which records typically are stored in order by the record-key field.
• Sometimes, a group of related files is called a database.
• A collection of programs designed to create and manage databases is called a database manage-
ment system (DBMS).
• Python views each file as a sequential stream of bytes.
• Python imposes no structure on a file—notions like “records” do not exist in Python files.
• Each file ends either with an end-of-file marker or at a specific byte number recorded in a system-
maintained administrative data structure.
• When a file is opened, Python creates an object and associates a stream with that object.
• Python creates three file streams—sys.stdin (standard input stream), sys.stdout (standard
output stream) and sys.stderr (standard error stream). These streams provide communication
channels between a program and a particular file or device.
• A program must import the sys module to access the three file streams directly.
• Program input corresponds to sys.stdin. Function raw_input uses sys.stdin to get in-
put from the user.
• Program output corresponds with sys.stdout. By default, the print statement sends infor-
mation to the standard output stream.
pythonhtp1_14.fm Page 487 Friday, December 14, 2001 2:06 PM
TERMINOLOGY
>> symbol raw data processing
"a" file-open mode readline method
"ab" file-open mode readlines method
bit record
buffering mode record key
byte redirection of output
character set "r" file-open mode
close method "r+" file-open mode
cPickle method "r+b" file-open mode
data hierarchy "rb" file-open mode
database seek method
database management system (DBMS) sequential-access file
end-of-file marker serialization
EOFError exception shelve file
field shelve module
file split method
file name standard-error stream (sys.stderr)
file-open mode standard-input stream (sys.stdin)
file-position pointer standard-output stream (sys.stdout)
file-seek location stream
instant-access processing sys.exit function
IOError exception tell method
keys method transaction-processing systems
magnetic disk truncate
offset unpickling an object
open method "w" file-open mode
persistent data "w+" file-open mode
pickling an object "w+b" file-open mode
random-access file "wb" file-open mode
pythonhtp1_14.fm Page 489 Friday, December 14, 2001 2:06 PM
SELF-REVIEW EXERCISES
14.1 Fill in the blanks in each of the following statements:
a) Computers store files on , such as magnetic disks.
b) A record can be implemented as a ,a or a in Python.
c) The set of all characters used to write programs on a computer is called its .
d) In a , records typically are stored in order by the record key.
e) Python creates three file streams— , and .
f) A is composed of several fields.
g) To facilitate the retrieval of specific records from a file, one field in each record is chosen
as a .
h) At the lowest level, the functions performed by computers essentially involve the manip-
ulation of and .
i) Data items represented in computers form a , in which data items become
larger and more complex as they progress from bits to fields.
j) A group of related files is called a .
14.2 State which of the following are true and which are false. If false, explain why.
a) The programmer must create the sys.stderr stream explicitly.
b) The smallest data item in a computer is a byte.
c) Python views each file as a dictionary.
d) File streams serve as communication channels.
e) It is not necessary to search through all the records in a random-access file to find a spe-
cific record.
f) Records in random-access files must be of uniform length.
g) Module cPickle performs more efficiently than does module pickle because
cPickle is written in Python.
h) Serialization converts complex objects to a set of bytes.
i) Method sys.exit returns 1 by default to signify that no errors occurred.
j) Sequential-access files are inappropriate for instant-access applications in which records
must be located quickly.
EXERCISES
14.3 Fill in the blanks in each of the following statements:
a) A group of related characters that conveys meaning is called a .
b) Method repositions the file-position pointer in a file.
c) Programs can output from the print statement to print to a different file object.
d) If the user enters the end-of-file character, function raw_input raises an .
e) Method returns a list of the lines in a file.
pythonhtp1_14.fm Page 490 Friday, December 14, 2001 2:06 PM
14.4 State which of the following are true and which are false. If false, explain why.
a) People prefer to manipulate bits instead of characters and fields because bits are more
compact.
b) People specify programs and data items as characters; computers then manipulate and
process these characters as groups of zeros and ones.
c) Most organizations store all information in a single file to facilitate computer processing.
d) Each statement that processes a file in a Python program explicitly refers to that file by
name.
e) Python imposes no structure on a file.
14.5 You are the owner of a hardware store and need to keep an inventory that can tell you what
different tools you have, how many of each you have on hand and the cost of each one. Write a pro-
gram that initializes the shelve file "hardware.dat", lets you input the data concerning each
tool and enables you to list all your tools. The tool identification number should be the record number.
Use the following information to start your file:
14.6 Modify the inventory program of Exercise 14.5. The modified program allows you to delete
a record for a tool that you no longer have and allows you to update any information in the file.
14.7 Create a simple text editor GUI that allows the user to open a file. The GUI should display
the text of the file and then close the file. The user can modify the file’s contents. When the user
chooses to save the text, the modified contents should be written to the file, replacing any other con-
tents. The user also should be able to clear the display.
14.8 Create four band members from the class BandMember. Pickle these objects and store them
in a file. Unpickle, then output the objects.
17 Hammer 76 11.99
37 Saw 88 12.00
68 Screwdriver 106 6.99
83 Wrench 34 7.50
1 class BandMember:
2 """Represent a band member"""
3
4 def __init__( self, name, instrument ):
5 """Initialize name and instrument"""
6
7 self.name = name
8 self.instrument = instrument
9
10 def __str__( self ):
11 """Overloaded string representation"""
12
13 return "%s plays the %s" % ( self.name, self.instrument )
15
Extensible Markup
Language (XML)
Objectives
• To understand XML.
• To mark up data using XML.
• To become familiar with the types of markup
languages created with XML.
• To understand the relationships among DTDs,
Schemas and XML.
• To understand the fundamentals of DOM-based and
SAX-based parsing.
• To understand the concept of XML namespaces.
• To create simple XSLT documents.
Every country has its own language, yet the subjects of which
the untutored soul speaks are the same everywhere.
Tertullian
The chief merit of language is clearness, and we know that
nothing detracts so much from this as do unfamiliar terms.
Galen
Like everything metaphysical, the harmony between thought
and reality is to be found in the grammar of the language.
Ludwig Wittgenstein
pythonhtp1_15.fm Page 492 Saturday, December 15, 2001 2:12 PM
Outline
15.1 Introduction
15.2 XML Documents
15.3 XML Namespaces
15.4 Document Object Model (DOM)
15.5 Simple API for XML (SAX)
15.6 Document Type Definitions (DTDs), Schemas and Validation
15.6.1 Document Type Definition Documents
15.6.2 W3C XML Schema Documents
15.7 XML Vocabularies
15.7.1 MathML™
15.7.2 Chemical Markup Language (CML)
15.7.3 Other XML Vocabularies
15.8 Extensible Stylesheet Language (XSL)
15.9 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
15.1 Introduction
The Extensible Markup Language (XML) was developed in 1996 by the World Wide Web
Consortium’s (W3C’s) XML Working Group. XML is a portable, widely supported, open
technology (i.e., non-proprietary technology) for describing data. XML quickly is becom-
ing the standard for data that is exchanged between applications. Using XML, document
authors can describe any type of data, including mathematical formulas, software configu-
ration instructions, music, recipes and financial reports. An additional benefit of using
XML is that documents are readable by both humans and machines.
This chapter explores XML and various XML-related technologies. The first three sec-
tions introduce XML and how it is used to mark up data. The next two sections describe
two different programmatic libraries that can be used to manipulate XML documents. Later
sections introduce several XML vocabularies (i.e., markup languages created with XML).
This chapter also examines a technology called Extensible Stylesheet Language Transfor-
mations (XSLT), which transforms XML data into other text-based formats. Chapter 16,
Python XML Processing, builds upon the concepts presented in this chapter by writing
Python applications that use XML.
of XML1 that is used in the document. XML comments (lines 3–4) begin with <!-- and end
with -->, and can be placed almost anywhere in an XML document. As in a Python program,
comments are used in XML for documentation purposes.
Common Programming Error 15.1
Placing any characters, including whitespace, before the XML declaration is an error. 15.1
XML marks up data using tags, which are names enclosed in angle brackets (<>). Tags
are used in pairs to delimit character data (e.g., Simple XML). A tag that begins markup
(i.e., XML data) is called a start tag, whereas a tag that terminates markup is called an end
tag. Examples of start tags are <article> and <title> (lines 6 and 8, respectively).
End tags differ from start tags in that they contain a forward slash (/) character immediately
after the < character. Examples of end tags are </title> and </article> (lines 8 and
23, respectively). XML documents can contain any number of tags.
Common Programming Error 15.2
Failure to provide a corresponding end tag for a start tag is an error. 15.0
Individual units of markup (i.e., everything included between a start tag and its corre-
sponding end tag) are called elements. An XML document includes one element (called a
root element) that contains all other elements in the document. The root element must be
the first element after the XML declaration. In Fig. 15.1, article (line 6) is the root ele-
ment. Elements are nested to form hierarchies—with the root element at the top of the hier-
archy. This allows document authors to create explicit relationships between data. For
example, elements title, date, author, summary and content then are nested
within article. Elements firstName and lastName are nested within author.
Common Programming Error 15.3
Attempting to create more than one root element in an XML document is an error. 15.3
Element title (line 8) contains the title of the article, Simple XML, as character
data. Similarly, date (line 10), summary (line 17) and content (lines 19–21) contain
character data that represent the article’s publication date, summary and content, respec-
tively. XML tag names can be of any length and may contain letters, digits, underscores,
hyphens and periods—they must begin with a letter or an underscore.
Common Programming Error 15.4
XML is case sensitive. Using the wrong case for an XML tag name is an error. 15.4
By itself, this document is simply a text file named article.xml. Although it is not
required, most XML-document file names end with the file extension .xml.2 Processing
an XML document requires a program called an XML parser. Parsers are responsible for
checking an XML document’s syntax and making the XML document’s data available to
applications. Often, XML parsers are built into applications or available for download over
the Internet. Popular parsers include Microsoft’s msxml, 4DOM (a Python package that we
use extensively in the Chapter 16), the Apache Software Foundation’s Xerces and IBM’s
XML4J. In this chapter, we use msxml.
When the user loads article.xml into Internet Explorer (IE),3 msxml parses the
document and passes the parsed data to IE. IE then uses a built-in style sheet to format the
data. Notice that the resulting format of the data (Fig. 15.2) is similar to the format of the
XML document shown in Fig. 15.1. As we soon demonstrate, style sheets play an important
and powerful role in the transformation of XML data into formats suitable for display.
Notice the minus (–) and plus (+) signs in Fig. 15.2. Although these are not part of the
XML document, IE places them next to all container elements (i.e., elements that contain
other elements). Container elements also are called parent elements. A minus sign indicates
that the parent element’s child elements (i.e., nested elements) currently are displayed. When
clicked, a minus sign becomes a plus sign (which collapses the container element and hides
all of its children). Conversely, clicking a plus sign expands the container element and
changes the plus sign to a minus sign. This behavior is similar to the viewing of the directory
structure on a Windows system using Windows Explorer. In fact, a directory structure often
is modeled as a series of tree structures, in which each drive letter (e.g., C:, etc.) represents
2. Some applications that process XML documents may require this file extension.
3. IE 5 and higher.
pythonhtp1_15.fm Page 495 Saturday, December 15, 2001 2:12 PM
Minus sign
Plus sign
We now present a second XML document (Fig. 15.3), which marks up a business letter.
This document contains significantly more data than did the previous XML document.
6 <letter>
7 <contact type = "from">
8 <name>Jane Doe</name>
9 <address1>Box 12345</address1>
10 <address2>15 Any Ave.</address2>
11 <city>Othertown</city>
12 <state>Otherstate</state>
13 <zip>67890</zip>
14 <phone>555-4321</phone>
15 <flag gender = "F" />
16 </contact>
17
18 <contact type = "to">
19 <name>John Doe</name>
20 <address1>123 Main St.</address1>
21 <address2></address2>
22 <city>Anytown</city>
23 <state>Anystate</state>
24 <zip>12345</zip>
25 <phone>555-1234</phone>
26 <flag gender = "M" />
27 </contact>
28
29 <salutation>Dear Sir:</salutation>
30
31 <paragraph>It is our privilege to inform you about our new
32 database managed with <technology>XML</technology>. This
33 new system allows you to reduce the load on
34 your inventory list server by having the client machine
35 perform the work of sorting and filtering the data.
36 </paragraph>
37
38 <paragraph>Please visit our Web site for availability
39 and pricing.
40 </paragraph>
41
42 <closing>Sincerely</closing>
43
44 <signature>Ms. Doe</signature>
45 </letter>
Root element letter (lines 6–45) contains the child elements contact (lines 7–16
and 18–27), salutation, paragraph, closing and signature. In addition to
being placed between tags, data also can be placed in attributes, which are name-value pairs
in start tags. Elements can have any number of attributes in their start tags. The first con-
tact element (lines 7–16) has attribute type with attribute value "from", which indi-
cates that this contact element marks up information about the letter’s sender. The
second contact element (lines 18–27) has attribute type with value "to", which indi-
cates that this contact element marks up information about the letter’s recipient. Like tag
names, attribute names are case sensitive; can be any length; may contain letters, digits,
underscores, hyphens and periods; and must begin with either a letter or underscore char-
pythonhtp1_15.fm Page 497 Saturday, December 15, 2001 2:12 PM
acter. A contact element stores a contact’s name, address and phone number. Element
salutation (line 29) marks up the letter’s salutation. Lines 31–40 mark up the letter’s
body with paragraph elements. Elements closing (line 42) and signature (line
44) mark up the closing sentence and the signature of the letter’s author, respectively.
Common Programming Error 15.6
Failure to enclose attribute values in double ("") or single (’’) quotes is an error. 15.6
In line 15, we introduce empty element flag, which indicates the gender of the con-
tact. Empty elements do not contain character data (i.e., they do not contain text between
the start and end tags). Such elements are closed either by placing a slash at the end of the
element (as shown in line 15) or by explicitly writing a closing tag, as in
<flag gender = "F"></flag>
The markup in Fig. 15.4 demonstrates the use of namespaces. This XML document
contains two file elements that are differentiated using namespaces.
5
6 <text:directory xmlns:text = "http://www.deitel.com/ns/python1e"
7 xmlns:image = "http://www.deitel.com/images/ns/120101">
8
9 <text:file filename = "book.xml">
10 <text:description>A book list</text:description>
11 </text:file>
12
13 <image:file filename = "funny.jpg">
14 <image:description>A funny picture</image:description>
15 <image:size width = "200" height = "100" />
16 </image:file>
17
18 </text:directory>
Lines 6–7 use attribute xmlns to create two namespace prefixes: text and image.
Each namespace prefix is bound to a series of characters called a uniform resource identi-
fier (URI) that uniquely identifies the namespace. Document authors create their own
namespace prefixes and URIs.
To ensure that namespaces are unique, document authors must provide unique URIs.
Here, we use the text http://www.deitel.com/ns/python1e and http://
www.deitel.com/images/ns/120101 as URIs. A common practice is to use Uni-
versal Resource Locators (URLs) for URIs, because the domain names (such as,
www.deitel.com) used in URLs are guaranteed to be unique. In this example, we use
URLs related to the Deitel & Associates, Inc., domain name to identify namespaces. The
parser never visits these URLs—they simply represent a series of characters used to differ-
entiate names. The URLs need not refer to actual Web pages or be formed properly.
pythonhtp1_15.fm Page 499 Saturday, December 15, 2001 2:12 PM
Lines 9–11 use the namespace prefix text to describe elements file and descrip-
tion. Notice that the namespace prefix text is applied to the end tag name as well. Lines
13–16 apply namespace prefix image to elements file, description and size.
To eliminate the need to precede each tag name with a namespace prefix, document
authors can specify a default namespace. Figure 15.5 demonstrates the creation and use of
default namespaces.
Line 6 defines a default namespace by binding a URI to attribute xmlns. Once this
default namespace is defined, tag names in child elements belonging to the namespace need
not be qualified by a namespace prefix. Element file (line 9–11) is in the namespace cor-
responding to the URI http://www.deitel.com/ns/python1e. Compare this to
lines 9–11 of Fig. 15.4, where we prefixed elements file and description with text.
The default namespace applies to element directory and all elements that are not
qualified with a namespace prefix. However, we can use a namespace prefix to specify a
different namespace for particular elements. For example, line 13 prefixes tag name file
in with image to indicate that it is in the namespace corresponding to the URI http://
www.deitel.com/images/ns/120101, rather than in the default namespace.
article
title
date
author firstName
summary lastName
contents
(called validating parsers4) read the DTD or Schema and check the XML document’s struc-
ture against it. If the XML document conforms to the DTD or Schema, then the XML docu-
ment is valid. Parsers that cannot validate documents against DTDs or Schemas are called
non-validating parsers. If an XML parser (validating or non-validating) is able to process an
XML document (that does not reference a DTD or Schema), the XML document is consid-
ered to be well formed (i.e., it is syntactically correct). By definition, a valid XML document
is a well-formed XML document. If a document is not well formed, the parser issues an error.
Software Engineering Observation 15.3
DTD and Schema documents are essential components for XML documents used in business-
to-business (B2B) transactions and mission-critical systems. 15.3
Fig. 15.7 Document Type Definition (DTD) for a business letter. (Part 1 of 2.)
4. Many DOM parsers and SAX parsers are validating parsers. Check you parser’s documentation to
determine whether it is a validating parser.
pythonhtp1_15.fm Page 503 Saturday, December 15, 2001 2:12 PM
Fig. 15.7 Document Type Definition (DTD) for a business letter. (Part 2 of 2.)
Line 4 uses the ELEMENT element type declaration to define rules for element
letter. In this case, letter contains one or more contact elements, one saluta-
tion element, one or more paragraph elements, one closing element and one sig-
nature element, in that sequence. The plus sign (+) occurrence indicator specifies that
an element must occur one or more times. Other indicators include the asterisk (*), which
indicates an optional element that can occur any number of times, and the question mark
(?), which indicates an optional element that can occur at most once. If an occurrence indi-
cator is omitted, exactly one occurrence is expected.
The contact element definition (line 7) specifies that it contains the name,
address1, address2, city, state, zip, phone and flag elements—in that order.
Exactly one occurrence of each is expected.
Line 9 uses the ATTLIST element type declaration to define an attribute (i.e., type)
for the contact element. Keyword #IMPLIED specifies that, if the parser finds a con-
tact element without a type attribute, the application can provide a value or ignore the
missing attribute. The absence of a type attribute cannot invalidate the document. Other
types of default values include #REQUIRED and #FIXED. Keyword #REQUIRED speci-
fies that the attribute must be present in the document and the keyword #FIXED specifies
that the attribute (if present) must always be assigned a specific value. For example,
<!ATTLIST address zip #FIXED "01757">
indicates that the value 01757 must be used for attribute zip; otherwise, the document is
invalid. If the attribute is not present, then the parser, by default, uses the fixed value that is
specified in the ATTLIST declaration. Flag CDATA specifies that attribute type contains
text that is not processed by the parser, but instead is passed to the application as is.
Software Engineering Observation 15.5
DTD syntax cannot describe an element’s (or attribute’s) data type. 15.5
Flag #PCDATA (line 11) specifies that the element can store parsed character data (i.e.,
text). Parsed character data cannot contain markup. Because they are used in markup, the
characters less than (<) and ampersand (&) must be replaced by their entity references (i.e.,
< and &). However, the ampersand character can be used with entity references.
See Appendix M, HTML/XHTML Special Characters, for a list of pre-defined entities.
Line 18 defines an empty element named flag. Keyword EMPTY specifies that the ele-
ment cannot contain character data. Empty elements commonly are used for their attributes.
pythonhtp1_15.fm Page 504 Saturday, December 15, 2001 2:12 PM
XML documents must reference a DTD explicitly. Figure 15.8 is an XML document
that conforms to letter.dtd (Fig. 15.7).
This XML document is similar to that in Fig. 15.3. Line 6 references a DTD file. This
markup contains three pieces: The name of the root element (letter in line 8) to which
the DTD is applied, the keyword SYSTEM (which in this case denotes an external DTD—
a DTD defined in a separate file) and the DTD’s name and location (i.e., letter.dtd in
the current directory). Though almost any file extension can be used, DTD documents typ-
ically end with the .dtd extension.
Various tools (many of which are free) check document conformity against DTDs and
Schemas (discussed momentarily). The output in Fig. 15.9 shows the results of validating
letter2.xml against letter.dtd using Microsoft’s XML Validator. Microsoft XML
Validator is available free for download from msdn.microsoft.com/downloads/
samples/Internet/xml/xml_validator/sample.asp. For additional valida-
tion tools, visit www.w3.org/XML/Schema.html.
The Microsoft XML Validator can validate XML documents against DTDs locally or
by uploading the documents to the XML Validator Web site. Here, letter2.xml and
letter.dtd are placed in folder /pythonhtp1/pythonhtp1_examples/Ch15.
This XML document (letter2.xml) is valid because it conforms to letter.dtd.
XML documents that fail validation still may be well-formed documents. When a docu-
ment fails to conform to a DTD or Schema, Microsoft XML Validator displays an error mes-
sage. For example, the DTD in Fig. 15.8 indicates that the contacts element must contain
child element name. If this element is omitted, the document is well formed, but not valid. In
such a scenario, Microsoft XML Validator displays the error message shown in Fig. 15.10.
to be valid. The application that uses the XML document containing this markup would
need to test whether the data in element quantity is numeric and take appropriate action
if the data is not numerics.
XML Schema enables Schema authors to specify that element quantity’s data must
be numeric. When a parser validates the XML document against this Schema, the parser
can determine that 5 conforms and that hello does not. An XML document that conforms
to a schema document is schema valid and a document that does not conform is invalid.
In this section, we use XSV (XML Schema Validator) to validate XML documents against
W3C XML Schema. To use XSV online, visit www.w3.org/2000/09/webdata/xsv,
enter the name of the XML file to validate, then press the Upload and Get Results button.
Visit www.ltg.ed.ac.uk/~ht/xsv-status.html to download XSV.
Software Engineering Observation 15.6
Many organizations and individuals are creating DTDs and schemas for a broad range of
applications (e.g., financial transactions, medical prescriptions, etc.). These collections—
called repositories—often are available free for download from the Web (e.g.,
www.dtd.com). 15.6
Figure 15.11 shows a Schema-valid XML document (book.xml) and Fig. 15.12 shows
the W3C XML Schema document (book.xsd) that defines the structure for book.xml.
W3C XML Schemas typically use the .xsd extension, although this is not required.
Figure 15.11 shows the result of validating book.xml against Schema book.xsd. Note
that the output is XML, and the outcome='success' and schemaErrors='0'
attributes indicate that book.xml is valid.
W3C XML Schema use the namespace URI http://www.w3.org/2001/
XMLSchema and often use namespace prefix xsd (line 6 in Fig. 15.12). Root element
schema contains elements that define an XML document’s structure. Line 7 binds the URI
http://www.deitel.com/booklist to namespace prefix deitel. Line 8 speci-
fies the targetNamespace, which is the namespace for elements and attributes that this
Schema defines.
Good Programming Practice 15.1
By convention, W3C XML Schema authors use namespace prefix xsd when referring to the
URI http://www.w3.org/2001/XMLSchema.\ 15.1
Fig. 15.11 XML document that conforms to a W3C XML Schema. (Part 1 of 2.)
pythonhtp1_15.fm Page 508 Saturday, December 15, 2001 2:12 PM
<?xml version='1.0'?>
<xsv docElt='{http://www.deitel.com/booklist}books'
instanceAssessed='true' instanceErrors='0' rootType='{http://www.dei-
tel.com/booklist}:BooksType' schemaDocs='/pythonhtp1_examples/ch15/
Schema/book.xsd' schemaErrors='0' target='file:/pythonhtp1_examples/
ch15/Schema/book.xml' validation='strict' version='XSV 1.203.2.37/
1.106.2.19 of 2001/11/29 11:00:00'xmlns='http://www.w3.org/2000/05/
xsv'>
<schemaDocAttempt URI='file:/pythonhtp1_examples/ch15/Schema/
book.xsd' outcome='success' source='command line'/>
</xsv>
Fig. 15.11 XML document that conforms to a W3C XML Schema. (Part 2 of 2.)
In W3C XML Schema, element element (line 10) defines an element. Attributes
name and type specify the element’s name and data type, respectively. In this case, the
name of the element is books and the data type is deitel:BooksType. Any element
(e.g., books) that contains attributes or child elements must define a complex type, which
defines each attribute and child element. Type deitel:BooksType (lines 12–17) is an
example of a complex type. We prefix BooksType with deitel, because this is a com-
plex type that we have created, not an existing W3C XML Schema data type.
pythonhtp1_15.fm Page 509 Saturday, December 15, 2001 2:12 PM
Lines 12–17 use element complexType to define an element type that has a child
element named book. Because book contains a child element, its type must be a complex
type (e.g., BookType). Attribute minOccurs specifies that books must contain a min-
imum of one book element. Attribute maxOccurs, with value unbounded (line 14)
specifies that books may have any number of book child elements. Element sequence
specifies the order of elements in the complex type.
Lines 19–23 define the complexType BookType. Line 21 defines element title
with type xsd:string. When an element has a simple type such as xsd:string, it
is prohibited from containing attributes and child elements. W3C XML Schema provides a
large number of data types such as xsd:date for dates, xsd:int for integers,
xsd:double for floating-point numbers and xsd:time for time.
The Schema in Fig. 15.12 indicates that every book element must contain child ele-
ment title. If this element is omitted, the document is well formed, but not valid. If we
remove line 8 from Fig. 15.11, XSV displays the error message shown in Fig. 15.13.
C:\PROGRA~1\XSV>xsv /pythonhtp1/pythonhtp1_examples/Ch15/Schema/
book.xml /pythonhtp1/pythonhtp1_examples/Ch15/Schema/book.xsd
<?xml version='1.0'?>
<xsv docElt='{http://www.deitel.com/booklist}books' instanceAs-
sessed='true' instanceErrors='1' rootType='{http://www.deitel.com/
booklist}:BooksType' schemaDocs='/pythonhtp1/pythonhtp1_examples/
Ch15/Schema/book.xsd' schemaErrors='0' target='file:/pythonhtp1/
pythonhtp1_examples/Ch15/Schema/book.xml' validation='strict' ver-
sion='XSV 1.203.2.37/1.106.2.19 of 2001/11/29 11:00:00' xmlns='http://
www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='file:/pythonhtp1/pythonhtp1_examples/Ch15/
Schema/book.xsd' outcome='success' source='command line'/>
<invalid char='4' code='cvc-complex-type.1.2.4' line='8' re-
source='file:/pythonhtp1/pythonhtp1_examples/Ch15/Schema/
book.xml'>content of book is not allowed to end here (1), expecting
['{None}:title']:
<fsm>
<node id='1'>
<edge dest='2' label='{None}:title'/>
</node>
<node final='true' id='2'/>
</fsm></invalid>
</xsv>
Fig. 15.13 XML document that does not conform to a W3C XML Schema.
pythonhtp1_15.fm Page 510 Saturday, December 15, 2001 2:12 PM
Extensible Stylesheet Language (XSL), which is introduced in Section 15.8. The following
subsections describe MathML, Chemical Markup Language (CML) and other XML vocab-
ularies.
15.7.1 MathML™
Until recently, computers typically required specialized software packages such as TeX
and LaTeX to display complex mathematical expressions. This section introduces Math-
ML, which the W3C developed for describing mathematical notations and expressions.
One application that can parse and render MathML is the W3C’s Amaya™ browser/editor,
which can be downloaded at no charge from
www.w3.org/Amaya/User/BinDist.html
This Web page contains download links for the Windows 95/98/NT/2000, Linux ® and So-
laris™ platforms. Amaya documentation and installation notes also are available at the
W3C Web site.
MathML markup describes mathematical expressions for display. Figure 15.14 uses
MathML to mark up a simple expression. [Note: In this section, we provide sample outputs
that illustrate how a MathML-enabled application might render the markup.]
1 <?xml version="1.0"?>
2
3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5
6 <!-- Fig. 15.14: mathml1.html -->
7 <!-- Simple MathML. -->
8
9 <html xmlns = "http://www.w3.org/1999/xhtml">
10
11 <head><title>Simple MathML Example</title></head>
12
13 <body>
14
15 <math xmlns = "http://www.w3.org/1998/Math/MathML">
16
17 <mrow>
18 <mn>2</mn>
19 <mo>+</mo>
20 <mn>3</mn>
21 <mo>=</mo>
22 <mn>5</mn>
23 </mrow>
24
25 </math>
26
27 </body>
28 </html>
(2 + 3 = 5)
Fig. 15.14 Expression marked up with MathML. (Part 2 of 2.)
We embed the MathML content into an XHTML document by using a math element
with the default namespace http://www.w3.org/1998/Math/MathML (line 15).
The mrow element (line 17) is a container element for expressions that contain more than
one element. In this case, the mrow element contains five children. The mn element (line
18) marks up a number. The mo element (line 19) marks up an operator (e.g., +). Using this
markup, we define the expression 2 + 3 = 5, which a software program that supports
MathML could display.
Let us now consider using MathML to mark up an algebraic equation that uses expo-
nents and arithmetic operators (Fig. 15.15).
1 <?xml version="1.0"?>
2
3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5
6 <!-- Fig. 15.15: mathml2.html -->
7 <!-- Simple MathML. -->
8
9 <html xmlns = "http://www.w3.org/1999/xhtml">
10
11 <head><title>Algebraic MathML Example</title></head>
12
13 <body>
14
15 <math xmlns = "http://www.w3.org/1998/Math/MathML">
16 <mrow>
17
18 <mrow>
19 <mn>3</mn>
20 <mo>⁢</mo>
21
22 <msup>
23 <mi>x</mi>
24 <mn>2</mn>
25 </msup>
26
27 </mrow>
28
29 <mo>+</mo>
30 <mi>x</mi>
31 <mo>-</mo>
32
33 <mfrac>
34 <mn>2</mn>
35 <mi>x</mi>
36 </mfrac>
37
38 <mo>=</mo>
39 <mn>0</mn>
40
41 </mrow>
42 </math>
43
44 </body>
45 </html>
3x 2 + x – 2
--- = 0
x
Fig. 15.15 Algebraic equation marked up with MathML. (Part 2 of 2.)
Element mrow behaves like parentheses, which allow the document author to group
related elements properly. Line 20 uses entity reference ⁢ to indicate a
multiplication operation without a symbolic representation (i.e., the multiplication symbol
does not appear between the 3 and x). For exponentiation, line 22 uses the msup element,
which represents a superscript. This msup element has two children—the expression to be
superscripted (i.e., the base) and the superscript (i.e., the exponent). Similarly, the msub ele-
ment represents a subscript. To display variables such as x, line 23 uses identifier element mi.
To display a fraction, line 33 uses element mfrac. Lines 34–35 specify the numerator
and the denominator for the fraction. If either the numerator or the denominator contains
more than one element, it must be nested in an mrow element.
Figure 15.16 marks up a calculus expression that contains an integral symbol and a
square-root symbol.
1 <?xml version="1.0"?>
2
3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5
6 <!-- Fig. 15.16: mathml3.html -->
7 <!-- Calculus example using MathML. -->
8
9 <html xmlns = "http://www.w3.org/1999/xhtml">
10
11 <head><title>Calculus MathML Example</title></head>
12
13 <body>
14
15 <math xmlns = "http://www.w3.org/1998/Math/MathML">
16 <mrow>
17 <msubsup>
18
19 <mo>∫</mo>
20 <mn>0</mn>
21
22 <mrow>
23 <mn>1</mn>
24 <mo>-</mo>
25 <mi>y</mi>
26 </mrow>
27
28 </msubsup>
29
30 <msqrt>
31 <mrow>
32
33 <mn>4</mn>
34 <mo>⁢</mo>
35
36 <msup>
37 <mi>x</mi>
38 <mn>2</mn>
39 </msup>
40
41 <mo>+</mo>
42 <mi>y</mi>
43
44 </mrow>
45 </msqrt>
46
47 <mo>δ</mo>
48 <mi>x</mi>
49 </mrow>
50 </math>
51 </body>
52 </html>
1 – y 4x 2 + y δx
∫0
Delta symbol
Integral
symbol
The entity reference ∫ (line 19) represents the integral symbol, while the
msubsup element (line 17) specifies the superscript and subscript. Element mo marks up
the integral operator. Element msubsup requires three child elements—an operator (e.g.,
the integral entity reference), the subscript expression (line 20) and the superscript expres-
sion (lines 22–26). Element mn (line 20) marks up the number (i.e., 0) that represents the
subscript. Element mrow marks up the expression (i.e., 1-y) that specifies the superscript
expression
Element msqrt (lines 30–45) represents a square root expression. Line 31 uses ele-
ment mrow to group the expression contained in the square root. Line 47 introduces entity
reference δ for representing a delta symbol. Delta is an operator, so line 47 places
this entity reference in element mo. To see other operations and symbols in MathML, visit
www.w3.org/Math.
pythonhtp1_15.fm Page 514 Saturday, December 15, 2001 2:12 PM
1 <?jumbo:namespace ns = "http://www.xml-cml.org"
2 prefix = "C" java = "jumbo.cmlxml.*Node" ?>
3
4 <!-- Fig. 15.17: ammonia.xml -->
5 <!-- Structure of ammonia. -->
6
7 <C:molecule id = "Ammonia">
8
9 <C:atomArray builtin = "elsym">
10 N H H H
11 </C:atomArray>
12
13 <C:atomArray builtin = "x2" type = "float">
14 1.5 0.0 1.5 3.0
15 </C:atomArray>
16
17 <C:atomArray builtin = "y2" type = "float">
18 1.5 1.5 0.0 1.5
19 </C:atomArray>
20
21 <C:bondArray builtin = "atid1">
22 1 1 1
23 </C:bondArray>
24
25 <C:bondArray builtin = "atid2">
26 2 3 4
27 </C:bondArray>
28
29 <C:bondArray builtin = "order" type = "integer">
30 1 1 1
31 </C:bondArray>
32
33 </C:molecule>
6. At the time of this writing, Jumbo did not allow users to load documents for rendering. For illus-
tration purposes, we created the image shown in Fig. 15.17.
pythonhtp1_15.fm Page 515 Saturday, December 15, 2001 2:12 PM
Ammonia
H H H
Fig. 15.17 CML markup for ammonia molecule. (Part 2 of 2.)
Vocabulary Description
VoiceXML™ The VoiceXML forum founded by AT&T, IBM, Lucent and Motorola
developed VoiceXML. It provides interactive voice communication
between humans and computers through a telephone, PDA (personal
digital assistant) or desktop computer. IBM’s VoiceXML SDK can pro-
cess VoiceXML documents. Visit www.voicexml.org for more
information on VoiceXML.
Geography Markup The OpenGIS developed the GML to describe geographic information.
Language (GML) Visit www.opengis.org for more information on GML.
Extensible User The Mozilla project created XUL for describing graphical user inter-
Interface Language faces in a platform-independent way. For more information visit:
(XUL) www.mozilla.org/xpfe/languageSpec.html.
7. The example in this section requires msxml 3.0 or higher to run. For more information on down-
loading and installing msxml 3.0, visit www.deitel.com.
8. XHTML is the W3C Recommendation that replaces HTML for marking up content for the Web.
For more information on XHTML, see the XHTML Appendices I and J.
pythonhtp1_15.fm Page 517 Saturday, December 15, 2001 2:12 PM
Line 1 of Fig. 15.20 contains the XML declaration. This line is present because an
XSLT document is an XML document. Line 6 is the xsl:stylesheet root element.
Attribute version specifies the version of XSLT to which this document conforms.
Namespace prefix xsl is defined and bound to the XSLT URI defined by the W3C. When
processed, lines 11–13 write the document type declaration to the result tree. Attribute
method is assigned "xml", which indicates that XML is being output to the result tree.
Attribute omit-xml-declaration is assigned "no", which indicates that an XML
declaration will be output to the result tree. Attribute doctype-system and doctype-
public contain the Doctype DTD information that is output to the result tree.
Fig. 15.20 XSLT document that transforms sorting.xml into XHTML. (Part 1 of 3.)
pythonhtp1_15.fm Page 519 Saturday, December 15, 2001 2:12 PM
39
40 <xsl:for-each select = "chapters/frontMatter/*">
41 <tr>
42 <td style = "text-align: right">
43 <xsl:value-of select = "name()" />
44 </td>
45
46 <td>
47 ( <xsl:value-of select = "@pages" /> pages )
48 </td>
49 </tr>
50 </xsl:for-each>
51
52 <xsl:for-each select = "chapters/chapter">
53 <xsl:sort select = "@number" data-type = "number"
54 order = "ascending" />
55 <tr>
56 <td style = "text-align: right">
57 Chapter <xsl:value-of select = "@number" />
58 </td>
59
60 <td>
61 ( <xsl:value-of select = "@pages" /> pages )
62 </td>
63 </tr>
64 </xsl:for-each>
65
66 <xsl:for-each select = "chapters/appendix">
67 <xsl:sort select = "@number" data-type = "text"
68 order = "ascending" />
69 <tr>
70 <td style = "text-align: right">
71 Appendix <xsl:value-of select = "@number" />
72 </td>
73
74 <td>
75 ( <xsl:value-of select = "@pages" /> pages )
76 </td>
77 </tr>
78 </xsl:for-each>
79 </table>
80
81 <p style = "color: blue">Pages:
82 <xsl:variable name = "pagecount"
83 select = "sum(chapters//*/@pages)" />
84 <xsl:value-of select = "$pagecount" />
85 <br />Media Type:
86 <xsl:value-of select = "media/@type" /></p>
87 </body>
88 </xsl:template>
89
90 </xsl:stylesheet>
Fig. 15.20 XSLT document that transforms sorting.xml into XHTML. (Part 2 of 3.)
pythonhtp1_15.fm Page 520 Saturday, December 15, 2001 2:12 PM
Fig. 15.20 XSLT document that transforms sorting.xml into XHTML. (Part 3 of 3.)
XSLT documents contain one or more xsl:template elements that specify which
information the XSLT processor outputs to the result tree. The template on line 16
matches the source tree’s document root. When the document root is encountered during
the transformation, this template is applied, and any text marked up by this element that
is not in the namespace referenced by xsl is outputted to the result tree. Line 18 calls for
all the templates that match children of the document root to be applied. Line 23 spec-
ifies a template that matches element book.
Lines 25–26 create the title for the XHTML document. We use the ISBN of the book
from attribute isbn and the contents of element title to create the title string ISBN
999-99999-9-X - Mary’s XML Primer. Element xsl:value-of selects the book ele-
ment’s isbn attribute.
Lines 33–35 create a header element that contains the book’s author. Because the con-
text node (i.e., the current node being processed) is book, the expression author/last-
Name selects the author’s last name, and the expression author/firstName selects the
author’s first name.
Line 40 selects each element (indicated by an asterisk) that is a child of element
frontMatter. Line 43 calls node-set function name to retrieve the current node’s ele-
ment name (e.g., preface). The current node is the context node specified in the
xsl:for-each (line 40).
Lines 53–54 sort chapters by number in ascending order. Attribute select selects
the value of context node chapter’s attribute number. Attribute data-type with
value "number", specifies a numeric sort and attribute order specifies "ascending"
pythonhtp1_15.fm Page 521 Saturday, December 15, 2001 2:12 PM
order. Attribute data-type also can be assigned the value "text" (line 67) and
attribute order also may be assigned the value "descending".
Lines 82–83 use an XSLT variable to store the value of the book’s page count and
output it to the result tree. Attribute name specifies the variable’s name, and attribute
select assigns it a value. Function sum totals the values for all page attribute values.
The two slashes between chapters and * indicate that all descendent nodes of chap-
ters are searched for elements that contain an attribute named pages.
Figure 15.21 shows the XHTML that is generated when msxml applies
sorting.xsl to sorting.xml. In Chapter 16, we use several of Python’s XML-
related packages to apply XSLT style sheets to XML documents.
Notice that the XHTML document contains an XML declaration that is different than
what was shown previously.Value encoding indicates the type of character encoding
(i.e., a set of numeric values associated with characters) the document uses. This document
uses UTF-8, which is well suited for ASCII-based systems. UTF-8 is the default encoding
for XML documents. More information on character encoding and UTF-8 may be found in
Appendix F, Unicode.
www.xmlbooks.com
This site contains a list of XML books recommended by Charles Goldfarb—one of the original de-
signers of GML (General Markup Language), from which XML’s parent language SGML (Standard
Generalized Markup Language) was derived.
wdvl.internet.com/Authoring/Languages/XML
The Web Developer's Virtual Library XML site includes tutorials, FAQ, the latest news and extensive
links to XML sites and software downloads.
www.xml.com
Visit xml.com for the latest news and information about XML, conference listings, links to XML
Web resources organized by topic, tools and more.
msdn.microsoft.com/xml/default.asp
The MSDN Online XML Development Center features articles on XML, Ask the Experts chat ses-
sions, samples and demos, newsgroups and other helpful information.
www.oasis-open.org/cover/xml.html
The SGML/XML Web Page is an extensive resource that includes links to FAQs, online resources, in-
dustry initiatives, demos, conferences and tutorials.
www.gca.org/whats_xml/default.htm
The GCA site has an XML glossary, list of books, brief descriptions of the draft standards for XML
and links to online drafts.
www.xmlinfo.com
XMLINFO is a resource site with tutorials, a list of recommended books, documentation, discussion
forums and more.
developer.netscape.com/tech/xml/index.html
The XML and Metadata Developer Central site has demos, technical notes and news articles related
to XML.
www.ucc.ie/xml
This site is a detailed XML FAQ. Submit your own questions through the site.
www.xml-cml.org
This site is a resource for the Chemical Markup Language (CML). It includes a FAQ list, documen-
tation, software and XML links.
SUMMARY
• XML is a widely supported open technology (i.e., nonproprietary technology) for data exchange.
• XML permits document authors to create their own markup for virtually any type of information.
This extensibility enables document authors to create entirely new markup languages (called vo-
cabularies) to describe specific types of data, including mathematical formulas, chemical molecu-
lar structures, music and recipes.
• XML allows document authors to create their own tags, so naming collisions (i.e., different ele-
ments that have the same name) can occur. Namespaces enable document authors to prevent col-
lisions among elements in an XML document.
• Namespace prefixes prepended to tag names specify the namespace in which the element can be
found. Each namespace prefix has a corresponding uniform resource identifier (URI) that uniquely
identifies the namespace. By definition, a URI is a series of characters that differentiates names.
Document authors can create their own namespace prefixes. Document authors can use virtually
any namespace prefix except the reserved namespace prefix xml.
pythonhtp1_15.fm Page 524 Saturday, December 15, 2001 2:12 PM
• To eliminate the need to place a namespace prefix in each element, authors may specify a default
namespace for an element and all of its child elements.
• XML documents are highly portable. Opening an XML document does not require special soft-
ware—any text editor that supports ASCII/Unicode characters will suffice. One important charac-
teristic of XML is that it is both human readable and machine readable.
• Processing an XML document—which typically ends in the .xml extension—requires a software
program called an XML parser (or an XML processor). Parsers check an XML document’s syntax
and can support the Document Object Model (DOM) and/or the Simple API for XML (SAX) API.
• DOM-based parsers build a tree structure containing the XML document’s data in memory. This
allows programs to manipulate the document’s data. SAX-based parsers process the document and
generate events as the parser encounters tags, text, comments and so on. These events contain data
from the XML document.
• An XML document can reference an optional document that defines the XML document’s struc-
ture. This optional document can be either a Document Type Definition (DTD) or a Schema.
• A DOM tree has a single root node that contains all other nodes in the document. The XML parser
exposes these methods and properties as a programmatic library, called an Application Program-
ming Interface (API).
• A node that contains other nodes (called child nodes) is a parent node. Nodes that are peers are
sibling nodes. A node’s descendant nodes include that node’s children, its children’s children and
so on. A node’s ancestor nodes include that node’s parent, its parent’s parent and so on.
• If the XML document conforms to its DTD or Schema, then the XML document is valid. Parsers
that cannot check for document conformity against DTDs or Schemas are called nonvalidating
parsers. If an XML parser (validating or nonvalidating) can process an XML document that does
not have a DTD or Schema successfully, the XML document is well formed (i.e., it is syntactically
correct). By definition, a valid XML document also is a well-formed document.
• The ATTLIST element type declaration in a DTD defines an attribute. Keyword #IMPLIED
specifies that, if the parser finds an element without the attribute, the application can provide a val-
ue or ignore the missing attribute. Keyword #REQUIRED specifies that the attribute must be in the
document, and keyword #FIXED specifies that the attribute must have the given value. Flag CDA-
TA specifies that an attribute contains data that the parser should not process as markup. Keyword
EMPTY specifies that the element does not contain any text.
• Flag #PCDATA specifies that the element can store parsed character data (i.e., text). Document au-
thors must replace the characters less than (<) and ampersand (&) with their corresponding entity
references (i.e., < and &).
• Schemas use XML syntax.
• In XML Schema, element element defines an element. Attributes name and type specify the
element’s name and data type, respectively. Any element that contains attributes or child ele-
ments must define a type—called a complex type—that defines each attribute and child element.
• Attribute minOccurs specifies the minimum number of occurrences for an element. Attribute
maxOccurs specifies the maximum number of occurrences for an element.
• When an element is a simple type, such as xsd:string, that element cannot contain attributes
and child elements.
• MathML markup describes mathematical expressions.
• Chemical Markup Language (CML) marks up molecular and chemical information.
• The characters <? and ?> delimit processing instructions (PIs), which are application-specific in-
formation embedded in an XML document. A processing instruction consists of a PI target and a
PI value.
pythonhtp1_15.fm Page 525 Saturday, December 15, 2001 2:12 PM
• Extensible Stylesheet Language (XSL) documents specify how programs should render an XML
document’s data. A subset of XSL—XSL Transformations (XSLT)—provides elements that de-
fine rules for transforming data from one XML document into another text-based format such as
XHTML.
• Transforming an XML document using XSLT involves two tree structures: The source tree (i.e.,
the XML document being transformed) and the result tree (i.e., the XML document to create).
TERMINOLOGY
ancestor node namespace prefix
asterisk (*) occurrence indicator node
atomArray element nonvalidating XML parser
ATTLIST element type declaration occurrence indicator
CDATA flag order attribute
child node parent node
complexType element parsed character data
container element parser
context node #PCDATA flag
data-type attribute PI (processing instruction)
default namespace PI target
descendent node PI value
doctype-public attribute plus sign (+) occurrence indicator
doctype-system attribute processing instruction
document reuse question mark (?) occurrence indicator
document root result tree
Document Type Definition (DTD) root element
DOM (Document Object Model) root node
DOM API (Application Programming Interface) SAX (Simple API for XML)
DOM-based XML parser SAX-based parser
EBNF (Extended Backus-Naur Form) grammar schema element
ELEMENT element type declaration Schema valid
empty element select attribute
EMPTY keyword simple type
event single-quote character (')
Extensible Stylesheet Language (XSL) source tree
external DTD stylesheet element
forward slash sum function
#IMPLIED flag SYSTEM flag
invalid document targetNamespace attribute
match attribute tree-based model
maxOccurs attribute type attribute
minOccurs attribute unbounded value
mn element validating XML parser
molecule element well-formed document
mrow element XML (Extensible Markup Language)
msqrt element XML declaration
msub element .xml file extension
msubsup element xml namespace prefix
msxml parser XML parser
name attribute XML processor
name node-set function XML Schema
pythonhtp1_15.fm Page 526 Saturday, December 15, 2001 2:12 PM
SELF-REVIEW EXERCISES
15.1 Which of the following tag names might be found in a well-formed XML document?
a) yearBorn.
b) year.Born.
c) year Born.
d) year-Born1.
e) 2_year_born.
f) --year/born.
g) year*born.
h) .year_born.
i) _year_born_.
j) y_e-a_r-b_o-r_n.
15.2 State whether each of the following is true or false. If false, explain why.
a) XML is a technology for creating markup languages.
b) Forward and backward slashes (/ and \) delimit XML markup text.
c) All XML start tags must have corresponding end tags.
d) Parsers check an XML document’s syntax.
e) XML, in any mixture of case, is a reserved namespace prefix.
f) When creating XML documents, document authors must use the set of XML tags that the
W3C provides.
g) In an XML document, the pound character (#), the dollar sign ($), ampersand (&), great-
er-than (>) and less-than (<) must be replaced with their corresponding entity references.
15.3 Fill in the blanks for each of the following statements:
a) MathML element defines a mathematical operator.
b) help avoid naming collisions.
c) embed application-specific information into an XML document.
d) is Microsoft’s XML parser.
e) XSL element inserts a DOCTYPE in the result tree.
f) XML Schema documents have root element .
g) Element marks up the ∫ MathML entity reference.
h) defines attributes in a DTD.
i) XSL element is the root element in an XSL document.
j) XSL element selects specific XML elements using repetition.
15.4 State whether each of the following is true or false. If false, explain why.
a) XML is not case sensitive.
b) An XML document may contain only one root element.
c) XML is a formatting language.
d) A DTD/Schema defines the style of an XML document.
e) MathML is an XML vocabulary.
f) XSL is an acronym for XML Stylesheet Language.
pythonhtp1_15.fm Page 527 Saturday, December 15, 2001 2:12 PM
g) The <!ELEMENT list (item*)> defines element list as containing one or more
item elements.
h) XML documents must have the .xml extension.
15.5 Find the error(s) in each of the following and explain how to correct it (them).
a) <job>
<title>Manager</title>
<task number = "42">
</job>
b) <mfrac>
<mi>x</mi>
<mo>+</mo>
<mn>4</mn>
<mi>y</mi>
</mfrac>
c) <company name = "Deitel & Associates, Inc." />
15.6 What is the #PCDATA flag used for?
15.7 Write a processing instruction for Internet Explorer that includes the style sheet wap.xsl.
EXERCISES
15.8 Create an XML document that marks up the nutrition facts for a package of Grandma Deitel’s
Cookies. A package of Grandma Deitel’s Cookies has a serving size of 1 package and the following
nutritional value per serving: 260 calories, 100 fat calories, 11 grams of fat, 2 grams of saturated fat,
pythonhtp1_15.fm Page 528 Saturday, December 15, 2001 2:12 PM
16
Python XML Processing
Objectives
• To create XML markup programmatically.
• To use the Document Object Model (DOM™) to
manipulate XML documents.
• To use the Simple API for XML (SAX) to retrieve
data from XML documents.
• To create an XML-based message forum.
Knowing trees, I understand the meaning of patience.
Knowing grass, I can appreciate persistence.
Hal Borland
I think that I shall never see
A poem lovely as a tree.
Joyce Kilmer
I played with an idea, and grew willful; tossed it into the air;
transformed it; let it escape and recaptured it; made it
iridescent with fancy, and winged it with paradox.
Oscar Wilde
pythonhtp1_16.fm Page 530 Wednesday, December 19, 2001 2:46 PM
Outline
16.1 Introduction
16.2 Generating XML Content Dynamically
16.3 XML Processing Packages
16.4 Document Object Model (DOM)
16.5 Parsing XML with xml.sax
16.6 Case Study: Message Forums with Python and XML
16.6.1 Displaying the Forums
16.6.2 Adding Forums and Messages
16.6.3 Alterations for Browsers without XML and XSLT Support
16.7 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
16.1 Introduction
In Chapter 15, we introduced XML and various XML-related technologies. In this chapter,
we demonstrate how Python applications and scripts can process XML documents. Support
for XML is provided through a large collection of freely available Python packages and
modules. This chapter focuses on the two of these Python packages: 4DOM and xml.sax.
In this chapter, we discuss how to generate XML content programatically. We intro-
duce DOM- and SAX-based parsing for programmatically manipulating an XML docu-
ment’s data. The chapter concludes with a case study that uses XML to mark up an online
message forum’s data.
1 O'Black, John
2 Green, Sue
Fig. 16.1 Text file names.txt used in Fig. 16.2. (Part 1 of 2.)
pythonhtp1_16.fm Page 531 Wednesday, December 19, 2001 2:46 PM
3 Red, Bob
4 Blue, Mary
5 White, Mike
6 Brown, Jane
7 Gray, Bill
Fig. 16.1 Text file names.txt used in Fig. 16.2. (Part 2 of 2.)
1 #!c:\Python\python.exe
2 # Fig. 16.2: fig16_02.py
3 # Marking up a text file's data as XML.
4
5 import sys
6
7 print "Content-type: text/xml\n"
8
9 # write XML declaration and processing instruction
10 print """<?xml version = "1.0"?>
11 <?xml:stylesheet type = "text/xsl"
12 href = "../XML/contact_list.xsl"?>"""
13
14 # open data file
15 try:
16 file = open( "names.txt", "r" )
17 except IOError:
18 sys.exit( "Error opening file" )
19
20 print "<contacts>" # write root element
21
22 # list of tuples: ( special character, entity reference )
23 replaceList = [ ( "&", "&" ),
24 ( "<", "<" ),
25 ( ">", ">" ),
26 ( '"', """ ),
27 ( "'", "'" ) ]
28
29 # replace special characters with entity references
30 for currentLine in file.readlines():
31
32 for oldValue, newValue in replaceList:
33 currentLine = currentLine.replace( oldValue, newValue )
34
35 # extract lastname and firstname
36 last, first = currentLine.split( ", " )
37 first = first.strip() # remove carriage return
38
39 # write contact element
40 print """ <contact>"
41 <LastName>%s</LastName>
42 <FirstName>%s</FirstName>
43 </contact>""" % ( last, first )
44
45 file.close()
46
47 print "</contacts>"
Line 7 prints the HTTP header, which sets the MIME type to text/xml. Lines 10–
12 print the XML declaration and a processing instruction for Internet Explorer. The pro-
cessing instruction references the XSLT style sheet contact_list.xsl (Fig. 16.3).
After the script prints the headers, lines 15–18 open the file (or exit, if the file could not
be opened). Line 20 prints the <contacts> start tag of the root element. A list of five tuples
is created in lines 23–27. Each tuple contains two values: a character and an entity reference
that corresponds to that character. The for loop in lines 30–43 generates XML elements for
each name in the file. Lines 32–33 call method replace to substitute characters (e.g., <, &,
etc.) with their corresponding entity references. The split method (line 36) extracts the last
name and first name from the line read from the file. Line 37 removes any whitespace (e.g.,
a carriage return) from the first name. The XML element containing the person’s name is
printed in lines 40–43. Finally, line 47 prints the root element’s end tag.
13 <head>
14 <title>Contact List</title>
15 </head>
16
17 <body>
18 <table border = "1">
19
20 <thead>
21 <tr>
22 <th>First Name</th>
23 <th>Last Name</th>
24 </tr>
25 </thead>
26
27 <!-- process each contact element -->
28 <xsl:for-each select = "contacts/contact">
29 <tr>
30 <td>
31 <xsl:value-of select = "FirstName" />
32 </td>
33 <td>
34 <xsl:value-of select = "LastName" />
35 </td>
36 </tr>
37 </xsl:for-each>
38
39 </table>
40
41 </body>
42
43 </html>
44
45 </xsl:template>
46
47 </xsl:stylesheet>
Another package, 4XSLT, contains an XSLT processor for transforming XML docu-
ments into other text-based formats. 4XSLT is located in a package called 4Suite3
(4suite.org), from Fourthought, Inc. The classes and functions provided by 4XSLT are
located in xml.xslt.
3. PyXML must be installed prior to installing 4Suite. Visit www.deitel.com for installation
instructions.
pythonhtp1_16.fm Page 535 Wednesday, December 19, 2001 2:46 PM
Lines 10–11 attempt to open article2.xml for reading. If the file cannot be
opened, the program exits with the message "Error opening file" (lines 12–13).
Line 17 instantiates a PyExpat Reader object, which is an instance of a DOM-based
parser. Module PyExpat is located in 4DOM’s reader package. Line 18 passes the XML
document referenced by file to Reader method fromStream, which parses the doc-
ument and loads the XML document’s data into memory.Variable document references
the DOM tree (called a Document) returned by fromStream.
A Document object’s documentElement attribute refers to the Document’s root
element node. Line 24 passes the root element node to 4DOM’s StripXml function, which
removes insignificant whitespace (e.g., the carriage return line feeds and spaces used for
indentation) from an XML DOM tree. If StripXml is not called, insignificant whitespace
would be stored in the DOM tree. Recall from Chapter 15, that a DOM tree contains a set
of nodes. Each node in a DOM tree is of a type derived from class Node. We say more
about these derived classes momentarily.
Lines 25–26 print the name of rootElement via its nodeName attribute. A Node
object’s childNodes attribute is a list of that Node’s children. Lines 31–32 print the
pythonhtp1_16.fm Page 536 Wednesday, December 19, 2001 2:46 PM
nodeName of each child node of rootElement. Lines 35–49 then print the names of
specific nodes. A Node object’s firstChild attribute corresponds to the first child node
in that Node’s list of children. Lines 35–36 assign the first child of rootElement to vari-
able child and print the child’s name.
Line 40 assigns the next sibling of child to variable sibling. Attribute
nextSibling contains a node’s next sibling (i.e., the next node that has the same parent
node). For example, title, date, author, summary and content are sibling nodes.
Line 41 prints sibling’s name.
Line 44 assigns the first child node of sibling to variable value. In this case,
value is a Text node that represents the contents of sibling. Text nodes contain
character data. Line 47 prints the text contained in value by accessing its nodeValue
attribute. Lines 48–49 print sibling’s parent node. Parent nodes are obtained through the
parentNode attribute. Finally, line 51 calls Reader method releaseNode, which
removes the specified Document (i.e., DOM tree) from memory.
Good Programming Practice 16.1
Although not required in Python version 2.0 and higher, calling method releaseNode en-
sures that a DOM tree is freed from memory. 16.1
The classes that inherit from Node represent the various XML node types. The Doc-
ument node represents the entire XML document (in memory) and provides methods for
manipulating its data. Element nodes represent XML elements. Text nodes represent
character data. Attr nodes represent XML attributes, and Comment nodes represent com-
ments. Document nodes can contain Element, Text and Comment nodes. Element
nodes can contain Attr, Element, Text and Comment nodes.
The tables in Fig. 16.6–Fig. 16.12 summarize important DOM attributes and methods
for navigating and updating DOM trees. Figure 16.6 describes some Node attributes and
methods, Fig. 16.7 describes some NodeList (i.e., an ordered list of Nodes) attributes
and methods, Fig. 16.8 describes some NamedNodeMap (i.e., an unordered dictionary of
Nodes) attributes and methods, Fig. 16.9 describes some Document attributes and
methods, Fig. 16.10 describes some Element attributes and methods, Fig. 16.11
describes some Attr attributes and Fig. 16.12 describes a Text and Comment attribute.
The program in Fig. 16.13 uses the DOM to add names to the contact list XML docu-
ment, contacts.xml (Fig. 16.14). The XML document is loaded into memory, pro-
grammatically manipulated and saved to disk (overwriting the previous version).
Attribute/Method Description
Attribute/Method Description
Atrribute/Method Description
Atrribute/Method Description
item( i ) Returns the attribute node at index i. Indices range from 0 to length – 1.
length Number of attribute nodes for the given element node.
Atrribute/Method Description
createAttribute( name ) Creates and returns an Attr node with the specified
name.
createComment( data ) Creates and returns a Comment node that contains
the specified data.
createElement( tagName ) Creates and returns an Element node with the
specified tagName.
createTextNode( data ) Creates and returns a Text node that contains the
specified data.
documentElement Root element node of the document tree (DOM tree).
Atrribute/Method Description
Attribute/Method Description
Attribute Description
Atrribute Description
54
55 # open contacts file
56 try:
57 file = open( "contacts.xml", "r+" )
58 except IOError:
59 sys.exit( "Error opening file" )
60
61 # create DOM parser and parse XML document
62 reader = PyExpat.Reader()
63 document = reader.fromStream( file )
64
65 printList( document )
66 printInstructions()
67 character = "l"
68
69 while character != "q":
70 character = raw_input( "\n? " )
71
72 if character == "a":
73 addContact( document )
74 elif character == "l":
75 printList( document )
76 elif character == "i":
77 printInstructions()
78 elif character != "q":
79 print "Invalid command!"
80
81 file.seek( 0, 0 ) # position to beginning of file
82 file.truncate() # remove data from file
83 PrettyPrint( document, file ) # print DOM contents to file
84 file.close() # close XML file
85 reader.releaseNode( document ) # free memory
? a
Enter the name of the person you wish to add: Michael Red
? l
Your contact list is:
John Black
Sue Green
Michael Red
? q
Line 57 opens contacts.xml for reading and writing. A parser object is instanti-
ated on line 62. Line 63 calls method fromStream to parse the XML document and build
the DOM tree.
Line 65 calls function printList (lines 14–28) to print the contact list to the screen.
Method getElementsByTagName (line 18) returns a NodeList that contains all Ele-
ment nodes that have contact for a tag name. Line 19 calls getElementsByTagName
to obtain a NodeList for all Element nodes that have FirstName for a tag name. Each
node referenced by contact contains only one such node. This one node is accessed as the
first element in the list (i.e., [ 0 ]). Line 22 assigns the value of first’s first child element
(a Text node) to variable firstText. Lines 25–26 repeat the processes to obtain the last
name. Line 28 prints the current contact’s first name and last name to the screen.
Line 66 calls function printInstructions to print the program’s instructions.
Lines 69–79 get the user’s choice and call the appropriate function.
The addContact function (lines 30–53) adds a contact to the list. The Document’s
root element is obtained via its documentElement attribute (line 31). Lines 33–36 prompt
the user for input and call string method split to separate the first name from the last name.
Line 39 calls the Document’s createElement method to create an Element node
with the tag name FirstName. Lines 40–41 create and append a Text node to this Ele-
ment node by calling the createTextNode and appendChild methods, respectively.
Lines 44–46 create an Element node with the tag name LastName in a similar manner.
Line 49 creates an Element node with the tag name contact. Lines 50–51 call
method appendChild to add the Element nodes referenced by firstNode and
lastNode to the node referenced by contactNode. Line 53 calls method append-
Child to add the node referenced by contactNode to the node referenced by root.
When the user has finished adding names to the contact list, the file is saved. The seek
method (line 81) positions the file pointer to the beginning of the file and method trun-
cate (line 82) deletes the contents of the file. Then, 4DOM’s PrettyPrint function
writes the updated XML to the file (line 83). Function PrettyPrint writes an XML
DOM tree’s data to a specified output stream (with indentation and carriage returns for
readability). Lines 84–85 close the file and release the DOM tree from memory.
Figure 16.16 demonstrates SAX-based parsing. This program allows the user to
specify a tag name to search for in an XML document. When the tag name is encountered,
the program outputs the element’s attribute-value pairs. Methods startElement and
endElement are overriden to handle the events generated when start tags and end tags
are encountered. Figure 16.17 contains the XML document used by this program.
Lines 42–43 obtain the name of the XML document to parse and the tag name to locate.
Line 46 invokes xml.sax function parse, which creates a SAX parser object. Function
parse’s first argument is either a Python file object or a filename. The second argument
passed to parse must be an instance of class xml.sax.ContentHandler (or a
derived class of ContentHandler, such as TagInfoHandler), which is the main
callback handler in xml.sax. Class ContentHandler contains the methods
(Fig. 16.15) for handling SAX events.
If an error occurs during the opening of the specified file, an IOError exception
is raised, and line 50 displays an error message. If an error occurs while parsing the file
(e.g., if the specified XML document is not well-formed), parse raises a SAX-
ParseException exception, and line 54 displays an error message.
characters( content ) Called when the parser encounters character data. The
character data is passed as content to the event handler.
endDocument() Called when the parser encounters the end of the docu-
ment.
endElement( name ) Called when the parser encounters an end tag. The tag
name is passed as an argument to the event handler.
startDocument() Called when the parser encounters the beginning of the
document.
startElement( name, attrs ) Called when the parser encounters a start tag. The tag
name and its attributes (attrs) are passed as arguments to
the event handler.
<box> started
Attributes:
size = big
<box> started
Attributes:
size = medium
</box> ended
<box> started
Attributes:
type = small
<box> started
Attributes:
type = tiny
</box> ended
</box> ended
</box> ended
Our example overrides only two event handlers. Methods startElement and
endElement are called when start tags and end tags are encountered. Method start-
Element (lines 16–31) takes two arguments—the element’s tag name as a string and the
element’s attributes. The attributes are passed as an instance of class AttributesImpl,
defined in xml.sax.reader. This class provides a dictionary-like interface to the ele-
ment’s attributes.
Line 21 determines whether the element received from the event contains the tag name
that the user specified. If so, line 22 prints the start tag, indented by depth spaces, and line
24 increments depth by 3 to ensure that the next tag printed indented further.
Lines 29–31 print the element’s attributes. The for loop first obtains the attribute
names by invoking the getNames method of attributes. The loop then prints each
attribute name and its corresponding value—obtained by passing the current attribute name
to the getValue method of attributes.
Method endElement (lines 34–39) executes when an end tag is encountered and
receives the end tag’s name as an argument. If name contains the tag name specified by the
pythonhtp1_16.fm Page 546 Wednesday, December 19, 2001 2:46 PM
user, line 38 decreases the indent by decrementing depth. Line 39 prints that the specified
end tag was found.
4. The implementation of this message forum requires Internet Explorer 5 or higher, and msxml 3.0
or higher. In Section 16.6.3, we discuss how other client browsers, such as Netscape, may be used.
pythonhtp1_16.fm Page 547 Wednesday, December 19, 2001 2:46 PM
forums.xml XML document containing available forum titles and their filenames.
default.py Main page that provides navigational links to the forums.
template.xml Template for a message forum document.
addForum.py Adds a new forum.
feedback.xml Sample message forum.
formatting.xsl XSLT document for transforming message forums into XHTML.
addPost.py Adds a message to a forum.
error.html Displays an error message.
site.css Style sheet for formatting XHTML content.
forum.py Transforms XML documents to HTML on the server for non-Internet
Explorer clients.
default.py forums.xml
addForum.py
feedback.xml
formatting.xsl
addPost.py
Fig. 16.21 XML document representing a forum containing one message. (Part 1 of 2.)
pythonhtp1_16.fm Page 549 Wednesday, December 19, 2001 2:46 PM
Fig. 16.21 XML document representing a forum containing one message. (Part 2 of 2.)
Fig. 16.22 XML document containing data for all available forums.
Visitors to the message forum are greeted initially by the Web page that default.py
(Fig. 16.23) generates, which displays links to all forums and provides forum management
options. Initially, only two links are active—one to view the Feedback forum (i.e., the
sample forum) and one to create a forum. In the chapter exercises, we ask the reader to
enhance the message forum by adding functionality for modifying and deleting forums.
1 #!c:\Python\python.exe
2 # Fig. 16.23: default.py
3 # Default page for message forums.
4
5 import os
6 import sys
7 from xml.dom.ext.reader import PyExpat
8
9 def printHeader( title, style ):
10 print """Content-type: text/html
11
12 <?xml version = "1.0" encoding = "UTF-8"?>
13 <!DOCTYPE html PUBLIC
14 "-//W3C//DTD XHTML 1.0 Strict//EN"
15 "DTD/xhtml1-strict.dtd">
16 <html xmlns = "http://www.w3.org/1999/xhtml">
17
Fig. 16.23 Default page for the message forum. (Part 1 of 3.)
pythonhtp1_16.fm Page 550 Wednesday, December 19, 2001 2:46 PM
18 <head>
19 <title>%s</title>
20 <link rel = "stylesheet" href = "%s" type = "text/css" />
21 </head>
22
23 <body>""" % ( title, style )
24
25 # open XML document that contains the forum names and locations
26 try:
27 XMLFile = open( "../htdocs/XML/forums.xml" )
28 except IOError:
29 print "Location: /error.html\n"
30 sys.exit()
31
32 # parse XML document containing forum information
33 reader = PyExpat.Reader()
34 document = reader.fromStream( XMLFile )
35 XMLFile.close()
36
37 # write XHTML to browser
38 printHeader( "Deitel Message Forums", "/XML/site.css" )
39 print """<h1>Deitel Message Forums</h1>
40 <p style="font-weight:bold">Available Forums</p>
41 <ul>"""
42
43 # determine client-browser type
44 if os.environ[ "HTTP_USER_AGENT" ].find( "MSIE" ) != -1:
45 prefix = "../XML/" # Internet Explorer
46 else:
47 prefix = "forum.py?file="
48
49 # add links for each forum
50 for forum in document.getElementsByTagName( "forum" ):
51
52 # create link to forum
53 link = prefix + forum.attributes.item( 0 ).value
54
55 # get element nodes containing tag name "name"
56 name = forum.getElementsByTagName( "name" )[ 0 ]
57
58 # get Text node's value
59 nameText = name.childNodes[ 0 ].nodeValue
60 print '<li><a href = "%s">%s</a></li>' % ( link, nameText )
61
62 print """</ul>
63 <p style="font-weight:bold">Forum Management</p>
64 <ul>
65 <li><a href = "addForum.py">Add a Forum</a></li>
66 <li>Delete a Forum</li>
67 <li>Modify a Forum</li>
68 </ul>
69 </body>
70
Fig. 16.23 Default page for the message forum. (Part 2 of 3.)
pythonhtp1_16.fm Page 551 Wednesday, December 19, 2001 2:46 PM
71 </html>"""
72
73 reader.releaseNode( document )
Fig. 16.23 Default page for the message forum. (Part 3 of 3.)
This Python script uses modules in package 4DOM to parse forums.xml. Lines 33–
34 instantiate a parser object, then load and parse forums.xml. Lines 38–71 output
XHTML to the browser. First, line 38 prints the XHTML header for the main page by
calling function printHeader (lines 9–23). This function prints the XHTML header
with a specified title and a link to a Cascading Style Sheet (CSS) that formats the page. In
this case study, we would like to take advantage of msxml’s XML parsing and XSLT pro-
cessing capabilities to reduce the amount of processing the server must perform. Lines 44–
45 determine whether the client is using Internet Explorer. If so, prefix is set to "../
XML/". Otherwise, prefix is set to "forum.py?file=". Note that line 47 uses
prefix to construct the hyperlinks to each forum. Clients who use Internet Explorer
request the XML documents directly, while other clients request forum.py. We discuss
this in greater detail in Section 16.6.3. The for loop (lines 50–60) retrieves all Element
nodes that contain the tag name forum. Hyperlinks are created to each forum found in
forums.xml. Lines 62–71 print the remaining XHTML, including a hyperlink to
addForum.py. Finally, line 73 releases the Document object from memory.
mitted, the script is re-requested and passed the user-entered form values. When this occurs,
the condition (line 32) is true, and lines 33–92 execute.
1 #!c:\Python\python.exe
2 # Fig. 16.24: addForum.py
3 # Adds a forum to the list
4
5 import re
6 import sys
7 import cgi
8
9 # 4DOM packages
10 from xml.dom.ext.reader import PyExpat
11 from xml.dom.ext import PrettyPrint
12
13 def printHeader( title, style ):
14 print """Content-type: text/html
15
16 <?xml version = "1.0" encoding = "UTF-8"?>
17 <!DOCTYPE html PUBLIC
18 "-//W3C//DTD XHTML 1.0 Strict//EN"
19 "DTD/xhtml1-strict.dtd">
20 <html xmlns = "http://www.w3.org/1999/xhtml">
21
22 <head>
23 <title>%s</title>
24 <link rel = "stylesheet" href = "%s" type = "text/css" />
25 </head>
26
27 <body>""" % ( title, style )
28
29 form = cgi.FieldStorage()
30
31 # if user enters data in form fields
32 if form.has_key( "name" ) and form.has_key( "filename" ):
33 newFile = form[ "filename" ].value
34
35 # determine whether file has xml extension
36 if not re.match( "\w+\.xml$", newFile ):
37 print "Location: /error.html\n"
38 sys.exit()
39 else:
40
41 # create forum files from xml files
42 try:
43 newForumFile = open( "../htdocs/XML/" + newFile, "w" )
44 forumsFile = open( "../htdocs/XML/forums.xml", "r+" )
45 templateFile = open( "../htdocs/XML/template.xml" )
46 except IOError:
47 print "Location: /error.html\n"
48 sys.exit()
49
Fig. 16.24 Script that adds a new forum to forums.xml. (Part 1 of 3.)
pythonhtp1_16.fm Page 553 Wednesday, December 19, 2001 2:46 PM
Fig. 16.24 Script that adds a new forum to forums.xml. (Part 2 of 3.)
pythonhtp1_16.fm Page 554 Wednesday, December 19, 2001 2:46 PM
103
104 <a href = "/cgi-bin/default.py">Return to Main Page</a>
105 </body>
106
107 </html>"""
Fig. 16.24 Script that adds a new forum to forums.xml. (Part 3 of 3.)
Line 36 examines the filename posted to the script to make sure it contains only alpha-
numeric characters and ends with .xml; if not, the script redirects the client to
error.html. This prevents a malicious user from writing to a system file or otherwise
gaining unrestricted access to the server. However, it is important to note that other solu-
tions exist, such as generating filenames on the server. If the filename is permitted, line 43
attempts to create the file by calling function open.
Line 44 opens file forums.xml for reading and writing ("r+"). Line 40 opens the
template XML document named template.xml (Fig. 16.25), which provides a forum’s
pythonhtp1_16.fm Page 555 Wednesday, December 19, 2001 2:46 PM
markup. The template contains an empty forums element, to which the forum name and
filename are added programmatically. If an error occurs during an attempt to open any file,
the client is redirected to error.html.
Line 51 instantiates a DOM parser and assigns it to variable reader. Line 52 loads
and parses forums.xml; the Document object created is assigned to variable docu-
ment. Because we wish to create a forum element within forums, line 55 calls the Doc-
ument object’s createElement method with the name of the new element
("forum"). The filename attribute of the new Element node is set by calling
setAttribute and passing the attribute’s name and value.
The forum element contains only one piece of information—the forum name—added
by lines 58–61. Line 58 creates another Element node named name. To add character
data to the new Element node, a child Text node must be created. We call method cre-
ateTextNode (line 59) with the forum name from the form (i.e., form[ "name"
].value). Line 60 appends the Text node to the Element node referenced by name
by calling method appendChild. Line 61 adds the Element node referenced by name
to the Element node referenced by forum.
Line 64 accesses the documentElement attribute of document to obtain the root
element node (i.e., forums). Lines 65–66 obtain a NodeList of all forum elements by
calling method getElementsByTagName, the first of which is assigned to variable
firstForum. Line 67 inserts the new Element node referenced by forum before the
first child node of forums by calling method insertBefore. With this technique, the
most recently added forums appear first in the forum list.
To update forums.xml, line 70 seeks to the beginning and deletes any existing
data (by truncating the file to size 0). Line 72 then calls function PrettyPrint to write
the updated XML to forumsFile.
Line 76 loads and parses file template.xml (Fig. 16.25) by calling method from-
Stream and assigns the Document object created to variable document. Line 77 uses
documentElement to get the root element, and line 78 sets its file attribute’s value
to the specified filename. Lines 81–84 add the name node, and lines 87–88 output the
updated XML to newForumFile and close the file. Lines 89–90 close template.xml
and release the Document object from memory. The user is redirected to default.py
in line 92.
Figure 16.26 contains the Python script that allows users to add messages to a forum.
When formatting.xsl (Fig. 16.27) is applied to a forum document, a link to
addPost.py is added to the page, which includes the current forum’s filename. This file-
name is passed to addPost.py (e.g., addPost.py?file=forum1.xml).
1 #!c:\Python\python.exe
2 # Fig. 16.26: addPost.py
3 # Adds a message to a forum.
4
5 import re
6 import os
7 import sys
8 import cgi
9 import time
10
11 # 4DOM packages
12 from xml.dom.ext.reader import PyExpat
13 from xml.dom.ext import PrettyPrint
14
15 def printHeader( title, style ):
16 print """Content-type: text/html
17
18 <?xml version = "1.0" encoding = "UTF-8"?>
19 <!DOCTYPE html PUBLIC
20 "-//W3C//DTD XHTML 1.0 Strict//EN"
21 "DTD/xhtml1-strict.dtd">
22 <html xmlns = "http://www.w3.org/1999/xhtml">
23
24 <head>
25 <title>%s</title>
26 <link rel = "stylesheet" href = "%s" type = "text/css" />
27 </head>
28
29 <body>""" % ( title, style )
30
31 # identify client browser
32 if os.environ[ "HTTP_USER_AGENT" ].find( "MSIE" ) != -1:
33 prefix = "../XML/" # Internet Explorer
34 else:
35 prefix = "forum.py?file="
36
37 form = cgi.FieldStorage()
38
39 # user has submitted message to post
40 if form.has_key( "submit" ):
41 filename = form[ "file" ].value
42
43 # add message to forum
44 if not re.match( "\w+\.xml$", filename ):
45 print "Location: /error.html\n"
46 sys.exit()
47
48 try:
49 forumFile = open( "../htdocs/XML/" + filename, "r+" )
50 except IOError:
51 print "Location: /error.html\n"
52 sys.exit()
53
Line 37 obtains the form values posted to the script. The user has not yet submitted a
new message; therefore, the form does not contain the value "submit" (line 40), and exe-
cution proceeds to line 90. If the form contains a single value (i.e., the filename), lines 91–
108 output a form, which includes fields for the user name, message title, message text and
the forum filename as a hidden value (line 99). Note that, if no parameters are passed to the
script, the script has been accessed in an inappropriate way, and the programs redirects the
browser to error.html (line 110).
When the form data are submitted, the posted information is processed, starting at line
41. As in the previous figure, the filename is checked for an .xml extension, and the file
pythonhtp1_16.fm Page 559 Wednesday, December 19, 2001 2:46 PM
is opened (lines 44–52). Lines 55–61 parse the forum file, create an Element node with
tag name message and set the node’s timestamp attribute by calling method setAt-
tribute.
Lines 64–77 create Element nodes that represent the user, title and text and
add text that corresponds to the values entered in the form. Note that, if a field has been left
blank, "( Field left blank )" is entered for that field. Each new Element node is
appended to the node referenced by message (line 77).
Line 80 appends the node referenced by message to the node referenced by forum.
Lines 81–82 then seek and truncate the XML file to eliminate the file’s content and
write the updated XML markup. Lines 84–85 close the file and free the Document object
from memory. The user is redirected to the updated XML document in line 87.
Fig. 16.27 XSLT style sheet that transforms XML into XHTML. (Part 1 of 3.)
pythonhtp1_16.fm Page 560 Wednesday, December 19, 2001 2:46 PM
26 <body>
27 <table width = "100%" cellspacing = "0"
28 cellpadding = "2">
29 <tr>
30 <td class = "forumTitle">
31 <xsl:value-of select = "name" />
32 </td>
33 </tr>
34 </table>
35
36 <!-- apply templates for message elements -->
37 <br />
38 <xsl:apply-templates select = "message" />
39 <br />
40
41 <div style = "text-align: center">
42 <a>
43
44 <!-- add href attribute to "a" element -->
45 <xsl:attribute name = "href">../cgi-bin/
addPost.py?file=<xsl:value-of select = "@file" />
46 </xsl:attribute>
47 Post a Message
48 </a>
49 <br /><br />
50 <a href = "../cgi-bin/default.py">Return to Main Page</a>
51 </div>
52
53 </body>
54 </xsl:template>
55
56 <!-- match message elements -->
57 <xsl:template match = "message">
58 <table width = "100%" cellspacing = "0"
59 cellpadding = "2">
60 <tr>
61 <td class = "msgTitle">
62 <xsl:value-of select = "title" />
63 </td>
64 </tr>
65
66 <tr>
67 <td class = "msgInfo">
68 by
69 <xsl:value-of select = "user" />
70 at
71 <span class = "date">
72 <xsl:value-of select = "@timestamp" />
73 </span>
74 </td>
75 </tr>
76
Fig. 16.27 XSLT style sheet that transforms XML into XHTML. (Part 2 of 3.)
pythonhtp1_16.fm Page 561 Wednesday, December 19, 2001 2:46 PM
77 <tr>
78 <td class = "msgText">
79 <xsl:value-of select = "text" />
80 </td>
81 </tr>
82
83 </table>
84 </xsl:template>
85
86 </xsl:stylesheet>
Fig. 16.27 XSLT style sheet that transforms XML into XHTML. (Part 3 of 3.)
Variable prefix is set according to whether MSIE (Microsoft Internet Explorer) appears
in the HTTP_USER_AGENT environment variable. For simplicity, we assume Internet Ex-
plorer 5 or higher (with msxml 3.0 or higher) is the only version of MSIE being used and
do not test for older versions.
Once prefix has been set, we may use its value to customize the URLs generated by
the scripts. One example occurs in line 87 of addPost.py:
This line directs Internet Explorer users to the specified XML forum file located in ../
XML/, but sends users of other browsers to forum.py, a Python script that receives a sin-
gle parameter (i.e., the filename).
Figure 16.28 shows forum.py, which transforms XML documents to HTML on
the server. The figure also includes the rendered HTML output displayed in Netscape
Communicator.
If a filename is not passed to the script, the user is redirected to error.html (line
40). Otherwise, execution begins at line 16. Lines 16–18 determine whether the specified
filename ends in .xml. If so, lines 21–22 open the XSLT style sheet (format-
ting.xsl) and the specified XML document, respectively. If an error occurs during an
attempt to open one of these files, the user is redirected to error.html (line 24).
pythonhtp1_16.fm Page 562 Wednesday, December 19, 2001 2:46 PM
The XML then is transformed into HTML for display. Line 28 instantiates a 4XSLT
Processor object, which transforms XML into HTML, by applying an XSLT style
sheet. Line 31 specifies the appropriate XSLT style sheet by invoking processor’s
appendStyleSheetStream method. This method appends a style sheet to the list of
style sheets a Processor can use. Note that more than one style sheet can be appended
(i.e., appendStyleSheetStream can be called multiple times) so that the same Pro-
cessor object can be used to transform an XML document to many different formats. The
argument passed to appendStyleSheetStream must be a Python file object. Other
methods for appending style sheets to a 4XSLT Processor are appendStyleSheet-
String, appendStyleSheetNode and appendStyleSheetUri, which accept as
arguments a string containing an XSLT style sheet, a DOM tree containing a style sheet and
a URI that references a style sheet, respectively. The specified URI may be a URL (in the
form of a string) that represents the location of the style sheet on the Web.
1 #!c:\Python\python.exe
2 # Fig. 16.28: forum.py
3 # Display forum postings for non-Internet Explorer browsers.
4
5 import re
6 import cgi
7 import sys
8 from xml.xslt import Processor
9
10 form = cgi.FieldStorage()
11
12 # form to display has been specified
13 if form.has_key( "file" ):
14
15 # determine whether file is xml
16 if not re.match( "\w+\.xml$", form[ "file" ].value ):
17 print "Location: /error.html\n"
18 sys.exit()
19
20 try:
21 style = open( "../htdocs/XML/formatting.xsl" )
22 XMLFile = open( "../htdocs/XML/" + form[ "file" ].value )
23 except IOError:
24 print "Location: /error.html\n"
25 sys.exit()
26
27 # create XSLT processor instance
28 processor = Processor.Processor()
29
30 # specify style sheet
31 processor.appendStylesheetStream( style )
32
Fig. 16.28 Script that transforms XML into HTML for browsers without XSLT support. (Part
1 of 2.)
pythonhtp1_16.fm Page 563 Wednesday, December 19, 2001 2:46 PM
Fig. 16.28 Script that transforms XML into HTML for browsers without XSLT support. (Part
2 of 2.)
Line 34 invokes the Processor’s runStream method to apply the style sheet to
the XML document. As with appendStyleSheetStream, the object passed to
runStream must be a Python file object. Other methods used for applying style sheets
are runString, runNode and runUri, which accept as arguments a string con-
taining XML, a DOM tree containing XML and a URI that references an XML document,
respectively.
Lines 35–36 close the XSLT and XML files used by the script. Line 37 prints the con-
tent type header for the Web browser. The transformed XML is then sent to the client as
HTML (line 38).
In this chapter, we used the concepts presented in Chapter 15 to create XML-based
applications. We used Python packages containing DOM implementations and SAX
implementations to parse our XML documents, then used XSLT style sheets to display
the XML document content in a browser. In Chapter 17, Database Application Program-
ming Interface (DB-API), we discuss databases, the widely employed relational database
model and the Structured Query Language (SQL), a language used to obtain database
contents easily.
pythonhtp1_16.fm Page 564 Wednesday, December 19, 2001 2:46 PM
SUMMARY
• Support for XML is provided through a large collection of freely available packages.
• The process by which Python applications can generate XML dynamically is similar to that by
which they generate XHTML. For example, to output XML from a Python script, we can use
print statements or we can use XSLT.
• The modules included with Python for DOM manipulation are xml.minidom and xml.pull-
dom. However, neither of these DOM implementations is fully compliant with the W3C’s DOM
Recommendation.
• A third-party package called 4DOM is a fully compliant DOM implementation. 4DOM is included
with XML package PyXML (pyxml.sourceforge.net). Once PyXML is installed, the ex-
tended DOM components of 4DOM are accessed via xml.dom.ext.
• 4XSLT, used for applying a style sheet to an XML document, is located in another XML package
called 4Suite (4suite.org), from Fourthought, Inc.
• 4DOM’s reader package includes module PyExpat.
• PyExpat contains class Reader, an XML parser. A Reader object takes an XML document
and parses it, storing it in memory as a tree structure (called a DOM tree).
• The Node class, or a class derived from Node, represents an XML element, node, comment, etc.
in an XML document. Other classes include NodeList, an ordered list of nodes, and NamedN-
odeMap, a dictionary of attribute nodes.
• A Document object represents the entire XML document (in memory) and provides methods for
manipulating its data.
• Element nodes represent XML elements.
• Text nodes represent character data.
• Attr nodes represent attributes in start tags.
• Comment nodes represent comments.
• Document nodes can contain Element, Text and Comment nodes.
• Element nodes can contain Attr, Element, Text and Comment nodes.
• Method fromStream accepts as input a Python file object and returns a Document object.
• A Document object’s documentElement attribute returns the Document’s root element
node.
• Function StripXml removes insignificant whitespace from an XML DOM tree.
• A Node object’s childNodes attribute contains a list of that Node’s children.
• A Node object’s firstChild attribute corresponds to the first child in that Node’s list of children.
pythonhtp1_16.fm Page 565 Wednesday, December 19, 2001 2:46 PM
TERMINOLOGY
4DOM package childNodes attribute of class Node
4Suite package Comment class
4XSLT package ContentHandler class
appendChild method of class Node createAttribute method of class
appendStyleSheetNode method of class Document
Processor createComment method of class Document
appendStyleSheetStream method of createElement method of class Document
class Processor createTextNode method of class
appendStyleSheetString method of Document
class Processor data attribute of class Comment
appendStyleSheetUri method of class data attribute of class Text
Processor Document class
Attr class Document Object Model (DOM)
attributes attribute of class Node DOM parser
characters method of class DOM tree
ContentHandler documentElement attribute
pythonhtp1_16.fm Page 566 Wednesday, December 19, 2001 2:46 PM
SELF-REVIEW EXERCISES
16.1 Fill in the blanks for each of the following statements:
a) A PyExpat object takes an XML document and parses it, storing it in mem-
ory as a tree structure.
b) A Document object’s attribute refers to the Document’s root element.
c) 4DOM’s function prints an XML DOM tree to a specified output stream.
d) Node method appends a new child to the list of child nodes.
e) Method removes a specified DOM tree from memory, freeing resources.
f) xml.sax class contains methods for handling SAX events which can be
overridden to perform desired parsing.
g) A 4XSLT object transforms XML into HTML, by applying a specified
XSLT style sheet.
h) Method fromStream returns a object.
16.2 State which of the following statements are true and which are false. If false, explain why.
a) To create a Python script which outputs XML, programmers use module xmlgen.
b) Method insertBefore( a, b ) inserts node a before node b.
c) The different XML node types are represented in a DOM tree by class XMLNode.
pythonhtp1_16.fm Page 567 Wednesday, December 19, 2001 2:46 PM
d) Node attribute childNodes returns a NodeList object containing the node’s chil-
dren.
e) 4DOM’s StripXml function parses an XML document.
f) With SAX-based parsing, the parser reads the input, storing it in memory as a tree struc-
ture.
g) The second argument passed to parse must be an instance of class xml.sax.Con-
tentHandler (or a subclass of ContentHandler).
h) If an error occurs while parsing a file, parse raises a SAXParseException excep-
tion.
EXERCISES
16.3 Modify the program in Fig. 16.13. Allow the user to add a new element to each contact
element. For instance, if the user adds a phoneNumber element, the user should be prompted to pro-
vide a phone number for each contact. Each time a user adds a contact, the user should be prompted
to provide information for any new elements in addition to the first and last names. Function print-
List should print any new information as well as the contact’s first and last names.
16.4 Create a Python script that, given an XML document, creates an XHTML list of the docu-
ment’s elements in hierarchical order. Display the elements in Internet Explorer. For example, given
the XML document in Fig. 16.29, create a Python script that lists the elements as shown in Fig. 16.29.
16.5 These lines of code are from lines 45–46 of formatting.xsl (Fig. 16.27). Explain why
the @ in front of "@file" is necessary in the xsl:value-of element.
6
7 <sports>
8 <game id = "783">
9 <name>Cricket</name>
10 <summary>
11 <paragraph>
12 More popular among commonwealth nations.
13 </paragraph>
14 </summary>
15 </game>
16
17 <game id = "239">
18 <name>Baseball</name>
19 <summary>
20 <paragraph>
21 More popular in America.
22 </paragraph>
23 </summary>
24 </game>
25 </sports>
17
Database Application
Programming Interface
(DB-API)
Objectives
• To understand the relational database model.
• To understand basic database queries using Structured
Query Language (SQL).
• To use the methods of the MySQLdb module to query
a database, insert data into a database and update data
in a database.
It is a capital mistake to theorize before one has data.
Arthur Conan Doyle
Now go, write it before them in a table, and note it in a book,
that it may be for the time to come for ever and ever.
The Holy Bible: The Old Testament
Let's look at the record.
Alfred Emanuel Smith
True art selects and paraphrases, but seldom gives a
verbatim translation.
Thomas Bailey Aldrich
Get your facts first, and then you can distort them as much
as you please.
Mark Twain
I like two kinds of men: domestic and foreign.
Mae West
pythonhtp1_17.fm Page 570 Wednesday, December 19, 2001 2:46 PM
Outline
17.1 Introduction
17.2 Relational Database Model
17.3 Relational Database Overview: Books Database
17.4 Structured Query Language (SQL)
17.4.1 Basic SELECT Query
17.4.2 WHERE Clause
17.4.3 ORDER BY Clause
17.4.4 Merging Data from Multiple Tables: INNER JOIN
17.4.5 Joining Data from Tables Authors, AuthorISBN,
Titles and Publishers
17.4.6 INSERT Statement
17.4.7 UPDATE Statement
17.4.8 DELETE Statement
17.5 Python DB-API Specification
17.6 Database Query Example
17.7 Querying the Books Database
17.8 Reading, Inserting and Updating a Database
17.9 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
17.1 Introduction
In Chapter 14, File Processing and Serialization, we discussed sequential-access and ran-
dom-access file processing. Sequential-file processing is appropriate for applications in
which most or all of the file’s information is to be processed. On the other hand, random-
access file processing is appropriate for applications in which only a small portion of a
file’s data is to be processed. For instance, in transaction processing it is crucial to locate
and, possibly, update an individual piece of data quickly. Python provides capabilities for
both types of file processing.
A database is an integrated collection of data. Many companies maintain databases to
organize employee information, such as names, addresses and phone numbers. There are
many different strategies for organizing data to facilitate easy access and manipulation of
the data. A database management system (DBMS) provides mechanisms for storing and
organizing data in a manner consistent with the database’s format. Database management
systems allow for the access and storage of data without concern for the internal represen-
tation of databases.
Today’s most popular database systems are relational databases, which store data in
tables and define relationships between the tables. A language called Structured Query
pythonhtp1_17.fm Page 571 Wednesday, December 19, 2001 2:46 PM
Each column of the table represents a different field. Records normally are unique (by
primary key) within a table, but particular field values might be duplicated in multiple
records. For example, three different records in the Employee table’s Department field
contain the number 413.
Often, different users of a database are interested in different data and different relation-
ships among those data. Some users require only subsets of the table columns. To obtain table
subsets, we use SQL statements to specify certain data we wish to select from a table. SQL
provides a complete set of commands (including SELECT) that enable programmers to define
complex queries to select data from a table. The results of queries commonly are called result
sets (or record sets). For example, we might select data from the table in Fig. 17.1 to create a
new result set that contains only the location of each department. This result set appears in
Fig. 17.2. SQL queries are discussed in detail in Section 17.4.
Department Location
413 New Jersey
611 Orlando
642 Los Angeles
Fig. 17.2 Result set formed by selecting Department and Location data
from the Employee table.
pythonhtp1_17.fm Page 573 Wednesday, December 19, 2001 2:46 PM
Field Description
AuthorID Author’s ID number in the database. In the Books database, this int field is
defined as an auto-incremented field. For each new record inserted in this table,
the database increments the AuthorID value, ensuring that each record has a
unique AuthorID. This field is the table’s primary key.
FirstName Author’s first name (a string).
LastName Author’s last name (a string).
1 Harvey Deitel
2 Paul Deitel
3 Tem Nieto
4 Kate Steinbuhler
5 Sean Santry
6 Ted Lin
7 Praveen Sadhu
8 David McPhie
9 Cheryl Yaeger
10 Marina Zlatkina
11 Ben Wiedermann
12 Jonathan Liperi
13 Jeffrey Listfield
Field Description
The AuthorISBN table (described in Fig. 17.7) consists of two fields that maintain
the authors’ ID numbers and the corresponding ISBN numbers of their books. This table
helps associate the names of the authors with the titles of their books. Figure 17.8 con-
tains a portion of the sample data from the AuthorISBN table of the Books database.
pythonhtp1_17.fm Page 574 Wednesday, December 19, 2001 2:46 PM
PublisherID PublisherName
1 Prentice Hall
2 Prentice Hall PTG
Field Description
AuthorID The author’s ID number, which allows the database to associate each
book with a specific author. The integer ID number in this field must
also appear in the Authors table.
ISBN The ISBN number for a book (a string).
1 0130895725 1 0130284181
1 0132261197 1 0130895601
1 0130895717 2 0130895725
1 0135289106 2 0132261197
1 0139163050 2 0130895717
1 013028419x 2 0135289106
1 0130161438 2 0139163050
1 0130856118 2 013028419x
1 0130125075 2 0130161438
1 0138993947 2 0130856118
1 0130852473 2 0130125075
1 0130829277 2 0138993947
1 0134569555 2 0130852473
1 0130829293 2 0130829277
1 0130284173 2 0134569555
Fig. 17.8 Data from AuthorISBN table in Books. [Note: This table shows only a
portion of the sample data.]
pythonhtp1_17.fm Page 575 Wednesday, December 19, 2001 2:46 PM
2 0130829293 3 0130856118
2 0130284173 3 0134569555
2 0130284181 3 0130829293
2 0130895601 3 0130284173
3 013028419x 3 0130284181
3 0130161438 4 0130895601
Fig. 17.8 Data from AuthorISBN table in Books. [Note: This table shows only a
portion of the sample data.]
The Titles table (described in Fig. 17.9) consists of seven fields that maintain gen-
eral information about the books in the database. This information includes each book’s
ISBN number, title, edition number, copyright year and publisher’s ID number, as well as
the name of a file that contains an image of the book cover and, finally, each book’s price.
Figure 17.10 contains the sample data from the Titles table.
Field Description
Fig. 17.10 Data from the Titles table of Books. (Part 1 of 3.)
pythonhtp1_17.fm Page 576 Wednesday, December 19, 2001 2:46 PM
Fig. 17.10 Data from the Titles table of Books. (Part 2 of 3.)
pythonhtp1_17.fm Page 577 Wednesday, December 19, 2001 2:46 PM
Fig. 17.10 Data from the Titles table of Books. (Part 3 of 3.)
Figure 17.11 illustrates the relationships among the tables in the Books database. The
first line in each table is the table’s name. The field whose name appears in italics contains
that table’s primary key. A table’s primary key uniquely identifies each record in the table.
Every record must have a value in the primary-key field, and the value must be unique. This
pythonhtp1_17.fm Page 578 Wednesday, December 19, 2001 2:46 PM
is known as the Rule of Entity Integrity. Note that the AuthorISBN table contains two
fields whose names are italicized. This indicates that these two fields form a compound pri-
mary key—each record in the table must have a unique AuthorID–ISBN combination.
For example, several records might have an AuthorID of 2, and several records might
have an ISBN of 0130895601, but only one record can have both an AuthorID of 2
and an ISBN of 0130895601.
Common Programming Error 17.1
Failure to provide a value for a primary-key field in every record breaks the Rule of Entity
Integrity and causes the DBMS to report an error. 17.1
The lines connecting the tables in Fig. 17.11 represent the relationships among the
tables. Consider the line between the Publishers and Titles tables. On the Pub-
lishers end of the line, there is a 1, and, on the Titles end, there is an infinity (∞)
symbol. This line indicates a one-to-many relationship, in which every publisher in the
Publishers table can have an arbitrarily large number of books in the Titles table.
Note that the relationship line links the PublisherID field in the Publishers table to
the PublisherID field in Titles table. In the Titles table, the PublisherID field
is a foreign key—a field for which every entry has a unique value in another table and where
the field in the other table is the primary key for that table (e.g., PublisherID in the
Publishers table). Programmers specify foreign keys when creating a table. The for-
eign key helps maintain the Rule of Referential Integrity: Every foreign-key field value
must appear in another table’s primary-key field. Foreign keys enable information from
multiple tables to be joined together for analysis purposes. There is a one-to-many relation-
ship between a primary key and its corresponding foreign key. This means that a foreign-
key field value can appear many times in its own table, but must appear exactly once as the
primary key of another table. The line between the tables represents the link between the
foreign key in one table and the primary key in another table.
pythonhtp1_17.fm Page 579 Wednesday, December 19, 2001 2:46 PM
The line between the AuthorISBN and Authors tables indicates that, for each
author in the Authors table, the AuthorISBN table can contain an arbitrary number of
ISBNs for books written by that author. The AuthorID field in the AuthorISBN table
is a foreign key of the AuthorID field (the primary key) of the Authors table. Note,
again, that the line between the tables links the foreign key in table AuthorISBN to the
corresponding primary key in table Authors. The AuthorISBN table links information
in the Titles and Authors tables.
The line between the Titles and AuthorISBN tables illustrates another one-to-
many relationship; a title can be written by any number of authors. In fact, the sole purpose
of the AuthorISBN table is to represent a many-to-many relationship between the
Authors and Titles tables; an author can write any number of books, and a book can
have any number of authors.
1 Deitel 8 McPhie
2 Deitel 9 Yaeger
3 Nieto 10 Zlatkina
4 Steinbuhler 12 Wiedermann
5 Santry 12 Liperi
6 Lin 13 Listfield
7 Sadhu
For example, to select the Title, EditionNumber and Copyright fields from those
rows of table Titles in which the Copyright date is greater than 1999, use the query:
SELECT Title, EditionNumber, Copyright
FROM Titles
WHERE Copyright > 1999
Figure 17.14 shows the result set of the preceding query. [Note: When we construct a query
for use in Python, we create a string containing the entire query. However, when we display
queries in the text, we often use multiple lines and indentation to enhance readability.]
Performance Tip 17.2
Using selection criteria improves performance, because queries that involve such criteria
normally select a portion of the database that is smaller than the entire database. Working
with a smaller portion of the data is more efficient than working with the entire set of data
stored in the database. 17.2
Fig. 17.14 Titles with copyrights after 1999 from table Titles. (Part 1 of 2.)
pythonhtp1_17.fm Page 582 Wednesday, December 19, 2001 2:46 PM
Fig. 17.14 Titles with copyrights after 1999 from table Titles. (Part 2 of 2.)
The WHERE clause condition can contain operators <, >, <=, >=, =, <> and LIKE. Oper-
ator LIKE is used for pattern matching with wildcard characters percent (%) and underscore
mark (_). Pattern matching allows SQL to search for strings that “match a pattern.”
A pattern that contains a percent (%) searches for strings in which zero or more char-
acters take the percent character’s place in the pattern. For example, the following query
locates the records of all authors whose last names start with the letter D:
The preceding query selects the two records shown in Fig. 17.15, because two of the au-
thors in our database have last names that begin with the letter D (followed by zero or more
characters). The % in the WHERE clause’s LIKE pattern indicates that any number of char-
acters can appear after the letter D in the LastName field. Notice that the pattern string is
surrounded by single-quote characters.
Portability Tip 17.1
Not all database systems support the LIKE operator, so be sure to read the database sys-
tem’s documentation carefully before employing this operator. 17.1
1 Harvey Deitel
2 Paul Deitel
Fig. 17.15 Authors from the Authors table whose last names start with D.
pythonhtp1_17.fm Page 583 Wednesday, December 19, 2001 2:46 PM
A pattern string including an underscore (_) character searches for strings in which
exactly one character takes the underscore’s place in the pattern. For example, the fol-
lowing query locates the records of all authors whose last names start with any character
(specified with _), followed by the letter i, followed by any number of additional charac-
ters (specified with %):
3 Tem Nieto
6 Ted Lin
11 Ben Wiedermann
12 Jonathan Liperi
13 Jeffrey Listfield
Fig. 17.16 Authors from table Authors whose last names contain i as the second
letter.
pythonhtp1_17.fm Page 584 Wednesday, December 19, 2001 2:46 PM
For example, to obtain a list of authors arranged in ascending order by last name
(Fig. 17.17), use the query:
SELECT AuthorID, FirstName, LastName
FROM Authors
ORDER BY LastName ASC
Note that the default sorting order is ascending; therefore, ASC is optional.
To obtain the same list of authors arranged in descending order by last name
(Fig. 17.18), use the query:
SELECT AuthorID, FirstName, LastName
FROM Authors
ORDER BY LastName DESC
2 Paul Deitel
1 Harvey Deitel
6 Ted Lin
12 Jonathan Liperi
13 Jeffrey Listfield
8 David McPhie
3 Tem Nieto
7 Praveen Sadhu
5 Sean Santry
4 Kate Steinbuhler
11 Ben Wiedermann
9 Cheryl Yaeger
10 Marina Zlatkina
10 Marina Zlatkina
9 Cheryl Yaeger
11 Ben Wiedermann
4 Kate Steinbuhler
5 Sean Santry
Fig. 17.18 Authors from table Authors in descending order by LastName. (Part
1 of 2.)
pythonhtp1_17.fm Page 585 Wednesday, December 19, 2001 2:46 PM
7 Praveen Sadhu
3 Tem Nieto
8 David McPhie
13 Jeffrey Listfield
12 Jonathan Liperi
6 Ted Lin
2 Paul Deitel
1 Harvey Deitel
Fig. 17.18 Authors from table Authors in descending order by LastName. (Part
2 of 2.)
The ORDER BY clause also can be used to order records by multiple fields. Such que-
ries are written in the form:
ORDER BY field1 sortingOrder, field2 sortingOrder, …
where sortingOrder is either ASC or DESC. Note that the sortingOrder does not have to be
identical for each field. For example, the query:
SELECT AuthorID, FirstName, LastName
FROM Authors
ORDER BY LastName, FirstName
sorts all authors in ascending order by last name, then by first name. Thus, any authors have
the same last name, their records are returned sorted by first name (Fig. 17.19).
1 Harvey Deitel
2 Paul Deitel
6 Ted Lin
12 Jonathan Liperi
13 Jeffrey Listfield
8 David McPhie
3 Tem Nieto
7 Praveen Sadhu
5 Sean Santry
4 Kate Steinbuhler
Fig. 17.19 Authors from table Authors in ascending order by LastName and by
FirstName. (Part 1 of 2.)
pythonhtp1_17.fm Page 586 Wednesday, December 19, 2001 2:46 PM
11 Ben Wiedermann
9 Cheryl Yaeger
10 Marina Zlatkina
Fig. 17.19 Authors from table Authors in ascending order by LastName and by
FirstName. (Part 2 of 2.)
The WHERE and ORDER BY clauses can be combined in one query. For example, the
query:
SELECT ISBN, Title, EditionNumber, Copyright, Price
FROM Titles
WHERE Title
LIKE '*How to Program' ORDER BY Title ASC
returns the ISBN, title, edition number, copyright and price of each book in the Titles
table that has a Title ending with “How to Program;” it lists these records in ascending
order by Title. The results of the query are depicted in Fig. 17.20.
Edition- Copy-
ISBN Title Number right Price
Fig. 17.20 Books from table Titles whose titles end with How to Program in
ascending order by Title. (Part 1 of 2.)
pythonhtp1_17.fm Page 587 Wednesday, December 19, 2001 2:46 PM
Edition- Copy-
ISBN Title Number right Price
Fig. 17.20 Books from table Titles whose titles end with How to Program in
ascending order by Title. (Part 2 of 2.)
ble that should be compared to join the tables. The “tableName.” syntax is required if the
fields have the same name in both tables. The same syntax can be used in any query to dis-
tinguish among fields in different tables that have the same name. Fully qualified names
that start with the database name can be used to perform cross-database queries.
Software Engineering Observation 17.1
If an SQL statement includes fields from multiple tables that have the same name, the state-
ment must precede those field names with their table names and the dot operator (e.g.,
Authors.AuthorID). 17.1
As always, the query can contain an ORDER BY clause. Figure 17.21 depicts the results
of the preceding query, ordered by LastName and FirstName. [Note: To save space,
we split the results of the query into two columns, each containing the FirstName,
LastName and ISBN fields.]
Fig. 17.21 Authors from table Authors and ISBN numbers of the authors’ books,
sorted in ascending order by LastName and FirstName.
pythonhtp1_17.fm Page 589 Wednesday, December 19, 2001 2:46 PM
Fig. 17.23 Portion of the result set produced by the query in Fig. 17.22. (Part 1 of 2.)
pythonhtp1_17.fm Page 590 Wednesday, December 19, 2001 2:46 PM
Fig. 17.23 Portion of the result set produced by the query in Fig. 17.22. (Part 2 of 2.)
We added indentation to the query in Fig. 17.22 to make the query more readable. Let
us now break down the query into its various parts. Lines 1–3 contain a comma-separated
list of the fields that the query returns; the order of the fields from left to right specifies the
fields’ order in the returned table. This query selects fields Title and ISBN from table
Titles, fields FirstName and LastName from table Authors, field Copyright
from table Titles and field PublisherName from table Publishers. For purposes
of clarity, we fully qualified each field name with its table name (e.g., Titles.ISBN).
Lines 5–10 specify the INNER JOIN operations used to combine information from the
various tables. There are three INNER JOIN operations. It is important to note that,
although an INNER JOIN is performed on two tables, either of those two tables can be the
result of another query or another INNER JOIN. We use parentheses to nest the INNER
JOIN operations; SQL evaluates the innermost set of parentheses first and then moves out-
ward. We begin with the INNER JOIN:
( Publishers INNER JOIN Titles
ON Publishers.PublisherID = Titles.PublisherID )
which joins the Publishers table and the Titles table ON the condition that the Pub-
lisherID numbers in each table match. The resulting temporary table contains informa-
tion about each book and its publisher.
pythonhtp1_17.fm Page 591 Wednesday, December 19, 2001 2:46 PM
which joins the Authors table and the AuthorISBN table ON the condition that the Au-
thorID fields in each table match. Remember that the AuthorISBN table has multiple en-
tries for ISBN numbers of books that have more than one author. The third INNER JOIN:
joins the two temporary tables produced by the two prior inner joins ON the condition that
the Titles.ISBN field for each record in the first temporary table matches the corre-
sponding AuthorISBN.ISBN field for each record in the second temporary table. The
result of all these INNER JOIN operations is a temporary table from which the appropriate
fields are selected to produce the results of the query.
Finally, line 11 of the query:
ORDER BY Titles.Title
indicates that all the records should be sorted in ascending order (the default) by title.
where tableName is the table in which to insert the record. The tableName is followed by
a comma-separated list of field names in parentheses. The list of field names is followed by
the SQL keyword VALUES and a comma-separated list of values in parentheses. The spec-
ified values in this list must match the field names listed after the table name in both order
and type (for example, if fieldName1 is specified as the FirstName field, then value1
should be a string in single quotes representing the first name). The INSERT statement:
inserts a record into the Authors table. The statement indicates that values will be inserted
for the FirstName and LastName fields. The corresponding values to insert are 'Sue'
and 'Smith'. [Note: The SQL statement does not specify an AuthorID in this example,
because AuthorID is an autoincrement field in table Authors. For every new record add-
ed to this table, MySQL assigns a unique AuthorID value that is the next value in the auto-
increment sequence (i.e., 1, 2, 3, etc.). In this case, MySQL assigns AuthorID number 8 to
Sue Smith.] Figure 17.24 shows the Authors table after the INSERT INTO operation.
pythonhtp1_17.fm Page 592 Wednesday, December 19, 2001 2:46 PM
1 Harvey Deitel
2 Paul Deitel
3 Tem Nieto
4 Kate Steinbuhler
5 Sean Santry
6 Ted Lin
7 Praveen Sadhu
8 David McPhie
9 Cheryl Yaeger
10 Marina Zlatkina
11 Ben Wiedermann
12 Jonathan Liperi
13 Jeffrey Listfield
14 Sue Smith
Smith and FirstName is equal to Sue. If we know the AuthorID in advance of the
UPDATE operation (possibly because we searched for the record previously), the WHERE
clause could be simplified as follows:
WHERE AuthorID = 14
Figure 17.25 depicts the Authors table after we perform the UPDATE operation.
Common Programming Error 17.7
Failure to use a WHERE clause with an UPDATE statement could lead to logic errors. 17.7
where tableName is the table from which to delete a record (or records). The WHERE clause
specifies the criteria used to determine which record(s) to delete. For example, the DELETE
statement:
DELETE FROM Authors
WHERE LastName = 'Jones' AND FirstName = 'Sue'
deletes the record for Sue Jones from the Authors table. Figure 17.26 depicts the Au-
thors table after we perform the DELETE operation.
1 Harvey Deitel
2 Paul Deitel
3 Tem Nieto
4 Kate Steinbuhler
5 Sean Santry
6 Ted Lin
7 Praveen Sadhu
8 David McPhie
9 Cheryl Yaeger
10 Marina Zlatkina
11 Ben Wiedermann
12 Jonathan Liperi
13 Jeffrey Listfield
14 Sue Jones
1 Harvey Deitel
2 Paul Deitel
3 Tem Nieto
4 Kate Steinbuhler
5 Sean Santry
6 Ted Lin
7 Praveen Sadhu
8 David McPhie
9 Cheryl Yaeger
10 Marina Zlatkina
11 Ben Wiedermann
12 Jonathan Liperi
13 Jeffrey Listfield
database to another database, a programmer needs to change three or four lines of code.
However, the switch between databases may require modifications to the SQL code (to
compensate for case sensitivity, etc.).
1 #!c:\python\python.exe
2 # Fig. 17.27: fig17_27.py
3 # Displays contents of the Authors table,
4 # ordered by a specified field.
5
6 import MySQLdb
7 import cgi
8 import sys
9
10 def printHeader( title ):
11 print """Content-type: text/html
12
13 <?xml version = "1.0" encoding = "UTF-8"?>
14 <!DOCTYPE html PUBLIC
15 "-//W3C//DTD XHTML 1.0 Transitional//EN"
16 "DTD/xhtml1-transitional.dtd">
17 <html xmlns = "http://www.w3.org/1999/xhtml"
18 xml:lang = "en" lang = "en">
19 <head><title>%s</title></head>
20
21 <body>""" % title
22
23 # obtain user query specifications
24 form = cgi.FieldStorage()
25
26 # get "sortBy" value
27 if form.has_key( "sortBy" ):
28 sortBy = form[ "sortBy" ].value
29 else:
30 sortBy = "firstName"
31
32 # get "sortOrder" value
33 if form.has_key( "sortOrder" ):
34 sortOrder = form[ "sortOrder" ].value
35 else:
36 sortOrder = "ASC"
37
Fig. 17.27 Connecting to and querying a database and displaying the results.
pythonhtp1_17.fm Page 596 Wednesday, December 19, 2001 2:46 PM
Fig. 17.27 Connecting to and querying a database and displaying the results.
pythonhtp1_17.fm Page 597 Wednesday, December 19, 2001 2:46 PM
Fig. 17.27 Connecting to and querying a database and displaying the results.
pythonhtp1_17.fm Page 598 Wednesday, December 19, 2001 2:46 PM
Line 6 imports module MySQLdb, which contains classes and functions for manipu-
lating MySQL databases in Python (available from sourceforge.net/projects/
mysql-python). For installation instructions, please visit www.deitel.com.
Lines 86–105 create an XHTML form that enables the user to specify how to sort the
records of the Authors table. Lines 24–36 retrieve and process this form. The records are
sorted by the field assigned to variable sortBy. By default, the records are sorted by
AuthorID. The user can select a radio button to sort the records by another field. Simi-
larly, variable sortOrder has either the user-specified value or "ASC".
Line 42 creates a Connection object called connection to manage the connec-
tion between the program and the database. Function MySQLdb.connect receives the
name of the database as the value of keyword argument db and creates the connection.
[Note: For operating systems other than Windows, MySQL may require a username and
password to connect to the database. If so, pass the appropriate values as strings to keyword
arguments user and passwd for function MySQLdb.connect.] If MySQLdb.con-
nect fails, the function raises a MySQLdb OperationalError exception.
Line 51 calls Connection method cursor to create a Cursor object for the data-
base. The Cursor method execute takes as an argument an SQL command to execute
against the database. Lines 54–55 query and retrieve all records from the Authors table
sorted by the field specified in sortBy and ordered by the value of sortOrder.
A Cursor object internally stores the results of a database query. The Cursor attribute
description contains a tuple of tuples in which each tuple provides information about a
field in the result set obtained by method execute. The first value of each field’s tuple is the
field name. Line 57 assigns the tuple of field name records to variable allFields. Cursor
method fetchall returns a tuple of tuples that contains all the internally stored results
obtained by invoking method execute. Each subtuple in the returned tuple represents one
record from the database, and each element in the record represents a field’s value for that
record. Line 58 assigns the tuple of matching records to variable allRecords.
Cursor method close (line 61) closes the Cursor object; line 62 closes the Con-
nection object with Connection method close. These methods explicitly close the
Cursor and the Connection objects. Although the objects’ close methods execute
when the objects are destroyed at program termination, programmers should explicitly
close the objects once they are no longer needed.
Good Programming Practice 17.2
Explicitly close Cursor and Connection objects with their respective close methods
as soon as the program no longer needs those objects. 17.2
The remainder of the program displays the results of the database query in an XHTML
table. Lines 65–83 display the Authors table’s fields using a for loop. For each field, the
program displays the first entry in that field’s tuple (lines 69-70). Lines 75–83 display a table
row for each record in the Authors table using nested for loops. The outer for loop (line
75) iterates through each record in the table to create a new row. The inner for loop (line 78)
iterates over each field in the current record and displays each field in a new cell.
Fig. 17.28 GUI application for submitting queries to a database. (Part 1 of 3.)
pythonhtp1_17.fm Page 600 Wednesday, December 19, 2001 2:46 PM
31
32 # frame to display query results
33 self.frame = Pmw.ScrolledFrame( self,
34 hscrollmode = "static", vscrollmode = "static" )
35 self.frame.pack( expand = YES, fill = BOTH )
36
37 self.panes = Pmw.PanedWidget( self.frame.interior(),
38 orient = "horizontal" )
39 self.panes.pack( expand = YES, fill = BOTH )
40
41 def submitQuery( self ):
42 """Execute user-entered query agains database"""
43
44 # open connection, retrieve cursor and execute query
45 try:
46 connection = MySQLdb.connect( db = "Books" )
47 cursor = connection.cursor()
48 cursor.execute( self.query.get() )
49 except MySQLdb.OperationalError, message:
50 errorMessage = "Error %d:\n%s" % \
51 ( message[ 0 ], message[ 1 ] )
52 showerror( "Error", errorMessage )
53 return
54 else: # obtain user-requested information
55 data = cursor.fetchall()
56 fields = cursor.description # metadata from query
57 cursor.close()
58 connection.close()
59
60 # clear results of last query
61 self.panes.destroy()
62 self.panes = Pmw.PanedWidget( self.frame.interior(),
63 orient = "horizontal" )
64 self.panes.pack( expand = YES, fill = BOTH )
65
66 # create pane and label for each field
67 for item in fields:
68 self.panes.add( item[ 0 ] )
69 label = Label( self.panes.pane( item[ 0 ] ),
70 text = item[ 0 ], relief = RAISED )
71 label.pack( fill = X )
72
73 # enter results into panes, using labels
74 for entry in data:
75
76 for i in range( len( entry ) ):
77 label = Label( self.panes.pane( fields[ i ][ 0 ] ),
78 text = str( entry[ i ] ), anchor = W,
79 relief = GROOVE, bg = "white" )
80 label.pack( fill = X )
81
82 self.panes.setnaturalsize()
83
Fig. 17.28 GUI application for submitting queries to a database. (Part 2 of 3.)
pythonhtp1_17.fm Page 601 Wednesday, December 19, 2001 2:46 PM
84 def main():
85 QueryWindow().mainloop()
86
87 if __name__ == "__main__":
88 main()
Fig. 17.28 GUI application for submitting queries to a database. (Part 3 of 3.)
When the user presses the Submit query button, method submitQuery (lines 41–
82) performs the query and displays the results. Lines 45–58 contain a try/except/else
statement that connects to and queries the database. The try statement creates a Connec-
tion and a Cursor object and uses Cursor method execute to perform the user-
entered query. Function MySQLdb.connect (line 46) fails if the specified database does
not exist. Cursor method execute (line 48) fails if the query string contains an SQL
syntax error. Each method raises an OperationalError exception. Lines 49–53
handle this exception and call tkMessageBox function showerror with an appropriate
error message.
If the user-entered query string successfully executes, the program retrieves the result
of the query. The else clause (lines 54–58) assigns the queried records to variable data
and assigns metadata to variable fields. Metadata is data that describes data. For
example, the metadata for a result set may include the field names and field types. The
metadata
fields = cursor.description
pythonhtp1_17.fm Page 602 Wednesday, December 19, 2001 2:46 PM
contains descriptive information about the result set of the user-entered query (line 56).
Cursor attribute description contains a tuple of tuples that provides information
about the fields obtained by method execute.
PanedWidget method destroy (line 61) removes the existing panes to display the
query data in new panes (lines 62–64). Lines 67–71 iterate over the field information to dis-
play the names of the columns. For each field, method add adds a pane to the Paned-
Widget. This method takes a string that identifies the pane. The Label constructor adds
a label to the pane that contains the name of the field with the relief attribute set to
RAISED. PanedWidget method pane (line 69) identifies the parent of this new label.
This method takes the name of a pane and returns a reference to that pane.
Lines 74–80 iterate over each record to create a label that contains the value of each
field in the record. Method pane specifies the appropriate parent frame for each label. The
expression
self.panes.pane( fields[ i ][ 0 ] )
evaluates to the pane whose name is the field name for the ith value in the record. Once the
results have been added to the panes, the PanedWidget method setnaturalsize
sets the size of each pane to be large enough to view the largest label in the pane.
11
12 def __init__( self ):
13 """Address Book constructor"""
14
15 Frame.__init__( self )
16 Pmw.initialise()
17 self.pack( expand = YES, fill = BOTH )
18 self.master.title( "Address Book Database Application" )
19
20 # buttons to execute commands
21 self.buttons = Pmw.ButtonBox( self, padx = 0 )
22 self.buttons.grid( columnspan = 2 )
23 self.buttons.add( "Find", command = self.findAddress )
24 self.buttons.add( "Add", command = self.addAddress )
25 self.buttons.add( "Update", command = self.updateAddress )
26 self.buttons.add( "Clear", command = self.clearContents )
27 self.buttons.add( "Help", command = self.help, width = 14 )
28 self.buttons.alignbuttons()
29
30
31 # list of fields in an address record
32 fields = [ "ID", "First name", "Last name",
33 "Address", "City", "State Province", "Postal Code",
34 "Country", "Email Address", "Home phone", "Fax Number" ]
35
36 # dictionary with Entry components for values, keyed by
37 # corresponding addresses table field names
38 self.entries = {}
39
40 self.IDEntry = StringVar() # current address id text
41 self.IDEntry.set( "" )
42
43 # create entries for each field
44 for i in range( len( fields ) ):
45 label = Label( self, text = fields[ i ] + ":" )
46 label.grid( row = i + 1, column = 0 )
47 entry = Entry( self, name = fields[ i ].lower(),
48 font = "Courier 12" )
49 entry.grid( row = i + 1 , column = 1,
50 sticky = W+E+N+S, padx = 5 )
51
52 # user cannot type in ID field
53 if fields[ i ] == "ID":
54 entry.config( state = DISABLED,
55 textvariable = self.IDEntry, bg = "gray" )
56
57 # add entry field to dictionary
58 key = fields[ i ].replace( " ", "_" )
59 key = key.upper()
60 self.entries[ key ] = entry
61
62 def addAddress( self ):
63 """Add address record to database"""
64
65 if self.entries[ "LAST_NAME" ].get() != "" and \
66 self.entries[ "FIRST_NAME"].get() != "":
67
68 # create INSERT query command
69 query = """INSERT INTO addresses (
70 FIRST_NAME, LAST_NAME, ADDRESS, CITY,
71 STATE_PROVINCE, POSTAL_CODE, COUNTRY,
72 EMAIL_ADDRESS, HOME_PHONE, FAX_NUMBER
73 ) VALUES (""" + \
74 "'%s', " * 10 % \
75 ( self.entries[ "FIRST_NAME" ].get(),
76 self.entries[ "LAST_NAME" ].get(),
77 self.entries[ "ADDRESS" ].get(),
78 self.entries[ "CITY" ].get(),
79 self.entries[ "STATE_PROVINCE" ].get(),
80 self.entries[ "POSTAL_CODE" ].get(),
81 self.entries[ "COUNTRY" ].get(),
82 self.entries[ "EMAIL_ADDRESS" ].get(),
83 self.entries[ "HOME_PHONE" ].get(),
84 self.entries[ "FAX_NUMBER" ].get() )
85 query = query[ :-2 ] + ")"
86
87 # open connection, retrieve cursor and execute query
88 try:
89 connection = MySQLdb.connect( db = "AddressBook" )
90 cursor = connection.cursor()
91 cursor.execute( query )
92 except MySQLdb.OperationalError, message:
93 errorMessage = "Error %d:\n%s" % \
94 ( message[ 0 ], message[ 1 ] )
95 showerror( "Error", errorMessage )
96 else:
97 cursor.close()
98 connection.close()
99 self.clearContents()
100
101 else: # user has not filled out first/last name fields
102 showwarning( "Missing fields", "Please enter name" )
103
104 def findAddress( self ):
105 """Query database for address record and display results"""
106
107 if self.entries[ "LAST_NAME" ].get() != "":
108
109 # create SELECT query
110 query = "SELECT * FROM addresses " + \
111 "WHERE LAST_NAME = ’" + \
112 self.entries[ "LAST_NAME" ].get() + "'"
113
114 # open connection, retrieve cursor and execute query
115 try:
116 connection = MySQLdb.connect( db = "AddressBook" )
Method addRecord (lines 62–102) adds a new record to the AddressBook database
in response to the Add button in the GUI. The method first ensures that the user has entered
values for the first and last name fields (lines 65–66). If the user enters values for these fields,
the query string inserts a record into the database (lines 69–85). Otherwise, tkMessageBox
function showwarning reminds the user to enter the information (lines 101–102). Line 74
includes ten string escape sequences whose values are replaced by the values contained in
lines 75–84. Line 85 closes the values parentheses in the SQL statement.
Lines 88–99 contain a try/except/else statement that connects to and updates the
database (i.e., inserts the new record in the database). Line 99 invokes method clear-
Contents (lines 185–191) to clear the contents of the GUI. If an error occurs, tkMes-
sageBox function showerror displays the error.
Method findAddress (lines 104–146) queries the AddressBook database for a
specific record when the user clicks the Find button in the GUI. Line 107 tests whether the
last name text field contains data. If the entry is empty, the program displays an error. If the
user has entered data in the last name text field, a SELECT SQL statement searches the data-
base for the user-specified last name. We used asterisk (*) in the SELECT statement because
line 126 uses metadata to get field names. Lines 115–143 contain a try/except/else
statement that connects to and queries the database. If these operations succeed, the program
retrieves the results from the database (lines 125–126). A message informs the user if the
query does not yield results (lines 128–129). If the query does yield results, lines 134–140 dis-
play the results in the GUI. Each field value is inserted in the appropriate Entry component.
The record’s ID must be converted to a string before it can be displayed.
Method updateAddress (lines 148–183) updates an existing database record. The
program displays a message if the user attempts to perform an update operation on a non-
existent record. Line 151 tests whether the id for the current record is valid. Lines 155–
162 create the SQL UPDATE statement. Lines 165–177 connect to and update the database.
Method clearContents (lines 185–191) clears the text fields when the user clicks
the Clear button in the GUI. Method help (lines 193–199) calls a tkMessageBox func-
tion to display instructions about how to use the program.
SUMMARY
• A database is an integrated collection of data.
• A database management system (DBMS) provides mechanisms for storing and organizing data in
a manner consistent with the database’s format. Database management systems allow for the ac-
cess and storage of data without worrying about the internal representation of databases.
• Today’s most popular database systems are relational databases.
• A language called Structured Query Language (SQL—pronounced as its individual letters or as
“sequel”) is used almost universally with relational database systems to perform queries (i.e., to
request information that satisfies given criteria) and to manipulate data.
• Python programmers communicate with databases using modules that conform to the Python Da-
tabase Application Programming Interface (DB-API).
• The relational database model is a logical representation of data that allows the relationships be-
tween the data to be considered independent of the actual physical structure of the data.
• A relational database is composed of tables. Any particular row of the table is called a record (or
row).
• A primary key is a field (or fields) in a table that contain(s) unique data, which cannot be duplicat-
ed in other records. This guarantees each record can be identified by a unique value.
• A foreign key is a field in a table for which every entry has a unique value in another table and
where the field in the other table is the primary key for that table. The foreign key helps maintain
the Rule of Referential Integrity—every value in a foreign-key field must appear in another table’s
primary-key field. Foreign keys enable information from multiple tables to be joined together and
presented to the user.
• Each column of the table represents a different field (or column or attribute). Records normally are
unique (by primary key) within a table, but particular field values may be duplicated between
records.
• SQL enables programmers to define complex queries that select data from a table by providing a
complete set of commands.
• The results of a query commonly are called result sets (or record sets).
• A typical SQL query selects information from one or more tables in a database. Such selections
are performed by SELECT queries. The simplest format of a SELECT query is
• An asterisk (*) indicates that all rows and columns from table tableName of the database should
be selected.
• To select specific fields from a table, replace the asterisk (*) with a comma-separated list of field
names.
• In most cases, it is necessary to locate records in a database that satisfy certain selection criteria.
Only records that match the selection criteria are selected. SQL uses the optional WHERE clause in
a SELECT query to specify the selection criteria for the query. The simplest format of a SELECT
query with selection criteria is
• The WHERE clause condition can contain operators <, >, <=, >=, =, <> and LIKE.
• Operator LIKE is used for pattern matching with wildcard characters percent ( % ) and underscore
( _ ). Pattern matching allows SQL to search for similar strings that “match a pattern.”
pythonhtp1_17.fm Page 609 Wednesday, December 19, 2001 2:46 PM
• A pattern that contains a percent character (%) searches for strings that have zero or more charac-
ters at the percent character’s position in the pattern.
• An underscore ( _ ) in the pattern string indicates a single character at that position in the pattern.
• The results of a query can be arranged in ascending or descending order using the optional ORDER
BY clause. The simplest form of an ORDER BY clause is
where ASC specifies ascending order (lowest to highest), DESC specifies descending order (high-
est to lowest) and field specifies the field on which the sort is based.
• Multiple fields can be used for ordering purposes with an ORDER BY clause of the form
where sortingOrder is either ASC or DESC. Note that the sortingOrder does not have to be iden-
tical for each field.
• The WHERE and ORDER BY clauses can be combined in one query.
• A join merges records from two or more tables by testing for matching values in a field that is com-
mon to both tables. The simplest format of a join is
• A fully qualified name specifies the fields from each table that should be compared to join the ta-
bles. The “tableName.” syntax is required if the fields have the same name in both tables. The
same syntax can be used in a query to distinguish fields in different tables that happen to have the
same name. Fully qualified names that start with the database name can be used to perform cross-
database queries.
• The INSERT statement inserts a new record in a table. The simplest form of this statement is
where tableName is the table in which to insert the record. The tableName is followed by a com-
ma-separated list of field names in parentheses. (This list is not required if the INSERT INTO op-
eration specifies a value for every column of the table in the correct order.) The list of field names
is followed by the SQL keyword VALUES and a comma-separated list of values in parentheses.
The values specified here should match the field names specified after the table name in order and
type (i.e., if fieldName1 is supposed to be the FirstName field, then value1 should be a string in
single quotes representing the first name).
• An UPDATE statement modifies data in a table. The simplest form for an UPDATE statement is
UPDATE tableName
SET fieldName1 = value1, …, fieldNameN = valueN
WHERE criteria
where tableName is the table in which to update a record (or records). The tableName is followed by
keyword SET and a comma-separated list of field name/value pairs in the format fieldName = value.
The WHERE clause specifies the criteria used to determine which record(s) to update.
pythonhtp1_17.fm Page 610 Wednesday, December 19, 2001 2:46 PM
• An SQL DELETE statement removes data from a table. The simplest form for a DELETE state-
ment is
where tableName is the table from which to delete a record (or records). The WHERE clause spec-
ifies the criteria used to determine which record(s) to delete.
• Modules have been written that can interface with most popular databases, hiding database details
from the programmer. These modules follow the Python Database Application Programming In-
terface (DB-API), a document that specifies common object and method names for manipulating
any database.
• The DB-API describes a Connection object that programs create to connect to a database.
• A program can use a Connection object to create a Cursor object, which the program uses to
execute queries against the database.
• The major benefit of the DB-API is that a program does not need to know much about the database
to which the program connects. Therefore, the programmer can change the database a program
uses without changing vast amounts of Python code. However, changing the DB often requires
changes in the SQL code.
• Module MySQLdb contains classes and functions for manipulating MySQL databases in Python.
• Function MySQLdb.connect creates the connection. The function receives the name of the da-
tabase as the value of keyword argument db. If MySQLdb.connect fails, the function raises an
OperationalError exception.
• The Cursor method execute takes as an argument a query string to execute against the data-
base.
• A Cursor object internally stores the results of a database query.
• The Cursor method fetchall returns a tuple of records that matched the query. Each record
is represented as a tuple that contains the values of that records field.
• The Cursor method close closes the Cursor object.
• The Connection method close closes the Connection object.
• A PanedWidget is a subdivided frame that allows the user to change the size of the subdivi-
sions. The PanedWidget constructor’s orient argument takes the value "horizontal" or
"vertical". If the value is "horizontal", the panes are placed left to right in the frame; if
the value is "vertical", the panes are placed top to bottom in the frame.
• Metadata are data that describe other data. The Cursor attribute description contains a tuple
of tuples that provides information about the fields of the data obtained by function execute. The
cursor and connection are closed.
• The PanedWidget method pane takes the name of a pane and returns a reference to that pane.
• The PanedWidget method setnaturalsize sets the size of each pane to be large enough to
view the largest label in the pane.
TERMINOLOGY
AND keyword Connection object
ASC keyword Cursor object
asterisk (*) data attribute
close method database
column database management system (DBMS)
pythonhtp1_17.fm Page 611 Wednesday, December 19, 2001 2:46 PM
SELF-REVIEW EXERCISES
17.1 Fill in the blanks in each of the following statements:
a) The most popular database query language is .
b) A relational database is composed of .
c) A table in a database consists of and .
d) The uniquely identifies each record in a table.
e) SQL provides a complete set of commands (including SELECT) that enable program-
mers to define complex .
f) SQL keyword is followed by the selection criteria that specify the records to
select in a query.
g) SQL keyword specifies the order in which records are sorted in a query.
h) A specifies the fields from multiple tables table that should be compared to
join the tables.
i) A is an integrated collection of data which is centrally controlled.
j) A is a field in a table for which every entry has a unique value in another
table and where the field in the other table is the primary key for that table.
17.2 State whether the following are true or false. If false, explain why.
a) DELETE is not a valid SQL keyword.
b) Tables in a database must have a primary key.
c) Python programmers communicate with databases using modules that conform to the
DB-API.
d) UPDATE is a valid SQL keyword.
e) The WHERE clause condition can not contain operator <>.
f) Not all database systems support the LIKE operator.
g) The INSERT INTO statement inserts a new record in a table.
h) MySQLdb.connect is used to create a connection to database.
pythonhtp1_17.fm Page 612 Wednesday, December 19, 2001 2:46 PM
EXERCISES
17.3 Write SQL queries for the Books database (discussed in Section 17.3) that perform each of
the following tasks:
a) Select all authors from the Authors table.
b) Select all publishers from the Publishers table.
c) Select a specific author and list all books for that author. Include the title, copyright year
and ISBN number. Order the information alphabetically by title.
d) Select a specific publisher and list all books published by that publisher. Include the title,
copyright year and ISBN number. Order the information alphabetically by title.
17.4 Write SQL queries for the Books database (discussed in Section 17.3) that perform each of
the following tasks:
a) Add a new author to the Authors table.
b) Add a new title for an author (remember that the book must have an entry in the
AuthorISBN table). Be sure to specify the publisher of the title.
c) Add a new publisher.
17.5 Modify Fig. 17.27 so that the user can read different tables in the books database.
17.6 Create a MySQL database that contains information about students in a university. Possible
fields might include date of birth, major, current grade point average, credits earned, etc. Write a Py-
thon program to manage the database. Include the following functionality: sort all students according
to GPA (descending), create a display of all students in one particular major and remove all records
from the database where the student has the required amount of credits to graduate.
17.7 Modify the FIND capability in Fig. 17.29 to allow the user to scroll through the results of the
query in case there is more than one person with the specified last name in the Address Book. Provide
an appropriate GUI.
17.8 Modify the solution from Exercise 17.7 so that the program checks whether a record already
exists in the database before adding it.
19
Multithreading
Objectives
• To understand the notion of multithreading.
• To appreciate how multithreading can improve
performance.
• To understand how to create, manage and destroy
threads.
• To understand the life cycle of a thread.
• To study several examples of thread synchronization.
• To understand daemon threads.
The spider’s touch, how exquisitely fine!
Feels at each thread, and lives along the line.
Alexander Pope
A person with one watch knows what time it is; a person with
two watches is never sure.
Proverb
Conversation is but carving!
Give no more to every guest,
Then he’s able to digest.
Jonathan Swift
Learn to labor and to wait.
Henry Wadsworth Longfellow
The most general definition of beauty…Multeity in Unity.
Samuel Taylor Coleridge
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
746 Multithreading Chapter 19
Outline
19.1 Introduction
19.2 threading Module
19.3 Thread Scheduling
19.4 Thread States: Life Cycle of a Thread
19.5 Thread Synchronization
19.6 Producer/Consumer Relationship Without Thread Synchronization
19.7 Producer/Consumer Relationship With Thread Synchronization
19.8 Producer/Consumer Relationship: The Circular Buffer
19.9 Semaphores
19.10 Events
19.11 Daemon Threads
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
19.1 Introduction
In Chapter 18, we discussed how to use processes to perform concurrent tasks in our pro-
grams. In this chapter, we discuss multithreading techniques for performing similar tasks.
A thread is often called a “light-weight” process, because the operating system generally
requires less resources to create and manage threads.
Python is different than many popular general-purpose programming languages in that
it makes multithreading primitives available to the applications programmer. The pro-
grammer specifies that applications contain threads of execution, each thread designating
a portion of a program that may execute concurrently with other threads. This capability
gives the Python programmer powerful capabilities not available in C, C++ or other single-
threaded languages.
Many tasks require a multithreaded programming approach. When a browser down-
loads large files such as audio clips or video clips from the World Wide Web, we do not
want to wait until an entire clip is downloaded before starting the playback. So we can put
multiple threads to work: one that downloads a clip, and another that plays the clip so that
these activities, or tasks, may proceed concurrently. To avoid choppy playback, we coordi-
nate the threads so that the player thread does not begin until there is a sufficient amount of
the clip in memory to keep the player thread busy.
Performance Tip 19.1
A problem with single-threaded applications is that possibly lengthy activities must complete
before other activities can begin. Users feel they already spend too much time waiting with
Internet and World Wide Web applications, so multithreading is immediately appealing. 19.1
Writing multithreaded programs can be tricky. Although the human mind can perform
many functions concurrently, humans find it difficult to jump between parallel “trains of
thought.” To see why multithreading can be difficult to program and understand, try the fol-
lowing experiment: Open three books to page 1. Now try reading the books concurrently.
Read a few words from the first book, then read a few words from the second book, then
read a few words from the third book, then loop back and read the next few words from the
first book, and so on. After a brief time you rapidly appreciate the challenges of multi-
threading: switching between books, reading briefly, remembering your place in each
book, moving the book you are reading closer so you can see it, pushing books you are not
reading aside, and amidst all this chaos, trying to comprehend the content of the books!
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
748 Multithreading Chapter 19
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 749
born
start
ready
es
p ir
I /O
l ex
co m
rva
expiration
e
ple
int
tio
ep
running
n
sl e
ep issu
le e I/
e.s Or
t im eq u
complete e st
One common way for a running thread to enter the blocked state is when the thread
issues an input/output request. In this case, a blocked thread becomes ready when the I/O
it is waiting for completes. The interpreter does not execute a blocked thread even if the
interpreter is free.
When a running thread calls function time.sleep, that thread enters the sleeping
state. A sleeping thread becomes ready after the designated sleep time expires. A sleeping
thread cannot use the interpreter. A thread enters the dead state when its run method either
completes or raises an uncaught exception.
The program in Fig. 19.2 demonstrates basic threading techniques, including creation
of a class derived from threading.Thread, construction of a thread and using function
time.sleep in a thread. Each thread of execution created in the program displays its
name after sleeping for a random amount of time between 1 and 5 seconds.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
750 Multithreading Chapter 19
Starting threads
thread1 going to sleep
thread2 going to sleep
thread3 going to sleep
thread4 going to sleep
Threads started
run method. Attribute sleepTime stores a random integer value determined when a
PrintThread object is constructed. When started, each PrintThread object sleeps
for the amount of time specified by sleepTime, and then outputs its name.
The PrintThread constructor (lines 11–17) first calls the base class constructor.
passing the class instance and the thread’s name. A thread’s name is specified with
Thread keyword argument name. If no name is specified, the thread will be assigned a
unique name in the form "Thread-n" where n is an integer. The constructor then initial-
izes sleepTime to a random integer between 1 and 5, inclusive. Then, the program out-
puts the name of the thread and the value of sleepTime, to show the values for the
particular PrintThread being constructed.
When a PrintThread’s start method (inherited from Thread) is invoked, the
PrintThread object enters the ready state. When the interpreter switches in the
PrintThread object, it enters the running state and its run method begins execution.
Method run (lines 20–25) prints a message indicating that the thread is going to sleep and
then invokes function time.sleep (line 24) to immediately put the thread into a sleeping
state. When the thread awakens after sleepTime seconds, it is placed into a ready state
again until it is switched into the processor. When the PrintThread object enters the
running state again, it outputs its name (indicating that the thread is done sleeping), its run
method terminates and the thread object enters the dead state.
The main portion of the program instantiates four PrintThread objects and invokes
the Thread class start method on each one to place all four PrintThread objects in
a ready state. After this, the main program’s thread terminates. However, the example con-
tinues running until the last PrintThread dies (i.e., has completed its run method).
other thread that attempts to enter the critical section will block until the original thread has
exited the critical section.
Such a procedure provides only the most basic level of synchronization. Sometimes,
however, we would like to create more sophisticated threads that access a critical section
only when some event occurs (i.e., a data value has changed). This can be done by using
condition variables. A thread uses a condition variable when the thread wants to monitor
the state of some object or wants to be notified when some event occurs. When the object’s
state changes or the event occurs, blocked threads are notified. We discuss condition vari-
ables throughout this chapter in the context of the classic producer/consumer problem. The
solution involves a consumer thread that accesses a critical section only when notified by
a producer thread, and vice versa.
Condition variables are created with class threading.Condition. Because con-
dition variables contain an underlying lock, condition variables provide acquire and
release methods. Additional condition variable methods are wait and notify. When
a thread has acquired the underlying lock, calling method wait releases the lock and
causes the thread to block until it is awakened by a call to notify on the same condition
variable. Calling method notify wakes up one thread waiting on the condition variable.
All waiting threads can be woken up by invoking the condition variable’s notifyAll
method.
Semaphores (created with class threading.Semaphore) are synchronization
primitives that allow a set number of threads to access a critical section. The Semaphore
object uses a counter to keep track of the number of threads that acquire and release the
semaphore. When a thread calls method acquire, the thread blocks if the counter is 0.
Otherwise, the thread acquires the semaphore and method acquire decrements the
counter. Calling method release releases the semaphore, increments the counter and
notifies a waiting thread. The initial value of the internal counter can be passed as an argu-
ment to the Semaphore constructor (default is 1). Because the internal counter can never
have a negative value, specifying a negative counter value raises an AssertionError
exception.
Sometimes, one or more threads want to wait for a particular event to occur before pro-
ceeding with their execution. An Event object (created with class threading.Event)
has an internal flag that is initially set to false (i.e., the event has not occurred). A thread
that calls Event method wait blocks until the event occurs. When the event occurs,
method set is called to set the flag to true and awaken all waiting threads. A thread that
calls wait after the flag is true does not block at all. Method isSet returns true if the flag
is true. Method clear sets the flag to false.
Writing a program that uses locks, condition variables or any other synchronization
primitive takes careful scrutiny to ensure that the program does not deadlock. A program
or thread deadlocks when the program or thread blocks forever on a needed resource. For
example, consider the scenario where a thread enters a critical section that tries to open a
file. If the file does not exists and the thread does not catch the exception, the thread termi-
nates before releasing the lock. Now all other threads will deadlock, because they block
indefinitely after they call the lock’s acquire method.
Common Programming Error 19.1
Threads in the waiting state for a lock object must eventually be awakened explicitly (i.e., by
releasing the lock) or the thread will wait forever. This may cause deadlock. 19.1
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 753
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
754 Multithreading Chapter 19
18
19 # wait for threads to terminate
20 producer.join()
21 consumer.join()
22
23 print "\nAll threads have terminated."
Starting threads...
Because the threads are not synchronized, data can be lost if the producer places new
data into the slot before the consumer consumes the previous data, and data can be “dou-
bled” if the consumer consumes data again before the producer produces the next item. To
show these possibilities, the consumer thread in this example sums all the values it reads.
The producer thread produces values from 1 to 10. If the consumer is able to read each
value produced once, the sum would be 55. However, if you execute this program several
times, you will see that the total is rarely, if ever, 55.
Figure 19.3 instantiates the shared UnsynchronizedInteger object number
and uses it as the argument to the constructors for the ProduceInteger object pro-
ducer and the ConsumeInteger object consumer. Next, the program invokes the
Thread class start method on objects producer and consumer to place them in the
ready state (lines 16–17). This statement launches the two threads. Lines 20–21 call
Thread method join to ensure that the main program waits indefinitely for both threads
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 755
to terminate before continuing. Notice that line 23 is executed after both threads have ter-
minated.
Class ProduceInteger—a subclass of threading.Thread—consists of
attribute sharedObject, a constructor (lines 11–15) and a run method (lines 17–25).
The constructor initializes attribute sharedObject to refer to the Unsynchronized-
Integer object passed as an argument.
Class ProduceInteger’s run method consists of a for structure that loops ten
times. Each iteration of the loop first invokes function time.sleep to put the Produ-
ceInteger object into the sleeping state for a random time interval between 0 and 3 sec-
onds. When the thread awakens, it invokes the shared object’s setSharedNumber
method (line 22) with the value of control variable i to set the shared object’s data member.
When the loop completes, the ProduceInteger thread displays a line in the command
window indicating that it has finished producing data and terminates (i.e., the thread dies).
Class ConsumeInteger—a subclass of threading.Thread—consists of
attribute sharedObject, a constructor (lines 11–15) and a run method (lines 17–29).
The constructor initializes attribute sharedObject to refer to the Unsynchronized-
Integer object passed as an argument.
3
4 import threading
5 import random
6 import time
7
8 class ConsumeInteger( threading.Thread ):
9 """Thread to consume integers"""
10
11 def __init__( self, threadName, sharedObject ):
12 """Initialize thread, set shared object"""
13
14 threading.Thread.__init__( self, name = threadName )
15 self.sharedObject = sharedObject
16
17 def run( self ):
18 """Consume 10 values at random time intervals"""
19
20 sum = 0 # total sum of consumed values
21
22 # consume 10 values
23 for i in range( 10 ):
24 time.sleep( random.randrange( 4 ) )
25 sum += self.sharedObject.getSharedNumber()
26
27 print "%s retrieved values totaling: %d" % \
28 ( self.getName(), sum )
29 print "Terminating", self.getName()
Class ConsumeInteger’s run method consists of a for structure that loops ten
times to read values from the UnsynchronizedInteger object to which sharedOb-
ject refers. Each iteration of the loop invokes function time.sleep to put the Con-
sumeInteger object into the sleeping state for a random time interval between 0 and 3
seconds. Next, the thread calls the getSharedNumber method to get the value of the
shared object’s data member. Then, the thread adds to variable sum the value returned by
getSharedInt (line 25). When the loop completes, the ConsumeInteger thread dis-
plays a line in the command window indicating that it has finished consuming data and ter-
minates (i.e., the thread dies).
Class UnsynchronizedInteger’s setSharedNumber method (lines 14–19)
and getSharedNumber method (lines 21–28) do not synchronize access to instance
variable sharedNumber (created in line 12). Ideally, we would like every value pro-
duced by the ProduceInteger object to be consumed exactly once by the Con-
sumeInteger object. However, the output of Fig. 19.3 reveals that the values 1, 6, 7, 8
and 10 are lost (i.e., never seen by the consumer) and the values 2, 4 and 9 are retrieved
more than once by the consumer.
4 import threading
5
6 class UnsynchronizedInteger:
7 """Class that provides unsynchronized access an integer"""
8
9 def __init__( self ):
10 """Initialize shared number to -1"""
11
12 self.sharedNumber = -1
13
14 def setSharedNumber( self, newNumber ):
15 """Set value of integer"""
16
17 print "%s setting sharedNumber to %d" % \
18 ( threading.currentThread().getName(), newNumber )
19 self.sharedNumber = newNumber
20
21 def getSharedNumber( self ):
22 """Get value of integer"""
23
24 tempNumber = self.sharedNumber
25 print "%s retrieving sharedNumber value %d" % \
26 ( threading.currentThread().getName(), tempNumber )
27
28 return tempNumber
In fact, method getSharedNumber must perform some “tricks” to make the output
accurately reflect the value of the data member. Line 24 assigns the value of data member
sharedNumber to variable tempNumber. Lines 25–28 then use the value of temp-
Number to print the message and return the value. If we did not use a temporary variable
in this way, the following scenario could occur. The consumer could call method get-
SharedNumber and print a message that displays the value of the data member. The
interpreter might then switch out the consumer thread for the producer thread. The producer
thread might then change the value of sharedNumber any number of times by calling
method setSharedNumber. Eventually, the interpreter switches the consumer back in
and method getSharedNumber returns a value different that the value printed before
the consumer was switched out.
This example clearly demonstrates that access to shared data by concurrent threads
must be controlled carefully or a program may produce incorrect results. To solve the prob-
lems of lost data and doubled data in the previous example, we must synchronize access to
the shared data for the concurrent producer and consumer threads.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
758 Multithreading Chapter 19
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 759
Starting threads...
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
760 Multithreading Chapter 19
11
12 self.sharedNumber = -1
13 self.writeable = 1 # the value can be changed
14 self.threadCondition = threading.Condition()
15
16 def setSharedNumber( self, newNumber ):
17 """Set value of integer--blocks until lock acquired"""
18
19 # block until lock released then acquire lock
20 self.threadCondition.acquire()
21
22 # while not producer’s turn, release lock and block
23 while not self.writeable:
24 self.threadCondition.wait()
25
26 # (lock has now been re-acquired)
27
28 print "%s setting sharedNumber to %d" % \
29 ( threading.currentThread().getName(), newNumber )
30 self.sharedNumber = newNumber
31
32 self.writeable = 0 # allow consumer to consume
33 self.threadCondition.notify() # wake up a waiting thread
34 self.threadCondition.release() # allow lock to be acquired
35
36 def getSharedNumber( self ):
37 """Get value of integer--blocks until lock acquired"""
38
39 # block until lock released then acquire lock
40 self.threadCondition.acquire()
41
42 # while producer’s turn, release lock and block
43 while self.writeable:
44 self.threadCondition.wait()
45
46 # (lock has now been re-acquired)
47
48 tempNumber = self.sharedNumber
49 print "%s retrieving sharedNumber value %d" % \
50 ( threading.currentThread().getName(), tempNumber )
51
52 self.writeable = 1 # allow producer to produce
53 self.threadCondition.notify() # wake up a waiting thread
54 self.threadCondition.release() # allow lock to be acquired
55
56 return tempNumber
The constructor (lines 9–14) creates attribute writeable and initializes its value to
1. The class’ condition variable—threadCondition—protects access to attribute
writeable. If writeable is 1, a producer can place a value into variable shared-
Number. However, this means that a consumer currently cannot read the value of
sharedNumber. If writeable is 0, a consumer can read a value from variable
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 761
sharedNumber. However, this means that a producer currently cannot place a value into
sharedNumber.
When the ProduceInteger thread object invokes method setSharedNumber
(lines 16–34), a lock is acquired on the condition variable (line 20). The while struc-
ture in lines 23–24 tests the writeable data member. If writeable is 0, line 24
invokes the condition variable’s wait method. This call places the ProduceInteger
thread object that called method setSharedNumber into the waiting state and releases
the lock on the SynchronizedInteger object so other objects may access it.
The ProduceInteger object remains in the waiting state until it is notified that it
may proceed—at which point it enters the ready state and waits for the interpreter to exe-
cute it. When the ProduceInteger object reenters the running state, the object implic-
itly reacquires the lock on the condition variable, and the setSharedNumber method
continues executing in the while structure with the next statement after wait. There are
no more statements, so the while condition is tested again. If the condition is true (i.e.,
writeable is 0), the program displays a message indicating that the producer is setting
sharedNumber to a new value, newNumber (the argument passed to setShared-
Number). writeable is set to 0 to indicate that the shared memory is now full (i.e., a
consumer can read the value and a producer cannot put another value there yet) and condi-
tion variable method notify is invoked. If there are any waiting threads, one thread in the
waiting state is placed into the ready state, indicating that the thread can now attempt its
task again (as soon as it is switched into the interpreter). Lines 34 then calls condition vari-
able method release, and method setSharedNumber returns to its caller.
Common Programming Error 19.2
Condition variable method notify does not release the underlying lock. Forgetting to call
release can result in deadlock. 19.2
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
762 Multithreading Chapter 19
Next, writeable is set to 1 to indicate that the shared memory is now empty, and
condition variable method notify is invoked. If there are any waiting threads, one thread
in the waiting state is placed into the ready state, indicating that the thread can now attempt
its task again (as soon as it is assigned a processor). Line 54 releases the lock on the condi-
tion variable, and line 56 returns the value of tempNumber to getSharedNumber’s
caller.
The output in Fig. 19.7 shows that every integer produced is consumed once—no
values are lost and no values are doubled. Also, the consumer cannot read a value until the
producer produces a value. The next section addresses a way for consumers and producers
to read and write multiple values simultaneously.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 763
Starting threads...
WAITING TO CONSUME
Produced 1 into cell 0 write 1 read 0 [1, -1, -1, -1, -1]
Consumed 1 from cell 0 write 1 read 1 [-1, -1, -1, -1, -1]
BUFFER EMPTY
Produced 2 into cell 1 write 2 read 1 [-1, 2, -1, -1, -1]
Produced 3 into cell 2 write 3 read 1 [-1, 2, 3, -1, -1]
Produced 4 into cell 3 write 4 read 1 [-1, 2, 3, 4, -1]
Consumed 2 from cell 1 write 4 read 2 [-1, -1, 3, 4, -1]
Produced 5 into cell 4 write 0 read 2 [-1, -1, 3, 4, 5]
Produced 6 into cell 0 write 1 read 2 [6, -1, 3, 4, 5]
Produced 7 into cell 1 write 2 read 2 [6, 7, 3, 4, 5]
BUFFER FULL
WAITING TO PRODUCE 8
Consumed 3 from cell 2 write 2 read 3 [6, 7, -1, 4, 5]
Produced 8 into cell 2 write 3 read 3 [6, 7, 8, 4, 5]
BUFFER FULL
Consumed 4 from cell 3 write 3 read 4 [6, 7, 8, -1, 5]
Produced 9 into cell 3 write 4 read 4 [6, 7, 8, 9, 5]
BUFFER FULL
WAITING TO PRODUCE 10
Consumed 5 from cell 4 write 4 read 0 [6, 7, 8, 9, -1]
Produced 10 into cell 4 write 0 read 0 [6, 7, 8, 9, 10]
BUFFER FULL
Producer finished producing values
Terminating Producer
Consumed 6 from cell 0 write 0 read 1 [-1, 7, 8, 9, 10]
Consumed 7 from cell 1 write 0 read 2 [-1, -1, 8, 9, 10]
Consumed 8 from cell 2 write 0 read 3 [-1, -1, -1, 9, 10]
Consumed 9 from cell 3 write 0 read 4 [-1, -1, -1, -1, 10]
Consumed 10 from cell 4 write 0 read 0 [-1, -1, -1, -1, -1]
BUFFER EMPTY
Consumer retrieved values totaling: 55
Terminating Consumer
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
764 Multithreading Chapter 19
3
4 import threading
5
6 class SynchronizedCells:
7
8 def __init__( self ):
9 """Set cells, flags, locations and condition variable"""
10
11 self.sharedCells = [ -1, -1, -1, -1, -1 ] # buffer
12 self.writeable = 1 # buffer may be changed
13 self.readable = 0 # buffer may not be read
14 self.writeLocation = 0 # current writing index
15 self.readLocation = 0 # current reading index
16
17 self.threadCondition = threading.Condition()
18
19 def setSharedNumber( self, newNumber ):
20 """Set next buffer index value--blocks until lock acquired"""
21
22 # block until lock released then acquire lock
23 self.threadCondition.acquire()
24
25 # while buffer is full, release lock and block
26 while not self.writeable:
27 print "WAITING TO PRODUCE", newNumber
28 self.threadCondition.wait()
29
30 # buffer is not full, lock has been re-acquired
31
32 # produce a number in shared cells, consumer may consume
33 self.sharedCells[ self.writeLocation ] = newNumber
34 self.readable = 1
35 print "Produced %2d into cell %d" % \
36 ( newNumber, self.writeLocation ),
37
38 # set writing index to next place in buffer
39 self.writeLocation = ( self.writeLocation + 1 ) % 5
40
41 print " write %d read %d " % \
42 ( self.writeLocation, self.readLocation ),
43 print self.sharedCells
44
45 # if producer has caught up to consumer, buffer is full
46 if self.writeLocation == self.readLocation:
47 self.writeable = 0
48 print "BUFFER FULL"
49
50 self.threadCondition.notify() # wake up a waiting thread
51 self.threadCondition.release() # allow lock to be acquired
52
53 def getSharedNumber( self ):
54 """Get next buffer index value--blocks until lock acquired"""
55
56 # block until lock released then acquire lock
Fig. 19.10 Synchronized circular buffer of integers (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 765
57 self.threadCondition.acquire()
58
59 # while buffer is empty, release lock and block
60 while not self.readable:
61 print "WAITING TO CONSUME"
62 self.threadCondition.wait()
63
64 # buffer is not empty, lock has been re-acquired
65
66 # consume a number from shared cells, producer may produce
67 self.writeable = 1
68 tempNumber = self.sharedCells[ self.readLocation ]
69 self.sharedCells[ self.readLocation ] = -1
70
71 print "Consumed %2d from cell %d" % \
72 ( tempNumber, self.readLocation ),
73
74 # move to next produced number
75 self.readLocation = ( self.readLocation + 1 ) % 5
76
77 print " write %d read %d " % \
78 ( self.writeLocation, self.readLocation ),
79 print self.sharedCells
80
81 # if consumer has caught up to producer, buffer is empty
82 if self.readLocation == self.writeLocation:
83 self.readable = 0
84 print "BUFFER EMPTY"
85
86 self.threadCondition.notify() # wake up a waiting thread
87 self.threadCondition.release() # allow lock to be acquired
88
89 return tempNumber
in the buffer in which a value can be placed. Next, the method assigns to tempNumber
the value at location readLocation in the circular buffer. Line 69 sets the value at loca-
tion readLocation in the buffer to –1, indicating it is an empty spot. The value con-
sumed and the cell from which the value was read are printed. Then, the method updates
attribute readLocation for the next call to method getSharedNumber. The output
continues with the current writeLocation and readLocation values and the cur-
rent values in the circular buffer. If the readLocation is equal to the writeLoca-
tion, the circular buffer is currently empty, so readable is set to 0 and the string
"BUFFER EMPTY" is displayed. Next, line 86 invokes condition variable method notify
to place the next waiting thread into the ready state. Line 87 invokes condition variable
method release to release the condition variable’s underlying lock. Finally, line 89
returns the retrieved value to the calling thread.
We have modified the program of Fig. 19.9 to include the current writeLocation
and readLocation values. We also display the current contents of the buffer shared-
Cells. The elements of the sharedCells list were initialized to –1 for output purposes
so you can see each value inserted into the buffer. Notice that after the fifth value is placed
in the fifth element of the buffer, the sixth value is inserted at the beginning of the list—
thus providing the circular buffer effect.
19.9 Semaphores
A semaphore is a variable that controls access to a common resource or a critical section.
A semaphore maintains a counter that specifies the number of threads that can use the re-
source or enter the critical section. The counter is decremented each time a thread acquires
the semaphore. When the counter is zero, the semaphore blocks any other threads until the
semaphore has been released by another thread. Figure 19.11 uses a restaurant scenario to
demonstrate using semaphores to control access to a critical section.
21
22 def run( self ):
23 """Print message and release semaphore"""
24
25 # acquire the semaphore
26 self.threadSemaphore.acquire()
27
28 # remove a table from the list
29 table = SemaphoreThread.availableTables.pop()
30 print "%s entered; seated at table %s." % \
31 ( self.getName(), table ),
32 print SemaphoreThread.availableTables
33
34 time.sleep( self.sleepTime ) # enjoy a meal
35
36 # free a table
37 print " %s exiting; freeing table %s." % \
38 ( self.getName(), table ),
39 SemaphoreThread.availableTables.append( table )
40 print SemaphoreThread.availableTables
41
42 # release the semaphore after execution finishes
43 self.threadSemaphore.release()
44
45 threads = [] # list of threads
46
47 # semaphore allows five threads to enter critical section
48 threadSemaphore = threading.Semaphore(
49 len( SemaphoreThread.availableTables ) )
50
51 # create ten threads
52 for i in range( 1, 11 ):
53 threads.append( SemaphoreThread( "thread" + str( i ),
54 threadSemaphore ) )
55
56 # start each thread
57 for thread in threads:
58 thread.start()
Fig. 19.11 Using a semaphore to control access to a critical section (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
768 Multithreading Chapter 19
Fig. 19.11 Using a semaphore to control access to a critical section (part 3 of 3).
19.10 Events
Module threading defines class Event, which is useful for thread communication. An
Event object has an internal flag, which is either true or false. One or more threads may
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 769
call the Event object’s wait method to block until the event occurs. When the event oc-
curs, the blocked thread or threads are notified and resume execution. Figure 19.12 illus-
trates a situation where a traffic light turns green every 3 seconds.
50 greenLight.clear()
51 print "RED LIGHT!"
52
53 time.sleep( 3 )
54
55 # sets the Event object’s flag to true
56 print "GREEN LIGHT!"
57 greenLight.set()
RED LIGHT!
Vehicle4 arrived at Mon Aug 20 16:58:33 2001
Vehicle8 arrived at Mon Aug 20 16:58:33 2001
Vehicle9 arrived at Mon Aug 20 16:58:35 2001
Vehicle10 arrived at Mon Aug 20 16:58:35 2001
GREEN LIGHT!
Vehicle4 passes through intersection at Mon Aug 20 16:58:35 2001
Vehicle8 passes through intersection at Mon Aug 20 16:58:35 2001
Vehicle9 passes through intersection at Mon Aug 20 16:58:35 2001
Vehicle10 passes through intersection at Mon Aug 20 16:58:35 2001
RED LIGHT!
Vehicle2 arrived at Mon Aug 20 16:58:36 2001
Vehicle5 arrived at Mon Aug 20 16:58:37 2001
Vehicle7 arrived at Mon Aug 20 16:58:37 2001
GREEN LIGHT!
Vehicle2 passes through intersection at Mon Aug 20 16:58:38 2001
Vehicle5 passes through intersection at Mon Aug 20 16:58:38 2001
Vehicle7 passes through intersection at Mon Aug 20 16:58:38 2001
RED LIGHT!
Vehicle1 arrived at Mon Aug 20 16:58:39 2001
Vehicle6 arrived at Mon Aug 20 16:58:40 2001
Vehicle3 arrived at Mon Aug 20 16:58:41 2001
GREEN LIGHT!
Vehicle1 passes through intersection at Mon Aug 20 16:58:41 2001
Vehicle6 passes through intersection at Mon Aug 20 16:58:41 2001
Vehicle3 passes through intersection at Mon Aug 20 16:58:41 2001
Fig. 19.12 Traffic light example demonstrating an Event object (part 2 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Chapter 19 Multithreading 771
setDaemon( 1 )
An argument of 0 means that the thread is not a daemon thread. A program can include a
mixture of daemon threads and non-daemon threads. When only daemon threads remain in
a program, the program exits. If a thread is to be a daemon, it must be set as such before its
start method is called; otherwise, setDaemon raises an AssertionError excep-
tion. Method isDaemon returns 1 if a thread is a daemon thread and returns 0 otherwise.
SUMMARY
[***To be done for second round of review***]
TERMINOLOGY
[***To be done for second round of review***]
SELF-REVIEW EXERCISES
[***To be done for second round of review***]
EXERCISES
[***To be done for second round of review***]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
772 Multithreading Chapter 19
Notes to Reviewers:
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send e-mails with detailed, line-by-line comments; mark these directly on the paper
pages.
• Please feel free to send any lengthy additional comments by e-mail to
[email protected].
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copy edited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are mostly concerned with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing or coding style on a
global scale. Please send us a short e-mail if you would like to make a suggestion.
• If you find something incorrect, please show us how to correct it.
• In the later round(s) of review, please read all the back matter, including the exercises and any so-
lutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/21/01
Index 1
Objectives
• To understand the elements of Python networking
with URLs, sockets and datagrams.
• To implement Python networking applications using
sockets and datagrams.
• To understand how to implement Python clients and
servers that communicate with one another.
• To understand how to implement network-based
collaborative applications.
• To construct a multithreaded server.
If the presence of electricity can be made visible in any part
of a circuit, I see no reason why intelligence may not be
transmitted instantaneously by electricity.
Samuel F. B. Morse
Mr. Watson, come here, I want you.
Alexander Graham Bell
What networks of railroads, highways and canals were in
another age, the networks of telecommunications,
information and computerization … are today.
Bruno Kreisky, Austrian Chancellor
Science may never come up with a better office-
communication system than the coffee break.
Earl Wilson
It’s currently a problem of access to gigabits through
punybaud.
J. C. R. Licklider
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
760 Networking Chapter 20
Outline
20.1 Introduction
20.2 Accessing URLs over HTTP
20.3 Establishing a Simple Server (Using Stream Sockets)
20.4 Establishing a Simple Client (Using Stream Sockets)
20.5 Client/Server Interaction with Stream Socket Connections
20.6 Connectionless Client/Server Interaction with Datagrams
20.7 Client/Server Tic-Tac-Toe Using a Multithreaded Server
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
20.1 Introduction
In this chapter, our discussion focuses on several fundamental networking technologies that
can be used to build distributed applications. We revisit the client/server relationship be-
tween World Wide Web browsers and World Wide Web servers to demonstrate a script that
causes the Web browser to load a new Web page.
Because Python is such a high-level language, networking tasks that take a great deal
of code and effort in other languages can be accomplished easily and simply in Python. This
chapter highlights the most frequently used Python networking capabilities. We demon-
strate module urllib and its ability to obtain a document downloaded from the World
Wide Web.
We also introduce Python’s socket-based communications, which enable applications
to view networking as if it were file I/O—a program can receive from a socket or send to a
socket as simply as reading from a file or writing to a file. We show how to create and
manipulate sockets.
Python provides stream sockets and datagram sockets. With stream sockets a process
establishes a connection to another process. While the connection is in place, data flows
between the processes in continuous streams. Stream sockets are said to provide a connec-
tion-oriented service. The protocol used for transmission is the popular TCP (Transmission
Control Protocol).
With datagram sockets, individual packets of information are transmitted. This is not
the right protocol for everyday users because unlike TCP, the protocol used, UDP—the
User Datagram Protocol, is a connectionless service, and does not guarantee that packets
arrive in any particular order. In fact, packets can be lost, can be duplicated and can even
arrive out of sequence. So with UDP, significant extra programming is required on the
user’s part to deal with these problems (if the user chooses to do so). Stream sockets and
the TCP protocol will be the most desirable for the vast majority of Python programmers.
Performance Tip 20.1
Connectionless services generally offer greater performance but less reliability than connec-
tion-oriented services. 20.1
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
761 Networking Chapter 20
Once again, we will see that many of the networking details for the examples in this
chapter are handled by the Python modules we use.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
762 Networking Chapter 20
29
30 def getPage( self, event ):
31 "Parse the URL, add addressing scheme and retrieve file"
32
33 # parse the URL
34 myURL = event.widget.get()
35 components = urlparse.urlparse( myURL )
36 self.contents.text_state = NORMAL
37
38 # if addressing scheme not specified, use http
39 if components[ 0 ] == "":
40 myURL = "http://" + myURL
41
42 # connect and retrieve the file
43 try:
44 tempFile = urllib.urlopen( myURL )
45 self.contents.settext( tempFile.read() ) # show results
46 except IOError:
47 self.contents.settext( "Error finding file" )
48
49 self.contents.text_state = DISABLED
50
51 def main():
52 WebBrowser().mainloop()
53
54 if __name__ == "__main__":
55 main()
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
763 Networking Chapter 20
creates a new socket using the specified address family and type. Argument family can be
either AF_INET or AF_UNIX. In this chapter, we use only AF_INET. The most common
values for argument type are SOCK_STREAM (for stream sockets) and SOCK_DGRAM (for
datagram sockets). Note that these constants are defined in module socket. For the pur-
poses of our discussion, we assume that we have created a stream socket. Section 20.6 dis-
cusses datagram sockets.
Once a socket is created, it must be bound to an address (step 2). A call to a socket
instance’s bind method such as
socket.bind( address )
binds the socket to the specified address. For a socket created by specifying family
AF_INET, address must be a two-element tuple in the form (host, port), where host is a
string representing the remote machine’s hostname or an IP address, and port is a port num-
ber (i.e., integer). The preceding statement reserves a port where the server waits for con-
nections from clients. Each client asks to connect to the server on this port. Method bind
raises the exception socket.error if the port is already in use, the hostname is incorrect
or the port is reserved.
Software Engineering Observation 20.1
Port numbers can be between 0 and 65535. Many operating systems reserve port numbers
below 1024 for system services (such as email and World Wide Web servers). Generally,
these ports should not be specified as connection ports in user programs. In fact, some oper-
ating systems require special access privileges to use port numbers below 1024. 20.1
The socket instance is now ready to receive a connection. In order to do so, the
socket must prepare for a connection (step 3). This is done with a call to socket method
listen of the form
socket.listen( backlog )
where backlog specifies the maximum number of clients that can request connections to the
server. This value should be at least 1. As connections are received, they are queued. If the
queue is full, client connections are refused.
The server socket then waits for a client to connect (step 4) with a call to socket
method accept
The socket waits indefinitely (or blocks) when it calls method accept. When a client
requests a connection, the method accepts the connection and returns to the server. Method
accept returns a two-element tuple of the form (connection, address). The first element
of the returned tuple (connection) is a new socket object that the server uses to commu-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
765 Networking Chapter 20
nicate with the client. The second element (address) corresponds to the client’s Internet ad-
dress.
Step 5 is the processing phase in which the server and the client communicate. The
server sends information to the client by invoking socket method send and passing the
information in the form of a string. Method send returns the number of bytes sent. The
server receives information from the client with socket method recv. When calling
recv, the server must specify an integer that corresponds to the maximum amount of data
that can be received at once. Method recv returns a string representing the received data.
If the amount of data sent is greater than recv allows, the data is truncated and recv
returns the maximum amount of data allowed. The excess data is buffered on the receiving
end. On a subsequent call to recv, the excess data is removed from the buffer (along with
any additional data the client may have sent since the previous call to recv).
Common Programming Error 20.2
A socket’s send method accepts only a string argument. Trying to pass a value with a dif-
ferent type (e.g., an integer) results in an error. 20.2
In step 6, when the transmission is complete, the server closes the connection by
invoking the close method on the socket.
Software Engineering Observation 20.2
With Python’s multithreading capabilities, we can easily create multithreaded servers that
can manage many simultaneous connections with many clients; this multithreaded-server ar-
chitecture is precisely what is used in popular UNIX, Windows NT and OS/2 network serv-
ers. 20.2
Step 2 connects to the server using socket method connect. Method connect
takes as input the address of the socket to connect to. For AF_INET client sockets, the call
to connect has the form
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
766 Networking Chapter 20
where host is a string representing the server’s hostname or IP address, and port is the in-
teger port number that corresponds to the server process. If the connection attempt is suc-
cessful, the client can now communicate with the server over the socket. A connection
attempt that fails raises the socket.error exception.
Common Programming Error 20.3
A socket.error exception is raised when a server address indicated by a client cannot
be resolved or when an error occurs while attempting to connect to a server. 20.3
Step 3 is the processing phase in which the client and the server communicate via
methods send and recv. In step 4 when the transmission is complete, the client closes the
connection by invoking the close method on the socket.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
767 Networking Chapter 20
Fig. 20.2 The server portion of a stream socket connection between a client and a
server (part 2 of 2).
Lines 13–40 set up the server to receive a connection and to process the connection
when it is received. Line 13 creates socket object mySocket to wait for connections.
Integer counter (line 10) keeps track of the total number of connections processed.
Line 16 binds mySocket to port 5000. Note that HOST is the string "127.0.0.1".
This causes the socket to use localhost, the hostname that corresponds to the machine
on which the program is running. [Note: We chose to demonstrate the client/server relation-
ship by connecting between programs executing on the same computer (localhost). Nor-
mally, this first argument would be a string containing the Internet address of another
computer.] Lines 18–31 contain a while loop in which the server receives and processes
each client connection. Line 22 listens for a connection from a client at port 5000. The
argument to listen is the number of connections that can wait in a queue to connect to
the server (1 in this example). If the queue is full when a client requests a connection, the
connection is refused.
Method listen sets up a listener to wait for a client connection. Once a connection
is received, socket method accept (line 25) creates a socket object that manages the
connection. Recall that accept returns a two-element tuple. The first element is a new
socket instance that we call connection. The second element is the Internet address
of the client computer that connected to this server (in the form (host, port) for AF_INET
sockets). Once a new socket for the current connection exists, line 26 prints a message
displaying the connection number and the client address.
Line 29 calls socket method send to send the string "SERVER>>> Connection
successful" to the client. Line 30 calls socket method recv to receive a string from
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
768 Networking Chapter 20
the client of maximum size 1024 bytes. The while loop in lines 32–40 loops until the
server receives the message "CLIENT>>> TERMINATE". Lines 34–35 check whether the
connection has been closed by the client. When a connection has been closed, recv returns
an empty string. If this is the case, the break statement exits the loop. Otherwise, line 37
prints the message received from the client.
Function raw_input (line 38) reads a string from the user. The server sends this
string to the client (line 39) and receives a message from the client (line 40). When the
transmission is complete, line 44 closes the socket. The server awaits the next connection
attempt from a client.
In our example, the server receives a connection, processes the connection, closes the
connection and waits for the next connection. A more likely scenario would be a server that
receives a connection, sets up that connection to be processed as a separate thread of exe-
cution and then waits for new connections. The separate threads that process existing con-
nections can continue to execute while the server concentrates on new connection requests.
We leave it as an exercise to implement this multithreaded approach to the server applica-
tion.
The client is displayed in Fig. 20.3. Sample output from a client/server connection fol-
lows the code.
Attempting connection
Connected to Server
SERVER>>> Connection successful
CLIENT>>> Hi to person at server
Attempting connection
Connected to Server
SERVER>>> Connection successful
CLIENT>>> Hi to person at server
SERVER>>> Hi back to you--client!
CLIENT>>> TERMINATE
Fig. 20.3 Demonstrating the client portion of a stream socket connection between a
client and a server (part 2 of 2).
Lines 12–29 perform the work necessary to connect to the server, to receive data from
the server and to send data to the server. Line 12 creates a socket object—mySocket—
to establish a connection. Line 15 attempts to connect to the server by calling socket
method connect with one argument, a two-element tuple. Variable PORT is the same as
in Fig. 20.2 (5000). This ensures that the client socket attempts to connect to the server
on the port to which the server is bound.
If the connection is successful, line 16 prints a message to the screen. The socket
method recv (line 19) receives a message from the server (i.e., "SERVER>>> Connec-
tion successful"). The while loop (lines 21–29) executes until the client receives
the message "SERVER>>> TERMINATE". As in the server program, line 23 checks each
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
770 Networking Chapter 20
received message to see if the server has closed the connection. If so, the break statement
exits the while loop (line 24).
Each iteration of the loop prints the message received from the server and calls func-
tion raw_input to read a string from the user. Line 28 sends this string to the server by
invoking socket method send. The client then receives the next message from the server
(line 29). When the transmission is complete, line 33 closes the socket instance
mySocket.
Packet received:
From host: 127.0.0.1
Host port: 1645
Length: 20
Containing:
first message packet
Fig. 20.4 The server side of a connectionless client/server computing with datagrams
(part 2 of 2).
The server (Fig. 20.4) defines one socket instance that sends and receives datagram
(SOCK_DGRAM) packets. Note that the specified socket type is SOCK_DGRAM. This
ensures that mySocket will be a datagram socket. Line 14 binds the socket to a port
(5000) where packets can be received from clients. Clients sending packets to this server
specify port 5000 in the packets they send.
The while loop in lines 16–31 receives packets from the client. First, line 19 waits
for a packet to arrive. The recvfrom method blocks until a packet arrives. Once a packet
arrives, recvfrom returns a string representing the data received and the address of the
socket sending the data. The server then prints a message to the screen that contains the
address of the client and the data sent.
Line 30 calls socket method sendto to echo the data back to the client. The
method’s first argument specifies the data to be sent. The second argument is a tuple that
specifies the client computer’s Internet address to which the packet will be sent and the port
where the client is waiting to receive packets.
The client (Fig. 20.5) works similarly to the server, except that the client sends packets
only when it is told to do so by the user typing a message and pressing the Enter key. The
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
772 Networking Chapter 20
while loop in lines 13–29 sends packets to the server using sendto (line 18) and waits
for packets using recvfrom at line 22, which blocks until a packet arrives.
Packet received:
From host: 127.0.0.1
Host port: 5000
Length: 20
Containing:
first message packet
Packet>>>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
773 Networking Chapter 20
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
774 Networking Chapter 20
37
38 # wait for another player to arrive
39 if self.mark == "X":
40 self.connection.send( "Waiting for another player..." )
41 self.server.gameBeginEvent.wait()
42 self.connection.send(
43 "Other player connected. Your move." )
44 else:
45 self.server.gameBeginEvent.wait() # wait for server
46 self.connection.send( "Waiting for first move..." )
47
48 # play game
49 while not self.server.gameOver():
50 location = self.connection.recv( 2 )
51
52 if not location:
53 break
54
55 if self.server.validMove( int( location ), self.number ):
56 self.server.display( "loc: " + location )
57 self.connection.send( "Valid move." )
58 else:
59 self.connection.send( "Invalid move, try again." )
60
61 self.connection.close()
62 self.server.display( "Game over." )
63 self.server.display( "Connection closed." )
64
65 class TicTacToeServer:
66 "Server that maintains a game of Tic-Tac-Toe for two clients"
67
68 def __init__( self ):
69 "Initialize variables and setup server"
70
71 HOST = ""
72 PORT = 5000
73
74 self.board = []
75 self.currentPlayer = 0
76 self.turnCondition = threading.Condition()
77 self.gameBeginEvent = threading.Event()
78
79 for i in range( 9 ):
80 self.board.append( None )
81
82 # setup server socket
83 self.server = socket.socket( socket.AF_INET,
84 socket.SOCK_STREAM )
85 self.server.bind( ( HOST, PORT ) )
86 self.display( "Server awaiting connections..." )
87
88 def execute( self ):
89 "Play the game--create and start both Player threads"
90
Fig. 20.6 Server side of client/server Tic-Tac-Toe program (part 2 of 4).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
775 Networking Chapter 20
91 self.players = []
92
93 for i in range( 2 ):
94 self.server.listen( 1 )
95 connection, address = self.server.accept()
96 self.players.append( Player( connection, self, i ) )
97 self.players[ -1 ].start()
98
99 # players are suspended until player O connects
100 # resume players now
101 self.gameBeginEvent.set()
102
103 def display( self, message ):
104 "Display a message on the server"
105
106 print message
107
108 def validMove( self, location, player ):
109 "Determine if a move is valid--if so, make move"
110
111 # only one move can be made at a time
112 self.turnCondition.acquire()
113
114 while player != self.currentPlayer:
115 self.turnCondition.wait()
116
117 if not self.isOccupied( location ):
118
119 if self.currentPlayer == 0:
120 self.board[ location ] = "X"
121 else:
122 self.board[ location ] = "O"
123
124 self.currentPlayer = ( self.currentPlayer + 1 ) % 2
125 self.players[ self.currentPlayer ].otherPlayerMoved(
126 location )
127 self.turnCondition.notify()
128 self.turnCondition.release()
129 return 1
130 else:
131 self.turnCondition.notify()
132 self.turnCondition.release()
133 return 0
134
135 def isOccupied( self, location ):
136 "Determine if a space is occupied"
137
138 return self.board[ location ] # an empty space is None
139
140 def gameOver( self ):
141 "Determine if the game is over"
142
143 # place code here testing for a game winner
144 # left as an exercise for the reader
Fig. 20.6 Server side of client/server Tic-Tac-Toe program (part 3 of 4).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
776 Networking Chapter 20
145 return 0
146
147 def main():
148 TicTacToeServer().execute()
149
150 if __name__ == "__main__":
151 main()
We begin with a discussion of the server side of the Tic-Tac-Toe game (Fig. 20.6).
Line 148 instantiates a TicTacToeServer object and invokes its execute method.
The TicTacToeServer constructor (lines 68–86) creates data member current-
Player and condition variable turnCondition. The server uses these members to
restrict access to method validMove—ensuring that only the current player can make a
move. Line 77 creates gameBeginEvent—a threading.Event object used to syn-
chronize the start of the game. Lines 79–80 then initialize the Tic-Tac-Toe board—a list of
length 9. Note that each location of the board is initialized to None, indicating that the
space is not yet occupied by either player. Locations are maintained as numbers from 0 to
8 (0 through 2 for the first row, 3 through 5 for the second row and 6 through 8 for the third
row). Lines 83–86 prepare the socket on which the server listens for player connections
and then display a message that the server is now ready.
Method execute (lines 88–101) loops twice, waiting each time for a connection
from a client. When the server receives a connection, the server creates a new Player
instance (lines 8–63) to manage the connection as a separate thread. The Player con-
structor (lines 11–23) takes as arguments the socket instance representing the connection
to the client, the TicTacToeServer instance and a number indicating what player it is—
X or O. Line 14 initializes the thread.
After the server creates each Player (line 96), the server invokes that instance’s
start method (line 97). The Player’s run method (lines 31–63) controls the informa-
tion that is sent to and received from the client. First, the method passes to the client the
character that the client places on the board when a move is made, then the method tells the
client that a connection has been made (lines 35–36). Lines 39–43 then cause player X to
block until the game can begin (i.e., player O has joined). Lines 44–46 similarly cause
player O to block until the server begins the game. When both players have joined the
game, the server starts the game by calling Event method set (line 101).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
777 Networking Chapter 20
At this point, each Player’s run method executes its while loop (lines 49–59).
Each iteration of this while loop receives a string representing the location where the
client wants to place a mark and invokes TicTacToeServer method validMove to
check the move. Lines 57 and 59 send a message to the client indicating whether or not the
move was valid. The game continues until TicTacToeServer method gameOver
(lines 140–145) indicates that the game is over. Lines 61–63 then close the connection to
the client and display a message on the server.
Method validMove (lines 108–133 in class TicTacToeServer) uses condition
variable methods acquire and release to allow only one move to be attempted at a
time. This prevents both players from modifying the state information of the game simul-
taneously. If the Player attempting to validate a move is not the current player (i.e., the
one allowed to make a move), the Player is placed in a wait state until it is that player’s
turn to move. If the position for the move being validated is already occupied on the board,
the method returns 0. Otherwise, the server places a mark for the player in its local repre-
sentation of the board, updates variable currentPlayer, calls Player method oth-
erPlayerMoved (lines 25–29) so the client can be notified, invokes the notify
method so the waiting Player (if there is one) can validate a move and returns 1 to indi-
cate that the move is valid (lines 124–129).
When a TicTacToeClient (Fig. 20.7) begins execution, it creates a Pmw
ScrolledText that displays messages from the server and creates a representation of the
board using nine Tkinter Buttons. Class TicTacToeClient inherits from class
threading.Thread so that a separate thread can be used to continually read messages
that are sent from the server to the client. The script’s run method (lines 54–82) opens a
connection to the server. After the client establishes a connection to the server, the method
reads the mark character (X or O) from the server (line 65), initializes attribute myTurn to
0 (line 68) and loops continually to read messages from the server (lines 71–77). The mes-
sages are passed to the script’s processMessage method for processing. When the
game is over (i.e., the server closes the connection), lines 79–82 close the connection and
display a message to the user.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
778 Networking Chapter 20
18 Frame.__init__( self )
19 Pmw.initialise()
20 self.pack( expand = YES, fill = BOTH )
21 self.master.title( "Tic-Tac-Toe Client" )
22 self.master.geometry( "250x325" )
23
24 self.id = Label( self, anchor = W )
25 self.id.grid( columnspan = 3, sticky = W+E+N+S )
26
27 self.board = []
28
29 # create and add all buttons to the board
30 for i in range( 9 ):
31 newButton = Button( self, font = "Courier 20 bold",
32 height = 1, width = 1, relief = GROOVE,
33 name = str( i ) )
34 newButton.bind( "<Button-1>", self.sendClickedSquare )
35 self.board.append( newButton )
36
37 current = 0
38
39 # display all buttons in 3x3 grid
40 for i in range( 1, 4 ):
41
42 for j in range( 3 ):
43 self.board[ current ].grid( row = i, column = j,
44 sticky = W+E+N+S )
45 current += 1
46
47 # area for server messages
48 self.display = Pmw.ScrolledText( self, text_height = 10,
49 text_width = 35, vscrollmode = "static" )
50 self.display.grid( row = 4, columnspan = 3 )
51
52 self.start() # run thread
53
54 def run( self ):
55 "Control thread that allows continuous update of the display"
56
57 # setup connection to server
58 HOST = "127.0.0.1"
59 PORT = 5000
60 self.connection = socket.socket( socket.AF_INET,
61 socket.SOCK_STREAM )
62 self.connection.connect( ( HOST, PORT ) )
63
64 # first get player’s mark ( X or O )
65 self.myMark = self.connection.recv( 2 )
66 self.id.config( text = 'You are player "%s"' % self.myMark )
67
68 self.myTurn = 0
69
70 # receive messages sent to client
71 while 1:
Fig. 20.7 Client side of a client/server Tic-Tac-Toe program (part 2 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
779 Networking Chapter 20
72 message = self.connection.recv( 34 )
73
74 if not message:
75 break
76
77 self.processMessage( message )
78
79 self.connection.close()
80 self.display.insert( END, "Game over.\n" )
81 self.display.insert( END, "Connection closed.\n" )
82 self.display.yview( END )
83
84 def processMessage( self, message ):
85 "Interpret server message and perform necessary actions"
86
87 if message == "Valid move.":
88 self.display.insert( END, "Valid move, please wait.\n" )
89 self.display.yview( END )
90 self.board[ self.currentSquare ].config(
91 text = self.myMark, bg = "white" )
92 elif message == "Invalid move, try again.":
93 self.display.insert( END, message + "\n" )
94 self.display.yview( END )
95 self.myTurn = 1
96 elif message == "Opponent moved.":
97 location = int( self.connection.recv( 2 ) )
98
99 if self.myMark == "X":
100 self.board[ location ].config( text = "O",
101 bg = "gray" )
102 else:
103 self.board[ location ].config( text = "X",
104 bg = "gray" )
105
106 self.display.insert( END, message + " Your turn.\n" )
107 self.display.yview( END )
108 self.myTurn = 1
109 elif message == "Other player connected. Your move.":
110 self.display.insert( END, message + "\n" )
111 self.display.yview( END )
112 self.myTurn = 1
113 else:
114 self.display.insert( END, message + "\n" )
115 self.display.yview( END )
116
117 def sendClickedSquare( self, event ):
118 "Send attempted move to server"
119
120 if self.myTurn:
121 name = event.widget.winfo_name()
122 self.currentSquare = int( name )
123 self.connection.send( name )
124 self.myTurn = 0
125
Fig. 20.7 Client side of a client/server Tic-Tac-Toe program (part 3 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
780 Networking Chapter 20
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
781 Networking Chapter 20
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
782 Networking Chapter 20
initially connects (lines 42–43). If the client receives any other message, the client simply
displays the message.
When the player clicks a space on the board (a Tkinter Button), method send-
ClickedSquare is invoked. Method sendClickedSquare (lines 117–124) first
tests whether it is the player’s turn. If so, line 121 obtains the name of the button pressed
by invoking Widget method winfo_name and stores the value in variable name. Lines
122–124 then update attribute currentSquare, send the move to the server and set
attribute myTurn to 0, so that the player cannot make another move until it has received
feedback from the server.
SUMMARY
[***To be done for second round of review***]
TERMINOLOGY
[***To be done for second round of review***]
SELF-REVIEW EXERCISES
EXERCISES
[***To be done for second round of review***]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
Chapter 20 Networking 783
Notes to Reviewers:
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send e-mails with detailed, line-by-line comments; mark these directly on the paper
pages.
• Please feel free to send any lengthy additional comments by e-mail to
[email protected].
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copy edited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are mostly concerned with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing or coding style on a
global scale. Please send us a short e-mail if you would like to make a suggestion.
• If you find something incorrect, please show us how to correct it.
• In the later round(s) of review, please read all the back matter, including the exercises and any so-
lutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
Index 1
socket 760
socket 763, 776
socket close 768
socket module 763
socket.error 766
socket-based communications 760
start method 776
stream socket 760, 766, 773
streams 760
streams-based transmission 770
system service 764
T
TCP (Transmission Control
Protocol) 760
telephone system 770
the server portion of a stream
socket connection between a
client and a server 766
Thread class 777
threading.Event class 776
threading.Thread class 777
Tic-Tac-Toe 773
TicTacToeClient 773, 777
TicTacToeServer 773
Tkinter module 777
U
UDP 760
Uniform (or Universal) Resource
Locators 761
Universal Resource Locators 761
URL 763
URL (uniform resource locator)
761
urllib module 763
urlopen method 763
urlparse method 763
urlparse module 763
User Datagram Protocol 760
W
wait for a new connection 767
wait state 777
waiting for a client to connect 764
Web server 764
Widget class 782
winfo_name method 782
World Wide Web browser 760
World Wide Web server 760
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/17/01
pythonhtp1_21.fm Page 777 Wednesday, August 29, 2001 4:16 PM
21
Security
Objectives
• To understand the basic concepts of security.
• To understand public-key/private-key cryptography.
• To learn about popular security protocols, such as
SSL.
• To understand digital signatures, digital certificates,
certificate authorities and public-key infrastructure.
• To understand Python programming security issues.
• To learn to write restricted Python code.
• To become aware of various threats to secure systems.
Three may keep a secret, if two of them are dead.
Benjamin Franklin
Attack—Repeat—Attack.
William Frederick Halsey, Jr.
Private information is practically the source of every large
modern fortune.
Oscar Wilde
There must be security for all—or not one is safe.
The Day the Earth Stood Still, screenplay by Edmund H.
North
No government can be long secure without formidable
opposition.
Benjamin Disraeli
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 778 Wednesday, August 29, 2001 4:16 PM
Outline
21.1 Introduction
21.2 Ancient Ciphers to Modern Cryptosystems
21.3 Secret-key Cryptography
21.4 Public-key Cryptography
21.5 Cryptanalysis
21.6 Key Agreement Protocols
21.7 Key Management
21.8 Digital Signatures
21.9 Public-key Infrastructure, Certificates and Certificate Authorities
21.9.1 Smart Cards
21.10 Security Protocols
21.10.1 Secure Sockets Layer (SSL)
21.10.2 IPSec and Virtual Private Networks (VPN)
21.11 Authentication
21.11.1 Kerberos
21.11.2 Biometrics
21.11.3 Single Sign-On
21.11.4 Microsoft® Passport
21.12 Security Attacks
21.12.1 Denial-of-Service (DoS) Attacks
21.12.2 Viruses and Worms
21.12.3 Software Exploitation, Web Defacing and Cybercrime
21.13 Running Resticted Python Code
21.13.1 Module rexec
21.13.2 Module Bastion
21.13.3 Web browser example
21.14 Network Security
21.14.1 Firewalls
21.14.2 Intrusion Detection Systems
21.15 Steganography
21.16 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises •
Works Cited • Recommended Reading
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 779 Wednesday, August 29, 2001 4:16 PM
21.1 Introduction
The explosion of e-business is forcing companies and consumers to focus on Internet and
network security. Consumers are buying products, trading stocks and banking online. They
are submitting their credit-card numbers, social-security numbers and other confidential in-
formation to vendors through Web sites. Businesses are sending confidential information
to clients and vendors using the Internet. At the same time, an increasing number of security
attacks are taking place on e-businesses, and companies and customers are vulnerable to
these attacks. Data theft and hacker attacks can corrupt files and even shut down businesses.
Preventing or protecting against such attacks is crucial to the success of e-business. In this
chapter, we explore Internet security, including securing electronic transactions and net-
works. We discuss how a Python programmer can secure programming code. We also ex-
amine the fundamentals of secure business and how to secure e-commerce transactions
using current technologies.
e-Fact 21.1
According to a study by International Data Corporation (IDC), organizations spent $6.2 bil-
lion on security consulting in 1999, and IDC expects the market to reach $14.8 billion by
2003.1 21.1
Modern computer security addresses the problems and concerns of protecting elec-
tronic communications and maintaining network security. There are four fundamental
requirements for a successful, secure transaction: privacy, integrity, authentication and
non-repudiation. The privacy issue is: How do you ensure that the information you transmit
over the Internet has not been captured or passed on to a third party without your knowl-
edge? The integrity issue is: How do you ensure that the information you send or receive
has not been compromised or altered? The authentication issue is: How do the sender and
receiver of a message prove their identities to each other? The nonrepudiation issue is: How
do you legally prove that a message was sent or received?
In addition to these requirements, network security addresses the issue of availability:
How do we ensure that the network and the computer systems to which it connects will stay
in continuous operation?
Python applications potentially can access files on the local computer on which the
code is run. This chapter explains how a programmer can write secure, restricted environ-
ment Python code.
e-Fact 21.2
According to Forrester Research, it is predicted that organizations will spend 55% more on
security in 2002 than they spent in 2000.2 21.2
We encourage you to visit the Web resources provided in Section 21.16 to learn more
about the latest developments in e-business security. These resources include many infor-
mative and entertaining demos.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 780 Wednesday, August 29, 2001 4:16 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 781 Wednesday, August 29, 2001 4:16 PM
er encrypts a message using the secret key, then sends the encrypted message to the intend-
ed recipient. A fundamental problem with secret-key cryptography is that before two
people can communicate securely, they must find a secure way to exchange the secret key.
One approach is to have the key delivered by a courier, such as a mail service or FedEx.
While this approach may be feasible when two individuals communicate, it is not efficient
for securing communication in a large network, nor can it be considered completely secure.
The privacy and the integrity of the message would be compromised if the key is intercept-
ed as it is passed between the sender and the receiver over unsecure channels. Also, since
both parties in the transaction use the same key to encrypt and decrypt a message, one can-
not authenticate which party created the message. Finally, to keep communications private
with each receiver, a sender needs a different secret key for each receiver. As a result, or-
ganizations would have huge numbers of secret keys to maintain.
communications
medium (such as
Buy 100 shares encrypt XY%#? Internet)
of company X 42%Y
Same
symmetric
secret key
Receiver Plaintext
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 782 Wednesday, August 29, 2001 4:16 PM
customer have the session key for the transaction they can communicate with each other,
encrypting their messages using the shared session key.
3 3
Session key
(symmetric
secret key)
Session key Session key
encrypted with encrypt encrypt encrypted with
the sender's the receiver's
KDC Key KDC key
Using a key distribution center reduces the number of courier deliveries (again, by
means such as mail or FedEx) of secret keys to each user in the network. In addition, users
can have a new secret key for each communication with other users in the network, which
greatly increases the overall security of the network. However, if the security of the key dis-
tribution center is compromised, then the security of the entire network is compromised.
One of the most commonly used symmetric encryption algorithms is the Data Encryp-
tion Standard (DES). Horst Feistel of IBM created the Lucifer algorithm, which was chosen
as the DES by the United States government and the National Security Agency (NSA) in
the 1970s.4 DES has a key length of 56 bits and encrypts data in 64-bit blocks. This type of
encryption is known as a block cipher. A block cipher is an encryption method that creates
groups of bits from an original message, then applies an encryption algorithm to the block
as a whole, rather than as individual bits. This method reduces the amount of computer pro-
cessing power and time required, while maintaining a fair level of security. For many years,
DES was the encryption standard set by the U.S. government and the American National
Standards Institute (ANSI). However, due to advances in technology and computing speed,
DES is no longer considered secure. In the late 1990s, specialized DES cracker machines
were built that recovered DES keys after just several hours.5 As a result, the old standard
of symmetric encryption has been replaced by Triple DES, or 3DES, a variant of DES that
is essentially three DES systems in a row, each with its own secret key. Though 3DES is
more secure, the three passes through the DES algorithm result in slower performance. The
United States government recently selected a new, more secure standard for symmetric
encryption to replace DES. The new standard is called the Advanced Encryption Standard
(AES). The National Institute of Standards and Technology (NIST), which sets the crypto-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 783 Wednesday, August 29, 2001 4:16 PM
graphic standards for the U.S. government, is evaluating Rijndael as the encryption method
for AES. Rijndael is a block cipher developed by Dr. Joan Daemen and Dr. Vincent Rijmen
of Belgium. Rijndael can be used with key sizes and block sizes of 128, 192 or 256 bits.
Rijndael was chosen over four other finalists as the AES candidate because of its high secu-
rity, performance, efficiency, flexibility and low memory requirement for computing sys-
tems.6 For more information about AES, visit csrc.nist.gov/encryption/aes.
communications
medium (such as
Buy 100 shares encrypt XY%#? Internet)
of company X 42%Y
Receiver's
private key
Receiver Plaintext
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 784 Wednesday, August 29, 2001 4:16 PM
Either the public key or the private key can be used to encrypt or decrypt a message.
For example, if a customer uses a merchant’s public key to encrypt a message, only the
merchant can decrypt the message, using the merchant’s private key. Thus, the merchant’s
identity can be authenticated, since only the merchant knows the private key. However, the
merchant has no way of validating the customer’s identity, since the encryption key the cus-
tomer used is publicly available.
If the decryption key is the sender’s public key and the encryption key is the sender’s
private key, the sender of the message can be authenticated. For example, suppose a cus-
tomer sends a merchant a message encrypted using the customer’s private key. The mer-
chant decrypts the message using the customer’s public key. Since the customer encrypted
the message using his or her private key, the merchant can be confident of the customer’s
identity. This process authenticates the sender, but does not ensure confidentiality, as
anyone could decrypt the message with the sender’s public key. This systems works as long
as the merchant can be sure that the public key with which the merchant decrypted the mes-
sage belongs to the customer, and not a third party posing as the customer.
These two methods of public-key encryption can actually be used together to authen-
ticate both participants in a communication (Fig. 21.4). Suppose a merchant wants to send
a message securely to a customer so that only the customer can read it, and suppose also
that the merchant wants to provide proof to the customer that the merchant (not an unknown
third party) actually sent the message. First, the merchant encrypts the message using the
customer's public key. This step guarantees that only the customer can read the message.
Then the merchant encrypts the result using the merchant’s private key, which proves the
identity of the merchant. The customer decrypts the message in reverse order. First, the cus-
tomer uses the merchant’s public key. Since only the merchant could have encrypted the
message with the inversely related private key, this step authenticates the merchant. Then
the customer uses the customer’s private key to decrypt the next level of encryption. This
step ensures that the content of the message was kept private in the transmission, since only
the customer has the key to decrypt the message. Although this system provides extremely
secure transactions, the setup cost and time required prevent widespread use.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 785 Wednesday, August 29, 2001 4:16 PM
WVF%B#
X2?%Y
Signed ciphertext
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 786 Wednesday, August 29, 2001 4:16 PM
21.5 Cryptanalysis
Even if keys are kept secret, it may be possible to compromise the security of a system. Try-
ing to decrypt ciphertext without knowledge of the decryption key is known as cryptanal-
ysis. Commercial encryption systems are constantly being researched by cryptologists to
ensure that the systems are not vulnerable to a cryptanalytic attack. The most common form
of cryptanalytic attacks are those in which the encryption algorithm is analyzed to find re-
lations between bits of the encryption key and bits of the ciphertext. Often, these relations
are only statistical in nature and incorporate an analyzer’s outside knowledge about the
plaintext. The goal of such an attack is to determine the key from the ciphertext.
Weak statistical trends between ciphertext and keys can be exploited to gain knowl-
edge about the key if enough ciphertext is known. Proper key management and expiration
dates on keys help prevent cryptanalytic attacks. When a key is used for long periods of
time, more ciphertext is generated that can be beneficial to an attacker trying to derive a
key. If a key is unknowingly recovered by an attacker, it can be used to decrypt every mes-
sage for the life of that key. Using public-key cryptography to exchange secret keys
securely allows a new secret key to encrypt every message.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 787 Wednesday, August 29, 2001 4:16 PM
Sender
Digital
envelope
2
encrypt
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 788 Wednesday, August 29, 2001 4:16 PM
To create a digital signature, a sender first takes the original plaintext message and runs
it through a hash function, which is a mathematical calculation that gives the message a
hash value. A one-way hashing function generates a string of characters that is unique to
the input file. The Secure Hash Algorithm (SHA-1) is the current standard for hashing func-
tions. In using SHA-1, the phrase “Buy 100 shares of company X” would produce the hash
value D8 A9 B6 9F 72 65 0B D5 6D 0C 47 00 95 0D FD 31 96 0A FD B5. MD5 is another
popular hash function, which was developed by Ronald Rivest to verify data integrity
through a 128-bit hash value of the input file.10 [***<userpages.umbc.edu/
~mabzug1/cs/md5/md5.html>***] The following interactive session demonstrates
the ways to get the MD5 hash of the same phrase in Python.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 789 Wednesday, August 29, 2001 4:16 PM
Digital signatures do not provide proof that a message has been sent. Consider the fol-
lowing situation: A contractor sends a company a digitally signed contract, which the con-
tractor later would like to revoke. The contractor could do so by releasing the private key
and then claiming that the digitally signed contract came from an intruder who stole the
contractor’s private key. Timestamping, which binds a time and date to a digital document,
can help solve the problem of non-repudiation. For example, suppose the company and the
contractor are negotiating a contract. The company requires the contractor to sign the con-
tract digitally and then have the document digitally timestamped by a third party called a
timestamping agency. The contractor sends the digitally-signed contract to the time-
stamping agency. The privacy of the message is maintained since the timestamping agency
sees only the encrypted, digitally-signed message (as opposed to the original plaintext mes-
sage). The timestamping agency affixes the time and date of receipt to the encrypted, signed
message and digitally signs the whole package with the timestamping agency’s private key.
The timestamp cannot be altered by anyone except the timestamping agency, since no one
else possesses the timestamping agency's private key. Unless the contractor reports the pri-
vate key to have been compromised before the document was timestamped, the contractor
cannot legally prove that the document was signed by an unauthorized third party. The
sender could also require the receiver to sign the message digitally and timestamp it as
proof of receipt. To learn more about timestamping, visit AuthentiDate.com.
The U.S. government’s digital-authentication standard is called the Digital Signature
Algorithm (DSA). The U.S. government recently passed digital-signature legislation that
makes digital signatures as legally binding as handwritten signatures. This legislation is
expected to increase e-business dramatically. For the latest news about U.S. government
legislation in information security, visit www.itaa.org/infosec. For more informa-
tion about the bills, visit the following government sites:
thomas.loc.gov/cgi-bin/bdquery/z?d106:hr.01714:
thomas.loc.gov/cgi-bin/bdquery/z?d106:s.00761:
A digital certificate is a digital document used to identify a user and issued by a certif-
icate authority (CA). A digital certificate includes the name of the subject (the company or
individual being certified), the subject’s public key, a serial number, an expiration date, the
signature of the trusted certificate authority and any other relevant information (Fig. 21.6).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 790 Wednesday, August 29, 2001 4:16 PM
A CA is a financial institution or other trusted third party, such as VeriSign. Once issued,
the digital certificates are publicly available and are held by the certificate authority in cer-
tificate repositories.
The CA signs the certificate by encrypting either the subject’s public key or a hash
value of the public key using the CA’s own private key. The CA has to verify every sub-
ject’s public key. Thus, users must trust the public key of a CA. Usually, each CA is part
of a certificate authority hierarchy. This hierarchy is similar to a chain of trust in which
each link relies on another link to provide authentication information. A certificate
authority hierarchy is a chain of certificate authorities, starting with the root certificate
authority, which is the Internet Policy Registration Authority (IPRA). The IPRA signs cer-
tificates using the root key. The root key signs certificates only for policy creation author-
ities, which are organizations that set policies for obtaining digital certificates. In turn,
policy creation authorities sign digital certificates for CAs. CAs then sign digital certifi-
cates for individuals and organizations. The CA takes responsibility for authentication, so
it must check information carefully before issuing a digital certificate. In one case, human
error caused VeriSign to issue two digital certificates to an imposter posing as a Microsoft
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 791 Wednesday, August 29, 2001 4:16 PM
employee.12 Such an error is significant; the inappropriately issued certificates can cause
users to download malicious code unknowingly onto their machines (see Authentication:
Microsoft Authenticode feature).
VeriSign, Inc., is a leading certificate authority. For more information about VeriSign,
visit www.verisign.com. For a listing of other digital-certificate vendors, please see
Section 21.16.
e-Fact 21.4
It can take a year and cost from $5 million to $10 million for a financial firm to build a digital
certificate infrastructure, according to Identrus, a consortium of global financial companies
that is providing a framework for trusted business-to-business e-commerce.13 21.4
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 792 Wednesday, August 29, 2001 4:16 PM
To obtain a digital certificate for your personal e-mail messages, visit www.veri-
sign.com or www.thawte.com. VeriSign offers a free 60-day trial, or you can pur-
chase the service for a yearly fee. Thawte offers free digital certificates for personal e-mail.
Web server certificates may also be purchased through VeriSign and Thawte; however,
they are more expensive than e-mail certificates.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 793 Wednesday, August 29, 2001 4:16 PM
device and the card is necessary. The alternative to this method is a contactless interface,
in which data is transferred to a reader via an embedded wireless device in the card, without
the card and the device having to make physical contact.15
Smart cards store private keys, digital certificates and other information necessary for
implementing PKI. They may also store credit card numbers, personal contact information,
etc. Each smart card is used in combination with a personal identification number (PIN).
This application provides two levels of security by requiring the user to both possess a
smart card and know the corresponding PIN to access the information stored on the card.
As an added measure of security, some microprocessor cards will delete or corrupt stored
data if malicious attempts at tampering with the card occur. Smart card PKI is stored por-
table, allowing users to access information from multiple devices using the same smart
card.
e-Fact 21.5
According to Dataquest, use of smart cards is growing 30% per year, and it is expected that
3.4 billion smarts cards will be in used worldwide in 2001. 16 21.5
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 794 Wednesday, August 29, 2001 4:16 PM
the packets have arrived, puts them in sequential order and determines if the packets have
arrived without alteration. If the packets have been accidentally altered or any data has been
lost, TCP requests retransmission. However, TCP is not sophisticated enough to determine
if packets have been maliciously altered during transmission, as malicious packets can be
disguised as valid ones. When all of the data successfully reaches TCP/IP, the message is
passed to the socket at the receiver end. The socket translates the message back into a form
that can be read by the receiver’s application.19 In a transaction using SSL, the sockets are
secured using public-key cryptography.
SSL implements public-key technology using the RSA algorithm and digital certifi-
cates to authenticate the server in a transaction and to protect private information as it
passes from one party to another over the Internet. SSL transactions do not require client
authentication; many servers consider a valid credit-card number to be sufficient for
authentication in secure purchases. To begin, a client sends a message to a server. The
server responds and sends its digital certificate to the client for authentication. Using
public-key cryptography to communicate securely, the client and server negotiate session
keys to continue the transaction. Session keys are secret keys that are used for the duration
of that transaction. Once the keys are established, the communication proceeds between the
client and the server by using the session keys and digital certificates. Encrypted data is
passed through TCP/IP, just as regular packets travel over the Internet. However, before
sending a message with TCP/IP, the SSL protocol breaks the information into blocks, com-
presses it and encrypts it. Conversely, after the data reaches the receiver through TCP/IP,
the SSL protocol decrypts the packets, then decompresses and assembles the data. These
extra processes provide an extra layer of security between TCP/IP and applications. SSL is
primarily used to secure point-to-point connections—transmissions of data from one com-
puter to another.20 SSL allows for the authentication of the server, the client, both or nei-
ther; in most Internet SSL sessions, only the server is authenticated. The Transport Layer
Security (TLS) protocol, designed by the Internet Engineering Task Force, is similar to
SSL. For more information on TLS, visit: www.ietf.org/rfc/rfc2246.txt.
Although SSL protects information as it is passed over the Internet, it does not protect
private information, such as credit-card numbers, once the information is stored on the mer-
chant’s server. When a merchant receives credit-card information with an order, the infor-
mation is often decrypted and stored on the merchant’s server until the order is placed. If
the server is not secure and the data is not encrypted, an unauthorized party can access the
information. Hardware devices, such as peripheral component interconnect (PCI) cards
designed for use in SSL transactions, can be installed on Web servers to process SSL trans-
actions, thus reducing processing time and leaving the server free to perform other tasks.21
Visit www.sonicwall.com/products/trans.asp for more information on these
devices. For more information about the SSL protocol, check out the Netscape SSL tutorial
at developer.netscape.com/tech/security/ssl/protocol.html and
the Netscape Security Center site at www.netscape.com/security/index.html.
frastructure of the Internet—the publicly available wires—to create Virtual Private Net-
works (VPNs), linking multiple networks, wireless users and other remote users. VPNs use
the Internet infrastructure that is already in place, therefore they are more economical than
private networks such as WANs.22 The encryption allows for VPNs to provide the same
services as private networks over a public network.
A VPN is created by establishing a secure tunnel through which data passes between
multiple networks over the Internet. IPSec (Internet Protocol Security) is one of the tech-
nologies used to secure the tunnel through which the data passes, ensuring the privacy and
integrity of the data, as well authenticating the users.23 IPSec, developed by the Internet
Engineering Task Force (IETF), uses public-key and symmetric key cryptography to
ensure authentication of the users, data integrity and confidentiality. The technology takes
advantage of the standard that is already in place, in which information travels between two
networks over the Internet via the Internet Protocol (IP). Information sent using IP, how-
ever, can easily be intercepted. Unauthorized users can access the network by using a
number of well-known techniques, such as IP spoofing—a method in which an attacker
simulates the IP of an authorized user or host to get access to resources that would other-
wise be off-limits. The SSL protocol enables secure, point-to-point connections between
two applications; IPSec enables the secure connection of an entire network. The Diffie-
Hellman and RSA algorithms are commonly used in the IPSec protocol for key exchange,
and DES or 3DES are used for secret-key encryption (depending on system and encryption
needs). An IP packet is encrypted, then sent inside a regular IP packet that creates the
tunnel. The receiver discards the outer IP packet, then decrypts the inner IP packet.24 VPN
security relies on three concepts—authentication of the user, encryption of the data sent
over the network and controlled access to corporate information.25 To address these three
security concepts, IPSec is composed of three pieces. The Authentication Header (AH)
attaches additional information to each packet, which verifies the identity of the sender and
proves that data was not modified in transit. The Encapsulating Security Payload (ESP)
encrypts the data using symmetric key ciphers to protect the data from eavesdroppers while
the IP packet is being sent from one computer to another. The Internet Key Exchange (IKE)
is the key-exchange protocol used in IPSec to determine security restrictions and to authen-
ticate the encryption keys.
VPNs are becoming increasingly popular in businesses. However, VPN security is dif-
ficult to manage. To establish a VPN, all of the users on the network must have similar soft-
ware or hardware. Although it is convenient for a business partner to connect to another
company’s network via VPN, access to specific applications and files should be limited to
certain authorized users versus all users on a VPN.26 Firewalls, intrusion detection software
and authorization tools can be used to secure valuable data (Section 21.14).
For more information about IPSec, visit the IPSec Developers Forum at www.ip-
sec.com. Also, check out the Web site for the IPSec Working Group of the IETF at
www.ietf.org/html.charters/ipsec-charter.html.
21.11 Authentication
As we discussed throughout the chapter, authentication is one of the fundamental require-
ments for e-business and m-business security. In this section, we will discuss some of the
technologies used to authenticate users in a network, such as Kerberos, biometrics and sin-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 796 Wednesday, August 29, 2001 4:16 PM
gle sign-on. We conclue the section with a discussion of Microsoft Passport—a technology
that combines several methods of authentication.
21.11.1 Kerberos
Firewalls do not protect users from internal security threats to their local area network. In-
ternal attacks are common and can be extremely damaging. For example, disgruntled em-
ployees with network access can wreak havoc on an organization’s network or steal
valuable proprietary information. It is estimated that 70 percent to 90 percent of attacks on
corporate networks are internal.27 Kerberos is a freely available, open-source protocol de-
veloped at MIT. It employs secret-key cryptography to authenticate users in a network and
to maintain the integrity and privacy of network communications.
Authentication in a Kerberos system is handled by a main Kerberos system and a sec-
ondary Ticket Granting Service (TGS). This system is similar to the key distribution centers
described in Section 23.3. The main Kerberos system authenticates a client’s identity to the
TGS; the TGS authenticates client’s rights to access specific network services.
Each client in the network shares a secret key with the Kerberos system. This secret
key may be used by multiple TGSs in the Kerberos system. The client starts by entering a
login name and password into the Kerberos authentication server. The authentication server
maintains a database of all clients in the network. The authentication server returns a
Ticket-Granting Ticket (TGT) encrypted with the client’s secret key that it shares with the
authentication server. Since the secret key is known only by the authentication server and
the client, only the client can decrypt the TGT, thus authenticating the client’s identity.
Next, the client’s system sends the decrypted TGT to the Ticket Granting Service to request
a service ticket. The service ticket authorizes the client’s access to specific network ser-
vices. Service tickets have a set expiration time. Tickets may be renewed by the TGS.
21.11.2 Biometrics
An innovation in security is likely to be biometrics. Biometrics uses unique personal infor-
mation, such as fingerprints, eyeball iris scans or face scans, to identify a user. This system
eliminates the need for passwords, which are much easier to steal. Have you ever written
down your passwords on a piece of paper and put the paper in your desk drawer or wallet?
These days, people have passwords and PIN codes for everything—Web sites, networks,
e-mail, ATM cards and even for their cars. Managing all of those codes can become a bur-
den. Recently, the cost of biometrics devices has dropped significantly. Keyboard-mounted
fingerprint scanning, face scanning and eye scanning devices are being used in place of
passwords to log into systems, check e-mail or access secure information over a network.
Each user’s iris scan, face scan or fingerprint is stored in a secure database. Each time a user
logs in, his or her scan is compared with the database. If a match is made, the login is suc-
cessful. Two companies that specialize in biometrics devices are IriScan
(www.iriscan.com) and Keytronic (www.keytronic.com). For additional resourc-
es, see Section 21.16.
Currently, passwords are the predominant means of authentication; however, we are
beginning to see a shift to smart cards and Biometrics. Microsoft recently announced that
it will include the Biometric Application Programming Interface (BAPI) in future versions
of Windows, which will make it possible for companies to integrate biometrics into their
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 797 Wednesday, August 29, 2001 4:16 PM
systems.28 Two-factor authentication uses two means to authenticate the user, such as bio-
metrics or a smart card used in combination with a password. Though this system could
potentially be compromised, using two methods of authentication is more secure than just
using passwords alone.
Keyware Inc. has already implemented a wireless biometrics system that stores user
voiceprints on a central server. Keyware also created layered biometric verification (LBV),
which uses multiple physical measurements—face, finger and voice prints—simulta-
neously. The LBV feature enables a wireless biometrics system to combine biometrics with
other authentication methods, such as PIN and PKI.29
Identix Inc. also provides biometrics authentication technology for wireless transac-
tions. The Identix fingerprint scanning device is embedded in handheld devices. The
Identix service offers transaction management and content protection services. Transac-
tion management services prove that transactions took place, and content protection ser-
vices control access to electronic documents, including limiting a user’s ability to
download or copy documents.30
Wireless biometrics is not widely used at this point. Fingerprint scanners must be
accompanied by fingerprint readers installed in mobile devices. Wireless device manufac-
turers are hesitant to build in fingerprint readers because the technology is expensive. Lap-
tops have begun to accommodate biometric security, but cell phones are slower to advance
due to limited memory and processing power.31
One of the major concerns with biometrics is the issue of privacy. Implementing fin-
gerprint scanners means that organizations will be keeping databases with each employee’s
fingerprint. Do people want to provide their employers with such personal information?
What if that data is compromised? To date, most organizations that have implemented bio-
metrics systems have received little, if any, resistance from employees.
The logon for creating the token is secured with encryption or with a single password,
which is the only password the user needs to remember or change. The only problem with
token authentication is that all applications must be built to accept tokens instead of tradi-
tional logon passwords.32
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 799 Wednesday, August 29, 2001 4:16 PM
There are many classes of computer viruses. A transient virus attaches itself to a spe-
cific computer program. The virus is activated when the program is run and deactivated
when the program is terminated. A more powerful type of virus is a resident virus, which,
once loaded into the memory of a computer, operates for the duration of the computer's use.
Another type of virus is the logic bomb, which triggers when a given condition is met, such
as a time bomb that is activated when the clock on the computer matches a certain time or
date.
A Trojan horse is a malicious program that hides within a friendly program or simu-
lates the identity of a legitimate program or feature, while actually causing damage to the
computer or network in the background. The Trojan horse gets its name from the story of
the Trojan War in Greek history. In this story, Greek warriors hid inside a wooden horse,
which the Trojans took within the walls of the city of Troy. When night fell and the Trojans
were asleep, the Greek warriors came out of the horse and opened the gates to the city, let-
ting the Greek army enter the gates and destroy the city of Troy. Trojan horse programs can
be particularly difficult to detect, since they appear to be legitimate and useful applications.
Also commonly associated with Trojan horses are backdoor programs, which are usually
resident viruses that give the sender complete, undetected access to the victim’s computer
resources. These types of viruses are especially threatening to the victim, as they can be set
up to log every keystroke (capturing all passwords, credit card numbers, etc.) No matter
how secure the connection between a PC supplying private information and the server
receiving the information, if a backdoor program is running on a computer, the data is inter-
cepted before any encryption is implemented. In June 2000, news spread of a Trojan horse
virus disguised as a video clip sent as an e-mail attachment. The Trojan horse virus was
designed to give the attacker access to infected computers, potentially to launch a denial-
of-service attack against Web sites.36
Two of the most famous viruses to date are Melissa, which struck in March 1999, and
the ILOVEYOU virus that hit in May 2000. Both viruses cost organizations and individuals
billions of dollars. The Melissa virus spread in Microsoft Word documents sent via e-mail.
When the document was opened, the virus was triggered. Melissa accessed the Microsoft
Outlook address book on that computer and automatically sent the infected Word attach-
ment by e-mail to the first 50 people in the address book. Each time another person opened
the attachment, the virus would send out another 50 messages. Once in a system, the virus
infected any subsequently saved files.
The ILOVEYOU virus was sent as an attachment to an e-mail posing as a love letter.
The message in the e-mail said “Kindly check the attached love letter coming from me.”
Once opened, the virus accessed the Microsoft Outlook address book and sent out messages
to the addresses listed, helping to spread the virus rapidly worldwide. The virus corrupted
all types of files, including system files. Networks at companies and government organiza-
tions worldwide were shut down for days trying to remedy the problem and contain the
virus. This virus accentuated the importance of scanning file attachments for security
threats before opening them.
e-Fact 21.7
Estimates for damage caused by the ILOVEYOU virus were as high as $10 billion to $15 bil-
lion, with the majority of the damage done in just a few hours. 21.7
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 801 Wednesday, August 29, 2001 4:16 PM
Why do these viruses spread so quickly? One reason is that many people are too
willing to open executable files from unknown sources. Have you ever opened an audio clip
or video clip from a friend? Have you ever forwarded that clip to other friends? Do you
know who created the clip and if any viruses are embedded in it? Did you open the ILOVE
YOU file to see what the love letter said?
Most antivirus software is reactive, going after viruses once they are discovered, rather
than protecting against unknown viruses. New antivirus software, such as Finjan Soft-
ware’s SurfinGuard® (www.finjan.com), looks for executable files attached to e-mail
and runs the executables in a secure area to test if they attempt to access and harm files. For
more information about antivirus software, see the McAfee.com: Antivirus Utilities fea-
ture.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 802 Wednesday, August 29, 2001 4:16 PM
The rise in cybercrimes has prompted the U. S. government to take action. Under the
National Information Infrastructure Protection Act of 1996, denial-of-service attacks and
distribution of viruses are federal crimes punishable by fines and jail time. For more infor-
mation about the U. S. government’s efforts against cybercrime or to read about recently
prosecuted cases, visit the U.S. Department of Justice Web site, at www.usdoj.gov/
criminal/cybercrime/compcrime.html. Also check out www.cyber-
crime.gov, a site maintained by the Criminal Division of the U. S. Department of Jus-
tice.
The CERT® (Computer Emergency Response Team) Coordination Center at Carnegie
Mellon University’s Software Engineering Institute responds to reports of viruses and
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 803 Wednesday, August 29, 2001 4:16 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 804 Wednesday, August 29, 2001 4:16 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 805 Wednesday, August 29, 2001 4:16 PM
95
96 if __name__ == "__main__":
97 main()
Line 34 creates an instance of class RExec. Line 35 gets the environment’s __main__
module of that environment. The instance defines an environment that contains a list of
accessible modules and built-in functions (e.g. raw_input or abs). It has its own en-
vironment, including a list of accessible modules and built in methods. Method
add_module adds a new module to the list of the modules allowed in the restricted envi-
ronment and returns a reference to that module. If the environment already permits access
to the module, method add_module simply returns a reference to the specified module.
Method add_module does not import the module into the restricted environment; the
method only modifies the list of modules that the restricted code may import.
Line 34 gets the reference to the dictionary __dict__ that contains the module-
global bindings for the restricted environment. A Bastion module wraps a Web browser
component and adds it to the module-global namespace of the restricted environment (line
39). The restricted code now may access and manipulate the Web browser component. By
wrapping the Web browser component with class Bastion, we allow the program to con-
trol how the restricted code accesses the browser. By default, code many not access a Bas-
tion instance’s data member or any methods that begin with the underscore (_) letter. The
code may access method that do not begin with the underscore character.
To demonstrate code execution, lines 41–49 add two methods to the WebBrowser.
Both setColor (lines 41–44) and _setColor (lines 46–49) set the foreground color of
the WebBrowser. By default, code may not access a Bastion-wrapped browser object’s
_setColor method.
The screenshots in Fig. 21.7 demonstrate the result of running the code in Fig. 21.8 and
Fig. 21.9. The first screenshot is the browser in its original state. The second screenshot is
the result of running the code in Fig. 21.8. The browser has changed its background color
to blue. The final screenshot demonstrates what happens when the code in Fig. 21.7
attempts to change color using restricted _setColor.
1 browser.setColor( "blue" )
Fig. 21.8
1 browser._setColor( "red" )
Fig. 21.9
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 808 Wednesday, August 29, 2001 4:16 PM
21.14.1 Firewalls
A basic tool in network security is the firewall. The purpose of a firewall is to protect a local
area network (LAN) from intruders outside the network. For example, most companies
have internal networks that allow employees to share files and access company informa-
tion. Each LAN can be connected to the Internet through a gateway, which usually includes
a firewall. For years, one of the biggest threats to security came from employees inside the
firewall. Now that businesses rely heavily on access to the Internet, an increasing number
of security threats are originating outside the firewall—from the hundreds of millions of
people connected to the company network by the Internet.51 A firewall acts as a safety bar-
rier for data flowing into and out of the LAN. Firewalls can prohibit all data flow not ex-
pressly allowed, or can allow all data flow that is not expressly prohibited. The choice
between these two models is up to the network security administrator and should be based
on the need for security versus the need for functionality.
There are two main types of firewalls: packet-filtering firewalls and application-level
gateways. A packet-filtering firewall examines all data sent from outside the LAN and
rejects any data packets that have local network addresses. For example, if a hacker from
outside the network obtains the address of a computer inside the network and tries to sneak
a harmful data packet through the firewall, the packet-filtering firewall will reject the data
packet, since it has an internal address, but originated from outside the network. A problem
with packet-filtering firewalls is that they consider only the source of data packets; they do
not examine the actual data. As a result, malicious viruses can be installed on an authorized
user’s computer, giving the hacker access to the network without the authorized user’s
knowledge. The goal of an application-level gateway is to screen the actual data. If the mes-
sage is deemed safe, then the message is sent through to the intended receiver.
Using a firewall is probably the most effective and easiest way to add security to a
small network.52 Often, small companies or home users who are connected to the Internet
through permanent connections, such as DSL lines, do not employ strong security mea-
sures. As a result, their computers are prime targets for crackers to use in denial-of-service
attacks or to steal information. It is important for all computers connected to the Internet to
have some degree of security for their systems. Numerous firewall software products are
available. Several products are listed in the Web resources in Section 6.15.
Air gap technology is a network security solution that complements the firewall. It
secures private data from external traffic accessing the internal network. The air gap sepa-
rates the internal network from the external network, and the organization decides which
information will be made available to external users. Whale Communications created the e-
Gap System, which is composed of two computer servers and a memory bank. The memory
bank does not run an operating system, therefore hackers cannot take advantage of common
operating system weaknesses to access network information.
Air gap technology does not allow outside users to view the network’s structure, pre-
venting hackers from searching the layout for weak spots or specific data. The e-Gap Web
Shuttle feature allows safe external access by restricting the system’s back office, which is
where an organization’s most sensitive information and IT-based business processes are
controlled. Users who want to access a network hide behind the air gap, where the authen-
tication server is located. Authorized users gain access through a single sign-on capability,
allowing them to use one log-in password to access authorized areas of the network.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 809 Wednesday, August 29, 2001 4:16 PM
The e-Gap Secure File Shuttle feature moves files in and out of the network. Each file
is inspected behind the air gap. If the file is deemed safe, it is carried by the File Shuttle into
the network.53
Air gap technology is used by e-commerce organizations to allow their clients and
partners to access information automatically, thus reducing the cost of inventory manage-
ment. Military, aerospace and government industries, which store highly sensitive informa-
tion, use air gap technology.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 810 Wednesday, August 29, 2001 4:16 PM
restricted applications? Intrusion detection systems monitor networks and application log
files—files containing information on files, including who accessed them and when—so if
an intruder makes it into the network or an unauthorized application, the system detects the
intrusion, halts the session and sets off an alarm to notify the system administrator.54
Host-based intrusion detection systems monitor system and application log files. They
can be used to scan for Trojan horses, for example. Network-based intrusion detection soft-
ware monitors traffic on a network for any unusual patterns that might indicate DoS attacks
or attempted entry into a network by an unauthorized user. Companies can then check their
log files to determine if indeed there was an intrusion and if so, they can attempt to track
the offender. Check out the intrusion detection products from Cisco (www.cisco.com/
warp/public/cc/pd/sqsw/sqidsz), Hewlett-Packard (www.hp.com/secu-
rity/home.html) and Symantec (www.symantec.com).
The OCTAVESM (Operationally Critical Threat, Asset and Vulnerability Evaluation)
method, under development at the Software Engineering Institute at Carnegie Mellon Uni-
versity, is a process for evaluating security threats of a system. There are three phases in
OCTAVE: building threat profiles, identifying vulnerabilities, and developing security
solutions and plans. In the first stage, the organization identifies its important information
and assets, then evaluates the levels of security required to protect them. In the second
phase, the system is examined for weaknesses that could compromise the valuable data.
The third phase is to develop a security strategy as advised by an analysis team of three to
five security experts assigned by OCTAVE. This approach is one of the firsts of its kind,
in which the owners of computer systems not only get to have professionals analyze their
systems, but also participate in prioritizing the protection of crucial information.55
21.15 Steganography
Steganography is the practice of hiding information within other information. The term lit-
erally means “covered writing.” Like cryptography, steganography has been used since an-
cient times. Steganography allows you to take a piece of information, such as a message or
image, and hide it within another image, message or even an audio clip. Steganography
takes advantage of insignificant space in digital files, in images or on removable disks.56
Consider a simple example: If you have a message that you want to send secretly, you can
hide the information within another message, so that no one but the intended receiver can
read it. For example, if you want to tell your stockbroker to buy a stock and your message
must be transmitted over an unsecure channel, you could send the message “BURIED UN-
DER YARD.” If you have agreed in advance that your message is hidden in the first letters
of each word, the stock broker picks these letters off and sees “BUY.”
An increasingly popular application of steganography is digital watermarks for intel-
lectual property protection. An example of a conventional watermark is shown in
Fig. 21.10. A digital watermark can be either visible or invisible. It is usually a company
logo, copyright notification or other mark or message that indicates the owner of the docu-
ment. The owner of a document could show the hidden watermark in a court of law, for
example, to prove that the watermarked item was stolen.
Digital watermarking could have a substantial impact on e-commerce. Consider the
music industry. Music publishers are concerned that MP3 technology is allowing people to
distribute illegal copies of songs and albums. As a result, many publishers are hesitant to
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 811 Wednesday, August 29, 2001 4:16 PM
put content online, as digital content is easy to copy. Also, since CD-ROMs are digital,
people are able to upload their music and share it over the Web. Using digital watermarks,
music publishers can make indistinguishable changes to a part of a song at a frequency that
is not audible to humans, to show that the song was, in fact, copied. Microsoft Research is
developing a watermarking system for digital audio, which would be included with default
Windows media players. In this digital watermarking system, data such as licensing infor-
mation is embedded into a song; the media player will not play files with invalid informa-
tion.
e-Fact 21.9
Record Companies are losing approximately $5 billion per year due to piracy. 57 21.9
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 812 Wednesday, August 29, 2001 4:16 PM
www.esecurityonline.com
This site is a great resource for information on online security. The site has links to news, tools, events,
training and other valuable security information and resources.
theory.lcs.mit.edu/~rivest/crypto-security.html
The Ronald L. Rivest: Cryptography and Security site has an extensive list of links to security resourc-
es, including newsgroups, government agencies, FAQs, tutorials and more.
www.w3.org/Security/Overview.html
The W3C Security Resources site has FAQs, information about W3C security and e-commerce initi-
atives and links to other security related Web sites.
web.mit.edu/network/ietf/sa
The Internet Engineering Task Force (IETF), which is an organization concerned with the architecture
of the Internet, has working groups dedicated to Internet Security. Visit the IETF Security Area to
learn about the working groups, join the mailing list or check out the latest drafts of the IETF’s work.
dir.yahoo.com/Computers_and_Internet/Security_and_Encryption
The Yahoo Security and Encryption page is a great resource for links to Web sites security and en-
cryption.
www.counterpane.com/hotlist.html
The Counterpane Internet Security, Inc., site includes links to downloads, source code, FAQs, tutori-
als, alert groups, news and more.
www.rsasecurity.com/rsalabs/faq
This site is an excellent set of FAQs about cryptography from RSA Laboratories, one of the leading
makers of public key cryptosystems.
www.nsi.org/compsec.html
Visit the National Security Institute’s Security Resource Net for the latest security alerts, government
standards, and legislation, as well as security FAQs links and other helpful resources.
www.itaa.org/infosec
The Information Technology Association of America (ITAA) InfoSec site has information about the
latest U.S. government legislation related to information security.
staff.washington.edu/dittrich/misc/ddos
The Distributed Denial of Service Attacks site has links to news articles, tools, advisory organizations
and even a section on security humor.
www.infoworld.com/cgi-bin/displayNew.pl?/security/links/
security_corner.htm
The Security Watch site on Infoword.com has loads of links to security resources.
www.antionline.com
AntiOnline has security-related news and information, a tutorial titled “Fight-back! Against Hack-
ers,” information about hackers and an archive of hacked sites.
www.microsoft.com/security/default.asp
The Microsoft security site has links to downloads, security bulletins and tutorials.
www.grc.com
This site offers a service to test the security of your computer’s Internet connection.
www.sans.org/giac.html
Sans Institute presents information on system and security updates, along with new research and dis-
coveries. The site offers current publications, projects, and weekly digests.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 814 Wednesday, August 29, 2001 4:16 PM
www.pactetstorm.securify.com
The Packet Storm page describes the twenty latest advisories, tools, and exploits. This site also pro-
vides links to the top security news stories.
www.xforce.iss.net
This site allows one to search a virus by name, reported date, expected risk, or affected platforms. Up-
dated news reports can be found on this page.
www.ntbugtraq.com
This site provides a list and description of various Windows NT Security Exploits/Bugs encountered
by Windows NT users. One can download updated service applications.
nsi.org/compsec.html
The Security Resource Net page states various warnings, threats, legislation and documents of viruses
and security in an organized outline.
www.securitystats.com
This computer security site provides statistics on viruses, web defacements and security spending.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 815 Wednesday, August 29, 2001 4:16 PM
www.epm.ornl.gov/~dunigan/security.html
This site has links to loads of security-related sites. The links are organized by subject and include
resources on digital signatures, PKI, smart cards, viruses, commercial providers, intrusion detection
and several other topics.
www.alw.nih.gov/Security
The Computer Security Information page is an excellent resource, providing links to news, news-
groups, organizations, software, FAQs and an extensive number of Web links.
www.fedcirc.gov
The Federal Computer Incident Response Capability deals with the security of government and civil-
ian agencies. This site has information about incident statistics, advisories, tools, patches and more.
axion.physics.ubc.ca/pgp.html
This site has a list of freely available cryptosystems, along with a discussion of each system and links
to FAQs and tutorials.
www.ifccfbi.gov
The Internet Fraud Complaint Center, founded by the Justice Department and the FBI, fields reports
of Internet fraud.
www.disa.mil/infosec/iaweb/default.html
The Defense Information Systems Agency’s Information Assurance page includes links to sites on
vulnerability warnings, virus information and incident-reporting instructions, as well as other helpful
links.
www.nswc.navy.mil/ISSEC/
The objective of this site is to provide information on protecting your computer systems from security
hazards. Contains a page on hoax versus real viruses.
www.cit.nih.gov/security.html
You can report security issues at this site. The site also lists official federal security policies, regula-
tions, and guidelines.
cs-www.ncsl.nist.gov/
The Computer Security Resource Center provides services for vendors and end users. The site in-
cludes information on security testing, management, technology, education and applications.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 816 Wednesday, August 29, 2001 4:16 PM
www.checkpoint.com
Check Point™ Software Technologies Ltd. is a leading provider of Internet security products and ser-
vices.
www.opsec.com
The Open Platform for Security (OPSEC) has over 200 partners that develop security products and
solutions using the OPSEC to allow for interoperability and increased security over a network.
www.baltimore.com
Baltimore Security is an e-commerce security solutions provider. Their UniCERT digital certificate
product is used in PKI applications.
www.ncipher.com
nCipher is a vendor of hardware and software products, including an SSL accelerator that increases
the speed of secure Web server transactions and a secure key management system.
www.entrust.com
Entrust Technologies provides e-security products and services.
www.antivirus.com
ScanMail® is an e-mail virus detection program for Microsoft Exchange.
www.zixmail.com
Zixmail™ is a secure e-mail product that allows you to encrypt and digitally sign your messages using
different e-mail programs.
web.mit.edu/network/pgp.html
Visit this site to download Pretty Good Privacy® freeware. PGP allows you to send messages and
files securely.
www.certicom.com
Certicom provides security solutions for the wireless Internet.
www.raytheon.com
Raytheon Corporation’s SilentRunner monitors activity on a network to find internal threats, such as
data theft or fraud.
SSL
developer.netscape.com/tech/security/ssl/protocol.html
This Netscape page has a brief description of SSL, plus links to an SSL tutorial and FAQs.
www.netscape.com/security/index.html
The Netscape Security Center is an extensive resource for Internet and Web security. You will find
news, tutorials, products and services on this site.
psych.psy.uq.oz.au/~ftp/Crypto
This FAQs page has an extensive list of questions and answers about SSL technology.
www.visa.com/nt/ecomm/security/main.html
Visa International’s security page includes information on SSL and SET. The page includes a dem-
onstration of an online shopping transaction, which explains how SET works.
www.openssl.org
The Open SSL Project provides a free, open source toolkit for SSL.
Public-key Cryptography
www.entrust.com
Entrust produces effective security software products using Public Key Infrastructure (PKI).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 817 Wednesday, August 29, 2001 4:16 PM
www.cse.dnd.ca
The Communication Security Establishment has a short tutorial on Public Key Infrastructure (PKI)
that defines PKI, public-key cryptography and digital signatures.
www.magnet.state.ma.us/itd/legal/pki.htm
The Commonwealth of Massachusetts Information Technology page has loads of links to sites related
to PKI that contain information about standards, vendors, trade groups and government organizations.
www.ftech.net/~monark/crypto/index.htm
The Beginner’s Guide to Cryptography is an online tutorial and includes links to other sites on privacy
and cryptography.
www.faqs.org/faqs/cryptography-faq
The Cryptography FAQ has an extensive list of questions and answers.
www.pkiforum.org
The PKI Forum promotes the use of PKI.
www.counterpane.com/pki-risks.html
Visit the Counterpane Internet Security, Inc.’s site to read the article “Ten Risks of PKI: What You're
Not Being Told About Public Key Infrastructure.”
Digital Signatures
www.ietf.org/html.charters/xmldsig-charter.html
The XML Digital Signatures site was created by a group working to develop digital signatures using
XML. You can view the group’s goals and drafts of their work.
www.elock.com
E-Lock Technologies is a vendor of digital-signature products used in Public Key Infrastructure. This
site has an FAQs list covering cryptography, keys, certificates and signatures.
www.digsigtrust.com
The Digital Signature Trust Co. is a vendor of Digital Signature and Public Key Infrastructure prod-
ucts. It has a tutorial titled “Digital Signatures and Public Key Infrastructure (PKI) 101.”
Digital Certificates
www.verisign.com
VeriSign creates digital IDs for individuals, small businesses and large corporations. Check out its
Web site for product information, news and downloads.
www.thawte.com
Thawte Digital Certificate Services offers SSL, developer and personal certificates.
www.silanis.com/index.htm
Silanis Technology is a vendor of digital-certificate software.
www.belsign.be
Belsign issues digital certificates in Europe. It is the European authority for digital certificates.
www.certco.com
Certco issues digital certificates to financial institutions.
www.openca.org
Set up your own CA using open-source software from The OpenCA Project.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 818 Wednesday, August 29, 2001 4:16 PM
Digital Wallets
www.globeset.com
GlobeSet is a vendor of digital-wallet software. Its site has an animated tutorial demonstrating the use
of an electronic wallet in an SET transaction.
www.trintech.com
Trintech digital wallets handle SSL and SET transactions.
wallet.yahoo.com
The Yahoo! Wallet is a digital wallet that can be used at thousands of Yahoo! Stores worldwide.
Firewalls
www.interhack.net/pubs/fwfaq
This site provides an extensive list of FAQs on firewalls.
www.spirit.com/cgi-bin/report.pl
Visit this site to compare firewall software from a variety of vendors.
www.zeuros.co.uk/generic/resource/firewall
Zeuros is a complete resource for information about firewalls. You will find FAQs, books, articles,
training and magazines on this site.
www.thegild.com/firewall
The Firewall Product Overview site has an extensive list of firewall products, with links to each ven-
dor’s site.
csrc.ncsl.nist.gov/nistpubs/800-10
Check out this firewall tutorial from the U.S. Department of Commerce.
www.watchguard.com
WatchGuard® Technologies, Inc., provides firewalls and other security solutions for medium to large
organizations.
Kerberos
www.nrl.navy.mil/CCS/people/kenh/kerberos-faq.html
This site is an extensive list of FAQs on Kerberos from the Naval Research Laboratory.
web.mit.edu/kerberos/www
Kerberos: The Network Authentication Protocol is a list of FAQs provided by MIT.
www.contrib.andrew.cmu.edu/~shadow/kerberos.html
The Kerberos Reference Page has links to several informational sites, technical sites and other helpful
resources.
www.pdc.kth.se/kth-krb
Visit this site to download various Kerberos white papers and documentation.
Biometrics
www.iosoftware.com/products/integration/fiu500/index.htm
This site describes a security device that scans a user’s fingerprint to verify identity.
www.identix.com/flash_index.html
Identix specializes in fingerprinting systems for law enforcement, access control and network securi-
ty. Using its fingerprint scanners, you can log on to your system, encrypt and decrypt files and lock
applications.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 819 Wednesday, August 29, 2001 4:16 PM
www.iriscan.com
Iriscan’s PR Iris™ can be used for e-commerce, network and information security. The scanner takes
an image of the user’s eye for authentication.
www.keytronic.com
Key Tronic manufactures keyboards with fingerprint recognition systems.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 820 Wednesday, August 29, 2001 4:16 PM
Newsgroups
news:comp.security.firewalls
news:comp.security.unix
news:comp.security.misc
news:comp.protocols.kerberos
TERMINOLOGY
128-bit IV
3DES
ActiveShield
Advanced Encryption Standard (AES)
application-level gateway
assemblies
asymmetric algorithm
authentication
authentication header (AH)
availability
backdoor program
binary string
biometrics
bit
block
block cipher
brute-force cracking
buffer overflow
BugTraq
Caesar cipher
CERT (Computer Emergency Response Team)
CERT Security Improvement Modules
certificate authority (CA)
certificate authority hierarchy
certificate repository
certificate revocation list (CRL)
cipher
ciphertext
collision
contact interface
contactless interface
content protection
CPU
cracker
cryptanalysis
cryptanalytic attack
cryptography
cryptosystem
Data Encryption Standard (DES)
data packet
decryption
denial-of-service (DoS) attack
denial-of-service attack
DES cracker machine
Diffie-Hellman Key Agreement Protocol
digital certificate
digital envelope
digital ID
digital signature
Digital Signature Algorithm (DSA)
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 821 Wednesday, August 29, 2001 4:16 PM
digital watermarking
distributed denial-of-service attack
Dynamic Proxy Navigation (DPN)
electronic shopping cart
Elliptic Curve Cryptography (ECC)
Encapsulating Security Payload (ESP)
encryption
Enhanced Security Network (ESN)
firewall
gateway
GSM (Global System for Mobile Communications)
hacker
hash function
hash value
identity permissions
ILOVEYOU Virus
initialization vector (IV)
integrity
integrity check (IC)
Internet Engineering Task Force (IETF)
Internet Key Exchange (IKE)
Internet Policy Registration Authority (IPRA)
Internet Protocol (IP)
Internet Security, Applications, Authentication and Cryptography (ISAAC)
IP address
IP spoofing
IPSec (Internet Protocol Security)
IV collision
Kerberos
key
key agreement protocol
key distribution center
key generation
key length
key management
layered biometric verification (LBV)
Liberty Trojan horse
Lightweight Extensible Authentication Protocol (LEAP)
local area network (LAN)
logic bomb
Lucifer
man-in-the-middle attack
masquerading
MD5 hashing algorithm
Melissa Virus
memory card
message digest
message integrity
microprocessor card
Microsoft Authenticode
Microsoft Intermediate Language (MSIL)
Microsoft Passport
mobile code
Mobile Wireless Internet Forum
Mobiletrust certificate authority
National Institute of Standards and Technology (NIST)
network security
nonrepudiation
Online Certificate Status Protocol (OCSP)
packet
packet-filtering firewall
PCI (peripheral component interconnect) card
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 822 Wednesday, August 29, 2001 4:16 PM
permissions
personal identification number (PIN)
plaintext
point-to-point connection
policy creation authority
Pretty Good Privacy (PGP)
privacy
private key
protocol
public key
Public Key Infrastructure (PKI)
public-key algorithms
public-key cryptography
resident virus
restricted algorithms
Rijndael
role based access control (RBAC)
root certificate authority
root key
routing table
RSA encryption memory
RSA Security, Inc.
secret key
Secure Enterprise Proxy
Secure Sockets Layer (SSL)
security policy file
service ticket
session key
single sign-on
smart card
socket
software exploit
steganography
substitution cipher
symmetric encryption algorithm
TCP/IP (Transmission Control Protocol/Internet Protocol)
Ticket Granting Service (TGS)
Ticket Granting Ticket (TGT)
time bomb
timestamping
timestamping agency
transaction management
transient virus
transposition cipher
Triple DES
Trojan horse virus
Trustpoint
VeriSign
Virtual Private Network (VPN)
virus
Web defacing
Wide area network (WAN)
worm
SELF-REVIEW EXERCISES
21.1 State whether the following are true or false. If the answer is false, explain why.
a) In a public-key algorithm, one key is used for both encryption and decryption.
b) Digital certificates are intended to be used indefinitely.
c) Secure Sockets Layer protects data stored on a merchant’s server.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 823 Wednesday, August 29, 2001 4:16 PM
d) Digital signatures can be used to provide undeniable proof of the author of a document.
e) In a network of 10 users communicating using public-key cryptography, only 10 keys are
needed in total.
f) The security of modern cryptosystems lies in the secrecy of the algorithm.
g) Increasing the security of a network often decreases its functionality and efficiency.
h) Firewalls are the single most effective way to add security to a small computer network.
i) Kerberos is an authentication protocol that is used over TCP/IP networks.
j) SSL can be used to connect a network of computers over the Internet.
k) Hacker attacks, such as Denial-of-Service and viruses, can cause e-business to lose bil-
lions of dollars.
21.2 Fill in the blanks in each of the following statements:
a) Cryptographic algorithms in which the message’s sender and receiver both hold an iden-
tical key are called .
b) A is used to authenticate the sender of a document.
c) In a , a document is encrypted using a secret key and sent with that secret key,
encrypted using a public-key algorithm.
d) A certificate that needs to be revoked before its expiration date is placed on a
.
e) The recent wave of network attacks that have hit companies such as eBay, and Yahoo are
known as .
f) A digital fingerprint of a document can be created using a .
g) The four main issues addressed by cryptography are , ,
and .
h) A customer can store purchase information and data on multiple credit cards in an elec-
tronic purchasing and storage device called a .
i) Trying to decrypt ciphertext without knowing the decryption key is known as
.
j) A barrier between a small network and the outside world is called a .
k) A hacker that tries every possible solution to crack a code is using a method known as
.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 824 Wednesday, August 29, 2001 4:16 PM
EXERCISES
21.3 What can online businesses do to prevent hacker attacks, such as denial-of-service attacks
and virus attacks?
21.4 Define the following security terms:
a) digital signature
b) hash function
c) symmetric key encryption
d) digital certificate
e) denial-of-service attack
f) worm
g) message digest
h) collision
i) triple DES
j) session keys
21.5 Define each of the following security terms, and give an example of how it is used:
a) secret-key cryptography
b) public-key cryptography
c) digital signature
d) digital certificate
e) hash function
f) SSL
g) Kerberos
h) firewall
21.6 Write the full name and describe each of the following acronyms:
a) PKI
b) IPSec
c) CRL
d) AES
e) SSL
21.7 List the four problems addressed by cryptography, and give a real-world example of each.
21.8 Compare symmetric-key algorithms with public-key algorithms. What are the benefits and
drawbacks of each type of algorithm? How are these differences manifested in the real-world uses of
the two types of algorithms?
WORKS CITED
1. A. Harrison, “Xerox Unit Farms Out Security in $20M Deal,” Computerworld 5 June 2000: 24.
2. “What the Experts are Saying About Security: Facts and Quotes,” from an OKENA company
Press kit.
3. “RSA Laboratories’ Frequently Asked Questions About Today’s Cryptography, Version 4.1,”
2000 <www.rsasecurity.com/rsalabs/faq>.
4. <www-math.cudenver.edu/~wcherowi/courses/m5410/m5410des.html>
5. M. Dworkin, “Advanced Encryption Standard (AES) Fact Sheet,” 5 March 2001.
6. <www.esat.kuleuven.ac.be/~rijmen/rijndael>
7. <www.rsasecurity.com/rsalabs/rsa_algorithm>
8. <www.pgpi.org/doc/overview>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 825 Wednesday, August 29, 2001 4:16 PM
9. <www.rsasecurity.com/rsalabs/faq>.
10. <userpages.umbc.edu/~mabzug1/cs/md5/md5.html>.
11. T. Russell, “The Cyptographic Landscape for PKI Smart Cards,” Internet Security Advisor
March/April 2001: 22.
12. G. Hulme, “VeriSign Gave Microsoft Certificates to Imposter,” Information Week 3 March
2001.
13. R. Yasin, “PKI Rollout to Get Cheaper, Quicker,” InternetWeek 24 July 2000: 28.
14. C. Ellison and B. Schneier, “Ten Risks of PKI: What You’re not Being Told about Public Key
Infrastructure,” Computer Security Journal 2000.
15. “What’s So Smart About Smart Cards?” Smart Card Forum.
16. T. Russell, “The Cyptographic Landscape for PKI Smart Cards,” Internet Security Advisor,
March/April 2001: 22.
17. S. Abbot, “The Debate for Secure E-Commerce,” Performance Computing February 1999: 37-
42.
18. T. Wilson, “E-Biz Bucks Lost Under the SSL Train,” Internet Week 24 May 1999: 1, 3.
19. H. Gilbert, “Introduction to TCP/IP,” 2 February 1995 <www.yale.edu/pclt/COMM/
TCPIP.HTM>.
20. RSA Laboratories, “Security Protocols Overview,” 1999 <www.rsasecurity.com/
standards/protocols>.
21. M. Bull, “Ensuring End-to-End Security with SSL,” Network World 15 May 2000: 63.
22. <www.cisco.com/warp/public/44/solutions/network/vpn.shtml>.
23. S. Burnett and S. Paine, RSA Security’s Official Guide to Cryptography (Berkeley: Osborne/
McGraw-Hill, 2001) 210.
24. D. Naik, Internet Standards and Protocols Microsoft Press 1998: 79-80.
25. M. Grayson, “End the PDA Security Dilemma,” Communication News February 2001: 38-40.
26. T. Wilson, “VPNs Don’t Fly Outside Firewalls,” Internet Week, 28 May 2001.
27. S. Gaudin, “The Enemy Within,” Network World 8 May 2000: 122-126.
28. D. Deckmyn, “Companies Push New Approaches to Authentication,” Computerworld 15 May
2000: 6.
29. “Centralized Authentication,” <www.keyware.com>.
30. J. Vijayan, “Biometrics Meet Wireless Internet,” Computerworld 17 July 2000: 14.
31. C. Nobel, “Biometrics Targeted For Wireless Devices,” eweek 31 July 2000: 22.
32. F. Trickey, “Secure Single Sign-On: Fantasy or Reality,” CSI <www.gocsi.com>
33. D. Moore, G. Voelker and S. Savage, “Inferring Internet Denial-of-Service Activity.”
34. J. Schwartz, “Computer Vandals Clog Antivandalism Web Site,” The New York Times 24 May
2001.
35. “Securing B2B,” Global Technology Business July 2000: 50-51.
36. H. Bray, “Trojan Horse Attacks Computers, Disguised as a Video Chip,” The Boston Globe 10
June 2000: C1+.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 826 Wednesday, August 29, 2001 4:16 PM
37. T.Bridis, “U.S. Archive of Hacker Attacks To Close Because It Is Too Busy,” The Wall Street
Journal 24 May 2001: B10.
38. R. Marshland, “Hidden Cost of Technology,” Financial Times 2 June 2000: 5.
39. F. Avolio, “Best Practices in Network Security,” Network Computing 20 March 2000: 60-72.
40. “Industry Statistics,” from an AbsoluteSoftware company Press kit.
41. J. Singer, R. Fink, “A Security Analysis of C#”
42. <msdn.microsoft.com/library/default.asp?url=/library/en-us/dnc-
sspec/html/vclrfcsharpspec_a.asp>
43. J. Singer, R. Fink, “A Security Analysis of C#”
44. <msdn.microsoft.com/msdnmag/issues/01/02/CAS/CAS.asp>
45. <msdn.microsoft.com/library/default.asp?url=/library/en-us/
cpref/html/frlrfSystemSecurityPermissionsFileIOPermissionClass-
Topic.asp>
46. <www.msdn.microsoft.com/library/dotnet/cpguide/cpconpermis-
sions.html>
47. <msdn.microsoft.com/msdnmag/issues/01/02/CAS/CAS.asp>
48. <msdn.microsoft.com/library/default.asp?url=/library/en-us/
cpref/html/frlrfsystemsecuritycodeaccesspermissionmember-
stopic.asp>
49. R. Yasin, "Security First for Visa", InternetWeek, 13 November 2000.
50. L. Lorek, "E-Commerce Insecurity", Interactive Week, April 23, 2001.
51. R. Marshland, 5.
52. T. Spangler, “Home Is Where the Hack Is,” Inter@ctive Week 10 April 2000: 28-34.
53. “Air Gap Technology,” Whale Communications <www.whale-com.com>.
54. O. Azim and P. Kolwalkar, “Network Intrusion Monitoring,” Advisor.com/Security March/April
2001: 16-19.
55. “OCTAVE Information Security Risk Evaluation,” 30 January 2001 <www.cert.org/
octave/methodintro.html>.
56. S. Katzenbeisser and F. Petitcolas, Information Hiding: Techniques for Steganography and Dig-
ital Watermarking (Norwood: Artech House, Inc., 2000) 1-2.
57. D.McCullagh, “MS May Have File-Trading Answer,” 1 May 2001 <www.wired.com/
news/print/0,1294,43389,00.html>.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21.fm Page 827 Wednesday, August 29, 2001 4:16 PM
[***Notes To Reviewers***]
• Please pay close attention to Sections 21.8 and 21.13—the Python-specific sections.
• We will post this chapter (with solutions to exercises) for second-round review.
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send us e-mails with detailed, line-by-line comments; mark these directly on the pa-
per pages.
• Please feel free to send any lengthy additional comments by e-mail to cheryl.yaeger@dei-
tel.net.
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copyedited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are concerned mostly with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing style on a global scale.
Please send us a short e-mail if you would like to make such a suggestion.
• Please be constructive. This book will be published soon. We all want to publish the best possible
book.
• If you find something that is incorrect, please show us how to correct it.
• Please read all the back matter including the exercises and any solutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_21IX.fm Page 1 Wednesday, August 29, 2001 4:15 PM
Index 1
2 Index
Index 3
www.tawte.com 792
www.verisign.com 791, 792
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
22
Data Structures
Objectives
• To be able to form linked data structures using self-
referential classes and recursion.
• To be able to create and manipulate dynamic data
structures such as linked lists, queues, stacks and
binary trees.
• To understand various important applications of
linked data structures.
• To understand how to create reusable data structures
with inheritance and composition.
Much that I bound, I could not free;
Much that I freed returned to me.
Lee Wilson Dodd
‘Will you walk a little faster?’ said a whiting to a snail,
‘There’s a porpoise close behind us, and he’s treading on my
tail.’
Lewis Carroll
There is always room at the top.
Daniel Webster
Push on — keep moving.
Thomas Morton
I think that I shall never see
A poem lovely as a tree.
Joyce Kilmer
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
804 Data Structures Chapter 22
Outline
22.1 Introduction
22.2 Self-Referential Classes
22.3 Linked Lists
22.4 Stacks
22.5 Queues
22.6 Trees
Summary • Terminology • Common Programming Errors • Good Programming Practices • Per-
formance Tips • Portability Tip • Self-Review Exercises • Answers to Self-Review Exercises • Ex-
ercises • Special Section: Building Your Own Compiler
22.1 Introduction
We have studied Python’s high-level data types such as lists, tuples and dictionaries. This
chapter introduces the general topic of data structures that underlies Python’s basic data
types. Linked lists are collections of data items “lined up in a row”—insertions and remov-
als are made anywhere in a linked list. Stacks are important in compilers and operating sys-
tems—insertions and removals are made only at one end of a stack—its top. Queues
represent waiting lines; insertions are made at the back (also referred to as the tail) of a
queue, and removals are made from the front (also referred to as the head) of a queue. Bi-
nary trees facilitate high-speed searching and sorting of data, efficient elimination of du-
plicate data items, representing file system directories and compiling expressions into
machine language. These data structures have many other interesting applications.
We will discuss the major types of data structures and implement programs that create
and manipulate these data structures. We use classes and inheritance to create and package
these data structures for reusability and maintainability.
Although basic Python lists can serve as stacks and queues, studying this chapter and
creating these structures “from scratch” is solid preparation for higher-level computer sci-
ence courses. The chapter examples are practical programs that you will be able to use in
more advanced courses and in industry applications. The exercises include a rich collection
of useful applications.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 805
data, a setNextNode method to set the value of member nextNode and a getNext-
Node method to return the value of member nextNode.
Self-referential class objects can be linked together to form useful data structures such
as lists, queues, stacks and trees. Figure 22.1 illustrates two self-referential class instances
linked together to form a list. Note that a slash—representing a reference to None—is
placed in the link member of the second self-referential class instance to indicate that the
link does not refer to another instance. The slash is only for illustration purposes; it does
not correspond to the backslash character in Python. A None reference normally indicates
the end of a data structure.
Common Programming Error 22.1
Not setting the link in the last node of a list to None. 22.1
15 10
Linked list nodes are normally not stored contiguously in memory. Logically, how-
ever, the nodes of a linked list appear to be contiguous. Figure 22.2 illustrates a linked list
with several nodes.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
806 Data Structures Chapter 22
firstNode lastNode
H D ... Q
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 807
30
31 self.nextNode = newNode;
32
33 class List:
34 "Linked list"
35
36 def __init__( self ):
37 "List constructor"
38
39 self.firstNode = None
40 self.lastNode = None
41
42 def __str__( self ):
43 "Override print statement"
44
45 if self.isEmpty():
46 return "The list is empty"
47
48 currentNode = self.firstNode
49 string = "The list is: "
50
51 while currentNode is not None:
52 string += str( currentNode.getData() ) + " "
53 currentNode = currentNode.getNextNode()
54
55 return string
56
57 def insertAtFront( self, value ):
58 "Insert node at front of list"
59
60 newNode = Node( value )
61
62 if self.isEmpty(): # List is empty
63 self.firstNode = self.lastNode = newNode
64 else: # List is not empty
65 newNode.setNextNode( self.firstNode )
66 self.firstNode = newNode
67
68 def insertAtBack( self, value ):
69 "Insert node at back of list"
70
71 newNode = Node( value )
72
73 if self.isEmpty(): # List is empty
74 self.firstNode = self.lastNode = newNode
75 else: # List is not empty
76 self.lastNode.setNextNode( newNode )
77 self.lastNode = newNode
78
79 def removeFromFront( self ):
80 "Delete node from front of list"
81
82 if self.isEmpty(): # raise error on empty list
83 raise IndexError, "remove from empty list"
Fig. 22.3 Manipulating a linked list—List.py.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
808 Data Structures Chapter 22
84
85 firstNodeValue = self.firstNode.getData()
86
87 if self.firstNode is self.lastNode: # one node in list
88 self.firstNode = self.lastNode = None
89 else:
90 self.firstNode = self.firstNode.getNextNode()
91
92 return firstNodeValue
93
94 def removeFromBack( self ):
95 "Delete node from back of list"
96
97 if self.isEmpty(): # raise error on empty list
98 raise IndexError, "remove from empty list"
99
100 lastNodeValue = self.lastNode.getData()
101
102 if self.firstNode is self.lastNode: # one node in list
103 self.firstNode = self.lastNode = None
104 else:
105 currentNode = self.firstNode
106
107 while currentNode.getNextNode() is not self.lastNode:
108 currentNode = currentNode.getNextNode()
109
110 currentNode.setNextNode( None )
111 self.lastNode = currentNode
112
113 return lastNodeValue
114
115 def isEmpty( self ):
116 "Is the list empty?"
117
118 return self.firstNode is None
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 809
134
135 listObject = List()
136
137 instructions()
138 choice = raw_input("? ")
139
140 while choice != "5":
141
142 if choice == "1":
143 listObject.insertAtFront( raw_input( "Enter value: " ) )
144 print listObject
145 elif choice == "2":
146 listObject.insertAtBack( raw_input( "Enter value: " ) )
147 print listObject
148 elif choice == "3":
149
150 try:
151 value = listObject.removeFromFront()
152 except IndexError, message:
153 print "Failed to remove:", message
154 else:
155 print value, "removed from list"
156 print listObject
157
158 elif choice == "4":
159
160 try:
161 value = listObject.removeFromBack()
162 except IndexError, message:
163 print "Failed to remove:", message
164 else:
165 print value, "removed from list"
166 print listObject
167
168 else:
169 print "Invalid choice:", choice
170
171 choice = raw_input("\n? ")
172
173 print "End list test\n"
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
810 Data Structures Chapter 22
? 1
Enter value: 1
The list is: 1
? 1
Enter value: 2
The list is: 2 1
? 2
Enter value: 3
The list is: 2 1 3
? 2
Enter value: 4
The list is: 2 1 3 4
? 3
2 removed from list
The list is: 1 3 4
? 3
1 removed from list
The list is: 3 4
? 4
4 removed from list
The list is: 3
? 4
3 removed from list
The list is empty
? 5
End list test
Figure 22.3 consists of two classes—Node and List. Encapsulated in each List
object is a linked list of Node instances. Node member nextNode stores a reference to
the next Node instance in the linked list.
The List class consists of members firstNode (a reference to the first Node in a
List instance) and lastNode (a reference to the last Node in a List instance). The
constructor initializes both links to None. The primary methods of the List class are
insertAtFront, insertAtBack, removeFromFront, and removeFromBack.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 811
Method isEmpty is called a predicate method—it does not alter the List; rather, it
determines if the List is empty (i.e., the reference to the first Node of the List is None).
If the List is empty, 1 is returned; otherwise, 0 is returned. Method __str__ displays
the List’s contents.
Good Programming Practice 22.1
Assign None to the link member of a new node. 22.1
Over the next several pages, we discuss each of the methods of the List class in
detail. Method insertAtFront places a new node at the front of the list. The method
consists of several steps:
1. Create a new Node instance and store the reference in variable newNode.
2. If the list is empty, then both firstNode and lastNode are set to newNode.
3. If the list is not empty, then the node referenced by newNode is threaded into the
list by copying firstNode to newNode.nextNode so that the new node re-
fers to what used to be the first node of the list, and copying newNode to first-
Node so that firstNode now refers to the new first node of the list.
Figure 22.4 illustrates method insertAtFront. Part a) of the figure shows the list
and the new node before the insertAtFront operation. The dotted arrows in part b)
illustrate the steps 2 and 3 of the insertAtFront operation that enable the node con-
taining 12 to become the new list front.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
812 Data Structures Chapter 22
a) firstNode
7 11
newNode
12
b) firstNode
7 11
newNode
12
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 813
12 7 11 5
12 7 11 5
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
814 Data Structures Chapter 22
a) firstNode lastNode
12 7 11 5
b) firstNode lastNode
12 7 11 5
tempNode
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 815
a) firstNode lastNode
12 7 11 5
12 7 11 5
tempNode
Method __str__ first determines if the list is empty. If so, the method returns "The
list is empty". Otherwise, it returns a string that contains each node’s data. The
method initializes currentNode as a copy of firstNode and then initializes the string
"The list is: ". While currentNode is not None, currentNode.data is added
to the string and the value of currentNode.nextNode is assigned to currentNode.
Note that if the link in the last node of the list is not None, the string creation algorithm
will erroneously continue past the end of the list. The string creation algorithm is identical
for linked lists, stacks and queues.
The kind of linked list we have been discussing is a singly linked list—the list begins
with a reference to the first node, and each node contains a reference to the next node “in
sequence.” This list terminates with a node whose reference member is None. A singly
linked list may be traversed in only one direction.
A circular, singly linked list begins with a reference to the first node, and each node
contains a reference to the next node. The “last node” does not contain a reference to None;
rather, the reference in the last node refers back to the first node, thus closing the “circle.”
A doubly linked list allows traversals both forwards and backwards. Such a list is often
implemented with two “start references”—one that refers to the first element of the list to
allow front-to-back traversal of the list, and one that refers to the last element of the list to
allow back-to-front traversal of the list. Each node has both a forward reference to the next
node in the list in the forward direction and a backward reference to the next node in the
list in the backward direction. If the list contains an alphabetized telephone directory, for
example, searching for someone whose name begins with a letter near the front of the
alphabet might begin from the front of the list. Searching for someone whose name begins
with a letter near the end of the alphabet might begin from the back of the list.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
816 Data Structures Chapter 22
In a circular, doubly linked list, the forward reference of the last node refers to the first
node, and the backward reference of the first node refers to the last node, thus closing the
“circle.”
22.4 Stacks
A stack is a constrained version of a linked list—new nodes can be added to a stack and
removed from a stack only at the top. For this reason, a stack is referred to as a last-in, first-
out (LIFO) data structure. The link member in the last node of the stack is set to None to
indicate the bottom of the stack.
Common Programming Error 22.2
Not setting the link in the bottom node of a stack to None. 22.2
The primary methods used to manipulate a stack are push and pop. Method push
adds a new node to the top of the stack. Method pop removes a node from the top of the
stack and returns the popped value to the caller. The method raises an IndexError if the
stack is empty.
Stacks have many interesting applications. For example, when a function call is made,
the called function must know how to return to its caller, so the return address is pushed
onto a stack. If a series of function calls occurs, the successive return values are pushed onto
the stack in last-in, first-out order so that each function can return to its caller. Stacks sup-
port recursive function calls in the same manner as conventional nonrecursive calls.
Stacks contain the space created for local variables on each invocation of a function.
When the function returns to its caller or throws an exception, the destructor (if any) for
each local object is called, the space for that function's local variables is popped off the
stack and those variables are no longer known to the program. Stacks are used by compilers
in the process of evaluating expressions and generating machine language code. The exer-
cises explore several applications of stacks.
We will take advantage of the close relationship between lists and stacks to implement
a stack class primarily by reusing a list class. We implement the stack class through inher-
itance of the list class.
The program of Figure 22.8 creates a Stack class primarily through inheritance of
class List of Fig. 22.3 We want the Stack to have methods push and pop. Note that
these are essentially the insertAtFront and removeFromFront methods of class
List. When we implement the Stack’s methods, we then have each of these call the
appropriate method of class List—push calls insertAtFront, pop calls remove-
FromFront. Of course, class List contains other methods (i.e., insertAtBack and
removeFromBack) that we would not use when manipulating instances of class Stack.
The driver program uses class Stack to instantiate a stack instance. Integers 0 through 3
are pushed onto the stack and then popped off the stack.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 817
Processing a Stack
The list is: 0
The list is: 1 0
The list is: 2 1 0
The list is: 3 2 1 0
3 popped from stack
The list is: 2 1 0
2 popped from stack
The list is: 1 0
1 popped from stack
The list is: 0
0 popped from stack
The list is empty
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
818 Data Structures Chapter 22
22.5 Queues
A queue is similar to a supermarket checkout line—the first person in line is serviced first,
and other customers enter the line at the end and wait to be serviced. Queue nodes are re-
moved only from the head of the queue and are inserted only at the tail of the queue. For
this reason, a queue is referred to as a first-in, first-out (FIFO) data structure. The insert and
remove operations are known as enqueue and dequeue.
Queues have many applications in computer systems. Most computers have only a
single processor, so only one user at a time can be served. Entries for the other users are
placed in a queue. Each entry gradually advances to the front of the queue as users receive
service. The entry at the front of the queue is the next to receive service.
Queues are also used to support print spooling. A multiuser environment may have
only a single printer. Many users may be generating outputs to be printed. If the printer is
busy, other outputs may still be generated. These are “spooled” to disk (much as thread is
wound onto a spool) where they wait in a queue until the printer becomes available.
Information packets also wait in queues in computer networks. Each time a packet
arrives at a network node, it must be routed to the next node on the network along the path
to the packet’s final destination. The routing node routes one packet at a time, so additional
packets are enqueued until the router can route them.
A file server in a computer network handles file access requests from many clients
throughout the network. Servers have a limited capacity to service requests from clients.
When that capacity is exceeded, client requests wait in queues.
Figure 22.9 creates class Queue primarily through inheritance of class List of
Fig. 22.3. We want the Queue to have methods enqueue and dequeue. We note that
these are essentially the insertAtBack and removeFromFront methods of class
List. When we implement the Queue’s methods, we have each of these call the appro-
priate method of class List—enqueue calls insertAtBack and dequeue calls
removeFromFront. Of course, class List contains other methods (i.e., insertAt-
Front and removeFromBack) that we would not use when manipulating instances of
class Queue. The main portion of the program uses class Queue to instantiate a queue
instance. We enqueue integer values 0 through 3, then dequeue the values in first-in, first-
out order.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 819
15 "Dequeue element"
16
17 return self.removeFromFront()
Processing a Queue
The list is: 0
The list is: 0 1
The list is: 0 1 2
The list is: 0 1 2 3
0 dequeued
The list is: 1 2 3
1 dequeued
The list is: 2 3
2 dequeued
The list is: 3
3 dequeued
The list is empty
22.6 Trees
Linked lists, stacks and queues are linear data structures. A tree is a nonlinear, two-dimen-
sional data structure with special properties. Tree nodes contain two or more links. This
section discusses binary trees (Fig. 22.10)—trees whose nodes all contain two links (one
or both of which may be None). The root node is the first node in a tree. Each link in the
root node refers to a child. The left child is the root node of the left subtree, and the right
child is the root node of the right subtree. The children of a single node are called siblings.
A node with no children is called a leaf node. Computer scientists normally draw trees from
the root node down—exactly the opposite of trees in nature.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
820 Data Structures Chapter 22
A D
47
25 77
11 43 65 93
7 17 31 44 68
The program of Figure 22.12 creates a binary search tree and traverses it (i.e., walks
through all its nodes) three ways—using recursive inorder, preorder and postorder tra-
versals.
10 self.data = data
11 self.right = None
12
13 def getData( self ):
14 "Get node data"
15
16 return self.data
17
18 def setData( self, newData ):
19 "Set node data"
20
21 self.data = newData
22
23 def getLeftNode( self ):
24 "Get left child"
25
26 return self.left
27
28 def setLeftNode( self, node ):
29 "Set right child"
30
31 self.left = node
32
33 def getRightNode( self ):
34 "Get right child"
35
36 return self.right
37
38 def setRightNode( self, node ):
39 "Set right child"
40
41 self.right = node
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
822 Data Structures Chapter 22
Preorder Traversal
50 25 12 6 13 33 75 67 68 88
Inorder Traversal
6 12 13 25 33 50 67 68 75 88
Postorder Traversal
6 13 12 33 25 68 67 88 75 50
The main program begins by instantiating a binary tree. The program prompts for 10
integers, each of which is inserted in the binary tree through a call to insertNode. The
program then performs preorder, inorder and postorder traversals (these are explained
shortly) of tree.
Now we discuss the class definitions. Class TreeNode has as data the node’s data
value, and references left (to the node’s left subtree) and right (to the node’s right sub-
tree). The constructor sets member data to the value supplied as a constructor argument,
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
824 Data Structures Chapter 22
and sets references left and right to None (thus initializing this node to be a leaf node).
Method getData returns the data value, and method setData sets the data value.
Class Tree has data rootNode, a reference to the root node of the tree. The class has
methods insertNode (that inserts a new node in the tree,) and preorderTraversal,
inorderTraversal and postorderTraversal, each of which walks the tree in
the designated manner. Each of these methods calls its own separate recursive utility
method to perform the appropriate operations on the internal representation of the tree. The
Tree constructor initializes rootNode to None to indicate that the tree is initially empty.
The Tree class’ utility method insertNodeHelper recursively inserts a node into
the tree. A node can only be inserted as a leaf node in a binary search tree. If the tree is
empty, a new TreeNode is created, initialized and inserted in the tree.
If the tree is not empty, the program compares the value to be inserted with the data
value in the root node. If the insert value is smaller, the program recursively calls insert-
NodeHelper to insert the value in the left subtree. If the insert value is larger, the program
recursively calls insertNodeHelper to insert the value in the right subtree. If the value
to be inserted is identical to the data value in the root node, the program prints the message
"duplicate" and returns without inserting the duplicate value into the tree.
Each of the methods inOrderTraversal, preOrderTraversal and pos-
tOrderTraversal traverse the tree (Fig. 22.13) and print the node values.
27
13 42
6 17 33 48
6 13 17 27 33 42 48
Note that the inOrderTraversal of a binary search tree prints the node values in
ascending order. The process of creating a binary search tree actually sorts the data—and
thus this process is called the binary tree sort.
The steps for a preOrderTraversal are:
1. Process the value in the node.
2. Traverse the left subtree with a preOrderTraversal.
3. Traverse the right subtree with a preOrderTraversal.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 825
The value in each node is processed as the node is visited. After the value in a given node
is processed, the values in the left subtree are processed, and then the values in the right
subtree are processed. The preOrderTraversal of the tree in Fig. 22.13 is:
27 13 6 17 42 33 48
6 17 13 33 48 42 27
The binary search tree facilitates duplicate elimination. As the tree is being created, an
attempt to insert a duplicate value will be recognized because a duplicate will follow the
same “go left” or “go right” decisions on each comparison as the original value did. Thus,
the duplicate will eventually be compared with a node containing the same value. The
duplicate value may be discarded at this point.
Searching a binary tree for a value that matches a key value is fast. If the tree is bal-
anced, then each level contains about twice as many elements as the previous level. So a
binary search tree with n elements would have a maximum of log2n levels, and thus a max-
imum of log2n comparisons would have to be made either to find a match or to determine
that no match exists. This means, for example, that when searching a (balanced) 1000-ele-
ment binary search tree, no more than 10 comparisons need to be made because 210 > 1000.
When searching a (balanced) 1,000,000-element binary search tree, no more than 20 com-
parisons need to be made because 220 > 1,000,000.
In the exercises, algorithms are presented for several other binary tree operations such
as deleting an item from a binary tree, printing a binary tree in a two-dimensional tree
format and performing a level-order traversal of a binary tree. The level-order traversal of
a binary tree visits the nodes of the tree row by row, starting at the root node level. On each
level of the tree, the nodes are visited from left to right. Other binary tree exercises include
allowing a binary search tree to contain duplicate values, inserting string values in a binary
tree and determining how many levels are contained in a binary tree.
SUMMARY
• Self-referential classes contain members called links that point to objects of the same class type.
• Self-referential classes enable many objects to be linked together in stacks, queues lists and trees.
• A linked list is a linear collection of self-referential class objects.
• A linked list is a dynamic data structure—the length of the list increases or decreases as necessary.
• Linked lists can continue to grow until memory is exhausted.
• Linked lists provide a mechanism for insertion and deletion of data by reference manipulation.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
826 Data Structures Chapter 22
• A singly linked list begins with a link to the first node, and each node contains a link to the next
node “in sequence.” This list terminates with a node whose reference member is None. A singly
linked list may be traversed in only one direction.
• A circular, singly linked list begins with a link to the first node, and each node contains a link to
the next node. The link in the last node references the first node, thus closing the “circle.”
• A doubly linked list allows traversals both forwards and backwards. Each node has both a forward
link to the next node in the list in the forward direction, and a backward link to the next node in
the list in the backward direction.
• In a circular, doubly linked list, the forward link of the last node points to the first node, and the
backward link of the first node points to the last node, thus closing the “circle.”
• Stacks and queues are constrained versions of linked lists.
• New stack nodes are added to a stack and are removed from a stack only at the top of the stack.
For this reason, a stack is referred to as a last-in, first-out (LIFO) data structure.
• The link member in the last node of the stack is set to null (zero) to indicate the bottom of the stack.
• The two primary operations used to manipulate a stack are push and pop. The push operation
creates a new node and places it on the top of the stack. The pop operation removes a node from
the top of the stack and returns the popped value.
• In a queue data structure, nodes are removed from the head and added to the tail. For this reason,
a queue is referred to as a first-in, first-out (FIFO) data structure. The add and remove operations
are known as enqueue and dequeue.
• Trees are two-dimensional data structures requiring two or more links per node.
• Binary trees contain two links per node.
• The root node is the first node in the tree.
• Each of the references in the root node refers to a child. The left child is the first node in the left
subtree, and the right child is the first node in the right subtree. The children of a node are called
siblings. Any tree node that does not have any children is called a leaf node.
• A binary search tree has the characteristic that the value in the left child of a node is less than the
value in its parent node, and the value in the right child of a node is greater than or equal to the
value in its parent node. If there are no duplicate data values, the value in the right child is greater
than the value in its parent node.
• An inorder traversal of a binary tree traverses the left subtree inorder, processes the value in the
root node and then traverses the right subtree inorder. The value in a node is not processed until
the values in its left subtree are processed.
• A preorder traversal processes the value in the root node, traverses the left subtree preorder and
then traverses the right subtree preorder. The value in each node is processed as the node is en-
countered.
• A postorder traversal traverses the left subtree postorder, traverses the right subtree postorder then
processes the value in the root node. The value in each node is not processed until the values in
both its subtrees are processed.
SUMMARY
[***To be done for second round of review***]
TERMINOLOGY
[***To be done for second round of review***]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Chapter 22 Data Structures 827
SELF-REVIEW EXERCISES
EXERCISES
[***To be done for second round of review***]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
828 Data Structures Chapter 22
Notes to Reviewers:
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send e-mails with detailed, line-by-line comments; mark these directly on the paper
pages.
• Please feel free to send any lengthy additional comments by e-mail to
[email protected].
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copy edited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are mostly concerned with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing or coding style on a
global scale. Please send us a short e-mail if you would like to make a suggestion.
• If you find something incorrect, please show us how to correct it.
• In the later round(s) of review, please read all the back matter, including the exercises and any so-
lutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
Additional Comments:
• The goal of this chapter is to teach the concept of data structures. However, it would be a good
idea to include more performance tips, throughout, to demonstrate why Python may or may not be
the best language in which to actually implement these data structures.
• Currently, we are reorganizing our object-oriented chapters to better capture the Python OOP idi-
om (specifically, attribute access). The implications for this chapter are:
1. "Private data"
1. Access methods go away.
2. We may be able to prevent access to un-needed base class methods from clients
of a derived class?
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
Index 1
B F
backward reference 815
FIFO 818 N
fig22_08.py 817 network node 818
balanced 825
fig22_09.py 819 node 805
binary search tree 820, 824
file system directory 804 None 805
binary tree 804, 819, 823
first-in first-out (FIFO) data nonlinear, two-dimensional data
binary tree sort 824
structure 818 structure 819
bottom of a stack 816 first-in, first-out order 818
BST (binary search tree) 820 forward reference 815
P
C G packet 818
C programming language 811 parent node 820
graphical representation of a
C++ programing language 811 pop stack method 816
binary tree 820
child 819 postorder traversal 820
circular, doubly-linked list 816 postOrderTraversal
circular, singly-linked list 815 H method 825
compiler 816 head of a queue 804, 818 predicate method 811
compiling 804 high-level data type 804 preorder traversal 820
computer network 818 preOrderTraversal method
824
I print spooling 818
D implementing a binary tree 820 printer 818
data structure 804 IndexError exception 816 printing a binary tree in a two-
deleting an item from a binary tree initialize pointer to 0 (null) 810 dimensional tree format 825
825 inorder traversal 820 push stack method 816
dequeue queue method 818 inOrderTraversal method
Python reference counting 811
destructor for garbage collection
824
insertion 804
811
dethread a node from a list 814 Q
dictionary 804 L queue 804, 805, 815, 818
doubly-linked list 815 queue in a computer network 818
last-in-first-out (LIFO) data
duplicate elimination 804, 825 Queue.py 818
structure 816
duplicate node values 820 leaf node 819
left child 819 R
left node 824
E left subtree 819, 823, 824
recursive function call 816
enqueue queue method 818 recursive utility method 824
level-order traversal of a binary
evaluating expressions 816 reference counting 811
tree 825
Examples LIFO 816 reference links 805
fig22_08.py 817 linear data structure 805, 819 reference to None 805
fig22_09.py 819 link 804, 805, 819 removal 804
implementing a binary tree linked list 804, 805, 815 right child 819
820 list 804, 805 right subtree 823, 824
List.py 806 List class 810, 816, 818 root node 819, 824
manipulating a linked list 806 list processing 806 root node of the left subtree 819
Queue.py 818 List.py 806 root node of the right subtree 819
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
2 Index
S
searching 804
self-referential class 804, 805
sibling 819
simple queue implementation 818
simple stack implementation 816
singly-linked list 815
sorting 804
spool to disk 818
spooling 818
stack 804, 805, 815
Stack.py 816
subtree 819
supermarket checkout line 818
T
tail of a queue 804, 818
tightly packed tree 825
top of a stack 804, 816
traversals forwards and backwards
815
traverse a binary tree 820, 825
traverse the left subtree 824
traverse the right subtree 824
tree 805, 819, 825
tree sort 824
Tree.py 821
Treenode.py 820
tuple 804
W
walk a list 814
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/11/01
pythonhtp1_23.fm Page 1052 Friday, August 31, 2001 1:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1013 Friday, August 31, 2001 1:47 PM
23
Case Study: Online
Bookstore
Objectives
• To build a three-tier, client/server, distributed Web
application using Python and CGI.
• To understand the concept of an HTTP session.
• To be able to use a Session class to keep track of an
HTTP session between pages.
• To be able to create XML from a script and XSL
transformations to convert the XML into a format the
client can display.
• To be able to deploy an application on an Apache Web
server.
[*** NEED QUOTES. ***]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1014 Friday, August 31, 2001 1:47 PM
Outline
23.1 Introduction
23.2 HTTP Sessions and Session Tracking Technologies
23.3 Tracking Sessions with Python Session Class
23.4 Bookstore Architecture
23.5 Setting up the Bookstore
23.6 Entering the Bookstore
23.7 Obtaining the Book List from the Database
23.8 Viewing a Book’s Details
23.9 Adding an Item to the Shopping Cart
23.10 Viewing the Shopping Cart
23.11 Checking Out
23.12 Processing the Order
23.13 Error Handling
23.14 Handling Wireless Clients (XHTML Basic and WML)
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
23.1 Introduction
In this chapter, we implement a bookstore Web application that integrates many technolo-
gies we cover in this book while serving as a capstone for our presentation of Python CGI.
The technologies used in the application include CGI (Chapter 6), XML, XSL and XSLT
(Chapters 15–16), mySQL and the Python DB-API (Chapter 17), HTML and XHTML
(Chapters 26–27) and Cascading Style Sheets (Chapter 28). The case study also introduces
additional features—we will discuss the new elements as we encounter them. We demon-
strate how to deploy this application on an Apache server so that after reading this chapter,
you will be able to implement a substantial distributed Web application containing many
components on an Apache server.
nates. This means that the client must identify itself with each request while connecting to
the server using HTTP.
One session tracking method uses cookies. Cookies are small text files sent by a Python
CGI script as part of a response to a client. Cookies can store information on the client’s
computer for retrieval later in the same browsing session or in future browsing sessions.
For example, because cookies can be retrieved later in the same session, cookies could be
used in a shopping application to indicate the client’s preferences because the cookies have
traced the clients’ movements—which pages have been visited and what links have been
clicked. When the Python script receives the client’s next communication, the Python script
can examine the cookie(s) information and identify the client’s preferences and display
products that may be of interest to the client, based the pages they have viewed.
Every HTTP-based interaction between a client and a server includes a header that
contains information about the request (communication from the client to the server) or
information about the response (communication from the server to the client). When a
Python script receives a request, the header includes information such as the request type
(e.g., GET or POST) and cookies stored on the client machines by the server. When the
server formulates its response, the header information includes any cookies the server will
store on the client computer.
Depending on the maximum age of a cookie, the Web browser either maintains the
cookie for the duration of the browsing session (i.e., until the user closes the Web browser)
or stores the cookie on the client computer to access in a future session. When the browser
makes a request of a server, cookies previously sent to the client by that server are returned
to the server (if they have not expired) as part of the request formulated by the browser.
Cookies are automatically deleted when they expire (i.e., reach their maximum age).
Cookies often are the easiest way for a Python programmer to distinguish clients.
However, cookies are not accepted by all client types or browsers. Also, users may disable
cookies, which may make users unable to view content on cookie-dependent sites—some
sites require cookies for clients to even access home pages. For these reasons, we have
chosen not to use cookies to track sessions in our online bookstore.
Portability Tip 23.1
Not all browsers support cookies. Designing a server which uses cookies may exclude some
users from accessing your site. 23.1
Another method for session tracking involves embedding state information. The first
time a client connects to a server, it is assigned a unique session ID by the server. When the
client makes additional requests, the client’s session ID is compared against the session IDs
stored on the server.
The ID must be passed from page to page so each Web page file will know the session
ID of the current client, thereby distinguishing clients. This can be done in different ways.
One method of passing the ID is to place a hidden form field. Then the next page can access
the ID as a normal CGI parameter. Another method is to add the ID to the URL by adding
the ID to a hyperlink that points to the next page. The next page can then extract the ID from
the URL. If the ID is appended to the URL as part of a query string, however, the next page
can access the ID as a normal CGI parameter.
Although more extensible than cookies, tracking session information using embedded
session IDs has disadvantages. One disadvantage to this method is that it creates Web page
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1016 Friday, August 31, 2001 1:47 PM
addresses much longer than they normally would be when the session ID is embedded in
every hyperlink. Embedding information also presents a potential security risk. Storing the
session ID in the web page or URL creates the possibility that a person other than the user
may see the ID and gain access to the user’s data. Nonetheless, we have chosen this method
to track HTTP sessions in our online bookstore.
Good Programming Practice 23.1
Every session-tracking method has advantages and disadvantages. Research and carefully
consider each technique before selecting one for a site. 23.1
30
31 try:
32 file = open( getClientType()[ 0 ] + "/contentType.txt" )
33 except:
34 raise SessionError( "Missing+content+type+file" )
35
36 contentType = file.read()
37 file.close()
38 return contentType
39
40 def redirect( URL ):
41 """Redirect the client to a relative URL"""
42
43 print "Location: %s\n" % \
44 urlparse.urljoin( "http://" + os.environ[ "HTTP_HOST" ] +
45 os.environ[ "REQUEST_URI" ], URL )
46
47 class SessionError( Exception ):
48 """User-defined exception for Session class"""
49
50 def __init__( self, error ):
51 """Set error message"""
52
53 self.error = error
54
55 def __str__( self ):
56 """Return error message"""
57
58 return self.error
59
60 class Session( UserDict ):
61 """Session class keeps tracks of an HTTP session"""
62
63 def __init__( self, createNew = 0 ):
64 """Create a new session or load an existing session"""
65
66 # attempt to load previously created session
67 if not createNew:
68
69 # session ID is passed in query string
70 queryString = cgi.parse_qs( os.environ[ "QUERY_STRING" ] )
71
72 # no ID has been supplied in query string
73 if not queryString.has_key( "ID" ):
74 raise SessionError( "No+ID+given" )
75
76 self.sessionID = queryString[ "ID" ][ 0 ]
77 self.fileName = os.getcwd() + "/sessions/." + \
78 self.sessionID
79
80 # supplied ID is invalid
81 if not self.sessionExists():
82 raise SessionError( "Nonexistant+ID+given" )
83
Fig. 23.1 Utility functions and Session class that track an http session.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1018 Friday, August 31, 2001 1:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1019 Friday, August 31, 2001 1:47 PM
Fig. 23.1 Utility functions and Session class that track an http session.
When a Session object is created, createNew (the argument passed to the cont-
structor) can be specified to a value other than 0 (the default) to create a new session. In this
case, execution begins at line 79 with a call to method generateID. Method gener-
ateID (lines 131–136) uses module md5 to generate a unique ID. Lines 134–135 create a
string from the time of the session, the client address and the client port. Lines 136–137
then create and return a unique ID using this string. For more information on md5, review
Chapter 21.
When the Session obtains its new ID from generateID, it stores the name of its
session file, fileName, and checks if the session already exists. Note that the filename of
a session is a period (.) followed by the session ID. All session files are stored in a subdi-
rectory, sessions, of the current working directory. If the session file already exists,
Session raises the user-defined exception SessionError (line 82).
Class Session inherits from class UserDict. UserDict is a class defined in
module UserDict that simulates a dictionary. The contents of each instance are stored in
a Python dictionary called data. Line 85 initializes an instance of UserDict, creating an
empty session dictionary (data). Data then stores the session ID (line 89). Lines 100–101
obtain the client type from function getClientType and store it in the session dictio-
nary. Function getClientType searches the HTTP_USER_AGENT environment vari-
able for certain keywords to determine the client type (lines 15–26). Line 91 stores the
results of function getContentType in data. Function getContentType opens the
contentType.txt file, which resides in a subdirectory named after the client type, and
returns the contents of the file (lines 25–35). Figure 23.2 contains an example of such a file.
Line 96 creates an empty shopping cart (an empty dictionary).
1 Content-type: text/html
2
To save session data between pages, method saveSession must be called (lines
119–124). This method creates a new session file corresponding to the value of attribute
fileName. Line 123 uses module cPickle to pickle the session dictionary and dump it
into the session file.
To open an existing session from a different script, create a Session with creat-
eNew set to 0 (default). If createNew is 0, execution begins in line 67. Session obtains
the query string and parses it. If no ID is specified, the constructor raises a Session-
Error. Otherwise, the session ID is extracted and the filename is determined (lines 76–
78). If the session does not exist, the constructor raises a SessionError (lines 81–82).
Otherwise, the constructor calls the UserDict base-class constructor (line 85). The value
of the session dictionary (data) is the value returned from method loadSession. This
method (lines 110–117) opens the session file (line 114). It then uses cPickle to unpickle
and return the session dictionary it contains (lines 115–117).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1020 Friday, August 31, 2001 1:47 PM
When a session is no longer needed, it can be removed from the server by invoking
method deleteSession (line 126–129). This method deletes the session file by calling
os.remove.
viewCart
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1021 Friday, August 31, 2001 1:47 PM
XML with XSLT) containing the cart contents, the subtotal dollar cost of each item and the
total dollar cost of all the items in the cart. When the user adds an item to the shopping cart,
the addToCart script processes the user’s request, then forwards the request to view-
Cart to create the document that displays the current cart. At this point, the user can either
continue shopping (allBooks.py) or proceed to checkout (order.py). In the latter
case, the user is presented with a form to input name, address and credit-card information.
Then, the user submits the form to invoke process.py, which completes the transaction
by sending a confirmation document to the user. Figure 23.4 overviews the scripts and
other files used in this case study.
File Description
contentType.txt Contains the line that specifies to the browser the content type
of the data. There is one of these files for each client type.
bookstore.py This is the default home page for the bookstore, which is dis-
played by entering the following URL in the client’s Web
browser:
http://localhost/cgi-bin/bookstore/
bookstore.py
Here, a new Session is created for the user to track the
HTTP session. The user is then forwarded to allBooks.py.
styles.css This Cascading Style Sheet (CSS) file is linked to all XHTML
and XHTML Basic documents rendered on the client. The CSS
file allows us to apply uniform formatting across all the static
and dynamic documents rendered.
allBooks.py This script uses Book objects to create a document containing
the product list. It queries the catalog database to obtain the
list of titles in the database. The results are processed and
placed into a list of Book objects. The list is stored as a session
attribute for the client. The script creates an XML document
which represents all the books, then applies a client-specific
XSLT transformation (allBooks.xsl) to the XML to pro-
duce a document that can be rendered by the client.
allBooks.xsl This XSLT style sheet transforms the XML representation of
the entire catalog of books into a document that the client
browser can render. There is one of these files for each client
type.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1022 Friday, August 31, 2001 1:47 PM
File Description
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1023 Friday, August 31, 2001 1:47 PM
File Description
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1024 Friday, August 31, 2001 1:47 PM
http://localhost/cgi-bin/bookstore/bookstore.py
1 #!c:\Python\python.exe
2 # Fig. 23.5: bookstore.py
3 # Create a new Session for client.
4
5 import sys
6 import time
7 import Session
8
9 # create new Session
10 try:
11 session = Session.Session( 1 )
12 except Session.SessionError, message: # ID already exists
13 time.sleep( 1 ) # wait 1 second
14 Session.redirect( "bookStore.py" ) # try again
15 sys.exit()
16
17 # re-direct to allBooks.py
18 nextPage = "allBooks.py?ID=%s" % session[ "ID" ]
19 session.saveSession()
20 Session.redirect( nextPage )
Line 11 creates a new Session for the client (see Section 11.2). Recall that if the ses-
sion-generated ID already exists, a SessionError is raised. In this case, the program
sleeps for one second (so that the seed for md5 changes), redirects the client to book-
store.py (to make another attempt) and exits (lines 13–15). Otherwise, lines 18–20 are
executed. Line 18 creates the redirection string to send the client to allBooks.py. Note
that the session ID is stored in the URL as part of the query string. This ensures all-
Books.py can determine the client’s identity. Lines 19–20 save the session and print the
redirection string, sending the client to allBooks.py.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1025 Friday, August 31, 2001 1:47 PM
Fig. 23.7 Book that represents a single book’s information and defines the XML
format of that information (part 3 of 3).
Method getXML (lines 77–124) uses the DOM Document and Element interfaces
to create an XML representation of the book data as part of the Document that is passed
as an argument to the method. The complete information for one book is placed in a
product element (created in line 81). The elements for the individual properties of a book
are appended to the product element as children. For example, line 84 uses Document
method createElement to create element isbn. Line 85 uses Document method
createTextNode to specify the text in the isbn element, and uses Element method
appendChild to append the text to element isbn. Then, line 86 appends element isbn
as a child of element product with Element method appendChild. Similar opera-
tions are performed for the other book properties. Line 124 returns element product to
the caller. For more information about XML and Python, refer to Chapters 15 and 16.
Recall that after creating a session for the client, bookstore.py redirects the user
to allBooks.py. This program retrieves the list of books from the catalog database
and dynamically generates an XML document that represents it. This document is then pro-
cessed against a client-specific XSLT stylesheet called allBooks.xsl. The results are
then rendered on the client.
1 #!c:\Python\python.exe
2 # Fig. 23.7: allBooks.py
3 # Retrieve all books from database and store in session.
4 # Display book list to client by retrieving XML and converting
5 # to required format using client-specific XSLT stylesheet.
6
7 import sys
Fig. 23.8 allBooks.py returns to the client a document containing the book list
(part 1 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1028 Friday, August 31, 2001 1:47 PM
8 import Book
9 import Session
10 import MySQLdb
11 from xml.xslt import Processor
12 from xml.dom.DOMImplementation import implementation
13
14 # load Session
15 try:
16 session = Session.Session()
17 except Session.SessionError, message: # invalid/no session ID
18 Session.redirect( "error.py?message=%s" % message )
19 sys.exit()
20
21 # setup mySQL statement
22 query = """SELECT isbn, title, editionNumber,
23 copyRight, publisherID, imageFile, price
24 FROM titles ORDER BY title"""
25
26 # attempt database connection and retrieve list of Books
27 try:
28
29 # connect to the database, retrieve a cursor and execute query
30 connection = MySQLdb.connect( db = "books" )
31 cursor = connection.cursor()
32 cursor.execute( query )
33
34 # acquire results and close database connection
35 results = cursor.fetchall()
36 cursor.close()
37 connection.close()
38 except OperationalError, message:
39 Session.redirect( "error.py?message=%s" % message )
40 sys.exit()
41
42 allBooks = []
43
44 # Get row data
45 for row in results:
46 book = Book.Book()
47 book.setISBN( row[ 0 ] )
48 book.setTitle( row[ 1 ] )
49 book.setEditionNumber( str( row[ 2 ] ) )
50 book.setCopyright( row[ 3 ] )
51 book.setPublisherID( str( row[ 4 ] ) )
52 book.setImageFile( row[ 5 ] )
53 book.setPrice( str( row[ 6 ] ) )
54
55 allBooks.append( book )
56
57 session[ "titles" ] = allBooks
58
59 # genereate XML
60 document = implementation.createDocument( None, None, None )
Fig. 23.8 allBooks.py returns to the client a document containing the book list
(part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1029 Friday, August 31, 2001 1:47 PM
Fig. 23.8 allBooks.py returns to the client a document containing the book list
(part 3 of 3).
Lines 15–19 load the session. If the session ID is not specified in the query string or if
the specified ID is invalid, the user is redirected to the error message that displays in
error.py.
Lines 22–24 prepare the mySQL statement that allBooks uses to query the cat-
alog database. Lines 30–37 then connect to the database and retrieve the list of books. If
an error occurs, the user is redirected to error.py and the program exits (lines 39–40).
Lines 45–55 create a Book object is created for each book in the database, its attributes
are set and appended to list allBooks (lines 45–55). Note that the edition number, pub-
lisher ID, and price attributes must first be converted to strings. This is because the values
are stored as integer and float values in the database; however, each Book’s getXML
method creates a TextNode for each of these attributes and createTextNode only
accepts strings. Line 57 stores the list of Book objects in the session dictionary with key
titles.
We then create an XML Document representing the entire catalog of books. Line 60
uses the createDocument method of xml.dom.DOMImplementation.imple-
mentation to create a blank DOM Document called document. Document method
createElement creates the catalog element (line 61). Line 62 appends the cat-
alog element to document. Lines 65–66 retrieve the product element for each book
and use method appendChild to append the element to catalog.
A client-specific XSLT stylesheet processes the XML Document (lines 69–73). An
XSLT Processor is created (line 69) and retrieves the XSLT stylesheet called all-
Books.xsl (line 70). Note that the copy of allBooks.xsl opened is the one found in
the directory named after the client type. This ensures that the XSLT stylesheet will trans-
form our XML Document into a format that is accepted by various clients. Line 71
appends the stylesheet to the list of stylesheets the processor may use. The session ID must
be inserted into the stylesheet first, because the ID is not contained in the XML Document
that the stylesheet will transform. Lines 72–73 run the processor on document and close
the stylesheet file, respectively. We then display the transformed XML to the client. Line
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1030 Friday, August 31, 2001 1:47 PM
76 creates the string that contains the content type specification and the processor results.
Lines 77 and 78 save the session and display the page to the user.
Figure 23.9 contains the XSLT stylesheet used to transform the XML catalog repre-
sentation into XHTML. The resulting XHTML document is shown in the screen capture in
of Fig. 23.9.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1031 Friday, August 31, 2001 1:47 PM
46 select = "editionNumber"/>e</strong>
47 </a><br/>
48
49 </xsl:template>
50 </xsl:stylesheet>
Fig. 23.9 allBooks.xsl for an HTML client type which transforms the XML
representation of the catalog into XHTML.
color is represented by the hexadecimal number #b0c4de. Line 3 defines class .bold to
apply bold font weight to text. Lines 4–7 define class .bigFont with four CSS attributes.
Elements to which this class is applied appear in the bold, Helvetica font which is double
the size of the base-text font. The color of the font is dark blue (represented by the hexa-
decimal number #00008b). If Helvetica font is not available, the browser will attempt to
use Arial, then the generic font sans-serif as a last resort. Class .italic applies
italic font style to text (line 8). Class .right right justifies text (line 9). Lines 10–11 indi-
cate that all table, th (table head data) and td (table data) elements should have a three-
pixel, grooved border with five pixels of internal padding between the text in a table cell
and the border of that cell. Lines 12–14 indicate that all table elements should have bright
blue background color (represented by the hexadecimal number #6495ed), and that all
table elements should use automatically determined margins on both their left and right
sides. This causes the table to be centered on the page. Not all of these styles are used in
every XHTML document. However, using a single linked style sheet allows us to change
the look and feel of our store quickly and easily by modifying the CSS file. For more infor-
mation on CSS see Chapter 28.
Portability Tip 23.2
Different browsers have different levels of support for Cascading Style Sheets. 23.2
Fig. 23.10 Shared cascading style sheet (styles.css) used to apply common
formatting across XHTML documents rendered on the client.
1 #!c:\Python\python.exe
Fig. 23.11 displayBook.py converts the XML representation of the selected book
to a client-specific format using an XSLT stylesheet (part 1 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1033 Friday, August 31, 2001 1:47 PM
Fig. 23.11 displayBook.py converts the XML representation of the selected book
to a client-specific format using an XSLT stylesheet (part 3 of 3).
If the ISBN has not been specified, the user is forwarded to error.py (line 17). Oth-
erwise, displayBook loads the session. If successful, displayBook obtains the list of
Books from variable session (line 27). Line 28 sets the session dictionary key
bookToAdd to value None, indicating that the specified ISBN has not yet been found in
the list of Books stored in variable titles.
Lines 31–35 iterate over titles, searching for a Book with the correct ISBN (spec-
ified in the query string). If a book is found that has the specified ISBN, session attribute
bookToAdd is set to the matching Book object and the loop terminates.
Line 38 checks whether a matching book has been found. If not, the user is redirected
to error.py (line 59). Otherwise, lines 41–55 execute. Line 41 creates a new XML Doc-
ument. Lines 42–43 append the product element of the matching Book to the Docu-
ment, using the appendChild method. Lines 46–51 process the XML Document
against a client-specific XSLT stylesheet called displayBook.xsl. The correct
stylesheet resides in the subfolder of the current directory named after the client type. Note
that we must format the stylesheet, inserting the session ID, before processing. We then dis-
play the results to the client and save the session (lines 54–55).
Figure 23.12 contains the displayBook.xsl style sheet file used in the XSLT
transformation. The values of six elements in the XML document are placed in the resulting
XHTML document. The resulting XHTML document is shown in the screen capture at the
end of Fig. 23.12.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1035 Friday, August 31, 2001 1:47 PM
19
20 <!-- obtain book title from script to place in title -->
21 <title><xsl:value-of select = "title"/></title>
22
23 <link rel = "stylesheet" href = "/bookstore/styles.css"
24 type = "text/css" />
25 </head>
26
27 <body>
28 <p class = "bigFont"><xsl:value-of select = "title"/></p>
29
30 <table>
31 <tr>
32 <!-- create table cell for product image -->
33 <td rowspan = "5"> <!-- cell spans 5 rows -->
34 <img src = "/bookstore/images/{ imageFile }"
35 alt = "{ title }" />
36 </td>
37
38 <!-- create table cells for price in row 1 -->
39 <td class = "bold">Price:</td>
40
41 <td><xsl:value-of select = "price"/></td>
42 </tr>
43
44 <tr>
45
46 <!-- create table cells for ISBN in row 2 -->
47 <td class = "bold">ISBN #:</td>
48
49 <td><xsl:value-of select = "isbn"/></td>
50 </tr>
51
52 <tr>
53
54 <!-- create table cells for edition in row 3 -->
55 <td class = "bold">Edition:</td>
56
57 <td><xsl:value-of select = "editionNumber"/></td>
58 </tr>
59
60 <tr>
61
62 <!-- create table cells for copyright in row 4 -->
63 <td class = "bold">Copyright:</td>
64
65 <td><xsl:value-of select = "copyright"/></td>
66 </tr>
67
68 <tr>
69
70 <!-- create Add to Cart button in row 5 -->
71 <td>
Fig. 23.12 XSLT stylesheet that transforms a book’s XML representation into an XHTML
document.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1036 Friday, August 31, 2001 1:47 PM
Fig. 23.12 XSLT stylesheet that transforms a book’s XML representation into an XHTML
document.
Lines 21 and 28 place the book’s title in the document’s title element and in a
paragraph at the beginning of the document’s body element, respectively. Line 34 speci-
fies an img element that holds the value of the imageFile element of an XML docu-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1037 Friday, August 31, 2001 1:47 PM
ment. This element specifies the name of the file representing the book’s cover image. Line
35 specifies the alt attribute of the img element using the book’s title. Lines 41, 49,
57 and 65 place the book’s price, isbn, editionNumber and copyright in table
cells, respectively. Lines 72–75 and lines 80-82 create Add to Cart (addToCart.py)
and View Cart (viewCart.py) buttons, respectively. Both buttons use the POST form
method to pass the session ID to their target file.
Fig. 23.13 CartItems contain an item and the quantity of an item in the
shopping cart.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1038 Friday, August 31, 2001 1:47 PM
1 #!c:\Python\python.exe
2 # Fig. 23.13: addToCart.py
3 # Create new/update CartItem for selected Book object
4
5 import sys
6 import Session
7 import CartItem
8
9 # load Session
10 try:
11 session = Session.Session()
12 except Session.SessionError, message: # invalid/no session ID
13 Session.redirect( "error.py?message=%s" % message )
14 sys.exit()
15
16 book = session[ "bookToAdd" ]
17 bookISBN = book.getISBN()
18 cart = session[ "cart" ]
19 alreadyInCart = 0 # book has not been found in cart
20
21 # determine if book is in cart
22 for isbn in cart.keys():
23
24 if isbn == bookISBN:
25 alreadyInCart = 1
26 cartItem = cart[ isbn ]
27 break
28
29 # if book is already in cart, update quantity
30 if alreadyInCart:
31 cartItem.setQuantity( cartItem.getQuantity() + 1 )
32
33 # otherwise, create and add a new CartItem to cart
34 else:
35 cart[ book.getISBN() ] = CartItem.CartItem( book, 1 )
36
37 # update cart attribute
38 session[ "cart" ] = cart
39
40 # send user to viewCart.py
41 nextPage = "viewCart.py?ID=%s" % session[ "ID" ]
42 session.saveSession() # save Session data
43 Session.redirect( nextPage )
Fig. 23.14 addToCart.py places an item in the shopping cart and invokes
viewCart.py to display the cart contents.
The program first obtains the Session object for the current client (lines 10–14). If
a session does not exist for this client, a RequestDispatcher forwards the request to
error.py (line 13). Otherwise, line 16 obtains the value of session attribute book-
ToAdd—the Book representing the book to add to the shopping cart. Lines 17 obtains this
Book’s ISBN. Line 18 obtains the value of session attribute cart—the dictionary that
represents the shopping cart. Lines 22–27 locate the CartItem for the book being added
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1039 Friday, August 31, 2001 1:47 PM
to the cart. If the shopping cart already contains an item for the specified book, line 31
increments the quantity for that CartItem. Otherwise, line 35 creates a new CartItem
with a quantity of 1 and puts the item into the shopping cart, keyed by the book ISBN. Line
38 sets cart session attribute to reference the dictionary cart. Then, lines 41-43 forward
the user to viewCart.py to display the cart contents.
1 #!c:\Python\python.exe
2 # Fig. 23.14: viewCart.py
3 # Generate XML representing cart, convert
4 # to required format using client-specific XSLT
5 # stylesheet and display results.
6
7 import sys
8 import Session
9 from xml.xslt import Processor
10 from xml.dom.DOMImplementation import implementation
11
12 # load Session
13 try:
14 session = Session.Session()
15 except Session.SessionError, message: # invalid/no session ID
16 Session.redirect( "error.py?message=%s" % message )
17 sys.exit()
18
19 cart = session[ "cart" ]
20 total = 0 # total for all ordered items
21
22 # generate XML representing cart object
23 document = implementation.createDocument( None, None, None )
24 cartNode = document.createElement( "cart" )
25 document.appendChild( cartNode )
26
27 # add XML representation for each cart item
28 for item in cart.values():
29
30 # get book data, calculate subtotal and total
31 book = item.getItem()
32 quantity = item.getQuantity()
33 price = float( book.getPrice() )
34 subtotal = quantity * price
35 total += subtotal
36
37 # create an orderProduct element
38 orderProduct = document.createElement( "orderProduct" )
Fig. 23.15 viewCart.py obtains the shopping cart and outputs a document with
the cart contents in tabular format.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1040 Friday, August 31, 2001 1:47 PM
39
40 # create a product element and append to orderProduct
41 productNode = book.getXML( document )
42 orderProduct.appendChild( productNode )
43
44 # create a quantity element and append to orderProduct
45 quantityNode = document.createElement( "quantity" )
46 quantityNode.appendChild( document.createTextNode( "%d" %
47 quantity ) )
48 orderProduct.appendChild( quantityNode )
49
50 # create a subtotal element and append to orderProduct
51 subtotalNode = document.createElement( "subtotal" )
52 subtotalNode.appendChild( document.createTextNode( "%.2f" %
53 subtotal ) )
54 orderProduct.appendChild( subtotalNode )
55
56 # append orderProduct to cartNode
57 cartNode.appendChild( orderProduct )
58
59 # set the total attribute of cart element
60 cartNode.setAttribute( "total", "%.2f" % total )
61
62 # make current total a session attribute
63 session[ "total" ] = total
64
65 # process generated XML against XSLT stylesheet
66 processor = Processor.Processor()
67 style = open( session[ "agent" ] + "/viewCart.xsl" )
68 processor.appendStylesheetString( style.read() % ( session[ "ID" ],
69 session[ "ID" ] ) )
70 results = processor.runNode( document )
71 style.close()
72
73 # display content type and processed XML
74 pageData = session[ "content type" ] + results
75 session.saveSession() # save Session data
76 print pageData
Fig. 23.15 viewCart.py obtains the shopping cart and outputs a document with
the cart contents in tabular format.
We first load the session (lines 13–17). If an error occurs, the client is redirected to
error.py. Line 19 obtains the shopping cart attribute of the session. We then create a
new XML Document and append a cart element to Document (lines 23–25).
Lines 28–57 compute the total of the items in the cart. Lines 31, 32 and 33 retrieve the
Book object, the quantity and the price from the CartItem, respectively. Line 34 calcu-
lates the subtotal for the CartItem. Line 35 updates the total cost of all cart items. Line
38 creates an XML orderProduct element for each item in the cart.
Each orderProduct element contains 3 children elements: product, quantity
and subtotal. We first retrieve and append the product child of orderProduct
(lines 41–42). Lines 45–48 then create and append the quantity element. Note that the
quantity of the current CartItem must be formatted to a string before creating the ele-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1041 Friday, August 31, 2001 1:47 PM
ment. Lines 51-54 create and append the subtotal child of orderProduct. The
subtotal element contains the subtotal of the current CartItem, formatted to two dec-
imal places. Line 57 appends the current orderProduct to the cart element.
When an orderProduct element has been created and appended to the cart ele-
ment for each CartItem, the total attribute of the cart element is then set (line 60).
Line 63 stores the current sales total in the total attribute of the session. Lines 66–71 pro-
cess the XML Document against a client-specific XSLT stylesheet (viewCart.xsl).
Note that session ID must once again be inserted into the stylesheet before processing.
Lines 74–76 save the session and display the translated XML to the client.
Figure 23.16 contains the viewCart.xsl style sheet file used in the XSLT transfor-
mation for an html client. The resulting XHTML document is shown in the screen capture
at the end of Fig. 23.16.
38 <th>Total</th>
39 </tr>
40
41 <xsl:apply-templates select = "orderProduct">
42
43 <!-- sort orderProducts by product/title -->
44 <xsl:sort select = "product/title"/>
45
46 </xsl:apply-templates>
47
48 <tr>
49 <td colspan = "4"
50 class = "bold right">Total: <xsl:value-of
51 select = "@total"/></td>
52 </tr>
53 </table>
54
55 </xsl:otherwise>
56 </xsl:choose>
57
58 <p class = "bold green">
59 <a href = "allBooks.py?ID=%s">Continue Shopping</a>
60 </p>
61
62 <form method = "post" action = "order.py?ID=%s">
63 <p><input type = "submit" value = "Check Out" /></p>
64 </form>
65
66 </body>
67 </html>
68 </xsl:template>
69
70 <xsl:template match = "orderProduct">
71
72 <tr>
73 <td><xsl:value-of select = "product/title"/>,
74 <xsl:value-of select = "product/editionNumber"/>e</td>
75 <td><xsl:value-of select = "quantity"/></td>
76 <td class = "right"><xsl:value-of select =
77 "product/price"/></td>
78 <td class = "bold right"><xsl:value-of select =
79 "subtotal"/></td>
80 </tr>
81
82 </xsl:template>
83 </xsl:stylesheet>
Fig. 23.16 XSLT stylesheet that transforms a cart’s XML representation into an XHTML
document.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1043 Friday, August 31, 2001 1:47 PM
Fig. 23.16 XSLT stylesheet that transforms a cart’s XML representation into an XHTML
document.
The first xsl:template (lines 12–68) matches cart elements. Line 26 begins an
xsl:choose element. If cart attribute (denoted by @) total is equal to "0.00",
lines 28–29 execute. Lines 28 and 29 display a message to the client indicating the shopping
cart is currently empty. If, however, total is not "0.00", lines 33-54 are executed, cre-
ating a table for all the items in the cart.
Lines 41–46 insert all matches to the orderProduct template, sorted by their
product/title element. The orderProduct template (lines 70–82) matches
orderProduct elements. Lines 73–79 insert the orderProduct’s product/
title, product/editionNumber, quantity, product/price and sub-
total in table cells. Lines 49–51 then insert a table row displaying the total for all items.
We then create two options for the user. The first is a hyperlink that points to all-
Books.py (line 59). The second is a Check Out button that takes the user to order.py
(lines 62-64).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1044 Friday, August 31, 2001 1:47 PM
validation of form elements or a combination of both. When the user presses the button, the
browser requests process.py to finalize the book order.
1 #!c:\Python\python.exe
2 # Fig. 23.16: order.py
3 # Display order form to get information from customer
4
5 import sys
6 import Session
7
8 # load Session
9 try:
10 session = Session.Session()
11 except Session.SessionError, message: # invalid/no session ID
12 Session.redirect( "error.py?message=%s" % message )
13 sys.exit()
14
15 # display content type and orderForm for specific client-type
16 content = open( "%s/orderForm.%s" % ( session[ "agent" ],
17 session[ "extension" ] ) )
18 pageData = session[ "content type" ] + content.read() % \
19 session[ "ID" ]
20 content.close()
21
22 session.saveSession() # save Session data
23 print pageData
Fig. 23.17 order.py retrieves, formats and displays a static order form page to the
client.
Lines 9–13 first load the session. If an error occurs, the client is forwarded to
error.py. Line 16 opens the client-specific order form. Note that for convenience, the
directory name is the same as the file extension. Lines 18–19 create the string that contains
the client content type and the contents of orderForm, formatted with the session ID. The
session then saves and the order form displays (lines 22–23).
Figure 23.18 shows orderForm.html, the order form displayed by order.py to
HTML clients. The resulting XHTML document is displayed in the screenshot below.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1045 Friday, August 31, 2001 1:47 PM
12
13 <link rel = "stylesheet" href = "/bookstore/styles.css"
14 type = "text/css" />
15
16 </head>
17
18 <body>
19 <p class = "bigFont">Shopping Cart Check Out</p>
20
21 <!-- Form to input user information and credit card. -->
22 <!-- Note: No need to input real data in this example. -->
23 <form method = "post" action = "process.py?ID=%s">
24 <p style = "font-weight: bold">
25 Please input the following information</p>
26
27 <table>
28 <tr>
29 <td class = "right bold">First name:</td>
30
31 <td>
32 <input type = "text" name = "firstname"
33 size = "25" />
34 </td>
35 </tr>
36 <tr>
37 <td class = "right bold">Last name:</td>
38
39 <td>
40 <input type = "text" name = "lastname"
41 size = "25" />
42 </td>
43 </tr>
44 <tr>
45 <td class = "right bold">Street:</td>
46
47 <td>
48 <input type = "text" name = "street"
49 size = "25" />
50 </td>
51 </tr>
52 <tr>
53 <td class = "right bold">City:</td>
54
55 <td>
56 <input type = "text" name = "city"
57 size = "25" />
58 </td>
59 </tr>
60 <tr>
61 <td class = "right bold">State:</td>
62
63 <td>
64 <input type = "text" name = "state"
Fig. 23.18 orderForm.html is the order form displayed by order.py for html
clients.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1046 Friday, August 31, 2001 1:47 PM
Fig. 23.18 orderForm.html is the order form displayed by order.py for html
clients.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1047 Friday, August 31, 2001 1:47 PM
Fig. 23.18 orderForm.html is the order form displayed by order.py for html
clients.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1048 Friday, August 31, 2001 1:47 PM
firmed by the credit-card company. Figure 23.20 shows file thankYou for an HTML cli-
ent. The resulting XHTML document is displayed in the screenshot below.
1 #!c:\Python\python.exe
2 # Fig. 23.18: process.py
3 # Display thank you page to customer and delete session
4
5 import sys
6 import Session
7
8 # load session
9 try:
10 session = Session.Session()
11 except Session.SessionError, message: # invalid/no session ID
12 Session.redirect( "error.py?message=%s" % message )
13 sys.exit()
14
15 # display content type and thankYou for specific client-type
16 content = open( "%s/thankYou.%s" % ( session[ "agent" ],
17 session[ "extension" ] ) )
18 pageData = session[ "content type" ] + content.read() % \
19 session[ "total" ]
20 content.close()
21
22 # delete session because processing is complete
23 session.deleteSession()
24 print pageData
Fig. 23.19 process.py retrieves, formats and displays a static thank you page to
the client.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1049 Friday, August 31, 2001 1:47 PM
Fig. 23.20 thankYou.html is the exit page displayed by process.py for HTML
clients.
1 #!c:\Python\python.exe
2 # Fig. 23.20: error.py
3 # Generate XML error message and display to user
4 # using client-specific XSLT stylesheet.
5
6 import cgi
7 import Session
8 from xml.xslt import Processor
9 from xml.dom.DOMImplementation import implementation
10
11 form = cgi.FieldStorage()
12
13 if form.has_key( "message" ):
14
Fig. 23.21 error.py displays a dynamically created error page.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1050 Friday, August 31, 2001 1:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_23.fm Page 1051 Friday, August 31, 2001 1:47 PM
21 <body>
22
23 <p class = "bigFont">Error message:</p>
24
25 <p class = "bold">
26 <xsl:value-of select = "message"/>
27 </p>
28
29 </body>
30 </html>
31 </xsl:template>
32 </xsl:stylesheet>
Fig. 23.22 XSLT stylesheet that transforms the XML representation of an error into an
XHTML document.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/31/01
pythonhtp1_24.fm Page 1069 Wednesday, August 29, 2001 4:23 PM
24
Multimedia
Objective
• To introduce multimedia applications in Python.
• To understand how to create 3D objects with module
PyOpenGL.
• To manipulate Alice 3D objects .
• To create a CD player with module pygame.
• To use module pygame to create a 2D Space Cruiser
game.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1070 Wednesday, August 29, 2001 4:23 PM
Outline
24.1 Introduction
24.2 Introduction to PyOpenGL
24.3 PyOpenGL examples
24.4 Introduction to Alice
24.5 Fox, Chicken and Seed Problem
24.6 Introduction to pygame
24.7 Python CD Player
24.8 Pygame Space Cruiser
24.9 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
24.1 Introduction
In addition to its many other capabilities, Python allows programmers to create interactive
multimedia applications. It is increasingly important for programmers to be able to create
multimedia components. We provide examples using PyOpenGL and Alice.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1071 Wednesday, August 29, 2001 4:23 PM
Tkinter as the OpenGL context, the program structure is similar to programs found in
Chapters 10 and 11.
1 #!c:\Python\python.exe
2 # A colored, rotating box (with open top and bottom)
3
4 from Tkinter import *
5 from OpenGL.GL import *
6 from OpenGL.Tk import *
7
8 class ColorBox( Frame ):
9 """A colored, rotating box"""
10
11 def __init__( self ):
12 """Initialize GUI and OpenGL"""
13
14 Frame.__init__( self )
15 self.master.title( "Color Box" )
16 self.master.geometry( "300x300" )
17 self.pack( expand = YES, fill = BOTH )
18
19 # create and pack Opengl -- use double buffering
20 self.openGL = Opengl( self, double = 1 )
21 self.openGL.pack( expand = YES, fill = BOTH )
22
23 self.openGL.redraw = self.redraw # set redraw function
24 self.openGL.set_eyepoint( 20 ) # move away from object
25
26 self.amountRotated = 0 # alternate rotating left/right
27 self.increment = 2 # rotate amount
28 self.update() # begin rotation
29
30 def redraw( self, openGL ):
31 """Draw box on black background"""
32
33 # clear background and disable lighting
34 glClearColor( 0.0, 0.0, 0.0, 0.0 )
35 glClear( GL_COLOR_BUFFER_BIT ) # select clear color
36 glDisable( GL_LIGHTING ) # paint background
37
38 # constants
39 red = ( 1.0, 0.0, 0.0 )
40 green = ( 0.0, 1.0, 0.0 )
41 blue = ( 0.0, 0.0, 1.0 )
42 purple = ( 1.0, 0.0, 1.0 )
43
44 vertices = \
45 [ ( ( -3.0, 3.0, -3.0 ), red ),
46 ( ( -3.0, -3.0, -3.0 ), green ),
47 ( ( 3.0, 3.0, -3.0 ), blue ),
48 ( ( 3.0, -3.0, -3.0 ), purple ),
49 ( ( 3.0, 3.0, 3.0 ), red ),
50 ( ( 3.0, -3.0, 3.0 ), green ),
Fig. 24.1 Using Opengl with Tkinter context.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1072 Wednesday, August 29, 2001 4:23 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1073 Wednesday, August 29, 2001 4:23 PM
Line 83 creates an instance of class ColorBox (lines 11–80) and enters its main-
loop. The ColorBox constructor (lines 11–28) first initializes the window (lines 14–17).
Lines 20–21 create and pack an Opengl component—openGL—which is used to render
the OpenGL objects. openGL attribute double is set to 1 to ensure that double buffering
is used. With double buffering, OpenGL maintains two screen buffers—one to display and
one to update. When the display is updated, the two buffers are simply switched. This
ensures that the user does see the screen being updated (which can cause a choppy display).
Line 23 sets openGL’s redraw method. This method, redraw, will be called when
the scene must be redrawn (i.e., something has changed). Method redraw (lines 30–65)
draws the box on the background. Line 34 calls PyOpenGL function glClearColor to
specify the color which will be used by function glClear (line 35). Colors are represented
by a three-element tuple or four-element tuple in the form ( R, G, B ) and ( R, G, B, A ),
respectively. R, G, B and A stand for red, green, blue and alpha (transparency). Possible
values are decimal values between 0.0 (none) and 1.0 (full). By combining different values,
different colors are achieved; The representation for black is ( 0.0, 0.0, 0.0, 0.0 ). Lines 39–
42 define some other colors. Line 35 calls PyOpenGL function glClear to color the
background with the previously selected color—black (line 34). The value passed to
glClear—GL_COLOR_BUFFER_BIT—specifies that the color specified should be
used to color the background. Line 36 calls PyOpenGL function glDisable to disable
lighting (GL_LIGHTING) for this example.
Lines 44–54 create a list of a vertices which define the box. Each element of the list
contains a vertex location and designated color. Lines 56–64 draw the box. Line 56 calls
PyOpenGL function glBegin with argument GL_QUAD_STRIP. This ensures that any
points defined before a subsequent call to function glEnd (line 64) will be connected by a
strip of polygons. For other acceptable values, review OpenGL documentation. In
PyOpenGL, three-dimensional points are defined with function glVertex3f. Line 60
obtains the vertex location and color for each vertex. Line 61 uses function apply to call
PyOpenGL function glColor3f to change the current drawing color. glColor3f
takes as arguments three floating-point numbers representing an RGB color. Line 62 then
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1074 Wednesday, August 29, 2001 4:23 PM
calls function glVertex3f to draw a point in three-dimensional space. The color of the
point is the color specified by glColor3f (line 61). Because each vertex has a unique
color, PyOpenGL will interpolate between the colors. Line 64 calls PyOpenGL function
glEnd, ending the GL_QUAD_STRIP. Finally, line 65 calls PyOpenGL function
Enable to re-enable lighting.
Line 24 calls Opengl method set_eyepoint. This method moves the camera
away from the scene by a specified amount. Lines 26–27 initialize variables amountRo-
tated and increment. These values will be used to control the rotation of the box.
Finally, line 28 invokes method update.
Method update (lines 67–80) rotates the box. Lines 70–73 alter the rotational direc-
tion, represented by variable increment. Method glRotate (line 76) accepts four
parameters. The first parameter, in this case variable increment, sets the angle of rota-
tion. The last three floating-point numbers are the coordinates around which the shape
rotates. Line 77 increments variable amountRotated, which keeps track of how much
the box has been rotated. The call to method tkRedraw (line 79) causes the Opengl com-
ponent to be redrawn with the rotated shape. Method after (line 80) takes 10 and method
update as parameters. As a result, mainloop schedules update to be called every 10ms.
Figure 24.2 demonstrates several methods of the OpenGL.GLUT module that create
three-dimensional shapes. Module GLUT is the GL Utilities toolkit. The example creates a
GUI that allows the user to preview colors and shapes.
1 #!c:\Python\python.exe
2 # Demonstrating various GLUT shapes
3
4 from Tkinter import *
5 import Pmw
6 from OpenGL.GL import *
7 from OpenGL.Tk import *
8 from OpenGL.GLUT import *
9
10 class ChooseShape( Frame ):
11 """Allow user to preview different shapes and colors"""
12
13 def __init__( self ):
14 """Create GUI with MenuBar"""
15
16 Frame.__init__( self )
17 Pmw.initialise()
18 self.master.title( "Choose a shape and color" )
19 self.master.geometry( "300x300" )
20
21 # initialize openGL
22 self.openGL = Opengl( double = 1 ) # use double-buffering
23 self.openGL.redraw = self.redraw # set redraw function
24 self.openGL.pack( expand = YES, fill = BOTH )
25 self.openGL.set_eyepoint( 20 ) # move away from object
26 self.openGL.autospin_allowed = 1 # allow auto-spin
27
28 # create and pack MenuBar
Fig. 24.2 Creating various shapes with GLUT.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1075 Wednesday, August 29, 2001 4:23 PM
83 color = self.selectedColor.get()
84 apply( glColor3f, self.colors[ color ] )
85
86 # obtain and draw selected shape
87 shape = self.selectedShape.get()
88 apply( eval( shape ), self.shapes[ shape ] )
89
90 glEnable( GL_LIGHTING ) # re-enable lighting
91
92 def main():
93 ChooseShape().mainloop()
94
95 if __name__ == "__main__":
96 main()
Line 93 creates an instance of class ChooseShape (lines 10–90) and enters its
mainloop. Lines 22–25 of the constructor create and pack an Opengl component in the
same way as Fig. 24.1. Line 26 sets allow_autospin to 1. As a result, the user can
cause a shape to rotate continuously by holding down the middle mouse button, dragging
it in the direction of the rotation and releasing it.
Dictionary shapes (lines 35–44) contains GLUT shapes as its keys. The values are
possible arguments to be passed to the methods which are named after the shapes. Methods
glutWireCube and glutSolidCube (lines 35–36) accept the length of the cube’s
side as a parameter—3 in this case. Methods glutWireIcosahedron and glutSo-
lidIcosahedron (lines 37–38) accept no parameters and create a 20-sided shape with
a radius of 1.0. Methods glutWireCone and glutSolidCone (lines 39–40) accept
four parameters—the base, the height, the number of slices and the number of stacks, i.e.,
the number of subdivisions of the cone, along the third axis. Methods glutWireTorus
and glutSolidTorus (lines 41–42) accept four parameters as well. The first two
specify the inner and outer radii of the doughnut shape. The last two arguments specify the
number of sides in each section and the number of divisions in each section. Methods
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1077 Wednesday, August 29, 2001 4:23 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1078 Wednesday, August 29, 2001 4:23 PM
are left on the shore alone. For the same reason, the chicken can not be left alone with the
flower.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1081 Wednesday, August 29, 2001 4:23 PM
The initial scene was created in the Alice world, using predefined objects. Alice Lid-
dell is attached to the boat. The chicken, fox and flower are initially placed next to the boat
on the same shore. All other items are inserted merely for decoration. The movements can
be controlled using the buttons on controlPanel (line 139). The menu allows the user
to move the objects in and out of boat and to send Alice across the river. Alice generates
the comment in line 1. Code automatically generated by Alice is placed above this com-
ment.
Lines 5–6 continuously point the camera at Alice Liddell. This loop ensures that the
camera follows Alice Liddell as she moves on the boat. Alice adds a loop to the list of
currently running animations and the loop runs until explicitly stopped.
Method AnimateWithPause (lines 9–12) combines two animations into one loop.
The animations run concurrently and then pause for a given time. This method animates
fish movement. Lines 14–21 create the animations for two jumping fish.
Lines 24–29 create lists and initialize them to the starting values. These lists keep track
of the objects on the shores and on the boat. Variable selected (line 30) holds the cur-
rently selected object. Method animalSelect (lines 33–36) is a callback for the radio
buttons allowing the user to select an object. Method ObjectInBoat (lines 39–43)
moves a given object into the boat. Line 41 sets Object.Stop as the response to collision
with the deck of the FishBoat. Lines 42–43 move the object above the deck and move it
down toward a collision with the deck.
Method ObjectOutOfBoat (lines 46–51) moves a given object out of the boat to
the shore. The boat movement is symmetric so there is no need to distinguish between the
shores when moving the object. Line 48 sets the response to collision with the ground to
Object.Stop. Line 49 displaces the object based on the name length so that the objects
land in different positions on the shore. Line 50–51 move the object back and down accord-
ingly.
Method getIntoBoat (lines 55–61) checks whether a currently selected object can
be moved into the boat. If the object can be moved into the boat, the method performs the
necessary adjustment to the lists. The call ObjectInBoat.eval( selected ) (line
61) returns the object associated in Python environment with selected. Line 57 checks
whether selected is on the current bank of the river. Line 58 checks whether the boat is
empty and whether the animation has finished moving the boat across the river (method
boatArrived). Lines 59–60 move the selected object from the currentBank list to
theBoat list.
Method getOutOfBoat (lines 64–69) checks if a currently selected object can be
moved to the shore from the boat. If the object can be moved to the shore, the method per-
forms the necessary adjustment to the lists and calls ObjectOutOfBoat. Line 66 checks
whether selected is in the boat and whether the boat has arrived at the shore. Lines 67–
68 move the selected object from list theBoat to list currentBank.
Method finishGame (lines 72–79) cleans up once the player loses or wins the game.
Line 74 destroys the controlPanel, and line 75 stops the camera animation that follows
the boat. Line 76 displays the variable final, the result of the game. Lines 77–79 display
the two animation parameters and then points the camera at final.
Lines 82–103 define method checkRules. Lines 84–85 declare empty animations.
Lines 87–95 check if one of the rules has been violated and change Animation1 and
Animation2 accordingly. If one of the conditions has been violated, lines 97–99 call
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1082 Wednesday, August 29, 2001 4:23 PM
finishGame with GAMEOVER as the result parameter. Lines 101–103 call finish-
Game with CONGRATULATIONS as a parameter if all objects were successfully trans-
ported to the other shore.
Method toOtherShore moves the boat between the river shores. Lines 108–109
returns without changing anything if the boat is in transit. Lines 111 makes global variables
accessible within the method. Lines 113–116 checks if there is an object on the boat and if
there is, animate that object with the boat. An object on the boat is not a part of the boat, so
a separate animation is created to synchronize the object with the boat.
Lines 119–121 create the animation that moves the boat to the other shore. Line 121
sets an alarm so that, a second after the boat leaves the shore, the program checks whether
the rules have been violated. Alarms are timed events in Alice. Alice.SetAlarm takes
the time to wait until setting off an alarm and a function to call at that time. Optionally,
parameters for that function can be provided. Lines 123–126 switch the current bank
pointer to the other shore.
Method boatArrived checks the status of the boat. If the boat is moving across the
river, this method returns 0, otherwise it returns 1. This is done using two period objects at
two sides of the river. These objects are placed where Alice is located when at the shore
and by checking the distance between them we are able to determine if she arrived.
Line 134 creates the controlPanel for user input. Lines 141–142 create the set of
radio buttons using the list of the objects at the bank and the callback animalSelect.
Lines 143–146 create the buttons for getting the selected object in and out of the boat. Lines
147–148 create a button that sends the boat across the river. Lines 150–152 set the callbacks
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1083 Wednesday, August 29, 2001 4:23 PM
for these buttons. Finally, lines 155–156 set the initial selection to "Fox". Figure 24.4
demonstrates what the example world looks like.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1084 Wednesday, August 29, 2001 4:23 PM
1 #!c:\Python\python.exe
2 # CDPlayer.py: A simple CD player using Tkinter and pygame
3
4 import sys
5 import string
6 import pygame, pygame.cdrom
7 from Tkinter import *
8 from tkMessageBox import *
9 import Pmw
10
11 class CDPlayer( Frame ):
12 """A GUI CDPlayer class using Tkinter and pygame"""
13
14 def __init__( self ):
15 """Initialize pygame.cdrom and get CDROM if one exists"""
16
17 pygame.cdrom.init()
18
19 if pygame.cdrom.get_count() > 0:
20 self.CD = pygame.cdrom.CD( 0 )
21 else:
22 sys.exit( "There are no available CDROM drives." )
23
24 self.createGUI()
25 self.updateTime()
26
27 def destroy( self ):
28 """Stop CD, uninitialize pygame.cdrom and destroy GUI"""
29
30 if self.CD.get_init():
31 self.CD.stop()
32
33 pygame.cdrom.quit()
34 Frame.destroy( self )
35
36 def createGUI( self ):
37 """Create CDPlayer widgets"""
38
39 Frame.__init__( self )
40 self.pack( expand = YES, fill = BOTH )
41 self.master.title( "CD Player" )
42
43 # display current track playing
Fig. 24.5 Python CD player (part 1 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1085 Wednesday, August 29, 2001 4:23 PM
44 self.trackLabel = IntVar()
45 self.trackLabel.set( 1 )
46 self.trackDisplay = Label( self, font = "Courier 14",
47 textvariable = self.trackLabel, bg = "black",
48 fg = "green" )
49 self.trackDisplay.grid( sticky = W+E+N+S )
50
51 # display current time of track playing
52 self.timeLabel = StringVar()
53 self.timeLabel.set( "00:00/00:00" )
54 self.timeDisplay = Label( self, font = "Courier 14",
55 textvariable = self.timeLabel, bg = "black",
56 fg = "green" )
57 self.timeDisplay.grid( row = 0, column = 1, columnspan = 3,
58 sticky = W+E+N+S )
59
60 # play/pause CD
61 self.playLabel = StringVar()
62 self.playLabel.set( "Play" )
63 self.play = Button( self, textvariable = self.playLabel,
64 command = self.playCD, width = 10 )
65 self.play.grid( row = 1, column = 0, columnspan = 2,
66 sticky = W+E+N+S )
67
68 # stop CD
69 self.stop = Button( self, text = "Stop", width = 10,
70 command = self.stopCD )
71 self.stop.grid( row = 1, column = 2, columnspan = 2,
72 sticky = W+E+N+S )
73
74 # skip to previous track
75 self.previous = Button( self, text = "<<<", width = 5,
76 command = self.previousTrack )
77 self.previous.grid( row = 2, column = 0, sticky = W+E+N+S )
78
79 # skip to next track
80 self.next = Button( self, text = ">>>", width = 5,
81 command = self.nextTrack )
82 self.next.grid( row = 2, column = 1, sticky = W+E+N+S )
83
84 # eject CD
85 self.eject = Button( self, text = "Eject", width = 10,
86 command = self.ejectCD )
87 self.eject.grid( row = 2, column = 2, columnspan = 2,
88 sticky = W+E+N+S )
89
90 # pulldown menu of all tracks on CD
91 self.trackChoices = Pmw.ComboBox( self, label_text = "Track",
92 labelpos = "w", selectioncommand = self.changeTrack,
93 fliparrow = 1, listheight = 100 )
94 self.trackChoices.grid( row = 3, columnspan = 4,
95 sticky = W+E+N+S )
96
97 self.trackChoices.component( "entry" ).config( bg = "grey",
Fig. 24.5 Python CD player (part 2 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1086 Wednesday, August 29, 2001 4:23 PM
206 else:
207 minutes = seconds / 60
208 endMinutes = endSeconds / 60
209 seconds = seconds - ( minutes * 60 )
210 endSeconds = endSeconds - ( endMinutes * 60 )
211
212 # display time in format mm:ss/mm:ss
213 trackTime = string.zfill( str( minutes ), 2 ) + \
214 ":" + string.zfill( str( seconds ), 2 )
215 endTime = string.zfill( str( endMinutes ), 2 ) + \
216 ":" + string.zfill( str( endSeconds ), 2 )
217
218 if self.CD.get_paused():
219
220 # alternate pause symbol and time in display
221 if not self.timeLabel.get() == " || ":
222 self.timeLabel.set( " || " )
223 else:
224 self.timeLabel.set( trackTime + "/" + endTime )
225
226 else:
227 self.timeLabel.set( trackTime + "/" + endTime )
228
229 # call updateTime method again after 1000ms ( 1 second )
230 self.after( 1000, self.updateTime )
231
232 def main():
233 CDPlayer().mainloop()
234
235 if __name__ == "__main__":
236 main()
Line 233 creates a CDPlayer object and enters its mainloop. The CDPlayer con-
structor (lines 14–25) initializes the cdrom module (line 17). The if/else statement in
lines 19–22 checks to see if there are any available CD-ROM drives by invoking cdrom’s
get_count function. Function get_count returns the number of CD-ROMs on the
system. If there is at least one CD-ROM, line 20 instantiates a CD object called CD. The
value passed to the CD constructor is the ID of the CD-ROM. The program uses the first
CD-ROM installed on the system if there is more than one. The constructor receives 0 as
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1089 Wednesday, August 29, 2001 4:23 PM
an argument because the first ID is always 0. The program exits (line 22) if no CD-ROM
exists.
Line 24 invokes method createGUI to create the CD player interface. createGUI
(lines 36–100) creates various GUI components for the CD player and adds them to the dis-
play. Each component’s action will be discussed later. Note that the Label created to dis-
play the track number (trackDisplay) and the Label created to display the current
track time (timeDisplay) both have textvariables—trackLabel and time-
Label—which will be used to update the CD player display. Notice also that Button
play has a textvariable—playLabel—which will be used to change its display
when the CD player is paused or playing. Lines 91–93 create trackChoices, a Pmw
ComboBox which will be used as a "drop-down" box of track choices. Lines 97–100 use
common “mega-widget” method component to customize the colors of the drop-down
box.
Once the GUI has been created, the constructor calls method updateTime (dis-
cussed later) and returns, entering the mainloop. Once here, the GUI components created
can be used.
The Play button has callback method playCD. playCD (lines 102–135) plays or
pauses the CD. Line 106 check if the CD-ROM is initialized by invoking CD method
get_init. If the CD-ROM is not initialized, playCD initializes it and sets current-
Track to 1. currentTrack stores the number of the currently playing track. Line 111
checks if the CD-ROM is empty by invoking CD method get_empty. If the CD-ROM is
empty, line 112 uninitializes the CD-ROM with CD method quit and returns. Otherwise,
line 117 obtains the total number of tracks on the disc from CD method get_numtracks
and stores that value in variable totalTracks. Lines 118–120 then add them to the drop-
down box of track choices (trackChoices) and select the first one (track 1).
Line 123 checks if CD is not playing and not paused with methods get_busy and
get_paused, respectively. If this is the case, playCD invokes CD method play, speci-
fying what track to play. Note that because tracks numbers for a CD object begin with 0 and
people generally believe track numbers begin with 1, the value passed to play is 1 less than
currentTrack. Line 125 sets the Play button to read "| |", a symbol for Paused.
If the CD is playing and not paused, however, lines 129–130 pause the CD (with CD
method pause) and set the Play button to read "Play" again.
If neither condition is met, however, the CD is paused. If this is the case, lines 134 and
135 resume play with method resume and set the Play button to read "| |" once more.
Note that if the CD is currently playing, the Play button reads "| |", and if the CD is cur-
rently paused, the Play button reads "Play".
The Stop button has callback stopCD (lines 137–142). Line 140 checks if CD is ini-
tialized. If so, CD method stop is invoked to stop the CD and the Play button is set to read
"Play" once more. Note that calling stop on a CD which is not playing does nothing.
However, line 140 checks if the CD-ROM is initialized because if it is not, calling stop
generates an error.
The >>> button has callback nextTrack. nextTrack (lines 159–164) skips to the
next track on the CD. If CD is initialized and the current track is not the last one, method
playTrack is invoked, with the next track number specified (currentTrack + 1).
Similarly, the <<< button has callback previousTrack. previousTrack (lines
166–170) skips to the previous track on a CD. If CD is initialized and the current track is
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1090 Wednesday, August 29, 2001 4:23 PM
not the first one, method playTrack is invoked, with the previous track number specified
(currentTrack - 1).
Method playTrack (lines 144–157) plays a specified track of the CD. If the CD is
initialized, line 148 sets currentTrack to the specified track number. Lines 149–150
then set trackLabel to the new track number and select the specified track number from
the dropdown box. If the CD is currently playing another track, line 154 simply plays the
specified track instead. If the CD is paused, however, lines 156–157 begin play of the spec-
ified track and then call method playCD to re-pause the disc.
The dropdown box (trackChoices) has callback method changeTrack. When
the user selects a track number from the listbox, changeTrack (lines 172–178) is
invoked. If CD is initialized, lines 176–177 obtain the index of the selection with Tkinter
ListBox method curselection. Line 178 invokes method playTrack to play the
selected track (index + 1).
The Eject button has callback method ejectCD (lines 180–193). Line 183 displays
a tkMessageBox window which asks the user if the CD should be ejected. This is a safe-
guard against accidental ejection. If the user chooses to eject the CD, CD is initialized (the
CD may not be playing), the disc is ejected with CD method eject and CD is uninitialized
(lines 186–188). Lines 189–193 sets the CD player interface to its initial appearance.
The CD player updates its display with method updateTime, originally called in line
25. updateTime (lines 195–230) updates the CD player display (lines 198–227) and
invokes common widget method after. after registers a callback that is called after a
specified amount of milliseconds. Line 230 ensures that method updateTime is called
every 1000 milliseconds (one second). Line 198 checks if CD is initialized. If not, execution
skips to line 230.
Otherwise, the current number of seconds into the currently playing track is obtained
from CD method get_current and stored in variable seconds (line 199).
get_current returns a two-element tuple of the current track number and the number
of seconds into that track. Lines 200–201 obtain the track length from CD method
get_track_length, specifying the current track (currentTrack - 1). This value is
stored in variable endSeconds. Lines 204–205 ensure that one track plays consecutively
after another until the entire disk has been played. Lines 207–210 use seconds and end-
Seconds to determine the current time and end time in minutes and seconds.
Lines 213–214 create a string for the current track time (trackTime). The string has
the form mm:ss where mm is minutes and ss is seconds. Note that string function zfill
pads the string with zeros so that it occupies the correct number of spaces. This ensures that
minutes or seconds in the range 0–9 (inclusive) result in strings of the same length as other
minute or second values.
Line 218 determines if the CD is paused. If not, timeDisplay is updated to display
the current time (line 227). Otherwise, timeDisplay is updated to either the current time
or a symbol representing pause (lines 221–224). This ensures that the display flashes
between the track time and the pause symbol when paused.
When finished using the CD player, the user destroys the window, invoking the
CDPlayer’s destroy method (lines 27–34). Line 30 checks if CD is initialized. If so,
CD method stop is invoked to stop the CD. If this was not done, the CD would continue
to play after the user destroyed the window. Lines 33–34 uninitialize the pygame cdrom
module and destroys the frame with Frame method destroy.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1091 Wednesday, August 29, 2001 4:23 PM
1 #!c:\Python\python.exe
2 # SpaceCruiser.py: Space Cruiser game using pygame
3
4 import os
5 import sys
6 import random
7 import pygame, pygame.image, pygame.font, pygame.mixer
8 from pygame.locals import *
9
10 class Sprite:
11 """An object to place on the screen"""
12
13 def __init__( self, image ):
14 """Initialize object image and calculate rectangle"""
15
16 self.image = image
17 self.rectangle = image.get_rect()
18
19 def place( self, screen ):
20 """Place the object on the screen"""
21
22 return screen.blit( self.image, self.rectangle )
23
24 def remove( self, screen, background ):
25 """Place the background over the image to remove it"""
26
27 return screen.blit( background, self.rectangle,
28 self.rectangle )
29
30 class Player( Sprite ):
31 """A Player Sprite with 4 different states"""
32
33 def __init__( self, images, crashImage,
34 centerX = 0, centerY = 0 ):
35 """Store all images and set the initial Player state"""
Fig. 24.6 Pygame example (part 1 of 9).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1092 Wednesday, August 29, 2001 4:23 PM
36
37 self.movingImages = images
38 self.crashImage = crashImage
39 self.centerX = centerX
40 self.centerY = centerY
41 self.playerPosition = 1 # start player facing down
42 self.speed = 0
43 self.loadImage()
44
45 def loadImage( self ):
46 """Load Player image and calculate rectangle"""
47
48 if self.playerPosition == -1: # player has crashed
49 image = self.crashImage
50 else:
51 image = self.movingImages[ self.playerPosition ]
52
53 Sprite.__init__( self, image )
54 self.rectangle.centerx = self.centerX
55 self.rectangle.centery = self.centerY
56
57 def moveLeft( self ):
58 """Change Player image to face one position to the left"""
59
60 if self.playerPosition == -1: # player has crashed
61 self.speed = 1
62 self.playerPosition = 0 # move left of obstacle
63 elif self.playerPosition > 0:
64 self.playerPosition -= 1
65
66 self.loadImage()
67
68 def moveRight( self ):
69 """Change Player image to face one position to the right"""
70
71 if self.playerPosition == -1: # player has crashed
72 self.speed = 1
73 self.playerPosition = 2 # move right of obstacle
74 elif self.playerPosition < ( len( self.movingImages ) - 1 ):
75 self.playerPosition += 1
76
77 self.loadImage()
78
79 def decreaseSpeed( self ):
80
81 if self.speed > 0:
82 self.speed -= 1
83
84 def increaseSpeed( self ):
85
86 if self.speed < 10:
87 self.speed += 1
88
89 # player has crashed, start player facing down
Fig. 24.6 Pygame example (part 2 of 9).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1093 Wednesday, August 29, 2001 4:23 PM
90 if self.playerPosition == -1:
91 self.playerPosition = 1
92 self.loadImage()
93
94 def collision( self ):
95 """Change Player image to crashed player"""
96
97 self.speed = 0
98 self.playerPosition = -1
99 self.loadImage()
100
101 def collisionBox( self ):
102 """Return smaller bounding box for collision tests"""
103
104 return self.rectangle.inflate( -20, -20 )
105
106 def isMoving( self ):
107 """Player is not moving if speed is 0"""
108
109 if self.speed == 0:
110 return 0
111 else:
112 return 1
113
114 def distanceMoved( self ):
115 """Player moves twice as fast when facing straight down"""
116
117 xIncrement, yIncrement = 0, 0
118
119 if self.isMoving():
120
121 if self.playerPosition == 1:
122 xIncrement = 0
123 yIncrement = 2 * self.speed
124 else:
125 xIncrement = ( self.playerPosition - 1 ) * self.speed
126 yIncrement = self.speed
127
128 return xIncrement, yIncrement
129
130 class Obstacle( Sprite ):
131 """A moveable Obstacle Sprite"""
132
133 def __init__( self, image, centerX = 0, centerY = 0 ):
134 """Load Obstacle image and initialize rectangle"""
135
136 Sprite.__init__( self, image )
137
138 # move Obstacle to specified location
139 self.positiveRectangle = self.rectangle
140 self.positiveRectangle.centerx = centerX
141 self.positiveRectangle.centery = centerY
142
143 # display Obstacle in moved position to buffer visible area
Fig. 24.6 Pygame example (part 3 of 9).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1094 Wednesday, August 29, 2001 4:23 PM
414 dirtyRectangles = []
415
416 # check for course end
417 if distanceTraveled > COURSE_DEPTH:
418 courseOver = 1
419
420 # check for game over
421 elif timeLeft <= 0:
422 break
423
424 if courseOver:
425 applauseSound.play()
426 message = "Asteroid Field Crossed!"
427 else:
428 gameOverSound.play()
429 message = "Game Over!"
430
431 pygame.display.update( displayMessage( message, screen,
432 background ) )
433
434 # wait until player wants to close program
435 while 1:
436 event = pygame.event.poll()
437
438 if event.type == QUIT or \
439 ( event.type == KEYDOWN and event.key == K_ESCAPE ):
440 break
441
442 if __name__ == "__main__":
443 main()
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1100 Wednesday, August 29, 2001 4:23 PM
Lines 255–256 create the black background for the game. First, the program creates a
pygame Surface that is the same size as the window. The size of the window is obtained
from screen method get_size. Surface method convert is then invoked on the
background. convert is used to convert a surface’s pixel format to the display format so
that blits are performed faster. Blits will be discussed later. The call to background’s
fill method fills the background with the color black. The argument passed to fill is
a three-element tuple representing the RGB values of the desired color. Because black has
no red, green or blue, it is represented by (0, 0, 0).
Line 259 blits the background onto the screen. Blitting can be thought of as drawing
an object on a surface. The call to screen’s blit method in line 259 draws the back-
ground onto the screen at position (0, 0). Position (0, 0) represents the upper-left
corner of the screen. Because the background is the same size as the screen, the background
will fill the screen. However, if the screen were visible at this point, it would not yet be
black. Although the background has been blitted, the display has not been updated. This is
done in line 260. The pygame.display function update updates the display. If passed
no arguments, update will update the entire display Surface. We will see later that this
is not always necessary (or efficient).
Lines 262–266 load all necessary sound files. Each line creates a Sound object
(defined in pygame.mixer) from a path created in lines 224–228. Lines 269–278 load
the ship images. In our game, the ship has four possible states: moving left, moving down,
moving right and crashed. Because of the implementation of class Player (discussed
later), the paths to the images representing the first three states are appended to list ship-
Files (lines 232–234). The for/in loop at line 271–274 iterates over this list, loading
each image. Line 272 loads an image with pygame.image function load. Note that just
as the background’s pixel format was converted, the pixel format of each image loaded
must be converted. The value returned by load is a pygame Surface, which is stored in
variable surface. Line 273 invokes surface method get_at to obtain the color of
the image at position (0, 0). For each image, the color at this position is white. surface
method set_colorkey is then passed this color. The effect is that the color white will
appear transparent for each surface. Each surface is appended to list loadedImages.
Lines 277–278 similarly load the image representing the crashed state.
Line 281 invokes screen method get_width to obtain the width of the window.
Because we want our ship to appear halfway across the screen, centerX is assigned half
of this value. Line 282 creates a Player object and assigns it to variable theShip. The
arguments passed to the Player constructor ensures that the ship appears halfway across
the screen, 25 pixels from the top. We will now discuss two classes, Sprite and Player.
Class Sprite (lines 10–28) defines any object that we place on the screen. The
Sprite constructor takes as input a pygame Surface called image. Lines 16 stores this
Surface in class attribute image. Line 17 computes the image’s bounding rectangle
with Surface method get_rect, and stores it in attribute rectangle. The object
returned by get_rect is a pygame rectstyle.
A pygame rectstyle represents a rectangular area and may have three possible forms.
The first is a four-element sequence of the form [ xpos, ypos, width, height ], where xpos
and ypos are the coordinates of the upper-left corner of the rectangle, and width and height
are the dimensions of the rectangle. The second is a pair of sequences of the form [ [ xpos,
ypos ], [ width, height ] ]. The third is an instance of class pygame.Rect. A Rect object
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1101 Wednesday, August 29, 2001 4:23 PM
represents a rectangle as well, but also has several useful methods. The rectstyle returned
by get_rect is a Rect object with xpos and ypos of 0. Many pygame functions accept
rectstyles as arguments rather than just Rect objects (including the Rect constructor). In
this case, it is possible (and more convenient) to simply pass the function a four-element
sequence.
Sprite method place (lines 19–22) “places” the object on the screen. place takes
as an argument Surface screen. screen’s blit method is invoked (line 22) to draw
the object at position rectangle. Note that changes to rectangle will change where
the object is drawn. place then returns the value returned by blit, a Rect representing
the area blitted.
Sprite method remove (lines 24–28) “removes” an object from the screen by
drawing the background over it (lines 27–28). Note that this call to blit has three argu-
ments, two of which are rectangle. The third argument specifies what section of
background to draw at position rectangle. If no third argument were specified, the
entire background would be drawn at rectangle. remove returns a Rect representing
the area blitted.
Class Player (lines 30–128) represents the object controlled by the player which
appears to move across the screen. In the game, this object is a spaceship. Player inherits
from class Sprite. Line 282 creates a Player object, invoking Player’s constructor
(lines 33–43). Lines 37–40 store the image surfaces and starting position into class
attributes. Line 41 sets playerPosition to 1. playerPosition is the index of the
current image being displayed. Because movingImages is a list of length 3, the indices
0, 1, 2 represent moving left, moving down and moving right, respectively. Thus, line 41
starts the Player in state moving down. playerPosition of –1 indicates the player
has crashed. Line 42 sets attribute speed to 0, and line 43 calls method loadImage.
loadImage (lines 45–55) updates attributes of Player. Lines 48–51 determine the
correct image to use. If the player has not crashed, the image representing the current player
state is used (line 51). Line 53 invokes Sprite’s constructor to update the image and
rectangle attributes. Lines 54–55 move the object to the correct position by changing
rectangle’s centerx and centery attributes.
Player methods moveLeft and moveRight are called when the player presses
the left and right arrow keys, respectively. Because they are similar, we will discuss them
together. First, an if statement checks if the player has crashed (i.e., playerPosition
is –1). If so, speed is set to 1 and move the player either to the left (line 62) or right (line
73) of the obstacle. Otherwise, if the player is not as far left or right as possible, we move
the player left (line 64) or right (line 75) one position. Finally, method loadImage
updates the image.
Method decreaseSpeed (lines 79–82) is called when the user presses the up arrow
key. decreaseSpeed decreases attribute speed by 1. Pressing the down arrow key
invokes method increaseSpeed (lines 84–92). increaseSpeed increases speed
by 1. Lines 90–92 test if the player has crashed. If so, playerPosition is set to 1
(moving down) and the image is updated (line 92).
Player method collision (lines 94–99) is called when the ship collides with an
asteroid. collision sets speed to 0, sets playerPosition to –1 (crashed) and
invokes method loadImage. Collisions are tested for with the Rect returned by method
collisionBox (lines 101–104). collisionBox calls Rect method inflate and
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1102 Wednesday, August 29, 2001 4:23 PM
returns the results. inflate returns a new Rect which represents the calling Rect
reduced or enlarged around its center by a specified amount. Note that we test for collisions
with smaller bounding rectangles for playability purposes. Most likely, the image of the
player are using does not completely fill its rectangle. It would become frustrating to the
player if collisions were to occur when bounding rectangles intersected, but images did not.
Using smaller bounding rectangles for collision detection is sometimes referred to as sub-
rectangle collision.
Method distanceMoved (lines 114–128) determines the current change in player
position. Line 119 invokes method isMoving (lines 106–112) to test if the player is
moving. If so, xIncrement and yIncrement must be calculated. Lines 121–126 use
playerPosition and speed to determine the distance moved. Note that when moving
down, the player moves twice as fast in the vertical direction as when moving left or moving
right.
Once a Player is instantiated (line 282), the program creates the asteroids. Lines 285
–286 load the asteroid image, setting white to transparent. The for/in loop in lines 289–
291 creates NUMBER_ASTEROID asteroids. Each asteroid is an instance of class
Obstacle (discussed later). The arguments passed to Obstacle’s constructor ensure
that each asteroid will be randomly placed on the screen. Note that the values passed to
random.randrange are larger than the screen size in order to buffer the visible area.
The game will simulate ship movement by moving these asteroids up the screen. The direc-
tion the asteroids move depends upon the current state of the ship. When an asteroid moves
off the top of the screen, it will be placed on the bottom of the screen again, creating a
scrolling effect.
Class Obstacle (lines 130–167) inherits from Sprite. An Obstacle represents
an object which the player must avoid. In our game, this object is an asteroid. When an
Obstacle is created, its constructor (lines 133–144) is invoked. Line 136 calls the
Sprite constructor to initialize the image and rectangle attributes.
Because we want asteroids to move off the screen completely (i.e., into negative screen
coordinates) before removing them and placing them back on the screen, we must buffer
the visible area. In order to do so, we must keep track of two locations for each Obstacle.
rectangle represents the actual location of the asteroid. This is where we place the
object. positiveRectangle represents the coordinates of rectangle shifted into
positive screen coordinates. Lines 139–141 create and initialize the position of posi-
tiveRectangle. Line 144 updates rectangle by invoking Rect method move.
This effect is that rectangle is now a rectangle of the same dimensions as posi-
tiveRectangle, but shifted by –60 pixels in both the x and y directions.
Obstacle method move (lines 146–162) is used to move the object. move requires
arguments xIncrement and yIncrement. Recall that class Player has method
distanceMoved. This method returns the necessary values. Lines 149–150 move the
position of positiveRectangle up the screen by the specified amounts. The if state-
ment at line 153 checks if the asteroid has reached the top of the screen. If so, lines 154–
155 add a random integer to the xpos of positiveRectangle. This ensures that the
next time the asteroid appears on the screen, it will not have the same x coordinate as its
previous pass. If these lines were omitted, the asteroid positions would appear to loop,
making gameplay boring. Notice that the program treats positiveRectangle, a Rect
object, as if it were a four-element sequence of the form [ xpos, ypos, width, height ]. Lines
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1103 Wednesday, August 29, 2001 4:23 PM
158–159 make sure that the xpos and ypos of positiveRectangle are within range.
Finally, now that positiveRectangle has been updated, Rect method move (line
162) obtains the new rectangle value.
As with class Player, Obstacle collisions are tested for with the Rect returned
by method collisionBox (lines 164–167). collisionBox calls Rect method
inflate and returns the results.
After the creation of the asteroids (lines 289–291), methods load and convert load
and convert the energy pack image (line 294), setting white to transparent (line 295).
During gameplay, energy packs will be created from class Objective. Objective
(lines 169–184) has a constructor (lines 173–179) and method move (lines 181–185) sim-
ilar to those of class Obstacle. Line 297 invokes Sound method play to play start-
Sound. The player will hear this sound when the game begins. Line 298 invokes
pygame.time function set_timer to generate a USEREVENT event every 1000ms
(one second). USEREVENT is a pygame constant which represents a user-defined event.
The effect of line 298 is that every second, a USEREVENT event will be placed onto SDL’s
event queue. pygame’s event system will be discussed in detail later.
The while loop in lines 300–422 plays the game. Each iteration checks that
courseOver is still 0. If it is, the asteroid field has not yet been crossed, and gameplay
continues. Lines 303–308 use pygame module time to ensure that the game does not run
too fast. Line 302 invokes pygame.time function get_ticks. get_ticks returns
the time, in milliseconds, since pygame.time was imported. This value is stored in vari-
able currentTime. If currentTime is less than nextTime, the previous number of
"ticks" plus a constant (WAIT_TIME), we invoke time function delay (line 306).
delay pauses the execution for a given number of milliseconds. The value passed to
delay is the number of milliseconds remaining until nextTime.
Next, the program updates the display. In order to update the positions of all objects
on the screen, it would be possible to remove each object, change its position and place
(i.e., blit) it on the screen again. Then pygame.display.update (as in line 260) could
update the entire display. However, updating the entire display is inefficient and slow. A
popular method used to speed up screen updates is called dirty rectangle animation. In dirty
rectangle animation, we maintain a list of rectangles (representing areas of the display)
which have been altered (i.e., have become "dirty"). After removing an object from the
screen, its current rectangle is appended to the list and the object’s position is updated.
Finally, the program places the object back on the screen and appends its new rectangle to
the list. Method update is called with the list of "dirty" rectangles. The effect is that
update will only update those parts of the display which have changed, dramatically
improving game performance. Note that the list of rectangles passed to update can be a list
of any rectstyle.
The game implements dirty rectangle animation. Lines 311–320 remove the ship, each
asteroid and the energy pack (if one is present) from the screen by invoking their remove
methods. Each time, remove returns a Rect representing the area changed. Each Rect
is appended to list dirtyRectangles.
We now discuss pygame event handling. As with Tkinter, events can be generated
from the keyboard or mouse. pygame also handles various other events, including joystick
events. One method of pygame event handling uses the SDL event queue. As events are
detected, they are placed on the queue. Each Event object on the queue has a type
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1104 Wednesday, August 29, 2001 4:23 PM
attribute. Keypress Events have type KEYDOWN. Most user-defined events have type
USEREVENT. A request to quit the game results in a QUIT event.
Line 323 invokes pygame.event function poll. poll returns the next Event
waiting on the queue. This object is stored in variable event. If event is a request to quit
the game (QUIT) or a KEYDOWN event with key attribute K_ESCAPE, the program exits
(line 328). Lines 330–344 check if event was generated by any of the four arrow keys
(K_UP, K_DOWN, K_RIGHT or K_LEFT). If so, the corresponding Player method is
invoked. Recall now that line 298 causes one USEREVENT event to be placed on the event
queue every second. Line 347 checks if event is one of these. If so, timeLeft, the time
remaining to cross the asteroid field, is reduced by 1 (line 348).
Lines 351–353 attempt to create a new energy pack. If an energy pack does not exists
(energyPack is None) and randrange returns 0, the program creates a new ener-
gyPack from class Objective. The arguments passed to Objective’s constructor
ensure that the pack will start at a random position at the bottom of the screen. Note that
because the function call passes 100 to randrange, the odds of creating a new pack if one
does not exist is 1 in 100.
We then update the positions of the asteroids and energy pack (if one exists). If the ship
is moving (i.e., speed > 0), we retrieve the xIncrement and yIncrement from
Player method distanceMoved (line 357). We update the position of each asteroid
(lines 358–359) and the position of the energy pack (lines 362–363). Line 366 checks if the
energy pack has moved off the top of the screen. If so, we destroy the current energy pack
(line 366). Line 368 increments distanceTraveled.
The next section tests for asteroid collisions. Lines 372–375 create a list, asteroid-
Boxes, of Rects returned from each asteroid’s collisionBox method. The call then
passes this list to Rect method collidelist. collidelist returns the index of the
first rectstyle in a list which overlaps the base rectangle. In line 378–3799, the base rect-
angle is the Rect returned from the ship’s collisionBox method. When an overlap is
found, collideList stops checking the remaining list. If no overlap is found, col-
lideList returns -1. If the ship has collided with an asteroid (line 383), we play a colli-
sion sound (line 383) and move the offending asteroid out of the way (line 384). Lines 384
and 385 invoke the ship’s collision method and deduct 5 extra seconds from the time
remaining.
Lines 389–395 check if the player has gotten an energy pack. Line 391–392 invokes
Rect method colliderect. colliderect returns true if the calling Rect overlaps
the argument rectstyle. If the player has, indeed, gotten the energy pack, the game plays
chimeSound, removes the energy pack and adds 5 seconds to the clock (lines 393–395).
Lines 398–404 place all the objects back on the screen, appending their rectangles
to dirtyRectangles. Lines 407–410 update the clock in the upper-left corner of the
screen. Function updateClock (lines 196–203) removes the previous clock Surface,
creates a new one and blits it onto the screen. A pygame.font.Font object (line 198)
allows the program render text into a Surface. The Font constructor takes two argu-
ments. The first is the name of the font file to use. If None is specified, Font will use the
pygame default font file (bluebold.ttf). The second argument is the size of the font. Line
198 creates a Font of type bluebold and size 48. Lines 199–200 invoke font’s render
method to create a new Surface with specified text. render accepts up to four argu-
ments. The first is the text to create. The second specifies to use antialiasing (edge
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1105 Wednesday, August 29, 2001 4:23 PM
smoothing) or not. The third is the RGB color to render the font in. The fourth is the RGB
color of the background. If no fourth argument is specified, the text background will be
transparent. updateClock returns both the old (remove) and new (post) rectangles.
Once the clock has been created and blitted on the screen, lines 409–410 append the
clock’s previous rectangle and current rectangle to dirtyRectangles. Line 413 is the
final step in dirty rectangle animation. Every altered area of the display is updated. Without
this line, the player would not see any change in the display. Line 414 re-initializes dirt-
yRectangles for the next iteration.
If the player has crossed the asteroid field (line 417), the program sets courseOver
to 1. This will ensure the while loop exists after the current iteration. If not, the program
checks whether the player has run out of time (line 421). If so, the program exits the while
loop.
Once the while loop has been broken, execution continues at line 424 and checks if
the player has won or lost the game. If the player has won, the game plays applause-
Sound and sets message to "Asteroid Field Crossed!". Otherwise, the pro-
gram plays gameOverSound and sets message to "Game Over!". Lines 431–432
invoke pygame.display function update to display message to the player. Func-
tion displayMessage returns the rectstyle passed to update. displayMessage
(lines 187–193) blits a message on the screen and returns the area of the screen which has
been modified. displayMessage is similar to updateClock. The while loop in
lines 435–440 waits for the user to exit the program.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1106 Wednesday, August 29, 2001 4:23 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24.fm Page 1107 Wednesday, August 29, 2001 4:23 PM
[***Notes To Reviewers***]
• We will post this chapter for second-round review with back matter—summary, terminology, ex-
ercises and solutions.
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send us e-mails with detailed, line-by-line comments; mark these directly on the pa-
per pages.
• Please feel free to send any lengthy additional comments by e-mail to cheryl.yaeger@dei-
tel.net.
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copyedited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are concerned mostly with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing style on a global scale.
Please send us a short e-mail if you would like to make such a suggestion.
• Please be constructive. This book will be published soon. We all want to publish the best possible
book.
• If you find something that is incorrect, please show us how to correct it.
• Please read all the back matter including the exercises and any solutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_24IX.fm Page 1 Wednesday, August 29, 2001 4:23 PM
Index 1
2 Index
25
Accessibility
Objectives
• To introduce the World Wide Web Consortium’s Web
Content Accessibility Guidelines 1.0 (WCAG 1.0).
• To understand how to use the alt attribute of the
<img> tag to describe images to people with visual
impairments, mobile-Web-device users, search
engines, etc.
• To understand how to make XHTML tables more
accessible to page readers.
• To understand how to verify that XHTML tags are
used properly and to ensure that Web pages are
viewable on any type of display or reader.
• To understand how VoiceXML™ and CallXML™ are
changing the way people with disabilities access
information on the Web.
• To introduce the various accessibility aids offered in
Windows 2000.
’Tis the good reader that makes the good book...
Ralph Waldo Emerson
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1110 Wednesday, August 29, 2001 3:08 PM
Outline
25.1 Introduction
25.2 Web Accessibility
25.3 Web Accessibility Initiative
25.4 Providing Alternatives for Images
25.5 Maximizing Readability by Focusing on Structure
25.6 Accessibility in XHTML Tables
25.7 Accessibility in XHTML Frames
25.8 Accessibility in XML
25.9 Using Voice Synthesis and Recognition with VoiceXML™
25.10 CallXML™
25.11 JAWS® for Windows
25.12 Other Accessibility Tools
25.13 Accessibility in Microsoft® Windows® 2000
25.13.1 Tools for People with Visual Impairments
25.13.2 Tools for People with Hearing Impairments
25.13.3 Tools for Users Who Have Difficulty Using the Keyboard
25.13.4 Microsoft Narrator
25.13.5 Microsoft On-Screen Keyboard
25.13.6 Accessibility Features in Microsoft Internet Explorer 5.5
25.14 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
25.1 Introduction
Enabling a Web site to meet the needs of individuals with disabilities is a concern for all
businesses. People with disabilities are a significant portion of the population, and legal
ramifications exist for Web sites that discriminate by not providing adequate and universal
access to their resources. In this chapter, we explore the Web Accessibility Initiative, its
guidelines, various laws regarding businesses and their availability to people with disabil-
ities and how some companies have developed systems, products and services to meet the
needs of this demographic.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1111 Wednesday, August 29, 2001 3:08 PM
Act Purpose
Americans with Disabilities Act The ADA prohibits discrimination on the basis of disability
in employment, state and local government, public accom-
modations, commercial facilities, transportation and telecom-
munications.
Telecommunications Act of 1996 The Telecommunications Act of 1996 contains two amend-
ments to Section 255 and Section 251(a)(2) of the Communi-
cations Act of 1934. These amendments require that
communication devices, such as cell phones, telephones and
pagers, be accessible to individuals with disabilities.
Individuals with Disabilities Education materials in schools must be made accessible to
Education Act of 1997 children with disabilities.
Rehabilitation Act Section 504 of the Rehabilitation Act states that college
sponsored activities receiving federal funding cannot dis-
criminate against individuals with disabilities. Section 508
mandates that all government institutions receiving federal
funding design their Web sites such that they are accessible
to individuals with disabilities. Businesses that service the
government also must abide by this act.
Fig. 25.1 Acts designed to protect access to the Internet for people with
disabilities.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1112 Wednesday, August 29, 2001 3:08 PM
difficult to achieve because people have varying types of disabilities, language barriers and
hardware and software inconsistencies. However, a high level of accessibility is attainable.
As more people with disabilities use the Internet, it is imperative that Web site designers
increase the accessibility of their sites. The WAI aims for such accessibility, as discussed
in its mission statement described at www.w3.org/WAI.
This chapter explains some of the techniques for developing accessible Web sites. The
WAI published the Web Content Accessibility Guidelines (WCAG) 1.0 to help businesses
determine if their Web sites are accessible to everyone. The WCAG 1.0 (www.w3.org/
TR/WCAG10) uses checkpoints to indicate specific accessibility requirements. Each
checkpoint has an associated priority indicating its importance. Priority-one checkpoints
are goals that must be met to ensure accessibility; we focus on these points in this chapter.
Priority-two checkpoints, though not essential, are highly recommended. These check-
points must be satisfied, or people with certain disabilities will experience difficulty
accessing Web sites. Priority-three checkpoints slightly improve accessibility.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1113 Wednesday, August 29, 2001 3:08 PM
At the time of this writing, the WAI is working on the WCAG 2.0 draft. A single check-
point in the WCAG 2.0 Working Draft may encompass several checkpoints from WCAG
1.0; WCAG 2.0 checkpoints will supersede those in WCAG 1.0. Also, the WCAG 2.0 sup-
ports a wider range of markup languages (i.e., XML, WML, etc.) and content types than its
predecessor. To obtain more information about the WCAG 2.0 Working Draft, visit
www.w3.org/TR/WCAG20.
The WAI also presents a supplemental checklist of quick tips, which reinforce ten
important points for accessible Web site design. More information on the WAI Quick Tips
resides at www.w3.org/WAI/References/Quicktips.
Emacspeak is a screen interface that allows greater Internet access to individuals with
visual disabilities by translating text to voice data. The open source product also imple-
ments auditory icons that play various sounds. Emacspeak can be customized with Linux
operating systems and provides support for the IBM ViaVoice speech engine. The Emacs-
peak Web site is located at www.cs.cornell.edu/home/raman/emacspeak/
emacspeak.html.
In March 2001, We Media introduced the “WeMedia Browser,” which allows people
with poor vision and cognitive disabilities (e.g., dyslexia) to use the Internet more conve-
niently. The WeMedia Browser improves upon the traditional browser by providing over-
sized buttons and keystroke commands for navigation. The user can control the speed and
volume at which the browser “reads” Web page text. The WeMedia Browser is available
for free download at www.wemedia.com.
IBM Home Page Reader (HPR) is another browser that “reads” text selected by the
user. The HPR uses the IBM ViaVoice technology to synthesize a voice. A trial version of
HPR is available at www-3.ibm.com/able/hpr.html.
signed. For example, the CAST eReader, a screen reader developed by the Center for Ap-
plied Special Technology (www.cast.org), starts at the top-left-hand cell and reads
columns from top to bottom, left to right. This procedure is known as reading a table in a
linearized manner. The CAST eReader reads the table in Fig. 25.3 as follows:
44 <tr>
45 <td>Pineapple</td>
46 <td>$2.00</td>
47 </tr>
48
49 </table>
50
51 </body>
52 </html>
This reading does not present the content of the table adequately. WCAG 1.0 recom-
mends using CSS instead of tables, unless the tables’ content linearizes in an understand-
able manner.
If the table in Fig. 25.3 were large, the screen reader’s linearized reading would be
even more confusing to users. By modifying the <td> tag with the headers attribute and
modifying header cells (cells specified by the <th> tag) with the id attribute, a table will
be read as intended. Figure 25.4 demonstrates how these modifications change the way a
table is interpreted.
This table does not appear to be different from a standard XHTML table. However, the
table is read in a more intelligent manner, when using a screen reader. A screen reader
vocalizes the data from the table in Fig. 25.4 as follows:
Every cell in the table is preceded by its corresponding header when read by the screen
reader. This format helps the listener understand the table. The headers attribute is
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1117 Wednesday, August 29, 2001 3:08 PM
intended specifically for tables that hold large amounts of data. Most small tables linearize
well as long as the <th> tag is used properly. The summary attribute and the caption
element are also suggested. For more examples demonstrating how to make tables acces-
sible, visit www.w3.org/TR/WCAG.
Fig. 25.4 Table optimized for screen reading using attribute headers (part 1 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1118 Wednesday, August 29, 2001 3:08 PM
47 <tr>
48 <td headers = "fruit">Banana</td>
49 <td headers = "price">$1.00</td>
50 </tr>
51
52 <tr>
53 <td headers = "fruit">Pineapple</td>
54 <td headers = "price">$2.00</td>
55 </tr>
56
57 </table>
58
59 </body>
60 </html>
Fig. 25.4 Table optimized for screen reading using attribute headers (part 2 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1119 Wednesday, August 29, 2001 3:08 PM
WCAG 1.0 suggests using Cascading Style Sheets (CSS) as an alternative to frames,
because CSS provides similar functionality and are highly customizible. Unfortunately, the
ability to display multiple XHTML documents in a single browser window requires the
complete support of HTML 4, which is not widespread. However, the second generation of
Cascading Style Sheets (CSS2) displays a single document as if it were several documents.
However, CSS2 is not yet fully supported by many user agents.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1120 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1121 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1122 Wednesday, August 29, 2001 3:08 PM
96 </block>
97 </form>
98
99 </vxml>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1123 Wednesday, August 29, 2001 3:08 PM
80 object-oriented development.
81 <assign name = "currentOption" expr = "'c'"/>
82 <goto next = "#repeat"/>
83 </block>
84 </form>
85
86 <form id = "cplus">
87 <block>
88 The C++ how to program, second edition.
89 With nearly 250,000 sold, Harvey and Paul Deitel's C++
90 How to Program is the world's best-selling introduction
91 to C++ programming. Now, this classic has been thoroughly
92 updated! The new, full-color Third Edition has been
93 completely revised to reflect the ANSI C++ standard, add
94 powerful new coverage of object analysis and design with
95 UML, and give beginning C++ developers even better live
96 code examples and real-world projects. The Deitels' C++
97 How to Program is the most comprehensive, practical
98 introduction to C++ ever published with hundreds of
99 hands-on exercises, roughly 250 complete programs written
100 and documented for easy learning, and exceptional insight
101 into good programming practices, maximizing performance,
102 avoiding errors, debugging, and testing. This new Third
103 Edition covers every key concept and technique ANSI C++
104 developers need to master: control structures, functions,
105 arrays, pointers and strings, classes and data
106 abstraction, operator overloading, inheritance, virtual
107 functions, polymorphism, I/O, templates, exception
108 handling, file processing, data structures, and more. It
109 also includes a detailed introduction to Standard
110 Template Library containers, container adapters,
111 algorithms, and iterators.
112 <assign name = "currentOption" expr = "'cplus'"/>
113 <goto next = "#repeat"/>
114 </block>
115 </form>
116
117 <form id = "repeat">
118 <field name = "confirm" type = "boolean">
119
120 <prompt>
121 To repeat say yes. Say no, to go back to home.
122 </prompt>
123
124 <filled>
125 <if cond = "confirm == true">
126 <goto expr = "'#' + currentOption"/>
127 <else/>
128 <goto next = "#publication"/>
129 </if>
130 </filled>
Fig. 25.6 Publication page of Deitel’s VoiceXML page (part 3 of 4).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1125 Wednesday, August 29, 2001 3:08 PM
131 </field>
132 </form>
133 </vxml>
Computer:
Welcome to the voice page of Deitel and Associates. To exit any time
say exit. To go to the home page any time say home.
User:
Home
Computer:
You have just entered the Deitel home page. Please make a selection by
speaking one of the following options: About us, Driving directions,
Publications.
User:
Driving directions
Computer:
Directions to Deitel and Associates, Inc.
We are located on Route 20 in Sudbury,
Massachusetts, equidistant from route 128
and route 495.
To repeat say yes. To go back to home, say no.
The menu element on line 26 enables users to select the page to which they would like
to link. The choice element, which is always part of either a menu or a form, presents
the options. The next attribute indicates the page to be loaded when a user makes a selec-
tion. The user selects a choice element by speaking the text marked up between the tags
into a microphone. In this example, the first and second choice elements on lines 41–42
transfer control to a local dialog (i.e., a location within the same document) when they are
selected. The third choice element transfers the user to the document publica-
tions.vxml. Lines 27–33 use element prompt to instruct the user to make a selection.
Attribute count maintains the number of times a prompt is spoken (i.e., each time a
prompt is read, count increments by one). The count attribute transfers control to
another prompt once a certain limit has been reached. Attribute timeout specifies how
long the program should wait after outputting the prompt for users to respond. In the event
that the user does not respond before the timeout period expires, lines 35–39 provide a
second, shorter prompt to remind the user to make a selection.
When the user chooses the publications option, the publications.vxml
(Fig. 25.6) loads into the browser. Lines 106–111 define link elements that provide links
to main.vxml. Lines 112–114 provide links to the menu element (lines 118–138), which
asks users to select one of the publications: Java, C or C++. The form elements on lines
140–214 describe each of the books on these topics. Once the browser speaks the descrip-
tion, control transfers to the form element with an id attribute that has a value equal to
repeat (lines 216–231).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1126 Wednesday, August 29, 2001 3:08 PM
Figure 25.6 provides a brief description of each VoiceXML tag used in the previous
example (Fig. 25.6).
25.10 CallXML™
Another advancement in voice technology for people with visual impairments is CallXML,
a technology created and supported by Voxeo (www.voxeo.com). CallXML creates
phone-to-Web applications that control incoming and outgoing telephone calls. Some ex-
amples of CallXML applications include voice mail, interactive voice response systems
and Internet call waiting. While VoiceXML assists individuals with visual impairments by
reading Web pages, CallXML provides individuals with visual impairments access to Web-
based content through telephones.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1127 Wednesday, August 29, 2001 3:08 PM
When users access CallXML applications, a text-to-speech (TTS) engine reads infor-
mation contained within CallXML elements. A TTS engine converts text to an automated
voice. Web applications respond to the caller’s input. [Note: A touch-tone phone is required
to access CallXML applications.]
Typically, CallXML applications play pre-recorded audio clips or text as output,
requesting a response as input. An audio clip may contain a greeting that introduces callers
to the application or to a menu of options that requires callers to make touch-tone entries.
Certain applications, such as voice mail, may require verbal and touch-tone input. Once the
input is received, the application responds by invoking CallXML elements such as text,
which contains the information a TTS engine reads to users. If the application does not
receive input within a designated time frame, it prompts the user to enter valid input.
When a user accesses a CallXML application, the incoming telephone call is referred
to as a session. A CallXML application can support multiple sessions, enabling the appli-
cation to receive multiple telephone calls simultaneously. Each session is independent of
the others and is assigned a unique sessionID for identification. A session terminates either
when the user hangs up the telephone or when the CallXML application invokes the
hangup element. Our first CallXML example shows the classic Hello World example
(Fig. 25.8).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1128 Wednesday, August 29, 2001 3:08 PM
Line 1 contains the optional XML declaration.Value version indicates the XML
version to which the document conforms. Currently, this is version = 1.0. Value
encoding indicates the type of Unicode encoding to use. This example uses UTF-8,
which requires eight bits to transfer and receive data. More information on Unicode may
be found in Appendix G, Unicode®.
The <callxml> tag on line 6 declares the contents of a CallXML document. Line 7
contains the Hello World text. All text spoken by a text-to-speech (TTS) engine needs
to reside within <text> tags.
To deploy a CallXML application, register with the Voxeo Community (commu-
nity.voxeo.com), a Web resource for creating, debugging and deploying phone appli-
cations. For the most part, Voxeo is a free Web resource. However, the company charges
fees when CallXML applications are deployed commercially. The Voxeo Community
assigns a unique telephone number to each CallXML application so that external users may
access and interact with the application. [Note: Voxeo assigns telephone numbers to appli-
cations that reside on the Internet. If you have access to a Web server (IIS, PWS, Apache,
etc.), use it to post your CallXML application. Otherwise, open an Internet account using
one of the many Internet-service companies (e.g., www.geocities.com,
www.angelfire.com). These companies allow you to post documents on the Internet
by using their Web servers.]
Figure 25.8 demonstrates the logging feature of the Voxeo Account Manager,
which is accessible to registered members. The logging feature records and displays the
“conversation” between the user and the application. The first row of the logging feature
displays the URL of the CallXML application and the global variables associated with each
session. The application (program) creates and assigns values to global variables, which the
entire application can access and modify, at the start of each session. The subsequent row(s)
display(s) the “conversation.” This example shows a one-way conversation (because the
application does not accept any input from the user) in which the TTS says Hello World.
The last row shows the end of session message, which states that the phone call has ter-
minated. The logging feature assists developers in debugging their applications. By
observing the “conversation,” a developer can determine at which point the application ter-
minates. If the application terminates abruptly (“crashes”), the logging feature states the
type and location of the error, so that a developer knows the particular section of the appli-
cation on which to focus.
The next example (Fig. 25.9) shows a CallXML application that reads the ISBN values
of three Deitel textbooks—Internet and World Wide Web How to Program: Second Edi-
tion, XML How to Program and Java How to Program: Fourth Edition—based on the
user’s touch-tone input. [Note: The following code has been formatted for presentation pur-
poses.]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1129 Wednesday, August 29, 2001 3:08 PM
7 <block>
8 <text>
9 Welcome. To obtain the ISBN of the Internet and World
10 Wide Web How to Program: Second Edition, please enter 1.
11 To obtain the ISBN of the XML How to Program,
60 </text>
61 </onTermDigit>
62
63 <onTermDigit value = "4">
64 <text>
65 Thank you for calling our CallXML application.
66 Good-bye.
67 </text>
68 </onTermDigit>
69 </block>
70
71 <!-- event handler that terminates the call -->
72 <onHangup />
73 </callxml>
Fig. 25.9 CallXML example that reads three ISBN values (part 3 of 3).
The <block> tag (line 7) encapsulates other CallXML tags. Usually, CallXML tags
that perform a similar task should be enclosed within <block>...</block>. The block
element in this example encapsulates the <text>, <getDigits>, <onMaxSilence>
and <onTermDigit> tags. A block element can contain nested block elements.
Lines 20–23 show some attributes of the <getDigits> tag. The getDigits ele-
ment obtains the user’s touch-tone response and stores it in the variable declared by the
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1131 Wednesday, August 29, 2001 3:08 PM
var attribute (i.e., ISBN). The maxDigits attribute (line 21) indicates the maximum
number of digits that the application can accept. This application accepts only one char-
acter. If no number is stated, then the application uses the default value—nolimit.
The termDigits attribute (line 22) contains the list of characters that terminate user
input. When a character from this list is received as input, the CallXML application is noti-
fied that the last acceptable input has been received and that any character entered after this
point is invalid. These characters do not terminate the call; they simply notify the applica-
tion to proceed to the next step because the necessary input has been received. In our
example, the values for termDigits are one, two, three or four. The default value for
termDigits is the null value ("").
The maxTime attribute (line 23) indicates the maximum amount of time to wait for a
user response (i.e., 60 seconds). If no input is received within the given time frame, then
the CallXML application may terminate—a drastic measure. The default value for this
attribute is 30 seconds.
The onMaxSilence element (lines 27–37) is an event handler that is invoked when
the maxTime (or maxSilence) expires. An event handler notifies the application of the
appropriate action to perform. In this case, the application asks the user to enter a value
because the maxTime has expired. After receiving input, getDigits (line 32) stores the
value in the ISBN variable.
The onTermDigit element (lines 39–68) is an event handler that notifies the appli-
cation of the appropriate action to perform when users select one of the termDigits
characters. At least one <onTermDigit> tag must be associated with the getDigits
element, even if the default value ("") is used. We provide four actions that the application
can perform depending on the user-entered value. For example, if the user enters 1, the
application reads the ISBN value of the Internet and World Wide Web How to Program:
Second Edition textbook.
Line 72 contains the <onHangup/> event handler, which terminates the telephone
call when the user hangs up the telephone. Our <onHangup> event handler is an empty
tag (i.e., there is no action to perform when this tag is invoked).
The logging feature in Fig. 25.9 displays the “conversation” between the application
and the user. The first row displays the URL of the application and the global variables of
the session. The subsequent rows display the “conversation”—the application asks the
caller which ISBN value to read, the caller enters 1 (Internet and World Wide Web How to
Program: Second Edition) and the application reads the corresponding ISBN. The end of
session message states that the application has terminated.
Brief descriptions of several logic and action CallXML elements are provided in
Fig. 25.10. Logic elements assign values to, and clear values from, the session variables,
and action elements perform specified tasks, such as answering and terminating a telephone
call during the current session. A complete list of CallXML elements is available at:
www.oasis-open.org/cover/callxmlv2.html
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1132 Wednesday, August 29, 2001 3:08 PM
Elements Description
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1133 Wednesday, August 29, 2001 3:08 PM
Elements Description
The JAWS demo is fully functional and includes an extensive, highly customized help
system. Users can select which voice to use and the rate at which text is spoken. Users also
can create keyboard shortcuts. Although the demo is in English, the full version of JAWS
3.7 allows the user to choose one of several supported languages.
JAWS also includes special key commands for popular programs such as Microsoft
Internet Explorer and Microsoft Word. For example, when browsing in Internet Explorer,
JAWS’ capabilities extend beyond reading the content on the screen. If JAWS is enabled,
pressing Insert + F7 in Internet Explorer opens a Links List dialog, which displays all the
links available on a Web page. For more information about JAWS and the other products
offered by Henter-Joyce, visit www.hj.com.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1134 Wednesday, August 29, 2001 3:08 PM
Despite the existence of adaptive software and hardware for people with visual impair-
ments, the accessibility of computers and the Internet is still hampered by the high costs,
rapid obsolescence and unnecessary complexity of current technology. Moreover, almost
all software currently available requires installation by a person who can see. Ocularis is a
project launched in the open-source community to help address these problems. Open
source software for people with visual impairments already exists, and although it is often
superior to its proprietary, closed-source counterparts, it has not yet reached its full poten-
tial. Ocularis ensures that the blind can use the Linux operating system fully, by providing
an Audio User Interface (AUI). Products that integrate with Ocularis include a word pro-
cessor, calculator, basic finance application, Internet browser and e-mail client. A screen
reader will also be included with programs that have a command-line interface. The official
Ocularis Web site is located at ocularis.sourceforge.net.
People with visual impairments are not the only beneficiaries of the effort being made
to improve markup languages. People with hearing impairments also have a number of
tools to help them interpret auditory information delivered over the Web, such as Synchro-
nized Multimedia Integration Language (SMIL™), discussed in Chapter 33, Multimedia.
This markup language adds extra tracks—layers of content found within a single audio or
video file—to multimedia content. The additional tracks can contain closed captioning.
Technologies also are being designed to help people with severe disabilities, such as
quadriplegia, a form of paralysis that affects the body from the neck down. One such tech-
nology, EagleEyes, developed by researchers at Boston College (www.bc.edu/
eagleeyes), is a system that translates eye movements into mouse movements. Users
move the mouse cursors by moving their eyes or heads and thereby can control computers.
The company CitXCorp is developing new technology that translates Web information
through the telephone. Information on a specific topic can be accessed by dialing the des-
ignated number. The new software is expected to be made available to users for $10 per
month. For more information on regulations governing the design of Web sites to accom-
modate people with disabilities, visit www.access-board.gov.
In alliance with Microsoft, GW Micro, Henter-Joyce and Adobe Systems, Inc. are also
working on software to aid people with disabilities. JetForm Corp also is accommodating
the needs of people with disabilities by developing server-based XML software. The new
software allows users to download a format that best meets their needs.
There are many services on the Web that assist e-business owners in designing their
Web sites to be accessible to individuals with disabilities. For additional information, the
U.S. Department of Justice (www.usdoj.gov) provides extensive resources detailing
legal issues and current technologies related to people with disabilities.
These examples are just a few of the accessibility projects and technologies that cur-
rently exist. For more information on Web and general computer accessibility, see the
resources provided in Section 25.14, Internet and World Wide Web Resources.
Accessibility Wizard, which guides users through all the Windows 2000 accessibility
features and configures their computers according to the chosen specifications. This section
guides users through the configuration of their Windows 2000 accessibility options using
the Accessibility Wizard.
To access the Accessibility Wizard, users must have Microsoft Windows 2000.
Select the Start button and select Programs followed by Accessories, Accessibility
and Accessibility Wizard. When the wizard starts, the Welcome screen is displayed.
Select Next to display a dialog (Fig. 25.11) that asks the user to select a font size. Click
Next.
Figure 25.12 shows the next dialog displayed. This dialog allows the user to activate
the font size settings chosen in the previous window, change the screen resolution, enable
the Microsoft Magnifier (a program that displays an enlarged section of the screen in a sep-
arate window) and disable personalized menus (a feature which hides rarely used programs
from the start menu, which can be a hindrance to users with disabilities). Make selections
and select Next.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1136 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1138 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1139 Wednesday, August 29, 2001 3:08 PM
25.13.3 Tools for Users Who Have Difficulty Using the Keyboard
The next dialog is StickyKeys (Fig. 25.20). StickyKeys helps users who have difficulty
pressing multiple keys at the same time. Many important computer commands are invoked
by pressing specific key combinations. For example, the reboot command requires pressing
Ctrl+Alt+Delete simultaneously. StickyKeys allows users to press key combinations in
sequence rather than simultaneously. Select Next to continue to the BounceKeys dialog
(Fig. 25.21).
Another common problem for certain users with disabilities is accidently pressing the
same key more than once. This problem typically results from pressing a key for a long
period of time. BounceKeys force the computer to ignore repeated keystrokes. Select
Next.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1140 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1141 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1144 Wednesday, August 29, 2001 3:08 PM
Mouse speed is adjusted by using the MouseSpeed (Fig. 25.26) dialog of the
Accessibility Wizard. Dragging the scroll bar changes the speed. Selecting the Next
button sets the speed and displays the wizard’s Set Automatic Timeouts window
(Fig. 25.27).
Although accessibility tools are important to users with disabilities, they can be a hin-
drance to users who do not need them. In situations where varying accessibility needs exist,
it is important that users be able to turn the accessibility tools on and off as necessary. The
Set Automatic Timeouts window specifies a timeout period for the tools. A timeout
either enables or disables a certain action after the computer has idled for a specified
amount of time. A screen saver is a common example of a program with a timeout period.
Here, a timeout is set to toggle the accessibility tools.
After selecting Next, the Save Settings to File dialog appears (Fig. 25.28). This
dialog determines whether the accessibility settings should be used as the default settings,
which are loaded when the computer is rebooted, or after a timeout. Set the accessibility
settings as the default if the majority of users need them. Users can save the accessibility
settings as well, by creating an.acw file, which, when clicked, activates the saved acces-
sibility settings on any Windows 2000 computer.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1146 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1147 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1148 Wednesday, August 29, 2001 3:08 PM
The accessibility options in IE5.5 augment users’ Web browsing. Users can ignore
Web colors, Web fonts and font size tags. This eliminates problems that arise from poor
Web page design and allows users to customize their Web browsers. Users can even specify
a style sheet, which formats every Web site visited according to users’ personal prefer-
ences.
These are not the only accessibility options offered in IE5.5. In the Internet Options
dialog click the Advanced tab. This opens the dialog shown in Fig. 25.33. The first option
that can be set is labeled Always expand ALT text for images. By default, IE5.5 hides
some of the <alt> text if it exceeds the size of the image it describes. This option forces
all the text to be shown. The second option reads: Move system caret with focus/
selection changes. This option is intended to make screen reading more effective. Some
screen readers use the system caret (the blinking vertical bar associated with editing text)
to decide what is read. If this option is not activated, screen readers may not read Web pages
correctly.
Web designers often forget to take accessibility into account when creating Web sites
and they use fonts that are too small. Many user agents have addressed this problem by
allowing the user to adjust the text size. Select the View menu and then Text Size to
change the font size using IE5.5. By default, the text size is set to Medium.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1149 Wednesday, August 29, 2001 3:08 PM
www.synapseadaptive.com/joel/natlink.htm
Python module natlink allows the user to access and control Dragon NaturallySpeaking, software
that provides a speech recognition system for Windows 95/98/NT.
www.w3.org/WAI
The World Wide Web Consortium’s Web Accessibility Initiative (WAI) site promotes the design of
universally accessible Web sites. This site contains the current guidelines and forthcoming standards
for Web accessibility.
deafness.about.com/health/deafness/msubmenu6.htm
This is the home page of deafness.about.com. It is a resource to find information pertaining to
deafness.
www.cast.org
CAST (Center for Applied Special Technology) offers software, including a valuable accessibility
checker, that help individuals with disabilities use a computer. The accessibility checker is a Web-
based program that validates the accessibility of Web sites.
www.trainingpost.org/3-2-inst.htm
This site presents a tutorial on the Gunning Fog Index. The Gunning Fog Index grades text based on
its readability.
www.w3.org/TR/REC-CSS2/aural.html
This page discusses Aural Style Sheets, outlining the purpose and uses of this new technology.
laurence.canlearn.ca/English/learn/newaccessguide/indie
INDIE stands for “Integrated Network of Disability Information and Education.” This site is home to
a search engine that helps users find information on disabilities.
java.sun.com/products/java-media/speech/forDevelopers/JSML
This site outlines the specifications for JSML, Sun Microsystem’s Java Speech Markup Language.
This language, like VoiceXML, could drastically improve accessibility for people with visual impair-
ments.
www.slcc.edu/webguide/lynxit.html
Lynxit is a development tool that allows users to view any Web site as a text-only browser would. The
site’s form allows you to enter a URL and returns the Web site in text-only format.
www.trill-home.com/lynx/public_lynx.html
This site allows users to browse the Web with a Lynx browser. Users can view how Web pages appear
to users without the most current technologies.
www.wgbh.org/wgbh/pages/ncam/accesslinks.html
This site provides links to other accessibility pages across the Web.
ocfo.ed.gov/coninfo/clibrary/software.htm
This page is the U.S. Department of Education’s Web site for software accessibility requirements. It
helps developers produce accessible products.
www-3.ibm.com/able/access.html
The homepage of IBM’s accessibility site provides information on IBM products and their accessi-
bility and discusses hardware, software and Web accessibility.
www.w3.org/TR/voice-tts-reqs
This page explains the speech synthesis markup requirements for voice markup languages.
www.voicexmlcentral.com
This site contains information about VoiceXML, such as the specification and the document type def-
inition (DTD).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1150 Wednesday, August 29, 2001 3:08 PM
deafness.about.com/health/deafness/msubvib.htm
This site provides information on vibrotactile devices, which allow individuals with hearing impair-
ments to experience audio in the form of vibrations.
web.ukonline.co.uk/ddmc/software.html
This site provides links to software for people with disabilities.
www.hj.com
Henter-Joyce is a division of Freedom Scientific that provides software for people with visual impair-
ments. It is the home of JAWS.
www.abledata.com/text2/icg_hear.htm
This page contains a consumer guide that discusses technologies for people with hearing impair-
ments.
www.washington.edu/doit
The University of Washington’s DO-IT (Disabilities, Opportunities, Internetworking and Technolo-
gy) site provides information and Web development resources for creating universally accessible Web
sites.
www.webable.com
WebABLE contains links to many disability-related Internet resources and is geared towards those de-
veloping technologies for people with disabilities.
www.webaim.org
The WebAIM site provides a number of tutorials, articles, simulations and other useful resources that
demonstrate how to design accessible Web sites. The site provides a screen reader simulation.
www.speech.cs.cmu.edu/comp.speech/SpeechLinks.html
The Speech Technology Hyperlinks page has over 500 links to sites related to computer-based speech
and speech recognition tools.
www.islandnet.com/~tslemko
The Micro Consulting Limited site contains shareware speech synthesis software.
www.chantinc.com/technology
This page is the Chant Web site, which discusses speech technology and how it works. Chant also
provides speech synthesis and speech recognition software.
whatis.techtarget.com/definition
This site provides definitions and information about several topics, including CallXML. Its thorough
definition of CallXML differentiates CallXML and VoiceXML, another technology developed by
Voxeo. The site contains links to other published articles discussing CallXML.
www.oasis-open.org/cover/callxmlv2.html
This site provides a comprehensive list of the CallXML tags complete with descriptions of each tag.
Short examples on how to apply the tags in various applications are provided.
SUMMARY
• Enabling a Web site to meet the needs of individuals with disabilities is an issue relevant to all
business owners.
• Legal ramifications exist for Web sites that discriminate against people with disabilities (i.e., by
not providing them with adequate access to the site’s resources).
• Technologies such as voice activation, visual enhancers and auditory aids enable individuals with
disabilities to work in more positions.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1151 Wednesday, August 29, 2001 3:08 PM
• On April 7, 1997, the World Wide Web Consortium (W3C) launched the Web Accessibility Ini-
tiative (WAI). The WAI is an attempt to make the Web more accessible; its mission is described
at www.w3.org/WAI.
• Accessibility refers to the level of usability of an application or Web site for people with disabili-
ties. Total accessibility is difficult to achieve because there are many different disabilities, lan-
guage barriers, and hardware and software inconsistencies.
• The majority of Web sites are considered either partially or totally inaccessible to people with vi-
sual, learning or mobility impairments.
• The WAI publishes the Web Content Accessibility Guidelines 1.0, which assigns priorities to a
three-tier structure of checkpoints. The WAI currently is working on a draft of the Web Content
Accessibility Guidelines 2.0.
• One important WAI requirement is to ensure that every image, movie and sound on a Web site is
accompanied by a description that clearly defines the object’s purpose; this is called an <alt>
tag.
• Specialized user agents, such as screen readers (programs that allow users to hear what is being
displayed on their screen) and braille displays (devices that receive data from screen-reading soft-
ware and output the data as braille), allow people with visual impairments to access text-based in-
formation that is normally displayed on the screen.
• Using a screen reader to navigate a Web site can be time consuming and frustrating, because
screen readers are unable to interpret pictures and other graphical content that do not have alterna-
tive text.
• Including links at the top of each Web page provides easy access to page’s main content.
• Web pages with large amounts of multimedia content are difficult for user agents to interpret un-
less they are designed properly. Images, movies and most non-XHTML objects cannot be read by
screen readers.
• Web designers should avoid misuse of the alt attribute; it is intended to provide a short descrip-
tion of an XHTML object that may not load properly on all user agents.
• The value of the longdesc attribute is a text-based URL, linked to a Web page, that describes
the image associated with the attribute.
• When creating a Web page intended for the general public, it is important to consider the reading
level at which it is written. Web site designers can make their sites more readable through the use
of shorter words, as some users may have difficulty reading long words. In addition, users from
other countries may have difficulty understanding slang and other nontraditional language.
• Web designers often use frames to display more than one XHTML file at a time and are a conve-
nient way to ensure that certain content is always on screen. Unfortunately, frames often lack prop-
er descriptions, which prevents users with text-based browsers, or users who lack sight, from
navigating the Web site.
• The <noframes> tag allows the designer to offer alternative content to users whose browsers do
not support frames.
• VoiceXML has tremendous implications for people with visual impairments and for the illiterate.
VoiceXML, a speech recognition and synthesis technology, reads Web pages to users and under-
stands words spoken into a microphone.
• A VoiceXML document is made up of a series of dialogs and subdialogs, which result in spoken
interaction between the user and the computer. VoiceXML is a voice-recognition technology.
• CallXML, a language created and supported by Voxeo, creates phone-to-Web applications.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1152 Wednesday, August 29, 2001 3:08 PM
• When a user accesses a CallXML application, the incoming telephone call is referred to as a ses-
sion. A CallXML application can support multiple sessions that enable the application to receive
multiple telephone calls at any given time.
• A session terminates either when the user hangs up the telephone or when the CallXML applica-
tion invokes the hangup element.
• The contents of a CallXML application are inserted within the <callxml> tag.
• CallXML tags that perform similar tasks should be enclosed within the <block> and </block>
tags.
• To deploy a CallXML application, register with the Voxeo Community, which assigns a telephone
number to the application so that other users may access it.
• Voxeo’s logging feature enables developers to debug their telephone application by observing the
“conversation” between the user and the application.
• Braille keyboards are similar to standard keyboards, except that in addition to having each key la-
beled with the letter it represents, braille keyboards have the equivalent braille symbol printed on
the key. Most often, braille keyboards are combined with a speech synthesizer or a braille display,
so users can interact with the computer to verify that their typing is correct.
• People with visual impairments are not the only beneficiaries of the effort to improve markup lan-
guages. Individuals with hearing impairments also have a great number of tools to help them in-
terpret auditory information delivered over the Web.
• Speech synthesis is another research area that helps people with disabilities.
• Open-source software for people with visual impairments already exists and is often superior to
most of its proprietary, closed-source counterparts.
• People with hearing impairments benefit from Synchronized Multimedia Integration Language
(SMIL). This markup language adds extra tracks—layers of content found within a single audio
or video file. The additional tracks can contain data such as closed captioning.
• EagleEyes, developed by researchers at Boston College (www.bc.edu/eagleeyes), trans-
lates eye movements into mouse movements. Users move the mouse cursor by moving their eyes
or heads and are thereby are able to control computers.
• All of the accessibility options provided by Windows 2000 are available through the Accessibil-
ity Wizard. The Accessibility Wizard takes a user step by step through all of the Windows ac-
cessibility features and configures his or her computer according to the chosen specifications.
• Microsoft Magnifier enlarges the section of your screen surrounding the mouse cursor.
• To solve problems seeing the mouse cursor, Microsoft offers the ability to use larger cursors, black
cursors and cursors that invert objects underneath them.
• SoundSentry is a tool that creates visual signals when system events occur.
• ShowSounds adds captions to spoken text and other sounds produced by today’s multimedia-
rich software.
• StickyKeys is a program that helps users who have difficulty pressing multiple keys at the same
time.
• BounceKeys forces the computer to ignore repeated keystrokes, solving the problem of acci-
dentally pressing the same key more than once.
• ToggleKeys causes an audible beep to alert users that they have pressed one of the lock keys (i.e.,
Caps Lock, Num Lock, or Scroll Lock).
• MouseKeys is a tool that uses the keyboard to emulate mouse movements.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1153 Wednesday, August 29, 2001 3:08 PM
• The Mouse Button Settings tool allows you to create a virtual left-handed mouse by swapping
the button functions.
• A timeout either enables or disables a certain action after the computer has idled for a specified
amount of time. A common example of a timeout is a screen saver.
• You can create an .acw file, that, when clicked, will automatically activate the saved accessibility
settings on any Windows 2000 computer.
• Microsoft Narrator is a text-to-speech program for people with visual impairments. It reads text,
describes the current desktop environment and alerts the user when certain Windows events occur.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1154 Wednesday, August 29, 2001 3:08 PM
TERMINOLOGY
accessibility IBM ViaVoice
Accessibility Wizard id attribute
Accessibility Wizard: Display <img> tag
Color Settings JAWS (Job Access With Sound)
Accessibility Wizard: Icon Size level attribute in VoiceXML
Accessibility Wizard: Mouse Cursor linearize
Accessibility Wizard: Scroll Bar <link> tag in VoiceXML
and Window Border Size local dialog
action element logging feature
alt attribute logic element
Americans with Disabilities Act (ADA) longdesc attribute
<assign> tag in VoiceXML Lynx
AuralCSS markup language
<block> tag in VoiceXML maxDigits attribute in CallXML
BounceKeys maxTime attribute in CallXML
braille display <menu> tag in VoiceXML
braille keyboard Microsoft Magnifier
<break> tag in VoiceXML Microsoft Narrator
<b> tag (bold) Mouse Button Settings window
CallXML MouseKeys
<callxml> tag in CallXML Narrator
caption <next> tag in VoiceXML
Cascading Style Sheets (CSS) nolimit (default value)
count attribute in VoiceXML <noframes> tag
<choice> tag in VoiceXML Ocularis
CSS2 <onHangup> tag in CallXML
D-link <onMaxSilence> tag in CallXML
default setting On-Screen Keyboard
EagleEyes <onTermDigits> tag in CallXML
encoding post request type
<enumerate> tag in VoiceXML priority 1 checkpoint
event handler priority 2 checkpoint
<exit> tag in VoiceXML priority 3 checkpoint
field variable <prompt> tag in VoiceXML
<filled> tag in VoiceXML quick tip
<form> tag in VoiceXML readability
frames Read typed characters
get request type screen reader
<getDigits> tag in CallXML session
global variable sessionID
<goto> tag in VoiceXML Set Automatic Timeout window
<grammar> tag in VoiceXML ShowSounds
Gunning Fog Index SoundSentry
header cells speech recognition
headers attribute speech synthesizer
<h1> tag StickyKeys
<strong> tag track
style sheet Unicode
system carat user agent
<subdialog> tag in VoiceXML <var> tag in VoiceXML
summary attribute var attribute in CallXML
Synchronized Multimedia Integration version
Language (SMIL) ViaVoice
tables voice server
<td> tag Voice Server SDK
termDigits attribute in CallXML VoiceXML
<text> tag in CallXML Voxeo Community
text-to-speech (TTS) <vxml> tag in VoiceXML
<th> tag Web Accessibility Initiative (WAI)
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1155 Wednesday, August 29, 2001 3:08 PM
SELF-REVIEW EXERCISES
25.1 Expand the following acronyms:
a) W3C.
b) WAI.
c) JAWS.
d) SMIL.
e) CSS.
25.2 Fill in the blanks in each of the following statements.
a) The highest priority of the Web Accessibility Initiative ensures that each ,
and is accompanied by a description that clearly defines its pur-
pose.
b) Technologies such as , and enable individuals with
disabilities to work in a large number of positions.
c) Although they can be used as a great layout tool, are difficult for screen read-
ers to interpret and convey clearly to a user.
d) To make a frame accessible to individuals with disabilities, it is important to include
tags on a Web page.
e) and often assist blind people using computers.
k) CallXML creates applications that allow businesses to receive and send tele-
phone calls.
l) A tag must be associated with the <getDigits> tag.
25.3 State whether each of the following is true or false. If false, explain why.
a) Screen readers have no problem reading and translating images.
b) When writing pages for the general public, it is important to consider the reading diffi-
culty level of the text.
c) The <alt> tag helps screen readers describe images in a Web page.
d) Left-handed people have been helped by the improvements made in speech-recognition
technology more than any other group of people.
e) VoiceXML lets users interact with Web content using speech recognition and speech
synthesis technologies.
f) Elements such as onMaxSilence and onTermDigitare event handlers because they
perform a specified task when invoked.
g) The debugging feature of the Voxeo Account Manager assists developers in de-
bugging their CallXML application.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1156 Wednesday, August 29, 2001 3:08 PM
ogy as everyone else can, speech-recognition technology has had the largest impact on the blind and
on people who have trouble typing. e) True. f) True. g) False. The logging feature assists developers
in debugging their CallXML application.
EXERCISES
25.4 Insert XHTML markup into each segment to make the segment accessible to someone with
disabilities. The contents of images and frames should be apparent from the context and filenames.
a) <img src = "dogs.jpg" width = "300" height = "250" />
b) <table width = "75%">
<tr><th>Language</th><th>Version</th></tr>
<tr><td>XHTML</td><td>1.0</td></tr>
<tr><td>Perl</td><td>5.6.0</td></tr>
<tr><td>Java</td><td>1.3</td></tr>
</table>
25.5 Define the following terms:
a) Action element.
b) Gunning Fog Index.
c) Screen reader.
c) Session.
d) Web Accessibility Initiative (WAI).
25.6 Describe the three-tier structure of checkpoints (priority-one, priority-two and priority-three)
set forth by the WAI.
25.7 Why do misused <h1> heading tags create problems for screen readers?
25.8 Use CallXML to create a voice mail system that plays a voice mail greeting and records the
message. Have friends and classmates call your application and leave a message.
EXERCISES
25.4 Insert XHTML markup into each segment to make the segment accessible to someone with
disabilities. The contents of images and frames should be apparent from the context and filenames.
a) <img src = "dogs.jpg" width = "300" height = "250" />
b) <table width = "75%">
<tr><th>Language</th><th>Version</th></tr>
<tr><td>XHTML</td><td>1.0</td></tr>
<tr><td>Perl</td><td>5.6.0</td></tr>
<tr><td>Java</td><td>1.3</td></tr>
</table>
c) <map name = "links">
<area href = "index.html" shape = "rect"
coords = "50, 120, 80, 150" />
<area href = "catalog.html" shape = "circle"
coords = "220, 30" />
</map>
<img src = "antlinks.gif" width = "300" height = "200"
usemap = "#links" />
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1157 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1158 Wednesday, August 29, 2001 3:08 PM
25.8 Use CallXML to create a voice mail system that plays a voice mail greeting and records the
message. Have friends and classmates call your application and leave a message.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1159 Wednesday, August 29, 2001 3:08 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25.fm Page 1160 Wednesday, August 29, 2001 3:08 PM
[***Notes To Reviewers***]
• Please list URLs that discuss Python-specific accessibility issues. We are conducting our own re-
search and will post this chapter for second round reviews after the inclusion of Python-specific
material.
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send us e-mails with detailed, line-by-line comments; mark these directly on the pa-
per pages.
• Please feel free to send any lengthy additional comments by e-mail to cheryl.yaeger@dei-
tel.net.
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copyedited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are concerned mostly with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing style on a global scale.
Please send us a short e-mail if you would like to make such a suggestion.
• Please be constructive. This book will be published soon. We all want to publish the best possible
book.
• If you find something that is incorrect, please show us how to correct it.
• Please read all the back matter including the exercises and any solutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_25IX.fm Page 1 Wednesday, August 29, 2001 3:07 PM
Index 1
2 Index
S V
screen reader 1113, 1114, 1132, value attribute 1132, 1133
1145, 1148 <var> tag (<var>…</var>)
scroll bar and window border size 1126
dialog 1137 var attribute 1131, 1132
sendEvent element 1132 version declaration 1128
session 1127 ViaVoice 1114, 1119
session attribute 1132 voice server 1119
sessionID 1127 Voice Server SDK 1.0 1119
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1161 Wednesday, August 29, 2001 3:47 PM
26
Introduction to XHTML:
Part 1
Objectives
• To understand important components of XHTML
documents.
• To use XHTML to create World Wide Web pages.
• To add images to Web pages.
• To understand how to create and use hyperlinks to
navigate Web pages.
• To mark up lists of information.
To read between the lines was easier than to follow the text.
Henry James
High thoughts must have high language.
Aristophanes
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1162 Wednesday, August 29, 2001 3:47 PM
Outline
26.1 Introduction
26.2 Editing XHTML
26.3 First XHTML Example
26.4 W3C XHTML Validation Service
26.5 Headers
26.6 Linking
26.7 Images
26.8 Special Characters and More Line Breaks
26.9 Unordered Lists
26.10 Nested and Ordered Lists
26.11 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
26.1 Introduction
Welcome to the world of opportunity created by the World Wide Web. The Internet is now
three decades old, but it was not until the World Wide Web became popular in the 1990s
that the explosion of opportunity that we are still experiencing began. Exciting new devel-
opments occur almost daily—the pace of innovation is unprecedented by any other tech-
nology. In this chapter, you will develop your own Web pages. As the book proceeds, you
will create increasingly appealing and powerful Web pages. In the later portion of the book,
you will learn how to create complete Web-based applications.
In this chapter, we begin unlocking the power of Web-based application development
with XHTML1—the Extensible Hypertext Markup Language. In later chapters, we intro-
duce more sophisticated XHTML techniques, such as tables, which are particularly useful
for structuring information from databases (i.e., software that stores structured sets of
data), and Cascading Style Sheets (CSS), which make Web pages more visually appealing.
Unlike procedural programming languages such as C, Fortran, Cobol and Pascal,
XHTML is a markup language that specifies the format of text that is displayed in a Web
browser such as Microsoft’s Internet Explorer or Netscape’s Communicator.
One key issue when using XHTML2 is the separation of the presentation of a docu-
ment (i.e., the document’s appearance when rendered by a browser) from the structure of
the document’s information. Over the next several chapters, we discuss this issue in depth.
1. XHTML has replaced the HyperText Markup Language (HTML) as the primary means of describ-
ing Web content. XHTML provides more robust, richer and extensible features than HTML. For
more on XHTML/HTML visit www.w3.org/markup.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1163 Wednesday, August 29, 2001 3:47 PM
Machines running specialized software called Web servers store XHTML documents.
Clients (e.g., Web browsers) request specific resources from the Web server. For example,
typing www.deitel.com/books/downloads.htm into a Web browser’s address
field requests downloads.htm from the Web server running at www.deitel.com.
This document resides in a directory named books. For now, we simply place the XHTML
documents on our machine and open them using Internet Explorer.
2. As this book was being submitted to the publisher, XHTML 1.1 became a World Wide Web Con-
sortium (W3C) Recommendation. The XHTML examples presented in this book are based upon
the XHTML 1.0 Recommendation, because Internet Explorer 5.5 does not support the full set of
XHTML 1.1 features. In the future, Internet Explorer and other browsers will support XHTML
1.1. When this occurs, we will update our Web site (www.deitel.com) with XHTML 1.1 ex-
amples and information.
3. All of the examples presented in this book are available at www.deitel.com and on the CD-
ROM that accompanies this book.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1164 Wednesday, August 29, 2001 3:47 PM
XHTML markup contains text that represents the content of a document and elements
that specify a document’s structure. Some important elements of an XHTML document
include the html element, the head element and the body element. The html element
encloses the head section (represented by the head element) and the body section (repre-
sented by the body element). The head section contains information about the XHTML
document, such as the title of the document. The head section also can contain special doc-
ument formatting instructions called style sheets and client-side programs called scripts for
creating dynamic Web pages. (We introduce style sheets in Chapter 28.) The body section
contains the page’s content that the browser displays when the user visits the Web page.
XHTML documents delimit an element with start and end tags. A start tag consists of
the element name in angle brackets (e.g., <html>). An end tag consists of the element
name preceded by a / in angle brackets (e.g., </html>). In this example lines 8 and 16
define the start and end of the html element. Note that the end tag on line 16 has the same
name as the start tag, but is preceded by a / inside the angle brackets. Many start tags define
attributes that provide additional information about an element. Browsers can use this addi-
tional information to determine how to process the element. Each attribute has a name and
a value separated by an equal sign (=). Line 8 specifies a required attribute (xmlns) and
value (http://www.w3.org/1999/xhtml) for the html element in an XHTML
document. Simply copy and paste the html element start tag on line 8 into your XHTML
documents.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1165 Wednesday, August 29, 2001 3:47 PM
An XHTML document divides the html element into two sections—head and body.
Lines 9–11 define the Web page’s head section with a head element. Line 10 specifies a
title element. This is called a nested element, because it is enclosed in the head ele-
ment’s start and end tags. The head element also is a nested element, because it is enclosed
in the html element’s start and end tags. The title element describes the Web page.
Titles usually appear in the title bar at the top of the browser window and also as the text
identifying a page when users add the page to their list of Favorites or Bookmarks,
which enable users to return to their favorite sites. Search engines (i.e., sites that allow users
to search the Web) also use the title for cataloging purposes.
Good Programming Practice 26.2
Indenting nested elements emphasizes a document’s structure and promotes readability. 26.2
Line 13 opens the document’s body element. The body section of an XHTML docu-
ment specifies the document’s content, which may include text and tags.
Some tags, such as the paragraph tags (<p> and </p>) in line 14, markup text for dis-
play in a browser. All text placed between the <p> and </p> tags form one paragraph.
When the browser renders a paragraph, a blank line usually precedes and follows paragraph
text.
This document ends with two closing tags (lines 15–16). These tags close the body
and html elements, respectively. The ending </html> tag in an XHTML document
informs the browser that the XHTML markup is complete.
To view this example in Internet Explorer, perform the following steps:
1. Copy the Chapter 26 examples onto your machine from the CD that accompanies
this book (or download the examples from www.deitel.com).
2. Launch Internet Explorer and select Open... from the File Menu. This displays
the Open dialog.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1166 Wednesday, August 29, 2001 3:47 PM
3. Click the Open dialog’s Browse... button to display the Microsoft Internet
Explorer file dialog.
4. Navigate to the directory containing the Chapter 26 examples and select the file
main.html, then click Open.
5. Click OK to have Internet Explorer render the document. Other examples are
opened in a similar manner.
At this point your browser window should appear similar to the sample screen capture
shown in Fig. 26.1. (Note that we resized the browser window to save space in the book.)
4. HTML (HyperText Markup Language) is the predecessor of XHTML designed for marking up
Web content. HTML is a deprecated technology.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1167 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1168 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1169 Wednesday, August 29, 2001 3:47 PM
26.5 Headers
Some text in an XHTML document may be more important than others. For example, the
text in this section is considered more important than a footnote. XHTML provides six
headers, called header elements, for specifying the relative importance of information.
Figure 26.5 demonstrates these elements (h1 through h6).
Portability Tip 26.1
The text size used to display each header element can vary significantly between browsers.
In Chapter 28, we discuss how to control the text size and other text properties. 26.1
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1170 Wednesday, August 29, 2001 3:47 PM
Header element h1 (line 15) is considered the most significant header and is rendered
in a larger font than the other five headers (lines 16–20). Each successive header element
(i.e., h2, h3, etc.) is rendered in a smaller font.
Look-and-Feel Observation 26.1
Placing a header at the top of every XHTML page helps viewers understand the purpose of
each page. 26.1
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1171 Wednesday, August 29, 2001 3:47 PM
26.6 Linking
One of the most important XHTML features is the hyperlink, which references (or links to)
other resources such as XHTML documents and images. In XHTML, both text and images
can act as hyperlinks. Web browsers typically underline text hyperlinks and color their text
blue by default, so that users can distinguish hyperlinks from plain text. In Fig. 26.6, we
create text hyperlinks to four different Web sites.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1172 Wednesday, August 29, 2001 3:47 PM
Line 17 introduces the <strong> tag. Browsers typically display text marked up
with <strong> in a bold font.
Links are created using the a (anchor) element. Line 20 defines a hyperlink that links
the text Deitel to the URL assigned to attribute href, which specifies the location of a
linked resource, such as a Web page, a file or an e-mail address. This particular anchor ele-
ment links to a Web page located at http://www.deitel.com. When a URL does not
indicate a specific document on the Web site, the Web server returns a default Web page.
This pages often is called index.html; however, most Web servers can be configured to
to use any file as the default Web page for the site. (Open http://www.deitel.com
in one browser window and http://www.deitel.com/index.html in a second
browser window to confirm that they are identical.) If the Web server cannot locate a
requested document, the server returns an error indication to the Web browser and the
browser displays an error message to the user.
Anchors can link to e-mail addresses using a mailto: URL. When someone clicks
this type of anchored link, most browsers launch the default e-mail program (e.g., Outlook
Express) to enable the user to write an e-mail message to the linked address. Figure 26.7
demonstrates this type of anchor.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1173 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1174 Wednesday, August 29, 2001 3:47 PM
Lines 17–19 contain an e-mail link. The form of an e-mail anchor is <a href =
"mailto:emailaddress">…</a>. In this case, we link to the e-mail address
[email protected].
26.7 Images
The examples discussed so far demonstrated how to mark up documents that contain only
text. However, most Web pages contain both text and images. In fact, images are an equal,
if not essential, part of Web-page design. The two most popular image formats used by
Web developers are Graphics Interchange Format (GIF) and Joint Photographic Experts
Group (JPEG) images. Users can create images using specialized pieces of software such
as Adobe PhotoShop Elements and Jasc Paint Shop Pro5. Images may also be acquired
from various Web sites, such as gallery.yahoo.com. Figure 26.8 demonstrates how
to incorporate images into Web pages.
Lines 15–16 use an img element to insert an image in the document. The image file’s
location is specified with the img element’s src attribute. In this case, the image is located
in the same directory as this XHTML document, so only the image’s file name is required.
Optional attributes width and height specify the image’s width and height, respec-
tively. The document author can scale an image by increasing or decreasing the values of
the image width and height attributes. If these attributes are omitted, the browser uses
the image’s actual width and height. Images are measured in pixels (“picture elements”),
which represent dots of color on the screen. The image in Fig. 26.8 is 183 pixels wide and
238 pixels high.
5. The CD-ROM that accompanies this book contains a 90-day evaluation version of Paint Shop
Pro™.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1175 Wednesday, August 29, 2001 3:47 PM
Every img element in an XHTML document has an alt attribute. If a browser cannot
render an image, the browser displays the alt attribute’s value. A browser may not be able
to render an image for several reasons. It may not support images—as is the case with a
text-based browser (i.e., a browser that can display only text)—or the client may have dis-
abled image viewing to reduce download time. Figure 26.8 shows Internet Explorer 5.5
rendering the alt attribute’s value when a document references a non-existent image file
(jhtp.jpg).
Some XHTML elements (called empty elements) contain only attributes and do not
markup text (i.e., text is not placed between the start and end tags). Empty elements (e.g.,
img) must be terminated, either by using the forward slash character (/) inside the closing
right angle bracket (>) of the start tag or by explicitly including the end tag. When using
the forward slash character, we add a space before the forward slash to improve readability
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1176 Wednesday, August 29, 2001 3:47 PM
(as shown at the ends of lines 16 and 18). Rather than using the forward slash character,
lines 17–18 could be written with a closing </img> tag as follows:
By using images as hyperlinks, Web developers can create graphical Web pages that
link to other resources. In Fig. 26.9, we create six different image hyperlinks.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1177 Wednesday, August 29, 2001 3:47 PM
Lines 17–20 create an image hyperlink by nesting an img element nested in an anchor
(a) element. The value of the img element’s src attribute value specifies that this image
(links.jpg) resides in a directory named buttons. The buttons directory and the
XHTML document are in the same directory. Images from other Web documents also can
be referenced (after obtaining permission from the document’s owner) by setting the src
attribute to the name and location of the image.
On line 20, we introduce the br element, which most browsers render as a line break.
Any markup or text following a br element is rendered on the next line. Like the img ele-
ment, br is an example of an empty element terminated with a forward slash. We add a
space before the forward slash to enhance readability.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1178 Wednesday, August 29, 2001 3:47 PM
results in a syntax error because it uses the less-than character (<), which is reserved for
start tags and end tags such as <p> and </p>. XHTML provides special characters or en-
tity references (in the form &code;) for representing these characters. We could correct the
previous line by writing
which uses the special character < for the less-than symbol. Figure 26.10 demonstrates
how to use special characters in an XHTML document.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1179 Wednesday, August 29, 2001 3:47 PM
Lines 27–28 contain other special characters, which are expressed as either word
abbreviations (e.g., amp for ampersand and copy for copyright) or hexadecimal (hex)
values (e.g., & is the hexadecimal representation of &). Hexadecimal numbers
are base 16 numbers—digits in a hexadecimal number have values from 0 to 15 (a total of
16 different values). The letters A–F represent the hexadecimal digits corresponding to dec-
imal values 10–15. Thus in hexadecimal notation we can have numbers like 876 consisting
solely of decimal-like digits, numbers like DA19F consisting of digits and letters, and num-
bers like DCB consisting solely of letters. We discuss hexadecimal numbers in detail in
Appendix C, Number Systems.
In lines 34–36, we introduce three new elements. Most browsers render the del ele-
ment as strike-through text. With this format users can easily indicate document revisions.
To superscript text (i.e., raise text on a line with a decreased font size) or subscript text (i.e.,
lower text on a line with a decreased font size), use the sup and sub elements, respec-
tively. We also use special characters < for a less-than sign and ¼ for the
fraction 1/4 (line 38).
In addition to special characters, this document introduces a horizontal rule, indicated
by the <hr /> tag in line 24. Most browsers render a horizontal rule as a horizontal line.
The <hr /> tag also inserts a line break above and below the horizontal line.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1180 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1181 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1182 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1183 Wednesday, August 29, 2001 3:47 PM
The first ordered list begins on line 33. Attribute type specifies the sequence type
(i.e., the set of numbers or letters used in the ordered list). In this case, setting type to "I"
specifies upper-case Roman numerals. Line 46 begins the second ordered list and sets
attribute type to "a", specifying lowercase letters for the list items. The last ordered list
(lines 64–68) does not use attribute type. By default, the list’s items are enumerated from
one to three.
A Web browser indents each nested list to indicate a hierarchal relationship. By
default, the items in the outermost unordered list (line 18) are preceded by discs. List items
nested inside the unordered list of line 18 are preceded by circles. Although not demon-
strated in this example, subsequent nested list items are preceded by squares. Unordered
list items may be explicitly set to discs, circles or squares by setting the ul element’s type
attribute to "disc", "circle" or "square", respectively.
Note: XHTML is based on HTML (HyperText Markup Language)—a legacy tech-
nology of the World Wide Web Consortium (W3C). In HTML, it was common to specify
the document’s content, structure and formatting. Formatting might specify where the
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1184 Wednesday, August 29, 2001 3:47 PM
browser places an element in a Web page or the fonts and colors used to display an element.
The so called strict form of XHTML allows only a document’s content and structure to
appear in a valid XHTML document, and not that document’s formatting. Our first several
examples used only the strict form of XHTML. In fact, the purpose of lines 2–3 in each of
the examples before Fig. 26.12 was to indicate to the browser that each document con-
formed to the strict XHTML definition. This enables the browser to confirm that the docu-
ment is valid. There are other XHTML document types as well. This particular example
uses the XHTML transitional document type. This document type exists to enable XHTML
document creators to use legacy HTML technologies in an XHTML document. In this
example, the type attribute of the ol element (lines 33 and 46) is a legacy HTML tech-
nology. Changing lines 2–3 as shown in this example, enables us to demonstrate ordered
lists with different numbering formats. Normally, such formatting is specified with style
sheets (Chapter 28).
Testing and Debugging Tip 26.2
Most current browsers still attempt to render XHTML documents, even if they are invalid. 26.2
SUMMARY
• XHTML (Extensible Hypertext Markup Language) is a markup language for creating Web pages.
• A key issue when using XHTML is the separation of the presentation of a document (i.e., the doc-
ument’s appearance when rendered by a browser) from the structure of the information in the doc-
ument.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1185 Wednesday, August 29, 2001 3:47 PM
• In XHTML, text is marked up with elements, delimited by tags that are names contained in pairs
of angle brackets. Some elements may contain additional markup called attributes, which provide
additional information about the element.
• A machine that runs specialized piece of software called a Web server stores XHTML documents.
• XHTML documents that are syntactically correct are guaranteed to render properly. XHTML doc-
uments that contain syntax errors may not display properly.
• Validation services (e.g., validator.w3.org) ensure that an XHTML document is syntacti-
cally correct.
• Every XHTML document contains a start <html> tag and an end </html> tag.
• Comments in XHTML always begin with <!-- and end with -->. The browser ignores all text
inside a comment.
• Every XHTML document contains a head element, which generally contains information, such
as a title, and a body element, which contains the page content. Information in the head element
generally is not rendered in the display window but may be made available to the user through oth-
er means.
• The title element names a Web page. The title usually appears in the colored bar (called the
title bar) at the top of the browser window and also appears as the text identifying a page when
users add your page to their list of Favorites or Bookmarks.
• The body of an XHTML document is the area in which the document’s content is placed. The con-
tent may include text and tags.
• All text placed between the <p> and </p> tags form one paragraph.
• XHTML provides six headers (h1 through h6) for specifying the relative importance of informa-
tion. Header element h1 is considered the most significant header and is rendered in a larger font
than the other five headers. Each successive header element (i.e., h2, h3, etc.) is rendered in a
smaller font.
• Web browsers typically underline text hyperlinks and color them blue by default.
• The <strong> tag renders text in a bold font.
• Users can insert links with the a (anchor) element. The most important attribute for the a element
is href, which specifies the resource (e.g., page, file, e-mail address, etc.) being linked.
• Anchors can link to an e-mail address using a mailto: URL. When someone clicks this type of
anchored link, most browsers launch the default e-mail program (e.g., Outlook Express) to initiate
e-mail messages to the linked addresses.
• The img element’s src attribute specifies an image’s location. Optional attributes width and
height specify the image width and height, respectively. Images are measured in pixels (“picture
elements”), which represent dots of color on the screen. Every img element in a valid XHTML
document must have an alt attribute, which contains text that is displayed if the client cannot ren-
der the image.
• The alt attribute makes Web pages more accessible to users with disabilities, especially those
with vision impairments.
• Some XHTML elements are empty elements and contain only attributes and do not mark up text.
Empty elements (e.g., img) must be terminated, either by using the forward slash character (/) or
by explicitly writing an end tag.
• The br element causes most browsers to render a line break. Any markup or text following a br
element is rendered on the next line.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1186 Wednesday, August 29, 2001 3:47 PM
• XHTML provides special characters or entity references (in the form &code;) for representing
characters that cannot be marked up.
• Most browsers render a horizontal rule, indicated by the <hr /> tag, as a horizontal line. The hr
element also inserts a line break above and below the horizontal line.
• The unordered list element ul creates a list in which each item in the list begins with a bullet sym-
bol (called a disc). Each entry in an unordered list is an li (list item) element. Most Web browsers
render these elements with a line break and a bullet symbol at the beginning of the line.
• Lists may be nested to represent hierarchical data relationships.
• Attribute type specifies the sequence type (i.e., the set of numbers or letters used in the ordered
list).
TERMINOLOGY
<!--…--> (XHTML comment)
a element (<a>…</a>)
alt attribute
& (& special character)
anchor
angle brackets (< >)
attribute
body element
br (line break) element
comments in XHTML
© (© special character)
disc
element
e-mail anchor
empty tag
Extensible Hypertext Markup Language (XHTML)
head element
header
header elements (h1 through h6)
height attribute
hexadecimal code
<hr /> tag (horizontal rule)
href attribute
.htm (XHTML file-name extension)
<html> tag
.html (XHTML file-name extension)
hyperlink
image hyperlink
img element
level of nesting
<li> (list item) tag
linked document
mailto: URL
markup language
nested list
ol (ordered list) element
p (paragraph) element
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1187 Wednesday, August 29, 2001 3:47 PM
special character
src attribute (img)
<strong> tag
sub element
subscript
superscript
syntax
tag
text editor
text editor
title element
type attribute
unordered-list element (ul)
valid document
Web page
width attribute
World Wide Web (WWW)
XHTML (Extensible Hypertext Markup Language)
XHTML comment
XHTML markup
XHTML tag
XML declaration
xmlns attribute
SELF-REVIEW EXERCISES
26.1 State whether the following are true or false. If false, explain why.
a) Attribute type, when used with an ol element, specifies a sequence type.
b) An ordered list cannot be nested inside an unordered list.
c) XHTML is an acronym for XML HTML.
d) Element br represents a line break.
e) Hyperlinks are marked up with <link> tags.
26.2 Fill in the blanks in each of the following:
a) The element inserts a horizontal rule.
b) A superscript is marked up using element and a subscript is marked up using
element .
c) The least important header element is and the most important header element
is .
d) Element marks up an unordered list.
e) Element marks up a paragraph.
EXERCISES
26.3 Use XHTML to create a document that contains the to mark up the following text:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1188 Wednesday, August 29, 2001 3:47 PM
Use h1 for the title (the first line of text), p for text (the second and third lines of text) and sub for
each word that begins with a capital letter. Insert a horizontal rule between the h1 element and the p
element. Open your new document in a Web browser to view the marked up document.
26.4 Why is the following markup invalid?
26.6 An image named deitel.gif is 200 pixels wide and 150 pixels high. Use the width and
height attributes of the <img> tag to (a) increase the size of the image by 100%; (b) increase the
size of the image by 50%; and (c) change the width-to-height ratio to 2:1, keeping the width attained
in part (a). Write separate XHTML statements for parts (a), (b) and (c).
26.7 Create a link to each of the following: (a) index.html, located in the files directory;
(b) index.html, located in the text subdirectory of the files directory; (c) index.html, lo-
cated in the other directory in your parent directory [Hint: .. signifies parent directory.]; (d) A
link to the President of the United States’ e-mail address ([email protected]); and
(e) An FTP link to the file named README in the pub directory of ftp.cdrom.com [Hint: Use
ftp://.].
26.8 Create an XHTML document that marks up your resume.
26.9 Create an XHTML document containing three ordered lists: ice cream, soft serve and frozen
yogurt. Each ordered list should contain a nested, unordered list of your favorite flavors. Provide a
minimum of three flavors in each unordered list.
26.10 Create an XHTML document that uses an image as an e-mail link. Use attribute alt to pro-
vide a description of the image and link.
26.11 Create an XHTML document that contains an ordered list of your favorite Web sites. Your
page should contain the header “My Favorite Web Sites.”
26.12 Create an XHTML document that contains links to all the examples presented in this chapter.
[Hint: Place all the chapter examples in one directory.]
26.13 Modify the XHTML document (picture.html) in Fig. 26.8 by removing all end tags.
Validate this document using the W3C validation service. What happens? Next remove the alt at-
tributes from the <img> tags and revalidate your document. What happens?
26.14 Identify each of the following as either an element or an attribute:
a) html.
b) width.
c) href.
d) br.
e) h3.
f) a.
g) src.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1189 Wednesday, August 29, 2001 3:47 PM
26.15 State which of the following statements are true and which are false. If false, explain why.
a) A valid XHTML document can contain uppercase letters in element names.
b) Tags need not be closed in a valid XHTML document.
c) XHTML documents can have the file extension .htm.
d) Valid XHTML documents can contain tags that overlap.
e) &less; is the special character for the less-than (<) character.
f) In a valid XHTML document, <li> can be nested inside either <ol> or <ul> tags.
26.16 Fill in the blanks for each of the following:
a) XHTML comments begin with <!-- and end with .
b) In XHTML, attribute values must be enclosed in .
c) is the special character for an ampersand.
d) Element can be used to bold text.
[***DUMP FILE***]
EXERCISES
26.3 Use XHTML to mark up the following text:
Python How to Program
Welcome to the world of Python programming. We have provided extensive coverage on Python.
Use h1 for the title, p for text and sub for each world that begins with a capital letter. Insert a hori-
zontal rule between the h1 element and the p element.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1190 Wednesday, August 29, 2001 3:47 PM
ANS:
ANS: According to the XHTML specification, the <p> start tag must have a closing
</p> tag.
26.5 Why is the following markup invalid?
ANS: According to the XHTML specification, the <br> tag must have a closing
</br> tag or be written as an empty element <br />.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1191 Wednesday, August 29, 2001 3:47 PM
26.6 An image named deitel.gif that is 200 pixels wide and 150 pixels high. Use the WIDTH
and HEIGHT attributes of the IMG tag to
a) increase image size by 100%;
ANS: <img src = "deitel.gif" width = "400" height = "300">
b) increase image size by 50%;
ANS: <img src = "deitel.gif" width = "300" height = "225">
c) change the width-to-height ratio to 2:1, keeping the width attained in a).
ANS: <img src = "deitel.gif" width = "400" height = "200">
26.7 Create a link to each of the following:
a) index.html, located in the files directory;
ANS: <a href = "files/index.html">
b) index.html, located in the text subdirectory of the files directory;
ANS: <a href = "files/text/index.html">
c) index.html, located in the other directory in your parent directory (Hint:.. signi-
fies parent directory.);
ANS: <a href = "../other/index.html">
d) A link to the President’s email address ([email protected]);
ANS: <a href = "mailto:[email protected]">
e) An FTP link to the file named README in the pub directory of ftp.cdrom.com
(Hint: remember to use ftp://).
ANS: <a href = "ftp://ftp.cdrom.com/pub/README">
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1192 Wednesday, August 29, 2001 3:47 PM
28 <ul>
29 <li>A position as an XHTML programmer</li>
30 </ul>
31 <h2>Education</h2>
32 <ul>
33 <li>Boston College, Chestnut Hill, MA<br />
34 Computer Science Major, YOG 2003<br />
35 Dean's List Fall 2001, Spring 2001</li>
36 </ul>
37 <h2>Skills</h2>
38 <ul>
39 <li>Computers</li>
40 <li>Programming</li>
41 <li>Typing, 55WPM</li>
42 <ul> <!-- start of nested list -->
43 <li>XHTML</li>
44 <li>Python</li>
45 <li>Cascading Style Sheets</li>
46 </ul> <!-- end of nested list -->
47
48 </li>
49
50 <li>Teamwork</li>
51 </ul>
52 <h2>Experience</h2>
53 <ul>
54 <li>Deitel & Associates,
55 Sudbury, MA, Summer 2000</li>
56 <li>Microsoft, Seattle, WA, Summer
57 1999</li>
58 <li>Computer Plus, Waltham, MA,
59 Spring 1999</li>
60 </ul>
61 <h2>Interests and Activities</h2>
62 <ul>
63 <li>Soccer</li>
64 <li>Guitar</li>
65 <li>Music</li>
66 <li>Student Government</li>
67 </ul>
68
69 <!-- end of unordered lists -->
70
71 <hr />
72
73 </body>
74 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1193 Wednesday, August 29, 2001 3:47 PM
26.9 Create an XHTML document containing three ordered lists: ice cream, soft serve and frozen
yogurt. Each ordered list should contain a nested, unordered of your favorite flavors. Provide a min-
imum of three flavors in each unordered list.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1194 Wednesday, August 29, 2001 3:47 PM
13 Flavors</h3>
14 <!-- start of ordered list -->
15 <ol>
16 <li>Ice Cream
17
18 <!-- start of nested unordered list -->
19 <ul>
20 <li>Cherry Garcia</li>
21 <li>Cookie Dough</li>
22 <li>Bubble Gum</li>
23 <li>Coffee</li>
24 </ul>
25
26 </li>
27
28 <li>Soft Serve
29
30 <!-- another nested unordered list -->
31 <ul>
32 <li>Vanilla</li>
33 <li>Chocolate</li>
34 <li>Strawberry</li>
35 </ul>
36
37 </li>
38
39 <li>Frozen Yogurt
40
41 <!-- another nested unordered list -->
42 <ul>
43 <li>Vanilla</li>
44 <li>Heathbar Crunch</li>
45 <li>Chocolate</li>
46 </ul>
47
48 </li>
49 </ol>
50 </body>
51 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1195 Wednesday, August 29, 2001 3:47 PM
26.10 Create an XHTML document that uses an image as an e-mail link. Use attribute alt to pro-
vide a description of the image and link.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1196 Wednesday, August 29, 2001 3:47 PM
26.11 Create an XHTML document that contains an ordered list of your favorite Web sites. Your
page should contain the header “My Favorite Web Sites.”
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1197 Wednesday, August 29, 2001 3:47 PM
26.12 Create an XHTML document that contains links to all the examples presented in this chapter.
[Hint: Place all the chapter examples in one directory.]
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1198 Wednesday, August 29, 2001 3:47 PM
29 Lists in XHTML</a></li>
30 <li><a href = "list">Figure 26.12 Nested and
31 ordered lists in XHTML</a></li>
32 </ul>
33 </body>
34 </html>
26.13 Modify the XHTML document (picture.html) in Fig. 26.8 by removing all end tags.
Validate this document using the W3C validation service. What happens? Next remove the alt at-
tributes from the <img> tags and re-validate your document. What happens?
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1199 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1200 Wednesday, August 29, 2001 3:47 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1201 Wednesday, August 29, 2001 3:47 PM
26.15 State which of the following statements are true and which are false. If false, explain why.
a) A valid XHTML document can contain uppercase letters in element names.
ANS: False. All XHTML element names must be in lowercase.
b) Tags need not be closed in a valid XHTML document.
ANS: False. All XHTML tags are required to have corresponding closing tags.
c) XHTML can have the file extension.htm.
ANS: True.
d) Valid XHTML documents can contain tags that overlap.
ANS: False. XHTML prohibits overlapping tags.
e) &less; is the special character for the less-than (<) character.
ANS: False. < is the special character for less-than.
f) In a valid XHTML document, <li> can be nested inside either <ol> or <ul> tags.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1202 Wednesday, August 29, 2001 3:47 PM
ANS: True.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26.fm Page 1203 Wednesday, August 29, 2001 3:47 PM
[***Notes To Reviewers***]
• This chapter will be sent for second-round reviews.
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send us e-mails with detailed, line-by-line comments; mark these directly on the pa-
per pages.
• Please feel free to send any lengthy additional comments by e-mail to cheryl.yaeger@dei-
tel.net.
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copyedited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are concerned mostly with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing style on a global scale.
Please send us a short e-mail if you would like to make such a suggestion.
• Please be constructive. This book will be published soon. We all want to publish the best possible
book.
• If you find something that is incorrect, please show us how to correct it.
• Please read all the back matter including the exercises and any solutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_26IX.fm Page 1 Wednesday, August 29, 2001 3:30 PM
Index 1
2 Index
T
table 1162
text-based browser 1175
text editor 1163
title bar 1165
title element 1165
title of a document 1164
type attribute 1183
U
ul element 1180
unordered list 1180
unordered list element (ul) 1180
V
validation service 1166
validator.w3.org 1166,
1184
validator.w3.org/file-
upload.html 1166
value of an attribute 1164
vi text editor 1163
W
W3C (World Wide Web
Consortium) 1166, 1183
W3C Recommendation 1163
Web page 1162
Web server 1163
Web-based application 1162
width attribute 1174, 1175
width-to-height ratio 1175
Wordpad 1163
World Wide Web (WWW) 1162
World Wide Web Consortium
(W3C) 1166
www.deitel.com 1172
www.w3.org/markup 1162
www.w3.org/TR/xhtml1
1184
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1205 Wednesday, August 29, 2001 3:45 PM
27
Introduction to XHTML:
Part 2
Objectives
• To create tables with rows and columns of data.
• To control table formatting.
• To create and use forms.
• To create and use image maps to aid in Web-page
navigation.
• To make Web pages accessible to search engines
using <meta> tags.
• To se the frameset element to display multiple
Web pages in a single browser window.
Yea, from the table of my memory
I’ll wipe away all trivial fond records.
William Shakespeare
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1206 Wednesday, August 29, 2001 3:45 PM
Outline
27.1 Introduction
27.2 Basic XHTML Tables
27.3 Intermediate XHTML Tables and Formatting
27.4 Basic XHTML Forms
27.5 More Complex XHTML Forms
27.6 Internal Linking
27.7 Creating and Using Image Maps
27.8 meta Elements
27.9 frameset Element
27.10 Nested framesets
27.11 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
27.1 Introduction
In the previous chapter, we introduced XHTML. We built several complete Web pages fea-
turing text, hyperlinks, images, horizontal rules and line breaks. In this chapter, we discuss
more substantial XHTML features, including presentation of information in tables and in-
corporating forms for collecting information from a Web-page visitor. We also introduce
internal linking and image maps for enhancing Web page navigation and frames for dis-
playing multiple documents in the browser.
By the end of this chapter, you will be familiar with the most commonly used XHTML
features and will be able to create more complex Web documents. In Chapter 28, we dis-
cuss how to make Web pages more visually appealing by manipulating fonts, colors and
text.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1207 Wednesday, August 29, 2001 3:45 PM
61 <th>Total</th>
62 <th>$3.75</th>
63 </tr>
64 </tfoot>
65
66 </table>
67
68 </body>
69 </html>
Table
header
Table
body
Table
footer
Table
border
Tables are defined with the table element. Lines 16–18 specify the start tag for a
table element that has several attributes. The border attribute specifies the table’s border
width in pixels. To create a table without a border, set border to "0". This example
assigns attribute width "40%" to set the table’s width to 40 percent of the browser’s
width. A developer can also set attribute width to a specified number of pixels.
Testing and Debugging Tip 27.1
Try resizing the browser window to see how the width of the window affects the width of the
table. 27.1
As its name implies, attribute summary (line 17) describes the table’s contents.
Speech devices use this attribute to make the table more accessible to users with visual
impairments. The caption element (line 22) describes the table’s content and helps text-
based browsers interpret the table data. Text inside the <caption> tag is rendered above
the table by most browsers. Attribute summary and element caption are two of many
XHTML features that make Web pages more accessible to users with disabilities.
A table has three distinct sections—head, body and foot. The head section (or header
cell) is defined with a thead element (lines 26–31), which contains header information
such as column names. Each tr element (lines 27–30) defines an individual table row. The
columns in the head section are defined with th elements. Most browsers center and dis-
play text formatted by th (table header column) elements in bold. Table header elements
are nested inside table row elements.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1209 Wednesday, August 29, 2001 3:45 PM
The body section, or table body, contains the table’s primary data. The table body
(lines 35–55) is defined in a tbody element. Data cells contain individual pieces of data
and are defined with td (table data) elements.
The foot section (lines 59–64) is defined with a tfoot (table foot) element and rep-
resents a footer. Common text placed in the footer includes calculation results and foot-
notes. Like other sections, the foot section can contain table rows and each row can contain
columns.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1210 Wednesday, August 29, 2001 3:45 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1211 Wednesday, August 29, 2001 3:45 PM
74
75 </table>
76
77 </body>
78 </html>
Line 42 introduces attribute valign, which aligns data vertically and may be
assigned one of four values—"top" aligns data with the top of the cell, "middle" ver-
tically centers data (the default for all data and header cells), "bottom" aligns data with
the bottom of the cell and "baseline" ignores the fonts used for the row data and sets
the bottom of all text in the row on a common baseline (i.e., the horizontal line to which
each character in a word is aligned).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1212 Wednesday, August 29, 2001 3:45 PM
Data that users enter on a Web page normally is sent to a Web server that provides
access to a site’s resources (e.g., XHTML documents, images, etc.). These resources are
either located on the same machine as the Web server or on a machine that the Web server
can access through the network. When a browser requests a Web page or file that is located
on a server, the server processes the request and returns the requested resource. A request
contains the name and path of the desired resource and the method of communication
(called a protocol). XHTML documents use the HyperText Transfer Protocol (HTTP).
Figure 27.3 sends the form data to the Web server which passes the form data to a CGI
(Common Gateway Interface) script (i.e., a program) written in Perl, C or some other lan-
guage. The script processes the data received from the Web server and typically returns
information to the Web server. The Web server then sends the information in the form of
an XHTML document to the Web browser. [Note: This example demonstrates client-side
functionality. If the form is submitted (by clicking Submit Your Entries) an error occurs.
In later chapters such as Perl and Python, we present the server-side programming neces-
sary to process information entered into a form.]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1213 Wednesday, August 29, 2001 3:45 PM
34 </p>
35
36 <!-- <input type = "text"> inserts a text box -->
37 <p><label>Name:
38 <input name = "name" type = "text" size = "25"
39 maxlength = "30" />
40 </label></p>
41
42 <p>
43 <!-- input types "submit" and "reset" insert -->
44 <!-- buttons for submitting and clearing the -->
45 <!-- form's contents -->
46 <input type = "submit" value =
47 "Submit Your Entries" />
48 <input type = "reset" value =
49 "Clear Your Entries" />
50 </p>
51
52 </form>
53
54 </html>
Fig. 27.3 Simple form with hidden fields and a text box (part 2 of 2).
Forms can contain visual and non-visual components. Visual components include
clickable buttons and other graphical user interface components with which users interact.
Non-visual components, called hidden inputs, store data that the document author specifies,
such as e-mail addresses and XHTML document file names that act as links. The form
begins on line 23 with the form element. Attribute method specifies how the form’s data
is sent to the Web server.
Using method = "post" appends form data to the browser request, which contains
the protocol (i.e., HTTP) and the requested resource’s URL. Scripts located on the Web
server’s computer (or on a computer accessible through the network) can access the form
data sent as part of the request. For example, a script may take the form information and
update an electronic mailing list. The other possible value, method = "get" appends the
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1214 Wednesday, August 29, 2001 3:45 PM
form data directly to the end of the URL. For example, the URL /cgi-bin/formmail
might have the form information name = bob appended to it.
The action attribute in the <form> tag specifies the URL of a script on the Web
server; in this case, it specifies a script that e-mails form data to an address. Most Internet
Service Providers (ISPs) have a script like this on their site; ask the Web site system admin-
istrator how to set up an XHTML document to use the script correctly.
Lines 28–33 define three input elements that specify data to provide to the script that
processes the form (also called the form handler). These three input element have type
attribute "hidden", which allows the document author to send form data that is not
entered by a user to a script.
The three hidden inputs are: an e-mail address to which the data will be sent, the e-
mail’s subject line and a URL where the browser will be redirected after submitting the
form. Two other input attributes are name, which identifies the input element, and
value, which provides the value that will be sent (or posted) to the Web server.
Good Programming Practice 27.1
Place hidden input elements at the beginning of a form, immediately after the opening
<form> tag. This placement allows document authors to locate hidden input elements
quickly. 27.1
We introduce another type of input in lines 38–39. The "text" input inserts a
text box into the form. Users can type data in text boxes. The label element (lines 37–40)
provides users with information about the input element’s purpose.
Common Programming Error 27.2
Forgetting to include a label element for each form element is a design error. Without
these labels, users cannot determine the purpose of individual form elements. 27.2
The input element’s size attribute specifies the number of characters visible in the
text box. Optional attribute maxlength limits the number of characters input into the text
box. In this case, the user is not permitted to type more than 30 characters into the text box.
There are two types of input elements in lines 46–49. The "submit" input ele-
ment is a button. When the user presses a "submit" button, the browser sends the data in
the form to the Web server for processing. The value attribute sets the text displayed on
the button (the default value is Submit). The "reset" input element allows a user to
reset all form elements to their default values. The value attribute of the "reset"
input element sets the text displayed on the button (the default value is Reset).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1215 Wednesday, August 29, 2001 3:45 PM
place the text between the <textarea> and </textarea> tags. Default text can be
specified in other input types, such as text boxes, by using the value attribute.
Fig. 27.4 Form with textareas, password boxes and checkboxes (part 1 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1216 Wednesday, August 29, 2001 3:45 PM
50 <p>
51 <strong>Things you liked:</strong><br />
52
53 <label>Site design
54 <input name = "thingsliked" type = "checkbox"
55 value = "Design" /></label>
56
57 <label>Links
58 <input name = "thingsliked" type = "checkbox"
59 value = "Links" /></label>
60
61 <label>Ease of use
62 <input name = "thingsliked" type = "checkbox"
63 value = "Ease" /></label>
64
65 <label>Images
66 <input name = "thingsliked" type = "checkbox"
67 value = "Images" /></label>
68
69 <label>Source code
70 <input name = "thingsliked" type = "checkbox"
71 value = "Code" /></label>
72 </p>
73
74 <p>
75 <input type = "submit" value =
76 "Submit Your Entries" />
77 <input type = "reset" value =
78 "Clear Your Entries" />
79 </p>
80
81 </form>
82 </html>
Fig. 27.4 Form with textareas, password boxes and checkboxes (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1217 Wednesday, August 29, 2001 3:45 PM
Fig. 27.4 Form with textareas, password boxes and checkboxes (part 3 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1218 Wednesday, August 29, 2001 3:45 PM
The "password" input in lines 46–47, inserts a password box with the specified
size. A password box allows users to enter sensitive information, such as credit card num-
bers and passwords, by “masking” the information input with asterisks. The actual value
input is sent to the Web server, not the asterisks that mask the input.
Lines 54–71 introduce the checkbox form element. Checkboxes enable users to select
from a set of options. When a user selects a checkbox, a check mark appears in the check
box. Otherwise, the checkbox remains empty. Each "checkbox" input creates a new
checkbox. Checkboxes can be used individually or in groups. Checkboxes that belong to a
group are assigned the same name (in this case, "thingsliked").
Common Programming Error 27.3
When your form has several checkboxes with the same name, you must make sure that they
have different values, or the scripts running on the Web server will not be able to distin-
guish between them. 27.3
We continue our discussion of forms by presenting a third example that introduces sev-
eral more form elements from which users can make selections (Fig. 27.5). In this example,
we introduce two new input types. The first type is the radio button (lines 76–94) spec-
ified with type "radio". Radio buttons are similar to checkboxes, except that only one
radio button in a group of radio buttons may be selected at any time. All radio buttons in a
group have the same name attributes and are distinguished by their different value
attributes. The attribute-value pair checked = "checked" (line 77) indicates which
radio button, if any, is selected initially. The checked attribute also applies to check-
boxes.
Fig. 27.5 Form including radio buttons and drop-down lists (part 1 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1219 Wednesday, August 29, 2001 3:45 PM
Fig. 27.5 Form including radio buttons and drop-down lists (part 2 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1220 Wednesday, August 29, 2001 3:45 PM
78 </label>
79
80 <label>Links from another site
81 <input name = "howtosite" type = "radio"
82 value = "link" /></label>
83
84 <label>Deitel.com Web site
85 <input name = "howtosite" type = "radio"
86 value = "deitel.com" /></label>
87
88 <label>Reference in a book
89 <input name = "howtosite" type = "radio"
90 value = "book" /></label>
91
92 <label>Other
93 <input name = "howtosite" type = "radio"
94 value = "other" /></label>
95
96 </p>
97
98 <p>
99 <label>Rate our site:
100
101 <!-- the <select> tag presents a drop-down -->
102 <!-- list with choices indicated by the -->
103 <!-- <option> tags -->
104 <select name = "rating">
105 <option selected = "selected">Amazing</option>
106 <option>10</option>
107 <option>9</option>
108 <option>8</option>
109 <option>7</option>
110 <option>6</option>
111 <option>5</option>
112 <option>4</option>
113 <option>3</option>
114 <option>2</option>
115 <option>1</option>
116 <option>Awful</option>
117 </select>
118
119 </label>
120 </p>
121
122 <p>
123 <input type = "submit" value =
124 "Submit Your Entries" />
125 <input type = "reset" value = "Clear Your Entries" />
126 </p>
127
128 </form>
129
130 </body>
Fig. 27.5 Form including radio buttons and drop-down lists (part 3 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1221 Wednesday, August 29, 2001 3:45 PM
131 </html>
Fig. 27.5 Form including radio buttons and drop-down lists (part 4 of 5).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1222 Wednesday, August 29, 2001 3:45 PM
Fig. 27.5 Form including radio buttons and drop-down lists (part 5 of 5).
The select element (lines 104–117) provides a drop-down list of items from which
the user can select an item. The name attribute identifies the drop-down list. The option
element (lines 105–116) adds items to the drop-down list. The option element’s
selected attribute specifies which item initially is displayed as the selected item in the
select element.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1223 Wednesday, August 29, 2001 3:45 PM
Fig. 27.6 Using internal hyperlinks to make pages more navigable (part 1 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1224 Wednesday, August 29, 2001 3:45 PM
44 <li>Scripts</li>
45 <li>New languages</li>
46 </ul>
47 </li>
48 </ul>
49 </li>
50
51 <li>Links</li>
52 <li>Keeping in touch with old friends</li>
53 <li>It is the technology of the future!</li>
54 </ul>
55
56 <!-- named anchor -->
57 <p><a name = "ceos"></a></p>
58 <h1>My 3 Favorite <em>CEOs</em></h1>
59
60 <p>
61
62 <!-- internal hyperlink to features -->
63 <a href = "#features">Go to <em>Favorite Features</em>
64 </a></p>
65
66 <ol>
67 <li>Bill Gates</li>
68 <li>Steve Jobs</li>
69 <li>Michael Dell</li>
70 </ol>
71
72 </body>
73 </html>
Fig. 27.6 Using internal hyperlinks to make pages more navigable (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1225 Wednesday, August 29, 2001 3:45 PM
Fig. 27.6 Using internal hyperlinks to make pages more navigable (part 3 of 3).
Although not demonstrated in this example, a hyperlink can specify an internal link in
another document by specifying the document name followed by a pound sign and the
named anchor, as in:
href = "page.html#name"
For example, to link to a named anchor called booklist in books.html, href is as-
signed "books.html#booklist".
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1226 Wednesday, August 29, 2001 3:45 PM
Fig. 27.7 Image with links anchored to an image map (part 1 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1227 Wednesday, August 29, 2001 3:45 PM
Fig. 27.7 Image with links anchored to an image map (part 2 of 2).
Lines 20–48 define image maps using a map element. Attribute id (line 20) identifies
the image map. If id is omitted, the map cannot be referenced by an image. We discuss
how to reference an image map momentarily. Hotspots are defined with area elements (as
shown on lines 25–27). Attribute href (line 25) specifies the link’s target (i.e., the
resource to which to link). Attributes shape (line 25) and coords (line 26) specify the
hotspot’s shape and coordinates, respectively. Attribute alt (line 27) provides alternate
text for the link.
Common Programming Error 27.5
Not specifying an id attribute for a map element prevents an img element from using the
map’s area elements to define hotspots. 27.1
The markup on lines 25–27 creates a rectangular hotspot (shape = "rect") for the
coordinates specified in the coords attribute. A coordinate pair consists of two numbers
representing the location of a point on the x-axis and the y-axis, respectively. The x-axis
extends horizontally and the y-axis extends vertically from the upper-left corner of the
image. Every point on an image has a unique x-y-coordinate. For rectangular hotspots, the
required coordinates are those of the upper-left and lower-right corners of the rectangle. In
this case, the upper-left corner of the rectangle is located at 2 on the x-axis and 123 on the
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1228 Wednesday, August 29, 2001 3:45 PM
y-axis, annotated as (2, 123). The lower-right corner of the rectangle is at (54, 143). Coor-
dinates are measured in pixels.
Common Programming Error 27.6
Overlapping coordinates of an image map cause the browser to render the first hotspot it en-
counters for the area. 27.1
The map area (lines 39–41) assigns the shape attribute "poly" to create a hotspot
in the shape of a polygon using the coordinates in attribute coords. These coordinates
represent each vertex, or corner, of the polygon. The browser connects these points with
lines to form the hotspot’s area.
The map area (lines 45–47) assigns the shape attribute "circle" to create a cir-
cular hotspot. In this case, the coords attribute specifies the circle’s center coordinates
and the circle’s radius, in pixels.
To use an image map with an img element, the img element’s usemap attribute is
assigned the id of a map. Lines 52–53 reference the image map named "picture". The
image map resides within the same document, so we use internal linking.
Fig. 27.8 Using meta to provide keywords and a description (part 1 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1229 Wednesday, August 29, 2001 3:45 PM
Fig. 27.8 Using meta to provide keywords and a description (part 2 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1230 Wednesday, August 29, 2001 3:45 PM
cifically for framesets. This new document type is specified in lines 2–3 and is required for
documents that define framesets.
Fig. 27.9 Web document containing two frames—navigation and content (part 1 of
2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1231 Wednesday, August 29, 2001 3:45 PM
Right
Left frame frame
leftframe main
Fig. 27.9 Web document containing two frames—navigation and content (part 2 of
2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1232 Wednesday, August 29, 2001 3:45 PM
A document that defines a frameset normally consists of an html element that con-
tains a head element and a frameset element. The <frameset> tag (line 23) informs
the browser that a page contains frames. Attribute cols specifies the frameset’s column
layout. The value of cols gives the width of each frame, either in pixels or as a percentage
of the browser width. In this case, the attribute cols = "110,*" informs the browser that
there are two vertical frames. The first frame extends 110 pixels from the left edge of the
browser window and the second frame fills the remainder of the browser width (as indi-
cated by the asterisk). Similarly, frameset attribute rows specifies the number of rows
and the size of each row in a frameset.
The documents that will be loaded into the frameset are specified with frame ele-
ments (lines 27–28 in this example). Attribute src specifies the URL of the page to display
in the frame. Each frame has name and src attributes. The first frame (which covers 110
pixels on the left side of the frameset) is named leftframe and displays the page
nav.html (Fig. 27.10). The second frame is named main and displays the page
main.html.
Attribute name identifies a frame, enabling hyperlinks in a frameset to specify the
target frame in which a linked document should display when the user clicks the link.
For example
Fig. 27.10 is the Web page displayed in the left frame of Fig. 27.9. This XHTML doc-
ument provides the navigation buttons that, when clicked, determine which document is
displayed in the right frame.
Fig. 27.10 XHTML document displayed in the left frame of Fig. 27.9 (part 1 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1233 Wednesday, August 29, 2001 3:45 PM
15 <body>
16
17 <p>
18 <a href = "links.html" target = "main">
19 <img src = "buttons/links.jpg" width = "65"
20 height = "50" alt = "Links Page" />
21 </a><br />
22
23 <a href = "list.html" target = "main">
24 <img src = "buttons/list.jpg" width = "65"
25 height = "50" alt = "List Example Page" />
26 </a><br />
27
28 <a href = "contact.html" target = "main">
29 <img src = "buttons/contact.jpg" width = "65"
30 height = "50" alt = "Contact Page" />
31 </a><br />
32
33 <a href = "header.html" target = "main">
34 <img src = "buttons/header.jpg" width = "65"
35 height = "50" alt = "Header Page" />
36 </a><br />
37
38 <a href = "table1.html" target = "main">
39 <img src = "buttons/table.jpg" width = "65"
40 height = "50" alt = "Table Page" />
41 </a><br />
42
43 <a href = "form.html" target = "main">
44 <img src = "buttons/form.jpg" width = "65"
45 height = "50" alt = "Feedback Form" />
46 </a><br />
47 </p>
48
49 </body>
50 </html>
Fig. 27.10 XHTML document displayed in the left frame of Fig. 27.9 (part 2 of 2).
Line 27 (Fig. 27.9) displays the XHTML page in Fig. 27.10. Anchor attribute target
(line 18 in Fig. 27.10) specifies that the linked documents are loaded in frame main (line
28 in Fig. 27.9). A target can be set to a number of preset values: "_blank" loads the
page into a new browser window, "_self" loads the page into the frame in which the
anchor element appears and "_top" loads the page into the full browser window (i.e.,
removes the frameset).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1234 Wednesday, August 29, 2001 3:45 PM
The outer frameset element (lines 23–41) defines two columns. The left frame extends
over the first 110 pixels from the left edge of the browser and the right frame occupies the
rest of the window’s width. The frame element on line 24 specifies that the document
nav.html (Fig. 27.10) displays in the left column.
Lines 28–31 define a nested frameset element for the second column of the outer
frameset. This frameset defines two rows. The first row extends 175 pixels from the top
of the browser window, as indicated by rows = "175,*". The second row occupies the
remainder of the browser window’s height. The frame element at line 29 specifies that the
first row of the nested frameset displays picture.html (Fig. 27.7). The frame ele-
ment at line 30 specifies that the second row of the nested frameset displays
main.html (Fig. 27.9).
Fig. 27.11 Framed Web site with a nested frameset (part 1 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1235 Wednesday, August 29, 2001 3:45 PM
39 </noframes>
40
41 </frameset>
42 </html>
Right
frame
contains
these two
nested
Left frame
leftfram
Fig. 27.11 Framed Web site with a nested frameset (part 2 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1236 Wednesday, August 29, 2001 3:45 PM
www.vbxml.com/xhtml/articles/xhtml_tables
The VBXML.com Web site contains a tutorial on creating XHTML tables.
www.webreference.com/xml/reference/xhtml.html
This Web page contains a list of the frequently used XHTML tags, such as header tags, table tags,
frame tags and form tags. It also provides a description of each tag.
SUMMARY
• XHTML tables mark up tabular data and are one of the most frequently used features in XHTML.
• The table element defines an XHTML table. Attribute border specifies the table’s border
width, in pixels. Tables without borders set this attribute to "0".
• Element summary summarizes the table’s contents and is used by speech devices to make the ta-
ble more accessible to users with visual impairments.
• Element caption describe’s the table’s content. The text inside the <caption> tag is rendered
above the table in most browsers.
• A table can be split into three distinct sections: head (thead), body (tbody) and foot (tfoot).
The head section contains information such as table titles and column headers. The table body con-
tains the primary table data. The table foot contains information such as footnotes.
• Element tr, or table row, defines individual table rows. Element th defines a header cell. Text in
th elements usually is centered and displayed in bold by most browsers. This element can be
present in any section of the table.
• Data within a row are defined with td, or table data, elements.
• Element colgroup groups and formats columns. Each col element can format any number of
columns (specified with the span attribute).
• The document author has the ability to merge data cells with the rowspan and colspan at-
tributes. The values assigned to these attributes specify the number of rows or columns occupied
by the cell. These attributes can be placed inside any data-cell tag.
• XHTML provides forms for collecting information from users. Forms contain visual components
such as buttons that users click. Forms may also contain non-visual components, called hidden in-
puts, which store data, such as e-mail addresses and XHTML document file names used for link-
ing.
• A form begins with the form element. Attribute method specifies how the form’s data is sent to
the Web server.
• The "text" input inserts a text box into the form. Text boxes allow the user to input data.
• The input element’s size attribute specifies the number of characters visible in the input el-
ement. Optional attribute maxlength limits the number of characters input into a text box.
• The "submit" input submits the data entered in the form to the Web server for processing. Most
Web browsers create a button that submits the form data when clicked. The "reset" input al-
lows a user to reset all form elements to their default values.
• The textarea element inserts a multiline text box, called a text area, into a form. The number
of rows in the text area is specified with the rows attribute and the number of columns (i.e., char-
acters) is specified with the cols attribute.
• The "password" input inserts a password box into a form. A password box allows users to enter
sensitive information, such as credit card numbers and passwords, by “masking” the information
input with another character. Asterisks are the masking character used for password boxes. The
actual value input is sent to the Web server, not the asterisks that mask the input.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1237 Wednesday, August 29, 2001 3:45 PM
• The checkbox input allows the user to make a selection. When the checkbox is selected, a check
mark appears in the check box. Otherwise, the checkbox is empty. Checkboxes can be used indi-
vidually or in groups. Checkboxes that are part of the same group have the same name.
• A radio button is similar in function to a checkbox, except that only one radio button in a group
can be selected at any time. All radio buttons in a group have the same name attribute value and
have different attribute values.
• The select input provides a drop-down list of items. The name attribute identifies the drop-
down list. The option element adds items to the drop-down list. The selected attribute, like
the checked attribute for radio buttons and checkboxes, specifies which list item is displayed ini-
tially.
• Image maps designate certain sections of an image as links. These links are more properly called
hotspots.
• Image maps are defined with map elements. Attribute id identifies the image map. Hotspots are
defined with the area element. Attribute href specifies the link’s target. Attributes shape and
coords specify the hotspot’s shape and coordinates, respectively, and alt provides alternate
text.
• One way that search engines catalog pages is by reading the meta elements’s contents. Two im-
portant attributes of the meta element are name, which identifies the type of meta element and
content, which provides information a search engine uses to catalog a page.
• Frames allow the browser to display more than one XHTML document simultaneously. The
frameset element informs the browser that the page contains frames. Not all browsers support
frames. XHTML provides the noframes element to specify alternate content for browsers that
do not support frames.
• You can use the frameset element to create more complex layouts in a Web page by nesting
framesets.
TERMINOLOGY
action attribute
area element
border attribute
browser request
<caption> tag
checkbox
checked attribute
col element
colgroup element
cols attribute
colspan attribute
coords element
form
form element
frame element
frameset element
header cell
hidden input element
hotspot
href attribute
image map
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1238 Wednesday, August 29, 2001 3:45 PM
img element
input element
internal hyperlink
internal linking
map element
maxlength attribute
meta element
method attribute
name attribute
navigational frame
nested frameset element
nested tag
noframes element
password box
"radio" (attribute value)
rows attribute (textarea)
rowspan attribute (tr)
selected attribute
size attribute (input)
table element
target = "_blank"
target = "_self"
target = "_top"
tbody element
td element
textarea
textarea element
tfoot (table foot) element
<thead>...</thead>
tr (table row) element
type attribute
usemap attribute
valign attribute (th)
value attribute
Web server
XHTML form
x-y-coordinate
SELF-REVIEW EXERCISES
27.1 State whether the following statements are true or false. If false, explain why.
a) The width of all data cells in a table must be the same.
b) Framesets can be nested.
c) You are limited to a maximum of 100 internal links per page.
d) All browsers can render framesets.
27.2 Fill in the blanks in each of the following statements:
a) Assigning attribute type in an input element inserts a button that, when
clicked, clears the contents of the form.
b) The layout of a frameset is set by including the attribute or the
attribute inside the <frameset> tag.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1239 Wednesday, August 29, 2001 3:45 PM
EXERCISES
27.4 Categorize each of the following as an element or an attribute:
a) width.
b) td.
c) th.
d) frame.
e) name.
f) select.
g) type.
27.5 What will the frameset produced by the following code look like? Assume that the pages
referenced are blank with white backgrounds and that the dimensions of the screen are 800 by 600.
Sketch the layout, approximating the dimensions.
27.6 Write the XHTML markup to create a frame with a table of contents on the left side of the
window, and have each entry in the table of contents use internal linking to scroll down the document
frame to the appropriate subsection.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1240 Wednesday, August 29, 2001 3:45 PM
27.7 Create XHTML markup that produces the table shown in Fig. 27.12. Use <em> and
<strong> tags as necessary. The image (camel.gif) is included in the Chapter 27 examples di-
rectory on the CD-ROM that accompanies this book.
27.8 Write an XHTML document that produces the table shown in Fig. 27.13.
27.9 A local university has asked you to create an XHTML document that allows potential stu-
dents to provide feedback about their campus visits. Your XHTML document should contain a form
with text boxes for names, addresses and e-mails. Provide check boxes that allow prospective stu-
dents to indicate what they liked most about the campus. These check boxes should include: students,
location, campus, atmosphere, dorm rooms and sports. Also, provide radio buttons that ask the pro-
spective student how they became interested in the university. Options should include: friends, tele-
vision, Internet and other. In addition, provide a text area for additional comments, a submit button
and a reset button.
27.10 Create an XHTML document titled “How to Get Good Grades.” Use <meta> tags to include
a series of keywords that describe your document.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1241 Wednesday, August 29, 2001 3:45 PM
27.11 Create an XHTML document that displays a tic-tac-toe table with player X winning. Use
<h2> to mark up both Xs and Os. Center the letters in each cell horizontally. Title the game using an
<h1> tag. This title should span all three columns. Set the table border to one.
[***DUMP FILE***]
SELF-REVIEW EXERCISES
27.1 State whether the following are true or false. If false, explain why.
a) The width of all data cells in a table must be the same.
ANS: False. You can specify the width of any column, either in pixels or as a percentage of
the table width.
b) Framesets can be nested.
ANS: True.
c) You are limited to a maximum of 100 internal links per page.
ANS: False. You can have an unlimited number of internal links.
d) All browsers can render framesets.
ANS: False. Some browsers are unable to render a frameset and must therefore rely on
the information that you include inside the <noframes>…</noframes> tags.
EXERCISES
27.4 Categorize each of the following as an element or an attribute:
a) width
ANS: Attribute.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1242 Wednesday, August 29, 2001 3:45 PM
b) td
ANS: Element.
c) th
ANS: Element.
d) frame
ANS: Element.
e) name
ANS: Attribute.
f) select
ANS: Element.
g) type
ANS: Attribute.
27.5 What will the frameset produced by the following code look like? Assume that the pages
imported are blank with white backgrounds and that the dimensions of the screen are 800 by 600.
Sketch the layout, approximating the dimensions.
ANS:
27.6 Write the XHTML markup to create a frame with a table of contents on the left side of the
window, and have each entry in the table of contents use internal linking to scroll down the document
frame to the appropriate subsection
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1243 Wednesday, August 29, 2001 3:45 PM
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
4
5 <!-- Exercise 27.6 Solution -->
6
7 <html xmlns = "http://www.w3.org/1999/xhtml">
8 <head>
9 <title>Solution 27.6</title>
10 </head>
11 <frameset cols = "175, *">
12 <frame name = "sidebar" src = "sidebar.html" />
13 <frame name = "main" src = "main.html" />
14 </frameset>
15 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1244 Wednesday, August 29, 2001 3:45 PM
16 </strong>
17 <img src = "camel.gif" alt = "Camel picture" />
18 </p>
19 <p>
20 <strong>
21 <a name = "tictactoe">
22 This is a tic tac toe example:</a>
23 </strong>
24 </p>
25 <table border = "1">
26
27 <!-- set all columns to be centered -->
28 <colgroup>
29 <col align = "center" />
30 <col align = "center" />
31 <col align = "center" />
32 </colgroup>
33
34 <!-- top column will span across 3 cells -->
35 <tr>
36 <th colspan = "3">
37 <h1>This is the head of the
38 Tic-Tac-Toe table
39 </h1>
40 </th>
41 </tr>
42
43 <!-- row one of the table -->
44 <tr>
45 <td><h2>X</h2></td>
46 <td><h2>O</h2></td>
47 <td><h2>O</h2></td>
48 </tr>
49
50 <!-- row two of the table -->
51 <tr>
52 <td><h2>X</h2></td>
53 <td><h2>X</h2></td>
54 <td><h2>O</h2></td>
55 </tr>
56
57 <!-- row three of the table -->
58 <tr>
59 <td><h2>O</h2></td>
60 <td><h2>O</h2></td>
61 <td><h2>X</h2></td>
62 </tr>
63 </table>
64 <p>
65 <strong>
66 <a name = "table">This is an example of a
67 table</a>
68 </strong>
69 </p>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1245 Wednesday, August 29, 2001 3:45 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1246 Wednesday, August 29, 2001 3:45 PM
27.7 Create XHTML markup that produces the table shown in Fig. 27.12. Use <em> and
<strong> tags as necessary. The image (bug.jpg) is included in the Chapter 27 examples direc-
tory on the CD-ROM that accompanies this book.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1247 Wednesday, August 29, 2001 3:45 PM
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1248 Wednesday, August 29, 2001 3:45 PM
27.8 Write an XHTML document that produces the table shown in Fig. 27.13.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1249 Wednesday, August 29, 2001 3:45 PM
26 <th>
27 <strong>This is the table head</strong>
28 </th>
29 </tr>
30 </thead>
31
32 <!-- all of the main content goes in the <tbody> -->
33 <!-- use this tag to format the entire section -->
34 <!-- <td> inserts a data cell, with regular text -->
35 <tbody>
36 <tr><td align = "left">This is the body</td></tr>
37 </tbody>
38
39 </table>
40 </body>
41 </html>
27.9 A local university has asked you to create an XHTML document that allows potential stu-
dents to provide feedback about their campus visits. Your XHTML document should contain a form
with text boxes for names, addresses and e-mails. Provide check boxes that allow prospective stu-
dents to indicate what they liked most about the campus. These check boxes should include: students,
location, campus, atmosphere, dorm rooms and sports. Also, provide radio buttons that ask the pro-
spective student how they became interested in the university. Options should include: friends, tele-
vision, Internet and other. In addition, provide a text area for additional comments, a submit button
and a reset button.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1250 Wednesday, August 29, 2001 3:45 PM
10 </head>
11 <body>
12 <h1>College Visit Feedback Form</h1>
13 <p>
14 Please fill out this form to let us know how your
15 visit was so that we can improve our facilities to
16 better suit you and your peers' needs.
17 </p>
18
19 <form method = "post" action = "">
20
21 <p>
22 <input type = "hidden" name = "recipient"
23 value = "[email protected]" />
24 <input type = "hidden" name = "subject"
25 value = "Visit Feedback" />
26 <input type = "hidden" name = "redirect"
27 value = "index.html" />
28 </p>
29
30 <!-- insert a textbox to gather information about -->
31 <!-- the user -->
32
33 <p><label>Full Name:
34 <input name = "fullname" type = "text"
35 size = "40" />
36 </label></p>
37
38 <p><label>Address1:
39 <input name = "address1" type = "text"
40 size = "40" />
41 </label></p>
42
43 <p><label>Address2:
44 <input name = "address2" type = "text"
45 size = "40" />
46 </label></p>
47
48 <p><label>Zip Code:
49 <input name = "zip" type = "text" size = "10" />
50 </label></p>
51
52 <p><label>E-mail:
53 <input name = "email" type = "text" size = "25" />
54 </label></p>
55
56 <strong><em>Check off all of the characteristics that
57 you enjoyed about the college:</em></strong><br />
58
59 <!-- insert checkboxes for the user to check -->
60 <!-- off what he or she likes -->
61 <p>
62 <label>Campus
63 <input name = "likes" type = "checkbox"
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1251 Wednesday, August 29, 2001 3:45 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1252 Wednesday, August 29, 2001 3:45 PM
118 </label>
119 </p>
120
121 <strong><em>Please give us any additional feedback
122 that you may have</em></strong><br />
123
124 <!-- user can enter in multiple lines of -->
125 <!-- information in a textarea -->
126
127 <p>
128 <label>Comments:
129 <textarea name = "comments" rows = "4"
130 cols = "40"></textarea>
131 </label>
132 </p>
133 <p>
134 <input type = "submit" value = "Submit" />
135 <input type = "reset" value = "Clear" />
136 </p>
137 </form>
138 </body>
139 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1253 Wednesday, August 29, 2001 3:45 PM
27.10 Create an XHTML document titled “How to Get Good Grades.” Use <meta> tags to include
a series of keywords that describe your document.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1254 Wednesday, August 29, 2001 3:45 PM
27.11 Create an XHTML document that displays a tic-tac-toe table with player X winning. Use
<h2> to mark up both Xs and Os. Center the letters in each cell horizontally. Title the game using an
<h1> tag. This title should span all three columns. Set the table border to one.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1255 Wednesday, August 29, 2001 3:45 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27.fm Page 1256 Wednesday, August 29, 2001 3:45 PM
[***Notes To Reviewers***]
• This chapter will be sent for second-round review.
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send us e-mails with detailed, line-by-line comments; mark these directly on the pa-
per pages.
• Please feel free to send any lengthy additional comments by e-mail to cheryl.yaeger@dei-
tel.net.
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copyedited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are concerned mostly with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing style on a global scale.
Please send us a short e-mail if you would like to make such a suggestion.
• Please be constructive. This book will be published soon. We all want to publish the best possible
book.
• If you find something that is incorrect, please show us how to correct it.
• Please read all the back matter including the exercises and any solutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_27IX.fm Page 1 Wednesday, August 29, 2001 3:40 PM
Index 1
2 Index
U
usemap attribute 1228
Using internal hyperlinks to make
pages more navigable 1223
Using meta to provide keywords
and a description 1228
V
valign attribute (th) 1211
value attribute 1214
vertex 1228
W
Web server 1212
Web site using two frames:
navigational and content
1230
width attribute 1208
X
XHTML form 1211
xy-coordinate 1227
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_28.fm Page 1257 Wednesday, August 29, 2001 4:08 PM
28
Cascading Style Sheets™
(CSS)
Objectives
• To take control of the appearance of a Web site by
creating style sheets.
• To use a style sheet to give all the pages of a Web site
the same look and feel.
• To use the class attribute to apply styles.
• To specify the precise font, size, color and other
properties of displayed text.
• To specify element backgrounds and colors.
• To understand the box model and how to control the
margins, borders and padding.
• To use style sheets to separate presentation from
content.
Fashions fade, style is eternal.
Yves Saint Laurent
A style does not go out of style as long as it adapts itself to
its period. When there is an incompatibility between the style
and a certain state of mind, it is never the style that triumphs.
Coco Chanel
How liberating to work in the margins, outside a central
perception.
Don DeLillo
I’ve gradually risen from lower-class background to lower-
class foreground.
Marvin Cohen
pythonhtp1_28.fm Page 1258 Wednesday, August 29, 2001 4:08 PM
Outline
28.1 Introduction
28.2 Inline Styles
28.3 Embedded Style Sheets
28.4 Conflicting Styles
28.5 Linking External Style Sheets
28.6 W3C CSS Validation Service
28.7 Positioning Elements
28.8 Backgrounds
28.9 Element Dimensions
28.10 Text Flow and the Box Model
28.11 User Style Sheets
28.12 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises
28.1 Introduction
In Chapters 26 and 27, we introduced the Extensible Markup Language (XHTML) for
marking up information. In this chapter, we shift our focus from marking up information to
formatting and presenting information using a W3C technology called Cascading Style
Sheets (CSS) that allows document authors to specify the presentation of elements on a Web
page (spacing, margins, etc.) separately from the structure of the document (section head-
ers, body text, links, etc.). This separation of structure from presentation simplifies main-
taining and modifying a document’s layout.
13 <body>
14
15 <p>This text does not have any style applied to it.</p>
16
17 <!-- The style attribute allows you to declare -->
18 <!-- inline styles. Separate multiple styles -->
19 <!-- with a semicolon. -->
20 <p style = "font-size: 20pt">This text has the
21 <em>font-size</em> style applied to it, making it 20pt.
22 </p>
23
24 <p style = "font-size: 20pt; color: #0000ff">
25 This text has the <em>font-size</em> and
26 <em>color</em> styles applied to it, making it
27 20pt. and blue.</p>
28
29 </body>
30 </html>
The first inline style declaration appears in line 20. Attribute style specifies the style
for an element. Each CSS property (the font-size property in this case) is followed by
a colon and a value. On line 20, we declare the p element to have 20-point text size. Line
21 uses element em to “emphasize” text, which most browsers do by making the font italic.
Line 24 specifies the two properties, font-size and color, separated by a semi-
colon. In this line, we set the text’s color to blue, using the hexadecimal code #0000ff.
Color names may be used in place of hexadecimal codes, as we demonstrate in the next
example. [Note: Inline styles override any other styles applied using the techniques we dis-
cuss later in this chapter.]
ment in an XHTML document’s head section. Figure 28.2 creates an embedded style
sheet containing four styles.
50
51 </body>
52 </html>
The style element (lines 13–24) defines the embedded style sheet. Styles placed in
the head apply to matching elements in the entire document, not just to a single element.
The type attribute specifies the Multipurpose Internet Mail Extension (MIME) type that
describes a file’s content. CSS documents use the MIME type text/css. Other MIME
types include image/gif (for GIF images) and text/javascript (for the JavaScript
scripting language).
The body of the style sheet (lines 15–22) declares the CSS rules for the style sheet. We
declare rules for em (lines 15–16), h1 (line 18) and p (line 20) elements. When the browser
renders this document, it applies the properties defined in these rules to each element to
which the rule applies. For example, the rule on lines 15–16 will be applied to all em ele-
ments. The body of each rule is enclosed in curly braces ({ and }). We declare a style class
named special in line 22. Class declarations are preceded with a period and are applied
to elements only of that class. We discuss how to apply a style class momentarily.
CSS rules that embedded style sheets use the same syntax as inline styles; the property
name is followed by a colon (:) and the value of that property. Multiple properties are sep-
arated by semicolons (;). In this example, the color property specifies the color of text
in an element line and property background-color specifies the background color of
the element.
The font-family property (line 18) specifies the name of the font to use. In this
case, we use the arial font. The second value, sans-serif, is a generic font family.
pythonhtp1_28.fm Page 1262 Wednesday, August 29, 2001 4:08 PM
Not all users have the same fonts installed on their computers, so Web-page authors often
specify a comma-separated list of fonts to use for a particular style. The browser attempts
to use the fonts in the order they appear in the list. Many Web-page authors end a font list
with a generic font family name in case the other fonts are not installed on the user’s com-
puter. In this example, if the arial font is not found on the system, the browser instead
displays a generic sans-serif font such as helvetica or verdana. Other generic
font families include serif (e.g., times new roman, Georgia), cursive (e.g.,
script), fantasy (e.g., critter) and monospace (e.g., courier, fixedsys).
The font-size property (line 20) specifies a 14-point font. Other possible measure-
ments, in addition to pt (point), are introduced later in the chapter. Relative values— xx-
small, x-small, small, smaller, medium, large, larger, x-large and xx-
large also can be used. Generally, relative values for font-size are preferred over
point sizes because an author does not know the specific measurements of the display for
each client. For example, a user may wish to view a Web page on a handheld device with a
small screen. Specifying an 18-point font size in a style sheet prevents such a user from
seeing more than one or two characters at a time. However, if a relative font size is speci-
fied, such as large or larger, the actual size is determined by the browser that displays
the font.
Line 30 uses attribute class in an h1 element to apply a style class—in this case class
special (declared as .special in the style sheet). When the browser renders the h1
element, notice that the text appears on screen with both the properties of an h1 element
(arial or sans-serif font defined at line 18) and the properties of the.special
style class applied (the color blue defined on line 22).
The p element and the .special class style are applied to the text in lines 42–49. All
styles applied to an element (the parent, or ancestor, element) also apply to that element’s
nested elements (descendant elements). The em element inherits the style from the p element
(namely, the 14-point font size in line 20), but retains its italic style. However, this property
overrides the color property of the special class because the em element has its own
color property. We discuss the rules for resolving these conflicts in the next section.
child’s styles take precedence. Figure 28.3 illustrates examples of inheritance and speci-
ficity.
Lines 20–21 declare a style for all em elements that are descendants of li elements.
In the screen output of Fig. 28.3, notice that Monday (which line 33 contains in an em ele-
ment) does not appear in bold red, because the em element is not in an li element. How-
ever, the em element containing with mushrooms (line 46) is in an li element;
therefore, it is formatted in bold red.
The syntax for applying rules to multiple elements is similar. For example, to apply the
rule in lines 20–21 to all li and em elements, you would separate the elements with
commas, as follows:
li, em { color: red;
font-weight: bold }
Lines 25–26 specify that all nested lists (ul elements that are descendants of ul ele-
ments) be underlined and have a left-hand margin of 15 pixels. A pixel is a relative-length
measurement—it varies in size, based on screen resolution. Other relative lengths are em
(the so-called “M-height” of the font, which is usually set to the height of an uppercase M),
ex (the so-called “x-height” of the font, which is usually set to the height of a lowercase x)
and percentages (e.g., margin-left: 10%). To set an element to display text at 150%
of its default text size, the author could use the syntax
font-size: 1.5em
Other units of measurement available in CSS are absolute-length measurements—i.e., units
that do not vary in size based on the system. These units are in (inches), cm (centimeters),
mm (millimeters), pt (points; 1 pt=1/72 in) and pc (picas—1 pc = 12 pt).
Good Programming Practice 28.1
Whenever possible, use relative-length measurements. If you use absolute-length measure-
ments, your document may not be readable on some client browsers (e.g., wireless phones). 28.1
In Fig. 28.3, the entire list is indented because of the 75-pixel left-hand margin for top-
level ul elements. However, the nested list is indented only 15 pixels more (not another 75
pixels) because the child ul element’s margin-left property overrides the parent ul
element’s margin-left property.
Figure 28.4 presents an external style sheet and Fig. 28.5 contains an XHTML document
that references the style sheet.
Lines 11–12 (Fig. 28.5) show a link element, which uses the rel attribute to specify
a relationship between the current document and another document. In this case, we declare
pythonhtp1_28.fm Page 1268 Wednesday, August 29, 2001 4:08 PM
the linked document to be a stylesheet for this document. The type attribute specifies
the MIME type as text/css. The href attribute provides the URL for the document
containing the style sheet .
Software Engineering Observation 28.1
Style sheets are reusable. Creating them once and reusing them reduces programming effort. 28.1
jigsaw.w3.org/css-validator/validator-upload.html
To validate the document, click the Browse button to locate the file on your computer. Af-
ter locating the file, click Submit this CSS file for validation to upload the file for val-
idation. [Note: Like many W3C technologies, CSS is being developed in stages (or
versions). The current version under development is Version 3.]
capability called absolute positioning, which provides authors greater control over how
document elements are displayed. Figure 28.8 demonstrates absolute positioning.
Fig. 28.7 CSS validation results. (Courtesy of World Wide Web Consortium (W3C).)
25 </html>
Lines 15–17 position the first img element (i.gif) on the page. Specifying an ele-
ment’s position as absolute removes the element from the normal flow of elements
on the page, instead positioning the element according to the distance from the top, left,
right or bottom margins of its containing block (i.e., an element such as body or p).
Here, we position the element to be 0 pixels away from both the top and left margins
of the body element.
The z-index attribute allows you to layer overlapping elements properly. Elements
that have higher z-index values are displayed in front of elements with lower z-index
values. In this example, i.gif has the lowest z-index (1), so it displays in the back-
ground. The img element at lines 20–22 (circle.gif) has a z-index of 2, so it dis-
plays in front of i.gif. The p element at lines 18–19 (Positioned Text) has a z-
index of 3, so it displays in front of the other two. If you do not specify a z-index or if
elements have the same z-index value, the elements are placed from background to fore-
ground in the order they are encountered in the document.
Absolute positioning is not the only way to specify page layout. Figure 28.9 demon-
strates relative positioning in which elements are positioned relative to other elements.
Setting the position property to relative, as in class super (lines 21–22), lays
out the element on the page and offsets the element by the specified top, bottom, left or
right values. Unlike absolute positioning, relative positioning keeps elements in the gen-
eral flow of elements on the page, so positioning is relative to other elements in the flow.
7
8 <html xmlns = "http://www.w3.org/1999/xhtml">
9 <head>
10 <title>Relative Positioning</title>
11
12 <style type = "text/css">
13
14 p { font-size: 1.3em;
15 font-family: verdana, arial, sans-serif }
16
17 span { color: red;
18 font-size: .6em;
19 height: 1em }
20
21 .super { position: relative;
22 top: -1ex }
23
24 .sub { position: relative;
25 bottom: -1ex }
26
27 .shiftleft { position: relative;
28 left: -1ex }
29
30 .shiftright { position: relative;
31 right: -1ex }
32
33 </style>
34 </head>
35
36 <body>
37
38 <p>The text at the end of this sentence
39 <span class = "super">is in superscript</span>.</p>
40
41 <p>The text at the end of this sentence
42 <span class = "sub">is in subscript</span>.</p>
43
44 <p>The text at the end of this sentence
45 <span class = "shiftleft">is shifted left</span>.</p>
46
47 <p>The text at the end of this sentence
48 <span class = "shiftright">is shifted right</span>.</p>
49
50 </body>
51 </html>
We introduce the span element in line 39. Element span is a grouping element—it
does not apply any inherent formatting to its contents. Its primary purpose is to apply CSS
rules or id attributes to a block of text. Element span is an inline-level element—it is dis-
played inline with other text and with no line breaks. Lines 17–19 define the CSS rule for
span. A similar element is the div element, which also applies no inherent styles but is
displayed on its own line, with margins above and below (a block-level element).
Common Programming Error 28.1
Because relative positioning keeps elements in the flow of text in your documents, be careful
to avoid unintentionally overlapping text. 28.1
28.8 Backgrounds
CSS also provides control over the element backgrounds. In previous examples, we intro-
duced the background-color property. CSS also can add background images to doc-
uments. Figure 28.10 adds a corporate logo to the bottom-right corner of the document.
This logo stays fixed in the corner, even when the user scrolls up or down the screen.
The background-image property (line 14) specifies the image URL for the image
logo.gif in the format url(fileLocation). The Web-page author can set the back-
ground-color in case the image is not found.
pythonhtp1_28.fm Page 1275 Wednesday, August 29, 2001 4:08 PM
The background-position property (line 15) places the image on the page. The
keywords top, bottom, center, left and right are used individually or in combi-
nation for vertical and horizontal positioning. Image can be positioned using lengths by
specifying the horizontal length followed by the vertical length. For example, to position
the image as vertically centered (positioned at 50% of the distance across the screen) and
30 pixels from the top, use
background-position: 50% 30px;
The background-repeat property (line 16) controls the tiling of the background
image. Tiling places multiple copies of the image next to each other to fill the background.
Here, we set the tiling to no-repeat to display only one copy of the background image.
The background-repeat property can be set to repeat (the default) to tile the image
vertically and horizontally, repeat-x to tile the image only horizontally or repeat-y
to tile the image only vertically.
The final property setting, background-attachment: fixed (line 17), fixes the
image in the position specified by background-position. Scrolling the browser
window does not move the image from its position. The default value, scroll, moves the
image as the user scrolls through the document.
Line 21 indents the first line of text in the element by the specified amount, in this case
1em. An author might use this property to create a Web page that reads more like a novel,
in which the first line of every paragraph is indented.
Line 24 uses the font-weight property to specify the “boldness” of text. Possible
values are bold, normal (the default), bolder (bolder than bold text) and lighter
(lighter than normal text). Boldness also can be specified with multiples of 100, from 100
to 900 (e.g., 100, 200, …, 900). Text specified as normal is equivalent to 400, and
bold text is equivalent to 700. However, many systems do not have fonts that scale this
finely, so using the values from 100 to 900 might not display the desired effect.
Another CSS property that formats text is the font-style property, which allows
the developer to set text to none, italic or oblique (oblique defaults to italic
if the system does not support oblique text).
Fig. 28.11 Setting box dimensions and aligning text (part 1 of 2).
pythonhtp1_28.fm Page 1276 Wednesday, August 29, 2001 4:08 PM
11
12 <style type = "text/css">
13
14 div { background-color: #ffccff;
15 margin-bottom: .5em }
16 </style>
17
18 </head>
19
20 <body>
21
22 <div style = "width: 20%">Here is some
23 text that goes in a box which is
24 set to stretch across twenty percent
25 of the width of the screen.</div>
26
27 <div style = "width: 80%; text-align: center">
28 Here is some CENTERED text that goes in a box
29 which is set to stretch across eighty percent of
30 the width of the screen.</div>
31
32 <div style = "width: 20%; height: 30%; overflow: scroll">
33 This box is only twenty percent of
34 the width and thirty percent of the height.
35 What do we do if it overflows? Set the
36 overflow property to scroll!</div>
37
38 </body>
39 </html>
Fig. 28.11 Setting box dimensions and aligning text (part 2 of 2).
pythonhtp1_28.fm Page 1277 Wednesday, August 29, 2001 4:08 PM
The inline style in line 22 illustrates how to set the width of an element on screen;
here, we indicate that the div element should occupy 20% of the screen width. Most ele-
ments are left-aligned by default; however, this alignment can be altered to position the ele-
ment elsewhere. The height of an element can be set similarly, using the height property.
The height and width values also can be assigned relative and absolute lengths. For
example
width: 10em
sets the element’s width to be equal to 10 times the font size. Line 27 sets text in the element
to be center aligned; some other values for the text-align property are left and
right.
One problem with setting both dimensions of an element is that the content inside the
element can exceed the set boundaries, in which case the element is simply made large
enough for all the content to fit. However, in line 32, we set the overflow property to
scroll, a setting that adds scrollbars if the text overflows the boundaries.
Fig. 28.12 Floating elements, aligning text and setting box dimensions (part 1 of 2).
pythonhtp1_28.fm Page 1279 Wednesday, August 29, 2001 4:08 PM
Fig. 28.12 Floating elements, aligning text and setting box dimensions (part 2 of 2).
Margin
Border
Content
Padding
Another property of every block-level element on screen is the border, which lies
between the padding space and the margin space and has numerous properties for adjusting
its appearance as shown in Fig. 28.14.
1 <?xml version = "1.0"?>
Fig. 28.14 Applying borders to elements (part 1 of 3).
pythonhtp1_28.fm Page 1280 Wednesday, August 29, 2001 4:08 PM
User style sheets are external style sheets. Figure 28.17 shows a user style sheet that
sets the body’s font-size to 20pt, color to yellow and background-color
to #000080.
User style sheets are not linked to a document; rather, they are set in the browser’s
options. To add a user style sheet in Internet Explorer 5.5, select Internet Options...,
located in the Tools menu. In the Internet Options dialog (Fig. 28.18), select Accessi-
bility..., Check the Format documents using my style sheet check box and type the
location of the user style sheet. Internet Explorer 5.5 applies the user style sheet to any doc-
ument it loads.
The Web page from Fig. 28.16 is displayed in Fig. 28.19, with the application of the
user style sheet from Fig. 28.17.
In this example if users define their own font-size in user style sheets, the author
styles have higher precedence and override the user styles. The 9pt font specified in the
author style sheet overrides the 20pt font specified in the user style sheet. This small font
may make pages difficult to read, especially for individuals with visual impairments. A
developer can avoid this problem by using relative measurements (such as em or ex)
instead of absolute measurements such as pt. Figure 28.20 changes the font-size prop-
erty to use a relative measurement (line 14), which does not override the user style set in
Fig. 28.17. Instead, the font size displayed is relative to that specified in the user style sheet.
In this case, text enclosed in the <p> tag displays as 20pt and <p> tags that have class
note applied to them are displayed in 15pt (.75 times 20pt).
Fig. 28.21 displays the Web page from Fig. 28.20 with the application of the user style
sheet from Fig. 28.16. Notice that the second line of text displayed is larger than the same
line of text in Fig. 28.19.
SUMMARY
• The inline style allows a developer to declare a style for an individual element by using the style
attribute in that element’s opening XHTML tag.
• Each CSS property is followed by a colon and the value of the attribute.
• The color property sets text color. Color names and hexadecimal codes may be used as the value.
• Styles that are placed in the <style> tag apply to the entire document.
• style element attribute type specifies the MIME type (the specific encoding format) of the
style sheet. Style sheets use text/css.
• Each rule body begins and ends with a curly brace ({ and }).
• Style class declarations are preceded by a period and are applied to elements of that specific class.
• The CSS rules in a style sheet use the same format as inline styles: The property is followed by a
colon (:) and the value of that property. Multiple properties are separated by semicolons (;).
• The background-color attribute specifies the background color of the element.
• The font-family attribute names a specific font that should be displayed. Generic font fami-
lies allow authors to specify a type of font instead of a specific font, in case a browser does not
support a specific font. The font-size property specifies the size used to render the font.
• The class attribute applies a style class to an element.
• Pseudoclasses provide the author access to content not specifically declared in the document. The
hover pseudoclass is activated when the user moves the mouse cursor over an element.
• The text-decoration property applies decorations to text within an element, such as
underline, overline, line-through and blink.
• To apply rules to multiple elements, separate the elements with commas in the style sheet.
• A pixel is a relative-length measurement: It varies in size based on screen resolution. Other relative
lengths are em, ex and percentages.
• The other units of measurement available in CSS are absolute-length measurements—i.e., units
that do not vary in size. These units can be in (inches), cm (centimeters), mm (millimeters), pt
(points; 1 pt=1/72 in) and pc (picas; 1 pc = 12 pt).
• External linking can create a uniform look for a Web site; separate pages can all use the same
styles. Modifying a single file makes changes to styles across an entire Web site.
• link’s rel attribute specifies a relationship between two documents.
• The CSS position property allows absolute positioning, which provides greater control on
where elements reside. Specifying an element’s position as absolute removes it from the
normal flow of elements on the page and positions it according to distance from the top, left,
right or bottom margins of its parent element.
• The z-index property allows a developer to layer overlapping elements. Elements that have
higher z-index values are displayed in front of elements with lower z-index values.
• Unlike absolute positioning, relative positioning keeps elements in a general flow on the page and
offsets them by the specified top, left, right or bottom values.
• Property background-image specifies the URL of the image, in the format url(fileLoca-
tion). The property background-position places the image on the page using the values
top, bottom, center, left and right individually or in combination for vertical and hori-
zontal positioning. You can also position by using lengths.
• The background-repeat property controls the tiling of the background image. Setting the tiling
to no-repeat displays one copy of the background image on screen. The background-re-
peat property can be set to repeat (the default) to tile the image vertically and horizontally, to
repeat-x to tile the image only horizontally or to repeat-y to tile the image only vertically.
• The property setting background-attachment: fixed fixes the image in the position spec-
ified by background-position. Scrolling the browser window does not move the image
from its set position. The default value, scroll, moves the image as the user scrolls the window.
• The text-indent property indents the first line of text in the element by the specified amount.
• The font-weight property specifies the “boldness” of text. Values besides bold and normal
(the default) are bolder (bolder than bold text) and lighter (lighter than normal text). The
value also may be justified using multiples of 100, from 100 to 900 (i.e., 100, 200, …, 900). Text
specified as normal is equivalent to 400, and bold text is equivalent to 700.
• The font-style property allows the developer to set text to none, italic or oblique
(oblique will default to italic if the system does not have a separate font file for oblique text,
which is normally the case).
• span is a generic grouping element; it does not apply any inherent formatting to its contents. Its
main use is to apply styles or id attributes to a block of text. Element span is displayed inline
pythonhtp1_28.fm Page 1288 Wednesday, August 29, 2001 4:08 PM
(an inline element) with other text and with no line breaks. A similar element is the div element,
which also applies no inherent styles, but is displayed on a separate line, with margins above and
below (a block-level element).
• The dimensions of elements on a page can be set with CSS by using the height and width prop-
erties.
• Text within an element can be centered using text-align; other values for the text-
align property are left and right.
• One problem with setting both dimensions of an element is that the content inside the element
might sometimes exceed the set boundaries, in which case the element must be made large enough
for all the content to fit. However, a developer can set the overflow property to scroll; this
setting adds scroll bars if the text overflows the boundaries set for it.
• Browsers normally place text and elements on screen in the order in which they appear in the
XHTML file. Elements can be removed from the normal flow of text. Floating allows you to move
an element to one side of the screen; other content in the document will then flow around the float-
ed element.
• Each block-level element has a box drawn around it, known as the box model. The properties of
this box are easily adjusted.
• The margin property determines the distance between the element’s edge and any outside text.
• CSS uses a box model to render elements on screen. The content of each element is surrounded by
padding, a border and margins.
• Margins for individual sides of an element can be specified by using margin-top, margin-
right, margin-left and margin-bottom.
• The padding is the distance between the content inside an element and the edge of the element.
Padding can be set for each side of the box by using padding-top, padding-right, pad-
ding-left and padding-bottom.
• A developer can interrupt the flow of text around a floated element by setting the clear prop-
erty to the same direction in which the element is floated—right or left. Setting the clear
property to all interrupts the flow on both sides of the document.
• A property of every block-level element on screen is its border. The border lies between the pad-
ding space and the margin space and has numerous properties with which to adjust its appearance.
• The border-width property may be set to any of the CSS lengths or to the predefined values
of thin, medium or thick.
• The border-styles available are none, hidden, dotted, dashed, solid, double,
groove, ridge, inset and outset.
• The border-color property sets the color used for the border.
• The class attribute allows more than one class to be assigned to an XHTML element.
TERMINOLOGY
absolute positioning
absolute-length measurement
arial font
background
background-attachment
background-color
background-image
background-position
pythonhtp1_28.fm Page 1289 Wednesday, August 29, 2001 4:08 PM
SELF-REVIEW EXERCISES
28.1 Assume that the size of the base font on a system is 12 points.
a) How big is 36-point font in ems?
b) How big is 8-point font in ems?
c) How big is 24-point font in picas?
d) How big is 12-point font in inches?
e) How big is 1-inch font in picas?
28.2 Fill in the blanks in the following statements:
a) Using the element allows authors to use external style sheets in their pages.
b) To apply a CSS rule to more than one element at a time, separate the element names with
a .
c) Pixels are a(n) -length measurement unit.
pythonhtp1_28.fm Page 1291 Wednesday, August 29, 2001 4:08 PM
d) The hover is activated when the user moves the mouse cursor
over the specified element.
e) Setting the overflow property to provides a mechanism for containing in-
ner content without compromising specified box dimensions.
f) While is a generic inline element that applies no inherent formatting,
is a generic block-level element that applies no inherent formatting.
g) Setting the background-repeat property to tiles the specified
background-image only vertically.
h) If you float an element, you can stop the flowing text by using property .
i) The property allows you to indent the first line of text in an element.
j) Three components of the box model are the , and .
EXERCISES
28.3 Write a CSS rule that makes all text 1.5 times larger than the base font of the system and col-
ors the text red.
28.4 Write a CSS rule that removes the underline from all links inside list items (li) and shifts
them left by 3 ems.
28.5 Write a CSS rule that places a background image halfway down the page, tiling it horizon-
tally. The image should remain in place when the user scrolls up or down.
28.6 Write a CSS rule that gives all h1 and h2 elements a padding of 0.5 ems, a grooved border
style and a margin of 0.5 ems.
28.7 Write a CSS rule that changes the color of all elements containing attribute class =
"greenMove" to green and shifts them down 25 pixels and right 15 pixels.
28.8 Write an XHTML document that shows the results of a color survey. The document should
contain a form with radio buttons that allows users to vote for their favorite color. One of the colors
should be selected as a default. The document should also contain a table showing various colors and
the corresponding percentage of votes for each color. (Each row should be displayed in the color to
which it is referring.) Use attributes to format width, border and cell spacing for the table.
28.9 Add an embedded style sheet to the XHTML document of Fig. 26.6. This style sheet should
contain a rule that displays h1 elements in blue. In addition, create a rule that displays all links in blue
without underlining them. When the mouse hovers over a link, change the link’s background color to
yellow.
28.10 Modify the style sheet of Fig. 28.4 by changing a:hover to a:hver and margin-left
to margin left. Validate the style sheet using the CSS Validator. What happens?
[***DUMP FILE***]
SELF-REVIEW EXERCISES
28.1 Assume that the size of the base font on a system is 12 points.
a) How big is 36-point font in ems?
pythonhtp1_28.fm Page 1292 Wednesday, August 29, 2001 4:08 PM
ANS: 3 ems.
b) How big is 8-point font in ems?
ANS: 0.75 ems.
c) How big is 24-point font in picas?
ANS: 2 picas.
d) How big is 12-point font in inches?
ANS: 1/6 inch.
e) How big is 1-inch font in picas?
ANS: 6 picas.
EXERCISES
28.3 Write a CSS rule that makes all text 1.5 times larger than the base font of the system and col-
ors it red.
ANS:
8 <head>
9 <title>Solution 28.3</title>
10 <style type = "text/css">
11 body { font-size: 1.5em;
12 color: #FF0000 }
13 </style>
14 </head>
15
16 <body>
17 <p>Testing red text that is 1.5 times the default font
18 size</p>
19 </body>
20 </html>
28.4 Write a CSS rule that removes the underline from all links inside list items (li) and shifts
them left by 3 ems.
ANS:
24 http://www.deitel.com</a></li>
25 <li><a href = "http://www.prenhall.com">
26 http://www.prenhall.com</a></li>
27 <li><a href = "http://www.phptrinteractive.com">
28 http://www.phptrinteractive.com</a></li>
29 </ol>
30 </body>
31 </html>
28.5 Write a CSS rule that places a background image halfway down the page, tiling it horizon-
tally. The image should remain in place when the user scrolls up or down.
ANS:
27 <h1>||||||||||||||||||||||||||||||||||</h1>
28 </body>
29 </html>
28.6 Write a CSS rule that gives all h1 and h2 elements a padding of 0.5 ems, a grooved border
style and a margin of 0.5 ems.
ANS:
28.7 Write a CSS rule that changes the color of all elements with attribute class = "green-
Move" to green and shifts them down 25 pixels and right 15 pixels.
ANS:
12 position: relative;
13 top: 25px;
14 left: 15px }
15 </style>
16 </head>
17
18 <body>
19 <h1>Normal text
20 <span class = "greenMove">Text with class
21 greenMove
22 </span>
23 </h1>
24 </body>
25 </html>
28.8 Write an XHTML document showing the results of a survey of people’s favorite color. The
document should contain a form with radio buttons that allows users to vote for their favorite color.
One of the colors should be selected as a default. The document should also contain a table showing
various colors and the corresponding percentage of votes for each color. (Each row should be dis-
played in the color to which it is referring.) Use attributes to format width, border and cell spacing for
the table. Validate the document against an appropriate XHTML DTD.
ANS:
71 "Submit" />
72 <input type = "reset" value = "Clear" />
73 </p>
74 </form>
75 </td>
76 <td><strong>Color Results</strong></td>
77 </tr>
78 <tr>
79 <td class = "blue" ></td>
80 <td><p>30%</p></td>
81 </tr>
82 <tr>
83 <td class = "red">
84 </td>
85 <td><p>13%</p></td>
86 </tr>
87 <tr>
88 <td class = "yellow"></td>
89 <td><p>9%</p></td>
90 </tr>
91 <tr>
92 <td class = "green"></td>
93 <td><p>12%</p></td>
94 </tr>
95 <tr>
96 <td class = "orange"></td>
97 <td><p>12%</p></td>
98 </tr>
99 <tr>
100 <td class = "purple"></td>
101 <td><p>7%</p></td>
102 </tr>
103 <tr>
104 <td class = "pink"></td>
105 <td><p>17%</p></td>
106 </tr>
107 <tr>
108 <td></td>
109 </tr>
110 </table>
111 </body>
112 </html>
pythonhtp1_28.fm Page 1300 Wednesday, August 29, 2001 4:08 PM
28.9 Add an embedded style sheet to the XHTML document of Fig. 26.4. This style sheet should
contain a rule that displays h1 elements in blue. In addition, create a rule that displays all links in blue
without underlining them. When the mouse hovers over a link, change the link’s background color to
yellow.
ANS:
28.10 Modify the style sheet of Fig. 28.4 by changing a:hover to a:hver and margin-left
to margin left. Validate the style sheet using the CSS Validator. What happens?
ANS:
12 background-color: #ffffff }
13
14 ul { margin left: 2cm }
15
16 ul ul { text-decoration: underline;
17 margin left: .5cm }
pythonhtp1_28.fm Page 1303 Wednesday, August 29, 2001 4:08 PM
[***Notes To Reviewers***]
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send us e-mails with detailed, line-by-line comments; mark these directly on the pa-
per pages.
• Please feel free to send any lengthy additional comments by e-mail to cheryl.yaeger@dei-
tel.net.
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copyedited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are concerned mostly with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing style on a global scale.
Please send us a short e-mail if you would like to make such a suggestion.
• Please be constructive. This book will be published soon. We all want to publish the best possible
book.
• If you find something that is incorrect, please show us how to correct it.
• Please read all the back matter including the exercises and any solutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
pythonhtp1_28IX.fm Page 1 Wednesday, August 29, 2001 4:08 PM
Index 1
2 Index
Index 3
S V
sans-serif font 1262 Validating a CSS document 1269
screen resolution 1265 Various border-styles 1281
script font 1262 Verdana font 1262
scroll up or down the screen 1273 vertical and horizontal positioning
scroll value 1275, 1277 1275
scrolling the browser window
1275
semicolon (;) 1259, 1261
W
separation of structure from W3C CSS Recommendation 1268
content 1258 W3C CSS Validation Service
1268
serif font 1262
Web page with user styles enabled
Setting box dimensions and
1284
aligning text 1275
width attribute value (style)
small relative font size 1262
1277
smallest relative font size 1262
solid value (border-style
property) 1281 X
span as a generic grouping x-large relative font size 1262
element 1273 x-small relative font size 1262
span element 1273 xx-large relative font size 1262
specificity 1263 xx-small relative font size 1262
structure of a document 1258
style attribute 1258, 1259
style class 1261, 1262 Z
z-index 1271
T
text-decoration property
1264, 1266
text/javascript 1261
text-align 1277
thick border width 1281
thin border width 1281
tile the image only horizontally
1275
tile the image vertically and
horizontally 1275
tiling no-repeat 1275
tiling of the background image
1275
Times New Roman font 1262
top 1271
top margin 1271, 1275
U
underline 1265
underline value 1264, 1266
url(fileLocation) 1274
user style 1284, 1286
User style sheet 1284
user style sheet 1282, 1285
Using relative measurements in
author styles 1286
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 8/29/01
pythonhtp1_29.fm Page 1394 Friday, September 28, 2001 2:18 PM
29
PHP
Objectives
• To understand PHP data types, operators, arrays and
control structures.
• To understand string processing and regular
expressions in PHP.
• To construct programs that process form data.
• To read and write client data using cookies.
• To construct programs that interact with MySQL
databases.
Conversion for me was not a Damascus Road experience. I
slowly moved into an intellectual acceptance of what my
intuition had always known.
Madeleine L’Engle
Be careful when reading health books; you may die of a
misprint.
Mark Twain
Reckeners without their host must recken twice.
John Heywood
There was a door to which I found no key; There was the veil
through which I might not see.
Omar Khayyam
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1457 Friday, September 28, 2001 2:18 PM
[***Notes to Reviewers***]
• Our first topic choice for this chapter was PHP. We then decided to change the topic to PSP (since
it is more Python specific); however, due to the fact that PSP is in poor condition, we decided to
revert back to PHP. We still prefer to do PSP, so any information you can provide would be help-
ful. Websites? Documentation? We were unable to get PSP 1.3 to run with Tomcat or JRun. Then,
we were unable to download PSP 1.3.
• Example 29.19 is not working (it is unable to open the Products database). We are currently work-
ing to resolve this issue.
• Please mark your comments in place on a paper copy of the chapter.
• Please return only marked pages to Deitel & Associates, Inc.
• Please do not send e-mails with detailed, line-by-line comments; mark these directly on the paper
pages.
• Please feel free to send any lengthy additional comments by e-mail to
[email protected] and [email protected].
• Please run all the code examples.
• Please check that we are using the correct programming idioms.
• Please check that there are no inconsistencies, errors or omissions in the chapter discussions.
• The manuscript is being copy edited by a professional copy editor in parallel with your reviews.
That person will probably find most typos, spelling errors, grammatical errors, etc.
• Please do not rewrite the manuscript. We are mostly concerned with technical correctness and cor-
rect use of idiom. We will not make significant adjustments to our writing or coding style on a
global scale. Please send us a short e-mail if you would like to make a suggestion.
• If you find something incorrect, please show us how to correct it.
• In the later round(s) of review, please read all the back matter, including the exercises and any so-
lutions we provide.
• Please review the index we provide with each chapter to be sure we have covered the topics you
feel are important.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1395 Friday, September 28, 2001 2:18 PM
Outline
29.1 Introduction
29.2 PHP
29.3 String Processing and Regular Expressions
29.4 Viewing Client/Server Environment Variables
29.5 Form Processing and Business Logic
29.6 Verifying a Username and Password
29.7 Connecting to a Database
29.8 Cookies
29.9 Operator Precedence
29.10 Internet and World Wide Web Resources
Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • Exercises •
Works Cited
29.1 Introduction
PHP, or PHP Hypertext Preprocessor, is quickly becoming one of the most popular server-
side scripting languages for creating dynamic Web pages. PHP was created in 1994 by Ras-
mus Lerdorf (who currently works for Linuxcare Inc. as a Senior Open-Source Researcher)
to track users at his Web site.1 In 1995, Lerdorf released it as a package called the “Personal
Home Page Tools.” PHP 2 featured built-in database support and form handling. In 1997,
PHP 3 was released, featuring a rewritten parser, which substantially increased perfor-
mance and led to an explosion in PHP use. It is estimated that over six million domains now
use PHP. The release of PHP 4, which features the new Zend Engine and is much faster and
more powerful than its predecessor, should further increase PHP’s popularity.2 More infor-
mation about the Zend engine can be found at www.zend.com.
PHP is an open-source technology that is supported by a large community of users and
developers. Open source software provides developers with access to the software’s source
code and free redistribution rights. PHP is platform independent; implementations exist for
all major UNIX, Linux and Windows operating systems. PHP also provides support for a
large number of databases, including MySQL.
After introducing the basics of the scripting language, we discuss viewing environment
variables. Knowing information about a client’s execution environment allows dynamic
content to be sent to the client. We then discuss form processing and business logic, which
are vital to e-commerce applications. We provide an example of implementing a private
Web site through username and password verification. Next, we build a three-tier, Web-
based application that queries a MySQL database. Finally, we show how Web sites use
cookies to store information on the client that will be retrieved during a client’s subsequent
visits to a Web site.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1396 Friday, September 28, 2001 2:18 PM
29.2 PHP
When the World Wide Web and Web browsers were introduced, the Internet began to
achieve widespread popularity. This greatly increased the volume of requests for informa-
tion from Web servers. The power of the Web resides not only in serving content to users,
but also in responding to requests from users and generating Web pages with dynamic con-
tent. It became evident that the degree of interactivity between the user and the server
would be crucial. While other languages can perform this function as well, PHP was written
specifically for interacting with the Web.
PHP code is embedded directly into XHTML documents. This allows the document
author to write XHTML in a clear, concise manner, without having to use multiple print
statements, as is necessary with other CGI-based languages. Figure 29.1 presents a simple
PHP program that displays a welcome message.
In PHP, code is inserted between the scripting delimiters <?php and ?>. PHP code
can be placed anywhere in XHTML markup, as long as the code is enclosed in these
scripting delimiters. Line 8 declares variable $name and assigns to it the string "Paul".
All variables are preceded by the $ special symbol and are created the first time they are
encountered by the PHP interpreter. PHP statements are terminated with a semicolon (;).
Common Programming Error 29.1
Failing to precede a variable name with a $ is a syntax error. 29.1
Line 8 contains a single-line comment, which begins with two forward slashes (//).
Text to the right of the slashes is ignored by the interpreter. Comments can also begin with
the pound sign (#). Multiline comments begin with delimiter /* and end with delimiter */.
Line 21 outputs the value of variable $name by calling function print. The actual
value of $name is printed, instead of "$name". When a variable is encountered inside a
double-quoted ("") string, PHP interpolates the variable. In other words, PHP inserts the
variable’s value where the variable name appears in the string. Thus, variable $name is
replaced by Paul for printing purposes. PHP variables are "multitype", meaning that they
can contain different types of data (e.g., integers, doubles or strings) at different times.
Figure 29.2 introduces these data types.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1398 Friday, September 28, 2001 2:18 PM
PHP scripts usually end with .php, although a server can be configured to handle
other file extensions. To run a PHP script, PHP must first be installed on your system. Visit
www.deitel.com for PHP installation and configuration instructions. Although PHP
can be used from the command line, a Web server is necessary to take full advantage of the
scripting language. Figure 29.3 demonstrates the PHP data types introduced in Fig. 29.2.
42
43 $value = "98.6 degrees";
44
45 // use type casting to cast variables to a
46 // different type
47 print( "Now using type casting instead: <br />
48 As a string - " . (string) $value .
49 "<br />As a double - " . (double) $value .
50 "<br />As an integer - " . (integer) $value );
51 ?>
52 </body>
53 </html>
Conversion between different data types may be necessary when performing arith-
metic operations with variables. In PHP, data-type conversion can be performed by passing
the data type as an argument to function settype. Lines 17–19 assign a string to variable
$testString, a double to variable $testDouble and an integer to variable
$testInteger. Variables are converted the to data type of the value they are assigned.
For example, variable $testString becomes a string when assigned the value "3.5
seconds". Lines 23–25 print the value of each variable. Notice that the enclosing of
a variable name in double quotes in a print statement is optional. Lines 34–39 call func-
tion settype to modify the data type of each variable. Function settype takes two
arguments: The variable whose data type is to be changed and the variable’s new data type.
Calling function settype can result in loss of data. For example, doubles are truncated
when they are converted to integers. When converting between a string and a number, PHP
uses the value of the number that appears at the beginning of the string. If no number
appears at the beginning of the string, the string evaluates to 0. In line 34, the string "3.5
seconds" is converted to a double, resulting in the value 3.5 being stored in variable
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1400 Friday, September 28, 2001 2:18 PM
$testString. In line 37, double 3.5 is converted to integer 3. When we convert this
variable to a string (line 39), the variable’s value becomes "3".
Another option for conversion between types is casting (or type casting). Unlike set-
type, casting does not change a variable’s content. Rather, type casting creates a tempo-
rary copy of a variable’s value in memory. Lines 47–50 cast variable $data’s value to a
string, a double and an integer. Type casting is necessary when a specific data type
is required for an arithmetic operation.
The concatenation operator (.) concatenates strings. This combines multiple strings in
the same print statement (lines 47–50). A print statement may be split over multiple
lines; everything that is enclosed in the parentheses, terminated by a semicolon, is sent to the
client. PHP provides a variety of arithmetic operators, which we demonstrate in Fig. 29.4.
40
41 // test if variable $a is between 50 and 100, inclusive
42 elseif ( $a < 101 )
43 print( "Variable a is now between 50 and 100,
44 inclusive<br />" );
45 else
46 print( "Variable a is now greater than 100
47 <br />" );
48
49 // add 10 to constant VALUE
50 $test = 10 + VALUE;
51 print( "A constant plus constant
52 VALUE yields $test <br />" );
53
54 // add a string to an integer
55 $str = "3 dollars";
56 $a += $str;
57 print( "Adding a string to an integer yields $a
58 <br />" );
59 ?>
60 </body>
61 </html>
Line 14 declares variable $a and assigns it the value 5. Line 18 calls function define
to create a named constant. A constant is a value that cannot be modified once it is declared.
Function define takes two arguments: the name and value of the constant. An optional
third argument accepts a boolean value that specifies whether the constant is case insensi-
tive—constants are case sensitive by default.
Common Programming Error 29.4
Assigning a value to a constant after a constant is declared is a syntax error. 29.4
Line 21 adds constant VALUE to variable $a, which is a typical use of arithmetic oper-
ators. Line 26 uses the assignment operator *= to yield an expression equivalent to $a =
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1402 Friday, September 28, 2001 2:18 PM
$a * 2 (thus assigning $a the value 20). These assignment operators (i.e., +=, -=, *= and
/=) are syntactical shortcuts. Line 34 adds 40 to the value of variable $a.
Testing and Debugging Tip 29.1
Always initialize variables before using them. Doing so helps avoid subtle errors. 29.1
Strings are converted to integers when they are used in arithmetic operations (lines 54–
55). In line 55, the string value "3 dollars" is converted to the integer 3 before being
added to integer variable $a.
Testing and Debugging Tip 29.2
Function print can be used to display the value of a variable at a particular point during
a program’s execution. This is often helpful in debugging a script. 29.2
The words if, elseif and else are PHP keywords (Fig. 29.5), meaning that they are
reserved for implementing language features. PHP provides the capability to store data in
arrays. Arrays are divided into elements that behave as individual variables. Figure 29.6 dem-
onstrates techniques for array initialization and manipulation.
PHP keywords
Individual array elements are accessed by following the array-variable name with an
index enclosed in braces ([]). If a value is assigned to an array that does not exist, then the
array is created (line 18). Likewise, assigning a value to an element where the index is
omitted appends a new element to the end of the array (line 21). The for loop (lines 24–
25) prints each element’s value. Function count returns the total number of elements
in the array. Because array indices start at 0, the index of the last element is one less than
the total number of elements. In this example, the for loop terminates once the counter
($i) is equal to the number of elements in the array.
Line 31 demonstrates a second method of initializing arrays. Function array returns
an array that contains the arguments passed to it. The first item in the list is stored as the
first array element, the second item is stored as the second array element, and so on. Lines
32–33 use another for loop to print out each array element’s value.
In addition to integer indices, arrays can have nonnumeric indices (lines 39–41). For
example, indices Harvey, Paul and Tem are assigned the values 21, 18 and 23, respec-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1403 Friday, September 28, 2001 2:18 PM
tively. PHP provides functions for iterating through the elements of an array (lines 45–46).
Each array has a built-in internal pointer, which points to the array element currently being
referenced. Function reset sets the iterator to the first element of the array. Function key
returns the index of the element to which the iterator points, and function next moves the
iterator to the next element. The for loop continues to execute as long as function key
returns an index. Function next returns false when there are no additional elements in
the array. When this occurs, function key cannot return an index, and the script terminates.
Line 47 prints the index and value of each element.
Function array can also be used to initialize arrays with string indices. In order to
override the automatic numeric indexing performed by function array, use operator =>
as demonstrated on lines 54–61. The value to the left of the operator is the array index, and
the value to the right is the element’s value.
The foreach loop is a control structure that is specially designed for iterating
through arrays (line 64). The syntax for a foreach loop starts with the array to iterate
through, followed by the keyword as, followed by the variables to receive the index and
the value for each element. We use the foreach loop to print each element and value
of array $fourth.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1404 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1405 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1406 Friday, September 28, 2001 2:18 PM
pare each element to the string "banana", printing the elements that are greater than, less
than and equal to the string.
Relational operators (==, !=, <, <=, > and >=) can also be used to compare strings. Lines
33–38 use relational operators to compare each element of the array to the string "apple".
These operators are also used for numerical comparison with integers and doubles.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1407 Friday, September 28, 2001 2:18 PM
For more powerful string comparisons, PHP provides functions ereg and
preg_match, which use regular expressions to search a string for a specified pattern.
Function ereg uses Portable Operating System Interface (POSIX) extended regular
expressions, whereas function preg_match provides Perl-compatible regular expres-
sions. POSIX-extended regular expressions are a standard to which PHP regular expres-
sions conform. In this section, we use function ereg. Perl regular expressions are more
widely used than POSIX regular expressions. Support for Perl regular expressions also
eases migration from Perl to PHP. Consult PHP’s documentation for a list of differences
between the Perl and PHP implementations. Figure 29.8 demonstrates some of PHP’s reg-
ular expression capabilities.
27
28 // search for pattern 'Now' at the end of the string
29 if ( ereg( "Now$", $search ) )
30 print( "String 'Now' was found at the end
31 of the line.<br />" );
32
33 // search for any word ending in 'ow'
34 if ( ereg( "[[:<:]]([a-zA-Z]*ow)[[:>:]]", $search,
35 $match ) )
36 print( "Word found ending in 'ow': " .
37 $match[ 1 ] . "<br />" );
38
39 // search for any words beginning with 't'
40 print( "Words beginning with 't' found: ");
41
42 while ( eregi( "[[:<:]](t[[:alpha:]]+)[[:>:]]",
43 $search, $match ) ) {
44 print( $match[ 1 ] . " " );
45
46 // remove the first occurrence of a word beginning
47 // with 't' to find other instances in the string
48 $search = ereg_replace( $match[ 1 ], "", $search );
49 }
50
51 print( "<br />" );
52 ?>
53 </body>
54 </html>
We begin by assigning the string "Now is the time" to variable $search (line
14). Line 19’s condition calls function ereg to search for the literal characters Now inside
variable $search. If the pattern is found, ereg returns true, and line 20 prints a mes-
sage indicating that the pattern was found. We use single quotes ('') inside the print
statement to emphasize the search pattern. When located inside a string, content delimited
by single quotes is interpolated. If a print statement uses only single quotes, the content
inside the single quotes is not interpolated. For example,
print( '$name' );
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1409 Friday, September 28, 2001 2:18 PM
in a print statement would output $name. Function ereg takes two arguments: a regu-
lar expression pattern to search for (Now) and the string to search. Although case mixture
and whitespace are typically significant in patterns, PHP provides function eregi for
specifying case insensitive pattern matches.
In addition to literal characters, regular expressions can include special characters that
specify patterns. For example, the caret (^) special character matches the beginning of a
string. Line 24 searches the beginning of $search for the pattern Now.
The characters $, ^ and . are part of a special set of characters called metacharacters.
A dollar sign ($) searches for the specified pattern at the end of the string (line 29). Because
the pattern Now is not found at the end of $search, the body of the if statement (lines
30–31) is not executed. Note that Now$ is not a variable, it is a pattern that uses $ to search
for characters Now at the end of a string. Another special character is the period (.), which
matches any single character.
Lines 34–35 search (from left to right) for the first word ending with the letters ow.
Bracket expressions are lists of characters enclosed in braces ([ ]), which match a single
character from the list. Ranges can be specified by supplying the beginning and the end of
the range separated by a dash (-). For instance, the bracket expression [a-z] matches any
lowercase letter, and [A-Z] matches any uppercase letter. In this example, we combine
the two to create an expression that matches any letter. The special bracket expressions
[[:<:]] and [[:>]] match the beginning and end of a word, respectively.
The expression inside the parentheses, [a-zA-Z]*ow, matches any word ending in
ow. It uses the quantifier * to match the preceding pattern 0 or more times. Thus, [a-zA-
Z]*ow matches any number of characters followed by the literal characters ow. Figure 29.9
lists some PHP quantifiers.
Placing a pattern in parentheses stores the matched string in the array that is specified
in the third argument to function ereg. The first parenthetical pattern matched is stored in
the second array element, the second in the third array element, and so on. The first element
(i.e., index 0) stores the string matched for the entire pattern. The parentheses in lines 34–
35 result in Now being stored in variable $match[ 1 ].
Quantifier Matches
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1410 Friday, September 28, 2001 2:18 PM
again. Lines 42–49 use a while loop and the ereg_replace function to find all the
words in the string that begin with t. We will say more about this function momentarily.
The pattern used in this example, [[:<:]](t[[:alpha:]]+)[[:>:]], matches any
word beginning with the character t followed by one or more characters. The example uses
the character class [[:alpha:]] to recognize any alphabetic character. This is equiva-
lent to the [a-zA-Z] bracket expression that was used earlier. Figure 29.10 lists some
character classes that can be matched with regular expressions.
The quantifier + matches one or more instances of the preceding expression. The result
of the match is stored in $match[ 1 ]. Once a match is found, we print it on line 44.
We then remove it from the string on line 48, using function ereg_replace. Function
ereg_replace takes three arguments: the pattern to match, a string to replace the
matched string and the string to search. The modified string is returned. Here, we search for
the word that we matched with the regular expression, replace the word with an empty
string then assign the result back to $search. This allows us to match any other words
beginning with the character t in the string.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1411 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1412 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1413 Friday, September 28, 2001 2:18 PM
quest information, send and receive Web-based e-mail, perform online paging and take
advantage of various other online services. Figure 29.13 uses an XHTML form to collect
information about users for the purpose of adding users to mailing lists. The type of regis-
tration form in this example could be used by a software company to acquire profile infor-
mation before allowing users to download software.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1414 Friday, September 28, 2001 2:18 PM
47
48 <!-- create drop-down list containing book names -->
49 <select name = "book">
50 <option>Internet and WWW How to Program 2e</option>
51 <option>C++ How to Program 3e</option>
52 <option>Java How to Program 4e</option>
53 <option>XML How to Program 1e</option>
54 </select>
55 <br /><br />
56
57 <img src = "images/os.gif" alt = "Operating System" />
58 <br /><span style = "color: blue">
59 Which operating system are you currently using?
60 <br /></span>
61
62 <!-- create five radio buttons -->
63 <input type = "radio" name = "os" value = "Windows NT"
64 checked = "checked" />
65 Windows NT
66
67 <input type = "radio" name = "os" value =
68 "Windows 2000" />
69 Windows 2000
70
71 <input type = "radio" name = "os" value =
72 "Windows 98" />
73 Windows 98<br />
74
75 <input type = "radio" name = "os" value = "Linux" />
76 Linux
77
78 <input type = "radio" name = "os" value = "Other" />
79 Other<br />
80
81 <!-- create a submit button -->
82 <input type = "submit" value = "Register" />
83 </form>
84
85 </body>
86 </html>
Fig. 29.13 XHTML form for gathering user input (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1415 Friday, September 28, 2001 2:18 PM
Fig. 29.13 XHTML form for gathering user input (part 3 of 3).
The action attribute of the form element (line 18) indicates that, when the user
clicks Register, the form data will be posted to Fig. 29.14 for processing. Using
method = "post" appends form data to the browser request which contains the protocol
(i.e., HTTP) and the requested resource’s URL. Scripts located on the Web server’s
machine (or on a machine accessible through the network) can access the form data sent as
part of the request.
We assign a unique name (e.g., email) to each of the form’s input fields. When
Register is clicked, each field’s name and value are sent to the Web server.
Figure 29.14 can then accesses the submitted value for each specific field.
Good Programming Practice 29.2
Use meaningful XHTML object names for input fields. This makes PHP scripts that re-
trieve form data easier to understand. 29.2
Figure 29.14 processes the data posted by Fig. 29.13 and sends XHTML back to the
client. For each form field posted to a PHP script, PHP creates a global variable with the
same name as the field. For example, in line 32 of Fig. 29.13, an XHTML text box is cre-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1416 Friday, September 28, 2001 2:18 PM
ated and given the name email. Later in our PHP script (line 67), we access the field’s
value by using variable $email.
In lines 18–19, we determine whether the phone number entered by the user is valid.
In this case, the phone number must begin with an opening parenthesis, followed by an area
code, a closing parenthesis, an exchange, a hyphen and a line number. It is crucial to vali-
date information that will be entered into databases or used in mailing lists. For example,
validation can be used to ensure that credit-card numbers contain the proper number of
digits before the numbers are encrypted to a merchant. The design of verifying information
is called business logic (or business rules).
The expression \( matches the opening parenthesis of the phone number. Because we
want to match the literal character (, we escape its normal meaning by preceding it with
the \ character. The parentheses in the expression must be followed by three digits ([0-
9]{3}), a closing parenthesis, three digits, a literal hyphen and four additional digits. Note
that we use the ^ and $ symbols to ensure that no extra characters appear at either end of
the string.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1417 Friday, September 28, 2001 2:18 PM
35 <p>Hi
36 <span style = "color: blue">
37 <strong>
38 <?php print( "$fname" ); ?>
39 </strong>
40 </span>.
41 Thank you for completing the survey.<br />
42
43 You have been added to the
44 <span style = "color: blue">
45 <strong>
46 <?php print( "$book " ); ?>
47 </strong>
48 </span>
49 mailing list.
50 </p>
51 <strong>The following information has been saved
52 in our database:</strong><br />
53
54 <table border = "0" cellpadding = "0" cellspacing = "10">
55 <tr>
56 <td bgcolor = "#ffffaa">Name </td>
57 <td bgcolor = "#ffffbb">Email</td>
58 <td bgcolor = "#ffffcc">Phone</td>
59 <td bgcolor = "#ffffdd">OS</td>
60 </tr>
61
62 <tr>
63 <?php
64
65 // print each form field’s value
66 print( "<td>$fname $lname</td>
67 <td>$email</td>
68 <td>$phone</td>
69 <td>$os</td>" );
70 ?>
71 </tr>
72 </table>
73
74 <br /><br /><br />
75 <div style = "font-size: 10pt; text-align: center">
76 This is only a sample form.
77 You have not been added to a mailing list.
78 </div>
79 </body>
80 </html>
Fig. 29.14 Obtaining user input through forms (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1418 Friday, September 28, 2001 2:18 PM
If the regular expression is matched, then the phone number is determined to be valid,
and an XHTML document is sent to the client, thanking the user for completing the form.
Otherwise, the body of the if statement is executed, and an error message is printed.
Function die (line 31) terminates script execution. In this case, if the user did not enter
a correct telephone number, we do not want to continue executing the rest of the script, so
we call function die.
Software Engineering Observation 29.1
Use business logic to ensure that invalid information is not stored in databases. When pos-
sible, use JavaScript to validate form data while conserving server resources. However,
some data, such as passwords, must be validated on the server-side. 29.1
www.php.net/manual/en/ref.mcrypt.php
[Note: These functions are not available for Windows distributions of PHP.]
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1419 Friday, September 28, 2001 2:18 PM
59 </tr>
60
61 <tr>
62 <td colspan = "1">
63 <input type = "submit" name = "Enter"
64 value = "Enter" style = "height: 23px;
65 width: 47px" />
66 </td>
67 <td colspan = "2">
68 <input type = "submit" name = "NewUser"
69 value = "New User"
70 style = "height: 23px" />
71 </td>
72 </tr>
73 </table>
74 </form>
75 </body>
76 </html>
Fig. 29.15 XHTML form for obtaining a username and password (part 3 of 3).
Figure 29.16 verifies the client’s username and password by querying a database. The
valid user list and each user’s respective password is contained within a simple text file
(Fig. 29.17). Existing users are validated against this text file, and new users are appended
to it.
First, lines 13–16 check whether the user has submitted a form without specifying a
username or password. Variable names, when preceded by the logical negation operator
(!), return true if they are empty or are set to 0. Logical operator OR (||) returns true
if either of the variables are empty or are set to 0. If this is the case, function fields-
Blank is called (line 144), which notifies the client that all form fields must be completed.
We determine whether we are adding a new user (line 19 in Fig. 29.16) by calling func-
tion isset to test whether variable $NewUser has been set. When a user submits the
XHTML form in password.html, the user clicks either the New User or Enter button.
This sets either variable $NewUser or variable $Enter, respectively. If variable
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1421 Friday, September 28, 2001 2:18 PM
$NewUser has been set, lines 22–36 are executed. If this variable has not been set, we
assume the user has pressed the Enter button, and lines 42–75 execute.
50 $userVerified = 0;
51
52 // read each line in file and check username
53 // and password
54 while ( !feof( $file ) && !$userVerified ) {
55
56 // read line from file
57 $line = fgets( $file, 255 );
58
59 // remove newline character from end of line
60 $line = chop( $line );
61
62 // split username and password
63 $field = split( ",", $line, 2 );
64
65 // verify username
66 if ( $USERNAME == $field[ 0 ] ) {
67 $userVerified = 1;
68
69 // call function checkPassword to verify
70 // user’s password
71 if ( checkPassword( $PASSWORD, $field )
72 == true )
73 accessGranted( $USERNAME );
74 else
75 wrongPassword();
76 }
77 }
78
79 // close text file
80 fclose( $file );
81
82 // call function accessDenied if username has
83 // not been verified
84 if ( !$userVerified )
85 accessDenied();
86 }
87
88 // verify user password and return a boolean
89 function checkPassword( $userpassword, $filedata )
90 {
91 if ( $userpassword == $filedata[ 1 ] )
92 return true;
93 else
94 return false;
95 }
96
97 // print a message indicating the user has been added
98 function userAdded( $name )
99 {
100 print( "<title>Thank You</title></head>
101 <body style = \"font-family: arial;
102 font-size: 1em; color: blue\">
103 <strong>You have been added
104 to the user list, $name.
Fig. 29.16 Verifying a username and password (part 2 of 4).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1423 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1424 Friday, September 28, 2001 2:18 PM
1 account1,password1
2 account2,password2
3 account3,password3
4 account4,password4
5 account5,password5
6 account6,password6
7 account7,password7
8 account8,password8
9 account9,password9
10 account10,password10
To add a new user, we open the file fig29_17.txt by calling function fopen and
assigning the file handle that is returned to variable $file (lines 22–23). A file handle is
a number assigned to the file by the Web server for purposes of identification. Function
fopen takes two arguments: The name of the file and the mode in which to open it. The
possible modes include read, write and append. Here, we open the file in append
mode, which opens it for writing, but does not write over the previous contents of the file.
If an error occurs in opening the file, function fopen does not return a file handle and an
error message is printed (lines 27–29), and script execution is terminated by calling func-
tion die (line 30). If the file opens properly, function fputs (line 35) writes the name and
password to the file. To specify a new line, we use the newline character (\n). This places
each username and password pair on a separate line in the file. On line 36, we pass the vari-
able $USERNAME to function userAdded (line 98). Function userAdded prints a
message to the client to indicate that the username and password were added to the file.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1425 Friday, September 28, 2001 2:18 PM
If we are not adding a new user, we open the file fig29_17.txt for reading. This
is accomplished by using function fopen and assigning the file handle that is returned to
variable $file (lines 42–43). Lines 44–47 execute if an error occurs in opening the file.
The while loop (line 54) repeatedly executes the code enclosed in its curly braces (lines
57–75) until the test condition in parentheses evaluates to false. Before we enter the
while loop, we set the value of variable $userVerified to 0. In this case, the test con-
dition (line 54) checks to ensure that the end of the file has not been reached and that the
user has not been found in the password file. Logical operator AND (&&) connects the two
conditions. Function feof, preceded by the logical negation operator (!), returns true
when there are more lines to be read in the specified file. When the logical negation oper-
ator (!) is applied to the $userVerified variable, true is returned if the variable is
empty or is set to 0.
Each line in fig29_17.txt consists of a username and password pair that is separated
by a comma and followed by a newline character. A line from this file is read using function
fgets (line 57) and is assigned to variable $line.
This function takes two arguments: The file handle to read, and the maximum number of
characters to read. The function reads until a newline character is encountered, the end of the
file is encountered or the number of characters read reaches one less than the number speci-
fied in the second argument.
For each line read, function chop is called (line 60) to remove the newline character
from the end of the line. Then, function split is called to divide the string into substrings
at the specified separator, or delimiter (in this case, a comma). For example, function
split returns an array containing ("account1" and "password1") from the first
line in fig29_17.txt. This array is assigned to variable $field.
Line 66 determines whether the username entered by the user matches the one returned
from the text file (stored in the variable $field[ 0 ]). If the condition evaluates to true,
then the $userVerified variable is set to 1, and lines 71–75 execute. On line 71, func-
tion checkPassword (line 89) is called to verify the user’s password. Variables
$PASSWORD and $field are passed to the function. Function checkPassword com-
pares the user’s password to the password in the file. If they match, true is returned (line
92), whereas false is returned if they do not (line 94). If the condition evaluates to true,
then function accessGranted (line 110) is invoked. Variable $USERNAME is passed to
the function, and a message notifies the client that permission has been granted. However,
if the condition evaluates to false, then function wrongPassword is invoked (line
121), which notifies the client that an invalid password was entered.
When the while loop is complete, either as a result of matching a username or of
reaching the end of the file, we are finished reading from fig29_17.txt. We call func-
tion fclose (line 80) to close the file. Line 84 checks whether the $userVerified
variable is empty or has a value of 0, which indicates that the username was not found in
the fig29_17.txt file. If this returns true, function accessDenied is called (line
132). This function notifies the client that access to the server has been denied.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1426 Friday, September 28, 2001 2:18 PM
mation as user accounts, passwords, credit-card numbers, mailing lists and product inven-
tories. PHP offers built-in support for a wide variety of databases. In this example, we use
MySQL. Visit www.deitel.com to locate information on setting up a MySQL database.
From a Web browser, the client enters a database field name that is sent to the Web server.
The PHP script is then executed; the script builds the select query, queries the database and
sends a record set in the form of XHTML to the client. The rules and syntax for writing
such a query string are discussed in Chapter 17, Python Database Application Program-
ming Interface (DB-API).
Figure 29.18 is a Web page that posts form data containing a database field to the
server. The PHP script in Fig. 29.19 processes the form data.
Line 17 creates an XHTML form, specifying that the data submitted from the form
will be sent to Fig. 29.19. Lines 22–28 add a select box to the form, set the name of the
select box to select, and set its default selection to *. This value specifies that all records
are to be retrieved from the database. Each database field is set as an option in the select
box.
.
35 </body>
36 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1428 Friday, September 28, 2001 2:18 PM
33 }
34 ?>
35
36 <h3 style = "color: blue">
37 Search Results</h3>
38
39 <table border = "1" cellpadding = "3" cellspacing = "2"
40 style = "background-color: #ADD8E6">
41
42 <?php
43
44 // fetch each record in result set
45 for ( $counter = 0;
46 $row = mysql_fetch_row( $result );
47 $counter++ ){
48
49 // build table to display results
50 print( "<tr>" );
51
52 foreach ( $row as $key => $value )
53 print( "<td>$value</td>" );
54
55 print( "</tr>" );
56 }
57
58 mysql_close( $database );
59 ?>
60
61 </table>
62
63 <br />Your search yielded <strong>
64 <?php print( "$counter" ) ?> results.<br /><br /></strong>
65
66 <h5>Please email comments to
67 <a href = "mailto:[email protected]">
68 Deitel and Associates, Inc.
69 </a>
70 </h5>
71
72 </body>
73 </html>
Fig. 29.19 Querying a database and displaying the results (part 2 of 3).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1429 Friday, September 28, 2001 2:18 PM
Fig. 29.19 Querying a database and displaying the results (part 3 of 3).
Figure 29.19 builds an SQL-query string with the specified field name and sending it
to the database-management system. Line 18 concatenates the posted field name to a
SELECT query. Line 21 calls function mysql_connect to connect to the MySQL data-
base. We pass three arguments to function mysql_connect: The server’s hostname, a
username and a password. This function returns a database handle—a reference to the
object that is used to represent PHP’s connection to the database—which we assign to vari-
able $database. If the connection to MySQL fails, function die is called, which outputs
an error message and terminates the script. Line 26 calls function mysql_select_db to
specify the database to be queried (in this case, Products). Function die is called if the
database cannot be opened. To query the database, line 30 calls function mysql_query,
specifying the query string and the database to query. Function mysql_query returns an
object containing the result set of the query, which we assign to variable $result. If the
query of the database fails, a message is output to the client indicating that the query failed
to execute. Function die is then called, accepting function mysql_error as a parameter
instead of a string message. In the event that the query fails, function mysql_error
returns any error strings from the database. Function mysql_query can also be used to
execute SQL statements, such as INSERT or DELETE, that do not return results.
Lines 45–56 use a for loop to iterate through each record in the result set while con-
structing an XHTML table from the results. The loop condition calls function
mysql_fetch_row to return an array containing the elements of each row in the result
set of our query ($result). The array is then stored in variable $row. Lines 52–53 use a
foreach loop to construct individual cells for each of the elements in the row. The
foreach loop takes the name of the array ($row), iterates through each index value of
the array ($key) and stores the element in variable $value. Each element of the array is
then printed as an individual cell. For each row retrieved, variable $counter is incre-
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1430 Friday, September 28, 2001 2:18 PM
mented by one. When the end of the result set has been reached, undef (false) is
returned by function mysql_fetch_row, which terminates the for loop.
After all rows of the result set have been displayed, the database is closed (line 58), and
the table’s closing tag is written (line 61). The number of results contained in $counter
is printed in line 64.
29.8 Cookies
A cookie is a text file that a Web site stores on a client’s computer to maintain information
about that client during and between browsing sessions. A Web site can store a cookie on
a client’s computer to record user preferences and other information, which the Web site
can retrieve during that client’s subsequent visits. For example, many Web sites use cook-
ies to store clients’ zip codes. The Web site can retrieve the zip code from the cookie and
provide weather reports and news updates tailored to the user’s region. Web sites also can
use cookies to track information about client activity. Analysis of information collected via
cookies can reveal the popularity of various Web sites or products. In addition, marketers
can use cookies to determine the effects of particular advertising campaigns.
Web sites store cookies on users’ hard drives, which raises issues regarding security
and privacy. Web sites should not store critical information, such as credit-card numbers or
passwords in cookies, because cookies are text files that any program can read. Several
cookie features address security and privacy concerns. A particular server can access only
the cookies that server placed on the client. For example, a Web application running on
www.deitel.com cannot access cookies that the Web site www.prenhall.com/
deitel may have placed on the client’s computer. A cookie also has a maximum age,
after which the Web browser deletes that cookie. Users who are concerned about the pri-
vacy and security implications of cookies can disable cookies in their Web browsers. How-
ever, the disabling of cookies can prevent those users from interacting with Web sites that
rely on cookies to function properly.
Microsoft Internet Explorer stores cookies as small text files on the client’s hard drive.
The information stored in the cookie is sent back to the Web server from which it originated
whenever the user requests a Web page from that particular server. The Web server can
send the client XHTML output that reflects the preferences or information that is stored in
the cookie.
Figure 29.20 uses a script to write a cookie to the client’s machine. The script displays
an XHTML form that allows a user to enter a name, height and favorite color. When the
user clicks the Write Cookie button, the script in Fig. 29.21 executes.
Software Engineering Observation 29.2
Some clients do not accept cookies. When a client declines a cookie, the browser application
normally informs the client that the site may not function correctly without cookies enabled.
29.2
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1431 Friday, September 28, 2001 2:18 PM
Figure 29.21 calls function setcookie to set the cookies to the values passed from
cookies.html. Function setcookie prints XHTML header information, therefore, it
needs to be called before any other XHTML (including comments) is printed.
Function setcookie takes the name of the cookie to be set as the first argument, fol-
lowed by the value to be stored in the cookie. For example, line 7 sets the name of the
cookie to "Name" and the value to variable $NAME, which is passed to the script from
Fig. 29.20. The optional third argument indicates the expiration date of the cookie. In this
example, we set the cookies to expire in five days by taking the current time, which is
returned by function time, and adding the number of seconds after which the cookie
should expire (60 seconds * 60 minutes * 24 hours * 5 days). If no expiration date is spec-
ified, the cookie only lasts until the end of the current session, which is the total time until
the user closes the browser. If only the name argument is passed to function setcookie,
the cookie is deleted from the cookie database. Lines 12–37 send a Web page to the client
indicating that the cookie has been written and listing the values that are stored in the
cookie. Lines 34–35 provide a link to Fig. 29.24.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1432 Friday, September 28, 2001 2:18 PM
33 </html>
1 <?php
2 // Fig. 29.21: fig29_21.php
3 // Program to write a cookie to a client's machine
4
5 // write each form field’s value to a cookie and set the
6 // cookie’s expiration date
7 setcookie( "Name", $NAME, time() + 60 * 60 * 24 * 5 );
8 setcookie( "Height", $HEIGHT, time() + 60 * 60 * 24 * 5 );
9 setcookie( "Color", $COLOR, time() + 60 * 60 * 24 * 5 );
10 ?>
11
12 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
13 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
14
15 <html xmlns = "http://www.w3.org/1999/xhtml">
16 <head>
17 <title>Cookie Saved</title>
18 </head>
19
20 <body style = "font-family: arial, sans-serif">
21 <p>The cookie has been set with the following data:</p>
22
23 <!-- print each form field’s value -->
24 <br /><span style = "color: blue">Name:</span>
25 <?php print( $NAME ) ?><br />
26
27 <span style = "color: blue">Height:</span>
28 <?php print( $HEIGHT ) ?><br />
Fig. 29.21 Writing a cookie to the client (part 1 of 2).
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1433 Friday, September 28, 2001 2:18 PM
29
30 <span style = "color: blue">Favorite Color:</span>
31
32 <span style = "color: <?php print( "$COLOR\">$COLOR" ) ?>
33 </span><br />
34 <p>Click <a href = "fig29_24.php">here</a>
35 to read the saved cookie.</p>
36 </body>
37 </html>
If the client is Internet Explorer, cookies are stored in the Cookies directory on the
client’s machine. Figure 29.22 shows the contents of this directory prior to the execution of
Fig. 29.21. After the cookie is written, a text file is added to the directory. In Fig. 29.23, the
file petel@localhost appears in the Cookies directory.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1434 Friday, September 28, 2001 2:18 PM
30
31 </table>
32 </body>
33 </html>
Lines 24–28 iterate through this array using a foreach loop, printing out the name
and value of each cookie in an XHTML table. The foreach loop takes the name of the
array ($HTTP_COOKIE_VARS) and iterates through each index value of the array
($key). In this case, the index value is the name of each cookie. Each element is then stored
in variable $value, and these values become the individual cells of the table.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1436 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1437 Friday, September 28, 2001 2:18 PM
SUMMARY
• PHP is an open-source technology that is supported by a large community of users and developers.
PHP is platform independent; implementations exist for all major UNIX, Linux and Windows op-
erating systems.
• PHP code is embedded directly into XHTML documents and provides support for a wide variety
of different databases. PHP scripts typically have the file extension .php.
• In PHP, code is inserted in special scripting delimiters that begin with <?php and end with ?>.
• Variables are preceded by the $ special symbol. A variable is created automatically when it is first
encountered by the PHP interpreter.
• PHP statements are terminated with a semicolon (;). Comments begin with two forward slashes
(//). Text to the right of the slashes is ignored by the interpreter.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1438 Friday, September 28, 2001 2:18 PM
• When a variable is encountered inside a double-quoted ("") string, PHP uses interpolation to re-
place the variable with its associated data.
• PHP variables are multitype, meaning that they can contain different types of data— integers,
floating-point numbers or strings.
• Type casting converts between data types without changing the value of the variable itself.
• The concatenation operator (.) appends the string on the right of the operator to the string on the
left.
• Uninitialized variables have the value undef, which evaluates to different values, depending on
the context. When undef is used in a numeric context, it evaluates to 0. When undef is inter-
preted in a string context, it evaluates to an empty string ("").
• Strings are automatically converted to integers when they are used in arithmetic operations.
• PHP provides the capability to store data in arrays. Arrays are divided into elements that behave
as individual variables.
• Individual array elements are accessed by following the array-variable name with the index num-
ber in braces ([]). If a value is assigned to an array that does not exist, the array is created. In ad-
dition to integer indices, arrays can also have nonnumeric indices.
• Function count returns the total number of elements in the array. Function array takes a list of
arguments and returns an array. Function array may also be used to initialize arrays with string
indices.
• Function reset sets the iterator to the first element of the array. Function key returns the index
of the current element. Function next moves the iterator to the next element.
• The foreach loop is a control structure that is specifically designed for iterating through arrays.
• Text manipulation in PHP is usually done with regular expressions—a series of characters that
serve as pattern-matching templates (or search criteria) in strings, text files and databases. This
feature allows complex searching and string processing to be performed using relatively simple
expressions.
• Function strcmp compares two strings. If the first string alphabetically precedes the second
string, -1 is returned. If the strings are equal, 0 is returned. If the first string alphabetically follows
the second string, 1 is returned.
• Relational operators (==, !=, <, <=, > and >=) can be used to compare strings. These operators
can also be used for numerical comparison of integers and doubles.
• For more powerful string comparisons, PHP provides functions ereg and preg_match, which
use regular expressions to search a string for a specified pattern.
• Function ereg uses POSIX extended regular expressions, whereas function preg_match pro-
vides Perl compatible regular expressions.
• The caret (^) matches the beginning of a string. A dollar sign ($) searches for the specified pattern
at the end of the string. The period (.) is a special character that is used to match any single char-
acter. The \ character is an escape character in regular expressions.
• Bracket expressions are lists of characters enclosed in square brackets ([ ]) that match a single
character from the list. Ranges can be specified by supplying the beginning and the end of the
range separated by a dash (-).
• The special bracket expressions [[:<:]] and [[:>]] match the beginning and end of a word.
• Character class [[:alpha:]] matches any alphabetic character.
• The quantifier + matches one or more instances of the preceding expression.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1439 Friday, September 28, 2001 2:18 PM
• Function ereg_replace takes three arguments: The pattern to search, a string to replace the
matched string and the string to search.
• PHP stores environment variables and their values in the $GLOBALS array. Individual array vari-
ables can be accessed directly by using an element’s key from the $GLOBALS array as a variable.
• For each form field posted to a PHP script, PHP creates a variable with the same name as the field.
• Function die terminates script execution.
• Passing a string argument to the die function prints that string a message before stopping program
execution.
• Function isset tests whether a variable has been set.
• Function fopen opens a text file.
• A file handle is a number that the server assigns to the file and is used when the server accesses
the file.
• Function fopen takes two arguments: The name of the file and the mode in which to open the
file. The possible modes include read, write and append.
• Function feof, preceded by the logical negation operator (!), returns true when there are more
lines to be read in a specified file.
• A line from a text file is read using function fgets. This function takes two arguments: The file
handle to read and the maximum number of characters to read.
• Function chop removes newline characters from the end of a line. Function split divides a
string into substrings at the specified separator or delimiter. Function fclose closes a file.
• Function mysql_connect connects to a MySQL database. This function returns a database han-
dle—a reference to the object which is used to represent PHP’s connection to the database. Function
mysql_query returns an object that contains the result set of the query. Function mysql_error
returns any error strings from the database if the query fails. Function mysql_fetch_row returns
an array that contains the elements of each row in the result set of a query.
• Cookies maintain state information for a particular client who uses a Web browser. Cookies are
often used to record user preferences or other information that will be retrieved during a client’s
subsequent visits to a Web site. On the server side, cookies can be used to track information about
client activity.
• The data stored in the cookie is sent back to the Web server from which it originated whenever the
user requests a Web page from that particular server.
• Function setcookie sets a cookie. Function setcookie takes as the first argument the name
of the cookie to be set, followed by the value to be stored in the cookie.
• PHP creates variables containing contents of a cookie, similar to when values are posted via forms.
• PHP creates array $HTTP_COOKIE_VARS, which contains all the cookie values indexed by their
names.
TERMINOLOGY
$ metacharacter assignment operator
$GLOBALS variable backslash
$HTTP_COOKIE_VARS bracket expression
append caret metacharacter (^) in PHP
array function character class
array_splice function chomp function
as comparison operator
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1440 Friday, September 28, 2001 2:18 PM
SELF-REVIEW EXERCISES
29.1 State whether the following are true or false. If false, explain why.
a) PHP code is embedded directly into XHTML.
b) PHP function names are case sensitive.
c) The strval function permanently changes the type of a variable into a string.
d) Conversion between data types happens automatically when a variable is used in a con-
text that requires a different data type.
e) The foreach loop is a control structure that is designed specifically for iterating over
arrays.
f) Relational operators can be used for alphabetic and numeric comparison.
g) The quantifier +, when used in a regular expression, matches any number of the preced-
ing pattern.
h) Opening a file in append mode causes the file to be overwritten.
i) Cookies are stored on the server computer.
j) The * arithmetic operator has higher precedence than the + operator.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1441 Friday, September 28, 2001 2:18 PM
EXERCISES
29.3 Write a PHP program named states.php that creates a scalar value $states with the
value "Mississippi Alabama Texas Massachusetts Kansas". Write a program that
does the following:
a) Search for a word in scalar $states that ends in xas. Store this word in element 0 of
an array named $statesArray.
b) Search for a word in $states that begins with k and ends in s. Perform a case-insen-
sitive comparison. Store this word in element 1 of $statesArray.
c) Search for a word in $states that begins with M and ends in s. Store this element in
element 2 of the array.
d) Search for a word in $states that ends in a. Store this word in element 3 of the array.
e) Search for a word in $states at the beginning of the string that starts with M. Store this
word in element 4 of the array.
f) Output the array $statesArray to the screen.
29.4 In the text, we presented environment variables. Develop a program that determines whether
the client is using Internet Explorer. If so, determine the version number and send that information
back to the client.
29.5 Modify the program in Fig. 29.14 to save information sent to the server into a text file. Each
time a user submits a form, open the text file and print the file’s contents.
29.6 Write a PHP program that tests whether an e-mail address is input correctly. Verify that the in-
put begins with series of characters, followed by the @ character, another series of characters, a period
(.) and a final series of characters. Test your program, using both valid and invalid email addresses.
29.7 Using environment variables, write a program that logs the address (obtained with the
REMOTE_ADDR environment variable) requesting information from the Web server.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1442 Friday, September 28, 2001 2:18 PM
29.8 Write a PHP program that obtains a URL and a description of that URL from a user and stores
the information into a database using MySQL. The database should be named URLs, and the table
should be named Urltable. The first field of the database, which is named URL, should contain an
actual URL, and the second, which is named Description, should contain a description of that
URL. Use www.deitel.com as the first URL, and input Cool site! as its description. The sec-
ond URL should be www.php.net, and the description should be The official PHP site. Af-
ter each new URL is submitted, print the complete results of the database in a table.
WORKS CITED
1. S.S. Bakken, et al., “Introduction to PHP,” 17 April 2000 <www.zend.com/zend/hof/
rasmus.php>.
2. S.S. Bakken, et al., “A Brief History of PHP,” January 2001 <www.php.net/manual/en/
intro-history.php>.s
[***SOLUTIONS***]
SELF-REVIEW EXERCISES
29.1 State whether the following are true or false. If false, explain why.
a) PHP code is embedded directly into XHTML.
ANS: True.
b) PHP function names are case sensitive.
ANS: False. Function names are not case sensitive.
c) The strval function permanently changes the type of a variable into a string.
ANS: False. The strval function returns the converted value, but does not affect the orig-
inal variable.
d) Conversion between data types happens automatically when a variable is used in a con-
text that requires a different data type.
ANS: True.
e) The foreach loop is a control structure that is designed specifically for iterating over
arrays.
ANS: True.
f) Relational operators can be used for alphabetic and numeric comparison.
ANS: True.
g) The quantifier +, when used in a regular expression, matches any number of the preced-
ing pattern.
ANS: False. The quantifier + matches one or more of the preceding patterns.
h) Opening a file in append mode causes the file to be overwritten.
ANS: False. Opening a file in write mode causes the file to be overwritten.
i) Cookies are stored on the server computer.
ANS: False. Cookies are stored on the client’s computer.
j) The * arithmetic operator has higher precedence than the + operator.
ANS: True.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1443 Friday, September 28, 2001 2:18 PM
EXERCISES
29.3 Write a PHP program named states.php that creates a scalar value $states with the
value "Mississippi Alabama Texas Massachusetts Kansas". Write a program that
does the following:
a) Search for a word in scalar $states that ends in xas. Store this word in element 0 of
an array named $statesArray.
b) Search for a word in $states that begins with k and ends in s. Perform a case-insen-
sitive comparison. Store this word in element 1 of $statesArray.
c) Search for a word in $states that begins with M and ends in s. Store this element in
element 2 of the array.
d) Search for a word in $states that ends in a. Store this word in element 3 of the array.
e) Search for a word in $states at the beginning of the string that starts with M. Store this
word in element 4 of the array.
f) Output the array $statesArray to the screen.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1444 Friday, September 28, 2001 2:18 PM
16 $statesArray[ 0 ] = $matches[ 1 ];
17
18 if ( eregi( "[[:<:]](k[[:alpha:]]+s)[[:>:]]", $states,
19 $matches ) )
20 $statesArray[ 1 ] = $matches[ 1 ];
21
22 if ( eregi( "[[:<:]](M[[:alpha:]]+s)[[:>:]]", $states,
23 $matches ) )
24 $statesArray[ 2 ] = $matches[ 1 ];
25
26 if ( eregi( "[[:<:]]([[:alpha:]]+a)[[:>:]]", $states,
27 $matches ) )
28 $statesArray[ 3 ] = $matches[ 1 ];
29
30 if ( eregi( "^(M[[:alpha:]]+)[[:>:]]", $states,
31 $matches ) )
32 $statesArray[ 4 ] = $matches[ 1 ];
33
34 foreach ( $statesArray as $key => $value )
35 print( "$value <br />" );
36 ?>
37 </body>
38 </html>
29.4 In the text, we presented environment variables. Develop a program that determines whether
the client is using Internet Explorer. If so, determine the version number and send that information
back to the client.
ANS:
29.5 Modify the program in Fig. 29.14 to save information sent to the server into a text file. Each
time a user submits a form, open the text file and print the file’s contents.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1446 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1447 Friday, September 28, 2001 2:18 PM
82 }
83
84 // read each line in file
85 while ( !feof( $file ) ) {
86
87 // read line from file
88 $line = fgets( $file, 255 );
89
90 // remove newline character from end of line
91 $line = chop( $line );
92
93 // split each value
94 $field = split( ",", $line, 5 );
95
96 // print text file data
97 print( "<tr><td>$field[0] $field[1]</td>
98 <td>$field[2]</td>
99 <td>$field[3]</td>
100 <td>$field[4]</td></tr>" );
101
102 }
103
104 // close text file
105 fclose( $file );
106
107 ?>
108 </table>
109
110 <br /><br /><br />
111 <div style = "font-size: 10pt; text-align: center">
112 This is only a sample form.
113 You have not been added to a mailing list.
114 </div>
115 </body>
116 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1448 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1449 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1450 Friday, September 28, 2001 2:18 PM
29.6 Write a PHP program that tests whether an e-mail address is input correctly. Verify that the
input begins with series of characters, followed by the @ character, another series of characters, a pe-
riod (.) and a final series of characters. Test your program, using both valid and invalid email ad-
dresses.
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1451 Friday, September 28, 2001 2:18 PM
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1452 Friday, September 28, 2001 2:18 PM
29.7 Using environment variables, write a program that logs the address (obtained with the
REMOTE_ADDR environment variable) requesting information from the Web server.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1453 Friday, September 28, 2001 2:18 PM
29.8 Write a PHP program that obtains a URL and a description of that URL from a user and stores
the information into a database using MySQL. The database should be named URLs, and the table
should be named Urltable. The first field of the database, which is named URL, should contain an
actual URL, and the second, which is named Description, should contain a description of that
URL. Use www.deitel.com as the first URL, and input Cool site! as its description. The sec-
ond URL should be www.php.net, and the description should be The official PHP site. Af-
ter each new URL is submitted, print the complete results of the database in a table.
ANS:
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1454 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1456 Friday, September 28, 2001 2:18 PM
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29.fm Page 1455 Friday, September 28, 2001 2:18 PM
37 {
38 print( "Could not execute query! <br />" );
39 die( mysql_error() );
40 }
41
42 while ( $row = mysql_fetch_row( $result ) )
43 print( "<tr><td>" . $row[ 0 ] . "</td><td>"
44 . $row[ 1 ] . "</td></tr>" );
45 ?>
46 </table>
47 </body>
48 </html>
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01
pythonhtp1_29IX.fm Page 1 Friday, September 28, 2001 2:23 PM
Index 1
2 Index
T
N time function 1431
newline character (\n) 1424
next function 1403
V
validation 1416
O
operator precedence chart 1435
operators.php 1400 W
Web server 1426, 1430
while loop 1425
P write 1424
parenthetical memory in PHP
1409
password.html 1418
password.php 1421
Perl (Practical Extraction and
Report Language) 1396
Perl-compatible regular
expression 1407
PHP comment 1397
.php extension 1398
PHP keyword 1402
PHP quantifier 1409
Portable Operating System
Interface (POSIX) 1407
POSIX extended regular
expression 1407
post request type 1415
Practical Extraction and Report
Language (Perl) 1396
preg_match function 1407
print function 1397
print statement 1396
© Copyright 1992–2002 by Deitel & Associates, Inc. All Rights Reserved. 9/28/01