Python Net Binder
Python Net Binder
David M. Beazley
http://www.dabeaz.com
1. Network Fundamentals !
2. Client Programming!
!
3. Internet Data Handling! !
4. Web Programming Basics!
5. Advanced Networks!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
4
32
49
65
93
Threaded Server
Forking Server (Unix)
Asynchronous Server
Utility Functions
Omissions
Discussion
0. Introduction
Introduction
Support Files
Python Networking
This Course
Standard Library
Prerequisites
0-1
0-2
0-3
0-4
0-5
0-6
1. Network Fundamentals
Network Fundamentals
The Problem
Two Main Issues
Network Addressing
Standard Ports
Using netstat
Connections
Client/Server Concept
Request/Response Cycle
Using Telnet
Data Transport
Sockets
Socket Basics
Socket Types
Using a Socket
TCP Client
Exercise 1.1
Server Implementation
TCP Server
Exercise 1.2
Advanced Sockets
Partial Reads/Writes
Sending All Data
End of Data
Data Reassembly
Timeouts
Non-blocking Sockets
Socket Options
Sockets as Files
Exercise 1.3
Odds and Ends
UDP : Datagrams
UDP Server
UDP Client
Unix Domain Sockets
Raw Sockets
Sockets and Concurrency
1-1
1-2
1-3
1-4
1-5
1-6
1-7
1-8
1-9
1-10
1-11
1-12
1-13
1-14
1-15
1-16
1-17
1-18
1-19
1-27
1-28
1-29
1-31
1-32
1-33
1-34
1-35
1-36
1-37
1-39
1-40
1-41
1-42
1-43
1-44
1-45
1-46
1-50
1-51
1-52
1-53
1-54
1-55
2. Client Programming
Client Programming
Overview
urllib Module
urllib protocols
HTML Forms
Web Services
Parameter Encoding
Sending Parameters
Response Data
Response Headers
Response Status
Exercise 2.1
urllib Limitations
urllib2 Module
urllib2 Example
urllib2 Requests
Requests with Data
Request Headers
urllib2 Error Handling
urllib2 Openers
urllib2 build_opener()
Example : Login Cookies
Discussion
Exercise 2.2
Limitations
ftplib
Upload to a FTP Server
httplib
smtplib
Exercise 2.3
2-1
2-2
2-3
2-5
2-6
2-8
2-9
2-10
2-12
2-13
2-14
2-15
2-16
2-17
2-18
2-19
2-20
2-21
2-22
2-23
2-24
2-25
2-26
2-27
2-28
2-29
2-30
2-31
2-32
2-33
3-1
3-2
3-3
3-4
3-6
3-7
3-9
3-10
3-11
Exercise 3.1
XML and ElementTree
etree Parsing Basics
Obtaining Elements
Iterating over Elements
Element Attributes
Search Wildcards
cElementTree
Tree Modification
Tree Output
Iterative Parsing
Exercise 3.2
JSON
Sample JSON File
Processing JSON Data
Exercise 3.3
3-13
3-14
3-15
3-17
3-18
3-19
3-20
3-22
3-23
3-24
3-25
3-28
3-29
3-30
3-31
3-32
4. Web Programming
Web Programming Basics
Introduction
Overview
Disclaimer
HTTP Explained
HTTP Client Requests
HTTP Responses
HTTP Protocol
Content Encoding
Payload Packaging
Exercise 4.1
Role of Python
Typical Python Tasks
Content Generation
Example : Page Templates
Commentary
Exercise 4.2
HTTP Servers
A Simple Web Server
Exercise 4.3
A Web Server with CGI
CGI Scripting
CGI Example
CGI Mechanics
Classic CGI Interface
CGI Query Variables
cgi Module
CGI Responses
Note on Status Codes
CGI Commentary
Exercise 4.4
WSGI
WSGI Interface
4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
4-14
4-15
4-17
4-18
4-19
4-20
4-21
4-22
4-23
4-24
4-27
4-28
4-29
4-30
4-31
4-32
4-33
4-34
4-35
4-36
WSGI Example
WSGI Applications
WSGI Environment
Processing WSGI Inputs
WSGI Responses
WSGI Content
WSGI Content Encoding
WSGI Deployment
WSGI and CGI
Exercise 4.5
Customized HTTP
Exercise 4.6
Web Frameworks
Commentary
4-37
4-38
4-39
4-41
4-42
4-44
4-45
4-46
4-48
4-49
4-50
4-53
4-54
4-56
5. Advanced Networking
Advanced Networking
Overview
Problem with Sockets
SocketServer
SocketServer Example
Execution Model
Exercise 5.1
Big Picture
Concurrent Servers
Server Mixin Classes
Server Subclassing
Exercise 5.2
Distributed Computing
Discussion
XML-RPC
Simple XML-RPC
XML-RPC Commentary
XML-RPC and Binary
Exercise 5.3
Serializing Python Objects
pickle Module
Pickling to Strings
Example
Miscellaneous Comments
Exercise 5.4
multiprocessing
Connections
Connection Use
Example
Commentary
What about...
Network Wrap-up
Exercise 5.5
5-1
5-2
5-3
5-4
5-5
5-11
5-12
5-13
5-14
5-15
5-16
5-17
5-18
5-19
5-20
5-21
5-23
5-24
5-25
5-26
5-27
5-28
5-29
5-31
5-32
5-33
5-34
5-35
5-36
5-38
5-40
5-41
5-42
Section 0
Introduction
Support Files
Course exercises:
http://www.dabeaz.com/python/pythonnetwork.zip
1- 2
Python Networking
Network programming is a major use of Python
Python standard library has wide support for
network protocols, data encoding/decoding, and
other things you need to make it work
1- 3
This Course
This course focuses on the essential details of
network programming that all Python
programmers should probably know
Standard Library
We will only cover modules supported by the
Python standard library
1- 5
Prerequisites
You should already know Python basics
However, you don't need to be an expert on all
1- 6
Section 1
Network Fundamentals
The Problem
Communication between computers
Network
Network Addressing
Machines have a hostname and IP address
Programs/services have port numbers
foo.bar.com
205.172.13.4
port 4521
Network
www.python.org
82.94.237.218
port 80
1- 4
Standard Ports
Ports for common services are preassigned
21
22
23
25
80
110
119
443
FTP
SSH
Telnet
SMTP (Mail)
HTTP (Web)
POP3 (Mail)
NNTP (News)
HTTPS (web)
Using netstat
Use 'netstat' to view active network connections
shell % netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address
Foreign Address
tcp
0
0 *:imaps
*:*
tcp
0
0 *:pop3s
*:*
tcp
0
0 localhost:mysql
*:*
tcp
0
0 *:pop3
*:*
tcp
0
0 *:imap2
*:*
tcp
0
0 *:8880
*:*
tcp
0
0 *:www
*:*
tcp
0
0 192.168.119.139:domain *:*
tcp
0
0 localhost:domain
*:*
tcp
0
0 *:ssh
*:*
...
1- 6
State
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
Connections
Each endpoint of a network connection is always
represented by a host and port #
1- 7
Client/Server Concept
Each endpoint is a running program
Servers wait for incoming connections and
provide a service (e.g., web, mail, etc.)
Server
www.bar.com
205.172.13.4
browser
web
Port 80
1- 8
Request/Response Cycle
Most network programs use a request/
response model based on messages
Using Telnet
As a debugging aid, telnet can be used to
Example:
type this
and press
return a few
times
1- 10
Data Transport
There are two basic types of communication
Streams (TCP): Computers establish a
Sockets
Programming abstraction for network code
Socket: A communication endpoint
socket
socket
network
1- 12
Socket Basics
To create a socket
import socket
s = socket.socket(addr_family, type)
Address families
socket.AF_INET
socket.AF_INET6
socket.SOCK_STREAM
socket.SOCK_DGRAM
Socket types
Example:
1- 13
Socket Types
Almost all code will use one of following
from socket import *
s = socket(AF_INET, SOCK_STREAM)
s = socket(AF_INET, SOCK_DGRAM)
1- 14
10
Using a Socket
Creating a socket is only the first step
s = socket(AF_INET, SOCK_STREAM)
TCP Client
How to make an outgoing connection
from socket import *
s = socket(AF_INET,SOCK_STREAM)
s.connect(("www.python.org",80))
s.send("GET /index.html HTTP/1.0\n\n")
data = s.recv(10000)
s.close()
# Connect
# Send request
# Get response
11
Exercise 1.1
Time : 10 Minutes
1- 17
Server Implementation
Network servers are a bit more tricky
Must listen for incoming connections on a
well-known port number
12
TCP Server
A simple server
message
1- 19
TCP Server
Address binding
Addressing
binds to localhost
s.bind(("",9000))
s.bind(("localhost",9000))
s.bind(("192.168.2.1",9000))
s.bind(("104.21.4.2",9000))
13
TCP Server
s.listen(backlog)
backlog is # of pending connections to allow
Note: not related to max number of clients
1- 21
TCP Server
connection
14
TCP Server
("104.23.11.4",27743)
1- 23
TCP Server
Sending data
data to client
1- 24
15
TCP Server
TCP Server
16
Exercise 1.2
Time : 20 Minutes
1- 27
Advanced Sockets
Socket programming is often a mess
Huge number of options
Many corner cases
Many failure modes/reliability issues
Will briefly cover a few critical issues
1- 28
17
Partial Reads/Writes
Be aware that reading/writing to a socket
may involve partial data transfer
Partial Reads/Writes
Be aware that for TCP, the data stream is
continuous---no concept of records, etc.
# Client
...
s.send(data)
s.send(moredata)
...
# Server
...
data = s.recv(maxsize)
...
18
1- 30
1- 31
End of Data
How to tell if there is no more data?
recv() will return empty string
>>> s.recv(1000)
''
>>>
1- 32
19
Data Reassembly
Receivers often need to reassemble
# List of chunks
# Get a chunk
# EOF. No more data
Timeouts
>>> s.recv(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.timeout: timed out
>>>
Disabling timeouts
s.settimeout(None)
1- 34
20
Non-blocking Sockets
Instead of timeouts, can set non-blocking
>>> s.setblocking(False)
1- 35
Socket Options
Sockets have a large number of parameters
Can be set using s.setsockopt()
Example: Reusing the port number
>>> s.bind(("",9000))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in bind
socket.error: (48, 'Address already in use')
>>> s.setsockopt(socket.SOL_SOCKET,
...
socket.SO_REUSEADDR, 1)
>>> s.bind(("",9000))
>>>
1- 36
21
Sockets as Files
Sometimes it is easier to work with sockets
represented as a "file" object
f = s.makefile()
1- 37
Sockets as Files
Commentary : From personal experience,
1- 38
22
Exercise 1.3
Time : 15 Minutes
1- 39
23
UDP : Datagrams
DATA
DATA
DATA
UDP Server
A simple datagram server
from socket import *
s = socket(AF_INET,SOCK_DGRAM)
s.bind(("",10000))
while True:
data, addr = s.recvfrom(maxsize)
resp = "Get off my lawn!"
s.sendto(resp,addr)
Send response
(optional)
No "connection" is established
It just sends and receives packets
1- 42
24
UDP Client
Sending a datagram to a server
from socket import *
s = socket(AF_INET,SOCK_DGRAM)
returned data
Send a message
Wait for a response
(optional)
remote address
Creation:
s = socket(AF_UNIX, SOCK_STREAM)
s = socket(AF_UNIX, SOCK_DGRAM)
# Server binding
# Client connection
25
Raw Sockets
If you have root/admin access, can gain direct
access to raw network packets
# get a packet
1- 45
server
browser
web
Port 80
web
web
browser
1- 46
26
clients
s = socket(AF_INET,
server
SOCK_STREAM)
...
while True:
c,a = s.accept()
... browser
a connection
point for clients
web
web
web
client data
transmitted
on a different
socket
browser
1- 47
server
browser
web
connect
browser
web
web
Port 80
accept()
web
send()/recv()
browser
1- 48
27
Threaded Server
Each client is handled by a separate thread
import threading
from socket import *
def handle_client(c):
... whatever ...
c.close()
return
s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
t = threading.Thread(target=handle_client,
args=(c,))
1- 50
28
import os
from socket import *
s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
if os.fork() == 0:
# Child process. Manage client
...
c.close()
os._exit(0)
else:
# Parent process. Clean up and go
# back to wait for more connections
c.close()
1- 51
Asynchronous Server
Server handles all clients in an event loop
import select
from socket import *
s = socket(AF_INET,SOCK_STREAM)
...
clients = [] # List of all active client sockets
while True:
# Look for activity on any of my sockets
input,output,err = select.select(s+clients,
clients, clients)
# Process all sockets with input
for i in input:
...
# Process all sockets ready for output
for o in output:
...
1- 52
29
Utility Functions
Get the hostname of the local machine
>>> socket.gethostname()
'foo.bar.com'
>>>
1- 53
Omissions
socket module has hundreds of obscure
socket control options, flags, etc.
1- 54
30
Discussion
It is often unnecessary to directly use sockets
Other library modules simplify use
However, those modules assume some
knowledge of the basic concepts (addresses,
ports, TCP, UDP, etc.)
31
Section 2
Client Programming
Overview
Python has library modules for interacting with
a variety of standard internet services
32
urllib Module
A high level module that allows clients to
connect a variety of internet services
HTTP
HTTPS
FTP
Local files
urllib Module
Open a web page: urlopen()
>>> import urllib
>>> u = urllib.urlopen("http://www.python/org/index.html")
>>> data = u.read()
>>> print data
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML ...
...
>>>
33
urllib protocols
Supported protocols
u
u
u
u
=
=
=
=
urllib.urlopen("http://www.foo.com")
urllib.urlopen("https://www.foo.com/private")
urllib.urlopen("ftp://ftp.foo.com/README")
urllib.urlopen("file:///Users/beazley/blah.txt")
2- 5
HTML Forms
One use of urllib is to automate forms
Example HTML source for the form
<FORM ACTION="/subscribe" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">
2- 6
34
HTML Forms
Within the form, you will find an action and
named parameters for the form fields
Action (a URL)
http://somedomain.com/subscribe
Parameters:
name
email
2- 7
Web Services
Another use of urllib is to access web services
Downloading maps
Stock quotes
Email messages
Most of these are controlled and accessed in
the same manner as a form
2- 8
35
Parameter Encoding
urlencode()
Takes a dictionary of fields and creates a
URL-encoded string of parameters
fields = {
'name' : 'Dave',
'email' : '[email protected]'
}
parms = urllib.urlencode(fields)
Sample result
>>> parms
'name=Dave&email=dave%40dabeaz.com'
>>>
2- 9
Sending Parameters
Case 1 : GET Requests
<FORM ACTION="/subscribe" METHOD="GET">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">
Example code:
fields = { ... }
parms = urllib.urlencode(fields)
u = urllib.urlopen("http://somedomain.com/subscribe?"+parms)
2- 10
36
Sending Parameters
Case 2 : POST Requests
<FORM ACTION="/subscribe" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">
Example code:
fields = { ... }
parms = urllib.urlencode(fields)
u = urllib.urlopen("http://somedomain.com/subscribe", parms)
2- 11
Response Data
To read response data, treat the result of
urlopen() as a file object
>>> u = urllib.urlopen("http://www.python.org")
>>> data = u.read()
>>>
37
Response Headers
HTTP headers are retrieved using .info()
>>> u = urllib.urlopen("http://www.python.org")
>>> headers = u.info()
>>> headers
<httplib.HTTPMessage instance at 0x1118828>
>>> headers.keys()
['content-length', 'accept-ranges', 'server',
'last-modified', 'connection', 'etag', 'date',
'content-type']
>>> headers['content-length']
'13597'
>>> headers['content-type']
'text/html'
>>>
A dictionary-like object
2- 13
Response Status
urlopen() ignores HTTP status codes (i.e.,
errors are silently ignored)
38
Exercise 2.1
Time : 15 Minutes
2- 15
urllib Limitations
urllib only works with simple cases
Does not support cookies
Does not support authentication
Does not report HTTP errors gracefully
Only supports GET/POST requests
2- 16
39
urllib2 Module
urllib2 - The sequel to urllib
Builds upon and expands urllib
Can interact with servers that require
cookies, passwords, and other details
urllib2 Example
urllib2 provides urlopen() as before
>>> import urllib2
>>> u = urllib2.urlopen("http://www.python.org/index.html")
>>> data = u.read()
>>>
Requests
Openers
2- 18
40
urllib2 Requests
Requests are now objects
>>> r = urllib2.Request("http://www.python.org")
>>> u = urllib2.urlopen(r)
>>> data = u.read()
2- 20
41
Request Headers
Adding/Modifying client HTTP headers
headers = {
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1; .NET CLR 2.0.50727)'
}
r = urllib2.Request("http://somedomain.com/",
headers=headers)
u = urllib2.urlopen(r)
response = u.read()
Catching an error
try:
u = urllib2.urlopen(url)
except urllib2.HTTPError,e:
code = e.code
# HTTP error code
2- 22
42
urllib2 Openers
The function urlopen() is an "opener"
It knows how to open a connection, interact
with the server, and return a response.
2- 23
urllib2 build_opener()
build_opener() makes an custom opener
# Make a URL opener with cookie support
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor()
)
u = opener.open("http://www.python.org/index.html")
2- 24
43
2- 25
Discussion
urllib2 module has a huge number of options
Different configurations
File formats, policies, authentication, etc.
Will have to consult reference for everything
2- 26
44
Exercise 2.2
Time : 15 Minutes
Password: guido456
2- 27
Limitations
urllib and urllib2 are useful for fetching files
However, neither module provides support for
more advanced operations
Examples:
Uploading to an FTP server
File-upload via HTTP Post
Other HTTP methods (e.g., HEAD, PUT)
2- 28
45
ftplib
A module for interacting with FTP servers
Example : Capture a directory listing
>>> import ftplib
>>> f = ftplib.FTP("ftp.gnu.org","anonymous",
...
"[email protected]")
>>> files = []
>>> f.retrlines("LIST",files.append)
'226 Directory send OK.'
>>> len(files)
15
>>> files[0]
'-rw-r--r-1 0
0
1765 Feb 20 16:47 README'
>>>
2- 29
=
=
=
=
"ftp.foo.com"
"dave"
"1235"
"somefile.dat"
import ftplib
ftp_serv = ftplib.FTP(host,username,password)
# Open the file you want to send
f = open(filename,"rb")
# Send it to the FTP server
resp = ftp_serv.storbinary("STOR "+filename, f)
# Close the connection
ftp_serv.close()
2- 30
46
httplib
A module for implementing the client side of an
HTTP connection
import httplib
c = httplib.HTTPConnection("www.python.org",80)
c.putrequest("HEAD","/tut/tut.html")
c.putheader("Someheader","Somevalue")
c.endheaders()
r = c.getresponse()
data = r.read()
c.close()
2- 31
smtplib
A module for sending email messages
import smtplib
serv = smtplib.SMTP()
serv.connect()
msg = """\
From: [email protected]
To: [email protected]
Subject: Get off my lawn!
Blah blah blah"""
serv.sendmail("[email protected]",['[email protected]'],msg)
2- 32
47
Exercise 2.3
Time : 15 Minutes
2- 33
48
Section 3
Overview
If you write network clients, you will have to
3- 2
49
CSV Files
Comma Separated Values
Elwood,Blues,"1060 W Addison,Chicago 60637",110
McGurn,Jack,"4902 N Broadway,Chicago 60640",200
Parsing HTML
Suppose you want to parse HTML (maybe
obtained via urlopen)
3- 4
50
Parsing HTML
Define a class that inherits from HTMLParser
and define a set of methods that respond to
different document features
from HTMLParser import HTMLParser
class MyParser(HTMLParser):
def handle_starttag(self,tag,attrs):
...
def handle_data(self,data):
...
def handle_endtag(self,tag):
...
starttag
data
endttag
3- 5
Running a Parser
To run the parser, you create a parser object
and feed it some data
3- 6
51
HTML Example
An example:
3- 7
HTML Example
Running the parser
>>> parser = GatherLinks()
>>> import urllib
>>> data = urllib.urlopen("http://www.python.org").read()
>>> parser.feed(data)
>>> for x in parser.links:
...
print x
/search/
/about
/news/
/doc/
/download/
...
>>>
3- 8
52
3- 9
53
SAX Parsing
Define a special handler class
import xml.sax
class MyHandler(xml.sax.ContentHandler):
def startDocument(self):
print "Document start"
def startElement(self,name,attrs):
print "Start:", name
def characters(self,text):
print "Characters:", text
def endElement(self,name):
print "End:", name
3- 11
SAX Parsing
To parse a document, you create an instance
of the handler and give it to the parser
# Create the handler object
hand = MyHandler()
# Parse a document using the handler
xml.sax.parse("data.xml",hand)
3- 12
54
Exercise 3.1
Time : 15 Minutes
3- 13
3- 14
55
3- 15
# Element name
# Element text
# Element attributes
3- 16
56
Obtaining Elements
<?xml version="1.0" encoding="iso-8859-1"?>
<recipe>
<title>Famous Guacamole</title>
<description>
A southwest favorite!
</description>
<ingredients>
<item num="2">Large avocados, chopped</item>
doc =chopped</item>
parse("recipe.xml")
<item num="1">Tomato,
desc_elem = doc.find("description")
<item num="1/2" units="C">White
onion, chopped</item>
<item num="1" units="tbl">Fresh
squeezed lemon juice</item>
desc_text = desc_elem.text
<item num="1">Jalapeno pepper, diced</item>
<item num="1" units="tbl">Fresh cilantro, minced</item>
<item num="3" units="tsp">Sea Salt</item>
doc = parse("recipe.xml")
<item num="6" units="bottles">Ice-cold
beer</item>
desc_text = doc.findtext("description")
</ingredients>
<directions>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</directions>
</recipe>
or
3- 17
3- 18
57
Element Attributes
<?xml version="1.0" encoding="iso-8859-1"?>
<recipe>
<title>Famous Guacamole</title>
<description>
A southwest favorite!
</description>
<ingredients>
<item num="2">Large avocados, chopped</item>
for
item
in doc.findall("ingredients/item"):
<item
num="1">Tomato,
chopped</item>
<item
num="1/2"
units="C">White onion, chopped</item>
num
= item.get("num")
<item
num="1"
units="tbl">Fresh squeezed lemon juice</item>
units
= item.get("units")
<item num="1">Jalapeno pepper, diced</item>
<item num="1" units="tbl">Fresh cilantro, minced</item>
<item num="3" units="tsp">Sea Salt</item>
<item num="6" units="bottles">Ice-cold beer</item>
</ingredients>
<directions>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</directions>
</recipe>
3- 19
Search Wildcards
Specifying a wildcard for an element name
items = doc.findall("*/item")
items = doc.findall("ingredients/*")
c = doc.findall("*/*/c")
c = doc.findall("a/*/c")
c = doc.findall("*/b/c")
3- 20
58
Search Wildcards
Wildcard for multiple nesting levels (//)
items = doc.findall("//item")
More examples
<?xml version="1.0"?>
<top>
<a>
<b>
<c>text</c>
</b>
</a>
</top>
c = doc.findall("//c")
c = doc.findall("a//c")
3- 21
cElementTree
There is a C implementation of the library
that is significantly faster
import xml.etree.cElementTree
doc = xml.etree.cElementTree.parse("data.xml")
3- 22
59
Tree Modification
ElementTree allows modifications to be
made to the document structure
3- 23
Tree Output
If you modify a document, it can be rewritten
There is a method to write XML
doc = xml.etree.ElementTree.parse("input.xml")
# Make modifications to doc
...
# Write modified document back to a file
f = open("output.xml","w")
doc.write(f)
3- 24
60
Iterative Parsing
3- 25
Iterative Parsing
If you combine iterative parsing and tree
modification together, you can process
large XML documents with almost no
memory overhead
61
Iterative Parsing
Programming pattern
from xml.etree.ElementTree import iterparse
parser = iterparse("file.xml",('start','end'))
for event,elem in parser:
if event == 'start':
if elem.tag == 'parenttag':
parent = elem
if event == 'end':
if elem.tag == 'tagname':
# process element with tag 'tagname'
...
# Discard the element when done
parent.remove(elem)
Exercise 3.2
Time : 15 Minutes
3- 28
62
JSON
Javascript Object Notation
A data encoding commonly used on the
web when interacting with Javascript
3- 30
63
3- 31
Exercise 3.3
Time : 15 Minutes
3- 32
64
Section 4
Introduction
The web is (obviously) so pervasive,
4- 2
65
Overview
Some basics of Python web programming
HTTP Protocol
CGI scripting
WSGI (Web Services Gateway Interface)
Custom HTTP servers
4- 3
Disclaimer
Web programming is a huge topic that
could span an entire multi-day class
66
HTTP Explained
HTTP is the underlying protocol of the web
Consists of requests and responses
GET /index.html
Browser
200 OK
...
<content>
Web Server
4- 5
4- 6
67
HTTP Responses
Server sends back a response
HTTP/1.1 200 OK
Date: Thu, 26 Apr 2007 19:54:01 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3
Last-Modified: Thu, 26 Apr 2007 18:40:24 GMT
Accept-Ranges: bytes
Content-Length: 14315
Connection: close
Content-Type: text/html
<HTML>
...
HTTP Protocol
There are a small number of request types
GET
POST
HEAD
PUT
OK
Forbidden
Not Found
Not implemented
68
Content Encoding
Content is described by these header fields:
Content-type:
Content-length:
Example:
Content-type: image/jpeg
Content-length: 12422
4- 9
Payload Packaging
Responses must follow this formatting
Headers
...
Content-type: image/jpeg
Content-length: 12422
...
\r\n
(Blank Line)
Content
(12422 bytes)
4- 10
69
Exercise 4.1
Time : 10 Minutes
4- 11
Role of Python
Most web-related Python programming
pertains to the operation of the server
GET /index.html
Firefox
Safari
Internet Explorer
etc.
Web Server
Apache
Python
MySQL
etc.
4- 12
70
One-time
generation of static web pages to be served
by a standard web server such as Apache.
Python scripts
that produce output in response to requests
(e.g., form processing, CGI scripting).
4- 13
Content Generation
It is often overlooked, but Python is a useful
tool for simply creating static web pages
4- 14
71
4- 15
72
Commentary
Using page templates to generate static
content is extremely common
4- 17
Exercise 4.2
Time : 10 Minutes
4- 18
73
HTTP Servers
Python comes with libraries that implement
simple self-contained web servers
4- 19
74
Exercise 4.3
Time : 10 Minutes
4- 21
4- 22
75
CGI Scripting
Common Gateway Interface
A common protocol used by existing web
servers to run server-side scripts, plugins
4- 23
CGI Example
A web-page might have a form on it
Here is the underlying HTML code
<FORM ACTION="/cgi-bin/subscribe.py" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">
76
CGI Example
Forms have submitted fields or parameters
<FORM ACTION="/cgi-bin/subscribe.py" METHOD="POST">
Your name: <INPUT type="text" name="name" size="30"><br>
Your email: <INPUT type="text" name="email" size="30"><br>
<INPUT type="submit" name="submit-button" value="Subscribe">
4- 25
CGI Example
Request encoding looks like this:
Request
Query
String
name=David+Beazley&email=dave%40dabeaz.com&submitbutton=Subscribe HTTP/1.1
77
CGI Mechanics
CGI was originally implemented as a scheme
HTTP Server
stdin
stdout
Python
subscribe.py
4- 27
4- 28
78
4- 29
cgi Module
A utility library for decoding requests
Major feature: Getting the passed parameters
#!/usr/bin/env python
# subscribe.py
import cgi
form = cgi.FieldStorage()
Parse parameters
4- 30
79
CGI Responses
4- 32
80
CGI Commentary
There are many more minor details (consult
a reference on CGI programming)
Exercise 4.4
Time : 25 Minutes
4- 34
81
WSGI
Web Services Gateway Interface (WSGI)
This is a standardized interface for creating
Python web services
WSGI Interface
WSGI is an application programming interface
loosely based on CGI programming
4- 36
82
WSGI Example
With WSGI, you write an "application"
An application is just a function (or callable)
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response
4- 37
WSGI Applications
Applications always receive just two inputs
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response
83
WSGI Environment
The environment contains CGI variables
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
environ['REQUEST_METHOD']
environ['SCRIPT_NAME']
start_response(status,response_headers)
environ['PATH_INFO']
response.append("Hello World\n")
environ['QUERY_STRING']
response.append("You requested :"+environ['PATH_INFO]')
environ['CONTENT_TYPE']
return response
environ['CONTENT_LENGTH']
environ['SERVER_NAME']
...
4- 39
WSGI Environment
Environment also contains some WSGI variables
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
environ['wsgi.input']
environ['wsgi.errors']
start_response(status,response_headers)
environ['wsgi.url_scheme']
response.append("Hello World\n")
environ['wsgi.multithread']
response.append("You requested :"+environ['PATH_INFO]')
environ['wsgi.multiprocess']
return response
...
84
4- 41
WSGI Responses
85
4- 42
WSGI Responses
start_response() is a hook back to the server
Gives the server information for formulating
the response (status, headers, etc.)
4- 43
WSGI Content
Content is returned as a sequence of byte strings
def hello_app(environ, start_response):
status = "200 OK"
response_headers = [ ('Content-type','text/plain')]
response = []
start_response(status,response_headers)
response.append("Hello World\n")
response.append("You requested :"+environ['PATH_INFO]')
return response
4- 44
86
WSGI Deployment
The main point of WSGI is to simplify
deployment of web applications
4- 46
87
WSGI Deployment
Running a simple stand-alone WSGI server
from wsgiref import simple_server
httpd = simple_server.make_server("",8080,hello_app)
httpd.serve_forever()
4- 47
4- 48
88
Exercise 4.5
Time : 20 Minutes
4- 49
Customized HTTP
Can implement customized HTTP servers
Use BaseHTTPServer module
Define a customized HTTP handler object
Requires some knowledge of the underlying
HTTP protocol
4- 50
89
Customized HTTP
Customized HTTP
90
Exercise 4.6
Time : 15 Minutes
4- 53
Web Frameworks
91
Web Frameworks
Web frameworks build upon previous concepts
Provide additional support for
Form processing
Cookies/sessions
Database integration
Content management
Usually require their own training course
4- 55
Commentary
If you're building small self-contained
92
Section 5
Advanced Networking
Overview
An assortment of advanced networking topics
The Python network programming stack
Concurrent servers
Distributed computing
Multiprocessing
5- 2
93
SocketServer
A module for writing custom servers
Supports TCP and UDP networking
The module aims to simplify some of the
5- 4
94
SocketServer Example
To use SocketServer, you define handler
objects using classes
5- 5
SocketServer Example
Handler Class
Server is implemented
by a handler class
import SocketServer
import time
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime()+"\n")
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()
5- 6
95
SocketServer Example
Handler Class
import SocketServer
import time
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()
5- 7
SocketServer Example
handle() method
import SocketServer
import time
Define handle()
to implement the
server action
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
serv.serve_forever()
5- 8
96
SocketServer Example
Client socket connection
import SocketServer
import time
class TimeHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.sendall(time.ctime())
serv = SocketServer.TCPServer(("",8000),TimeHandler)
Socket object
serv.serve_forever()
SocketServer Example
Creating and running the server
import SocketServer
import time
5- 10
97
Execution Model
Server runs in a loop waiting for requests
On each connection, the server creates a
new instantiation of the handler class
Exercise 5.1
Time : 15 Minutes
5- 12
98
Big Picture
A major goal of SocketServer is to simplify
5- 13
Concurrent Servers
SocketServer supports different kinds of
concurrency implementations
TCPServer
- Synchronous TCP server (one client)
ForkingTCPServer
- Forking server (multiple clients)
ThreadingTCPServer - Threaded server (multiple clients)
serv = SocketServer.ForkingTCPServer(("",8000),TimeHandler)
serv.serve_forever()
serv = SocketServer.ThreadingTCPServer(("",8000),TimeHandler)
serv.serve_forever()
5- 14
99
5- 15
Server Subclassing
SocketServer objects are also subclassed to
provide additional customization
Example: Security/Firewalls
class RestrictedTCPServer(TCPServer):
# Restrict connections to loopback interface
def verify_request(self,request,addr):
host, port = addr
if host != '127.0.0.1':
return False
else:
return True
serv = RestrictedTCPServer(("",8080),TimeHandler)
serv.serve_forever()
5- 16
100
Exercise 5.2
Time : 15 Minutes
5- 17
Distributed Computing
It is relatively simple to build Python
5- 18
101
Discussion
Keep in mind: Python is a "slow" interpreted
programming language
5- 19
XML-RPC
Remote Procedure Call
Uses HTTP as a transport protocol
Parameters/Results encoded in XML
Supported by languages other than Python
5- 20
102
Simple XML-RPC
How to create a stand-alone server
5- 21
Simple XML-RPC
Adding multiple functions
from SimpleXMLRPCServer import SimpleXMLRPCServer
s = SimpleXMLRPCServer(("",8080))
s.register_function(add)
s.register_function(foo)
s.register_function(bar)
s.serve_forever()
5- 22
103
XML-RPC Commentary
XML-RPC is extremely easy to use
Almost too easy--you might get the perception
that it's extremely limited or fragile
104
Exercise 5.3
Time : 15 Minutes
5- 25
5- 26
105
pickle Module
A module for serializing objects
Serializing an object onto a "file"
import pickle
...
pickle.dump(someobj,f)
5- 27
Pickling to Strings
Pickle can also turn objects into byte strings
import pickle
# Convert to a string
s = pickle.dumps(someobj, protocol)
...
# Load from a string
someobj = pickle.loads(s)
5- 28
106
Example
Using pickle with XML-RPC
# addserv.py
import pickle
def add(px,py):
x = pickle.loads(px)
y = pickle.loads(py)
return pickle.dumps(x+y)
from SimpleXMLRPCServer import SimpleXMLRPCServer
serv = SimpleXMLRPCServer(("",15000))
serv.register_function(add)
serv.serve_forever()
5- 29
Example
Passing Python objects from the client
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
[1,
>>>
import pickle
import xmlrpclib
serv = xmlrpclib.ServerProxy("http://localhost:15000")
a = [1,2,3]
b = [4,5]
r = serv.add(pickle.dumps(a),pickle.dumps(b))
c = pickle.loads(r)
c
2, 3, 4, 5]
5- 30
107
Miscellaneous Comments
Pickle is really only useful if used in a Pythononly environment
Exercise 5.4
Time : 15 Minutes
5- 32
108
multiprocessing
Python 2.6/3.0 include a new library module
(multiprocessing) that can be used for
different forms of distributed computation
Connections
Creating a dedicated connection between
two Python interpreter processes
Client process
from multiprocessing.connection import Client
c = Client(("servername",16000),authkey="12345")
109
Connection Use
Connections allow bidirectional message
passing of arbitrary Python objects
c
c.send(obj)
obj = c.recv()
5- 35
Example
Example server using multiprocessing
# addserv.py
def add(x,y):
return x+y
from multiprocessing.connection import Listener
serv = Listener(("",16000),authkey="12345")
c = serv.accept()
while True:
x,y = c.recv()
# Receive a pair
c.send(add(x,y))
# Send result of add(x,y)
5- 36
110
Example
Client connection with multiprocessing
>>>
>>>
>>>
>>>
>>>
>>>
>>>
[1,
>>>
5- 37
Commentary
Multiprocessing module already does the
111
Commentary
Multiprocessing is a good choice if you're
working strictly in a Python environment
5- 39
What about...
CORBA? SOAP? Others?
There are third party libraries for this
Honestly, most Python programmers aren't
into big heavyweight distributed object
systems like this (too much trauma)
112
Network Wrap-up
Have covered the basics of network support
that's bundled with Python (standard lib)
Exercise 5.5
Time : 15 Minutes
5- 42
113
Django, 4-54
dump() function, pickle module, 5-27
dumps() function, pickle module, 5-28
E
accept() method, of sockets, 1-19, 1-22
Address binding, TCP server, 1-20
Addressing, network, 1-4
Asynchronous network server, 1-52
B
BaseRequestHandler, SocketServer module, 5-5
bind() method, of sockets, 1-19, 1-20, 1-42
Browser, emulating in HTTP requests, 2-21
build_opener() function, urllib2 module, 2-24
C
cElementTree module, 3-22
cgi module, 4-30
CGI scripting, 4-23, 4-24, 4-25, 4-26, 4-27
CGI scripting, and WSGI, 4-48
CGI scripting, creating a response, 4-31, 4-32
CGI scripting, environment variables, 4-28
CGI scripting, I/O model, 4-28
CGI scripting, parsing query variables, 4-30
CGI scripting, query string, 4-26
CGI scripting, query variables, 4-29
CherryPy, 4-54
Client objects, multiprocessing module, 5-34
Client/Server programming, 1-8
close() method, of sockets, 1-16, 1-25
Concurrency, and socket programming, 1-46
connect() method, of sockets, 1-16
Connections, network, 1-7
Content encoding, HTTP responses, 4-9
Cookie handling and HTTP requests, 2-25
Cookies, and urllib2 module, 2-17
CORBA, 5-40
Creating custom openers for HTTP requests, 2-24
csv module, 3-3
D
Datagram, 1-43
Distributed computing, 5-18, 5-19
F
FieldStorage object, cgi module, 4-30
File upload, via urllib, 2-28
Files, creating from a socket, 1-37
Forking server, 1-51
ForkingMixIn class, SocketServer module, 5-15
ForkingTCPServer, SocketServer module, 5-14
ForkingUDPServer, SocketServer module, 5-14
Form data, posting in an HTTP request, 2-10,
2-11, 2-20
FTP server, interacting with, 2-29
FTP, uploading files to a server, 2-30
ftplib module, 2-29
G
gethostbyaddr() function, socket module, 1-53
gethostbyname() function, socket module, 1-53
gethostname() function, socket module, 1-53
Google AppEngine, 4-54
H
Hostname, 1-4
Hostname, obtaining, 1-53
HTML, parsing of, 3-4, 3-7
HTMLParser module, 3-5, 3-7
I
Interprocess communication, 1-44
IP address, 1-4
IPC, 1-44
IPv4 socket, 1-13
IPv6 socket, 1-13
O
Objects, serialization of, 5-26
Opener objects, urllib2 module, 2-23
OpenSSL, 2-5
P
Parsing HTML, 3-7
Parsing, JSON, 3-29
Parsing, of HTML, 3-5
pickle module, 5-27
POST method, of HTTP requests, 2-6, 2-7
Posting form data, HTTP requests, 2-10, 2-11,
2-20
Pylons, 4-54
JSON, 3-29
json module, 3-31
L
Limitations, of urllib module, 2-28
listen() method, of sockets, 1-19, 1-21
Listener objects, multiprocessing module, 5-34
load() function, pickle module, 5-27
loads() function, pickle module, 5-28
S
M
makefile() method, of sockets, 1-37
multiprocessing module, 5-33
N
netstat, 1-6
Network addresses, 1-4, 1-7
Network programming, client-server concept, 1-8
Network programming, standard port
assignments, 1-5
V
viewing open network connections, 1-6
TCP, 1-13, 1-14
TCP, accepting new connections, 1-22
TCP, address binding, 1-20
TCP, client example, 1-16
TCP, communication with client, 1-23
TCP, example with SocketServer module, 5-5
TCP, listening for connections, 1-21
TCP, server example, 1-19
TCPServer, SocketServer module, 5-10
Telnet, using with network applications, 1-10
Threaded network server, 1-50
ThreadingMixIn class, SocketServer module,
5-15
ThreadingTCPServer, SocketServer module, 5-14
ThreadingUDPServer, SocketServer module, 5-14
Threads, and network servers, 1-50
Timeout, on sockets, 1-34
Turbogears, 4-54
Twisted framework, 1-52
U
UDP, 1-13, 1-41
UDP, client example, 1-43
UDP, server example, 1-42
W
Web frameworks, 4-54, 4-55
Web programming, and WSGI, 4-35, 4-36
Web programming, CGI scripting, 4-23, 4-24,
4-25, 4-26, 4-27
Web services, 2-8
Webdav, 2-28
WSGI, 4-36
WSGI (Web Services Gateway Interface), 4-35
WSGI, and CGI environment variables, 4-39
WSGI, and wsgi.* variables, 4-40
WSGI, application inputs, 4-38
WSGI, applications, 4-37
WSGI, parsing query string, 4-41
WSGI, producing content, 4-44
WSGI, response encoding, 4-45
WSGI, responses, 4-42
WSGI, running a stand-alone server, 4-46, 4-47
WSGI, running applications within a CGI script,
4-48
WWW, see HTTP, 4-5
X
XML, element attributes, 3-19
XML, element wildcards, 3-20
XML, ElementTree interface, 3-15, 3-16
XML, ElementTree module, 3-14
XML, finding all matching elements, 3-18
XML, finding matching elements, 3-17
XML, incremental parsing of, 3-25
XML, modifying documentation structu with
ElementTree, 3-23
XML, parsing with SAX, 3-9
XML, writing to files, 3-24
XML-RPC, 5-20
Z
Zope, 4-54