Real World
Real World
PyCon 2009
http://jacobian.org/speaking/2009/real-world-django/
1
So you’ve written a
web app...
2
Now what?
3
API Metering Distributed Log storage, analysis
Backups & Snapshots Graphing
Counters HTTP Caching
Cloud/Cluster Management Tools Input/Output Filtering
Instrumentation/Monitoring Memory Caching
Failover Non-relational Key Stores
Node addition/removal and hashing Rate Limiting
Autoscaling for cloud resources Relational Storage
CSRF/XSS Protection Queues
Data Retention/Archival Rate Limiting
Deployment Tools Real-time messaging (XMPP)
Multiple Devs, Staging, Prod Search
Data model upgrades Ranging
Rolling deployments Geo
Multiple versions (selective beta) Sharding
Bucket Testing Smart Caching
Rollbacks Dirty-table management
CDN Management
Distributed File Storage
http://randomfoo.net/2009/01/28/infrastructure-for-modern-web-sites
4
What’s on the plate
Structuring for deployment
Testing
Production environments
Deployment
The “rest” of the web stack
Monitoring
Performance & tuning
5
Writing applications
you can deploy, and
deploy, and deploy...
6
The extended extended
remix!
7
The fivefold path
Do one thing, and do it well.
Don’t be afraid of multiple apps.
Write for flexibility.
Build to distribute.
Extend carefully.
8
1
9
Do one thing, and do it
well.
10
Application ==
encapsulation
11
Keep a tight focus
Ask yourself: “What does this application
do?”
Answer should be one or two short
sentences
12
Good focus
“Handle storage of users and
authentication of their identities.”
“Allow content to be tagged, del.icio.us
style, with querying by tags.”
“Handle entries in a weblog.”
13
Bad focus
“Handle entries in a weblog, and users
who post them, and their authentication,
and tagging and categorization, and some
flat pages for static content, and...”
The coding equivalent of a run-on
sentence
14
Warning signs
A lot of very good Django applications are
very small: just a few files
If your app is getting big enough to need
lots of things split up into lots of modules,
it may be time to step back and re-
evaluate
15
Warning signs
Even a lot of “simple” Django sites
commonly have a dozen or more
applications in INSTALLED_APPS
If you’ve got a complex/feature-packed
site and a short application list, it may be
time to think hard about how tightly-
focused those apps are
16
Approach features
skeptically
17
Should I add this feature?
What does the application do?
Does this feature have anything to do with
that?
No? Guess I shouldn’t add it, then.
18
2
19
Don’t be afraid of
multiple apps
20
The monolith mindset
The “application” is the whole site
Re-use is often an afterthought
Tend to develop plugins that hook into the
“main” application
Or make heavy use of middleware-like
concepts
21
The Django mindset
Application == some bit of functionality
Site == several applications
Tend to spin off new applications liberally
22
Django encourages this
Instead of one “application”, a list:
INSTALLED_APPS
Applications live on the Python path, not
inside any specific “apps” or “plugins”
directory
Abstractions like the Site model make
you think about this as you develop
23
Should this be its own application?
24
Unrelated features
Feature creep is tempting: “but wouldn’t
it be cool if...”
But it’s the road to Hell
See also: Part 1 of this talk
25
I’ve learned this the
hard way
26
djangosnippets.org
One application
Includes bookmarking features
Includes tagging features
Includes rating features
27
Should be about four
applications
28
Orthogonality
Means you can change one thing without
affecting others
Almost always indicates the need for a
separate application
Example: changing user profile workflow
doesn’t affect user signup workflow.
Make them two different applications.
29
Reuse
Lots of cool features actually aren’t
specific to one site
See: bookmarking, tagging, rating...
Why bring all this crap about code
snippets along just to get the extra stuff?
30
Advantages
Don’t keep rewriting features
Drop things into other sites easily
31
Need a contact form?
32
urlpatterns += (‘’,
(r’^contact/’, include(‘contact_form.urls’)),
)
33
And you’re done
34
But what about...
35
Site-specific needs
Site A wants a contact form that just
collects a message.
Site B’s marketing department wants a
bunch of info.
Site C wants to use Akismet to filter
automated spam.
36
3
37
Write for flexibility
38
Common sense
Sane defaults
Easy overrides
Don’t set anything in stone
39
Form processing
Supply a form class
But let people specify their own if they
want
40
Templates
Specify a default template
But let people specify their own if they
want
41
Form processing
You want to redirect after successful
submission
Supply a default URL
But let people specify their own if they
want
42
URL best practices
Provide a URLConf in the application
Use named URL patterns
Use reverse lookups: reverse(),
permalink, {% url %}
43
Working with models
Whenever possible, avoid hard-coding a
model class
Use get_model() and take an app label/
model name string instead
Don’t rely on objects; use the default
manager
44
Working with models
Don’t hard-code fields or table names;
introspect the model to get those
Accept lookup arguments you can pass
straight through to the database API
45
Learn to love managers
Managers are easy to reuse.
Managers are easy to subclass and
customize.
Managers let you encapsulate patterns of
behavior behind a nice API.
46
Advanced techniques
Encourage subclassing and use of
subclasses
Provide a standard interface people can
implement in place of your default
implementation
Use a registry (like the admin)
47
The API your
application exposes is
just as important as the
design of the sites
you’ll use it in.
48
In fact, it’s more
important.
49
Good API design
“Pass in a value for this argument to
change the behavior”
“Change the value of this setting”
“Subclass this and override these methods
to customize”
“Implement something with this interface,
and register it with the handler”
50
Bad API design
“API? Let me see if we have one of
those...” (AKA: “we don’t”)
“It’s open source; fork it to do what you
want” (AKA: “we hate you”)
def application(environ,
start_response) (AKA: “we have a
web service”)
51
4
52
Build to distribute
53
So you did the tutorial
from mysite.polls.models
import Poll
mysite.polls.views.vote
include(‘mysite.polls.urls’)
mysite.mysite.bork.bork.bork
54
Project coupling kills
re-use
55
Why (some) projects suck
You have to replicate that directory
structure every time you re-use
Or you have to do gymnastics with your
Python path
And you get back into the monolithic
mindset
56
A good “project”
A settings module
A root URLConf module
And that’s it.
57
Advantages
No assumptions about where things live
No tricky bits
Reminds you that it’s just another Python
module
58
It doesn’t even have to
be one module
59
ljworld.com
worldonline.settings.ljworld
worldonline.urls.ljworld
And a whole bunch of reused apps in
sensible locations
60
Configuration is
contextual
61
What reusable apps look like
Single module directly on Python path
(registration, tagging, etc.)
Related modules under a package
(ellington.events,
ellington.podcasts, etc.)
No project cruft whatsoever
62
And now it’s easy
You can build a package with distutils
or setuptools
Put it on the Cheese Shop
People can download and install
63
Make it “packageable”
even if it’s only for your
use
64
General best practices
Be up-front about dependencies
Write for Python 2.3 when possible
Pick a release or pick trunk, and document
that
But if you pick trunk, update frequently
65
I usually don’t do
default templates
66
Be obsessive about documentation
67
5
68
Embracing and
extending
69
Don’t touch!
Good applications are extensible without
hacking them up.
Take advantage of everything an
application gives you.
You may end up doing something that
deserves a new application anyway.
70
But this application
wasn’t meant to be
extended!
71
Use the Python
(and the Django)
72
Want to extend a view?
If possible, wrap the view with your own
code.
Doing this repetitively? Just write a
decorator.
73
Want to extend a model?
You can relate other models to it.
You can write subclasses of it.
You can create proxy subclasses (in
Django 1.1)
74
Model inheritance is
powerful.
With great power
comes great
responsibility.
75
Proxy models
New in Django 1.1.
Lets you add methods, managers, etc.
(you’re extending the Python side, not the
DB side).
Keeps your extensions in your code.
Avoids many problems with normal
inheritance.
76
Extending a form
Just subclass it.
No really, that’s all :)
77
Other tricks
Using signals lets you fire off customized
behavior when particular events happen.
Middleware offers full control over
request/response handling.
Context processors can make additional
information available if a view doesn’t.
78
But if you must make
changes to someone
else’s code...
79
Keep changes to a minimum
If possible, instead of adding a feature, add
extensibility.
Then keep as much changed code as you
can out of the original app.
80
Stay up-to-date
You don’t want to get out of sync with the
original version of the code.
You might miss bugfixes.
You might even miss the feature you
needed.
81
Make sure your VCS is
up to the job of
merging from upstream
82
Be a good citizen
If you change someone else’s code, let
them know.
Maybe they’ll merge your changes in and
you won’t have to fork anymore.
83
What if it’s my own
code?
84
Same principles apply
Maybe the original code wasn’t sufficient.
Or maybe you just need a new application.
Be just as careful about making changes. If
nothing else, this will highlight ways in
which your code wasn’t extensible to
begin with.
85
Further reading
86
Testing
87
“ Tests are the
Programmer’s stone,
transmuting fear into
boredom.
”
— Kent Beck
88
Hardcore TDD
89
“
I don’t do test driven
development. I do
stupidity driven testing… I
wait until I do something
stupid, and then write tests
”
to avoid doing it again.
— Titus Brown
90
Whatever happens, don’t let
your test suite break thinking,
“I’ll go back and fix this later.”
91
Unit testing unittest
doctest
Functional/behavior
testing
django.test.Client, Twill
92
You need them all.
93
Unit tests
“Whitebox” testing
Verify the small functional units of your
app
Very fine-grained
Familier to most programmers (JUnit,
NUnit, etc.)
Provided in Python by unittest
94
from django.test import TestCase
from django.http import HttpRequest
from django.middleware.common import CommonMiddleware
from django.conf import settings
class CommonMiddlewareTest(TestCase):
def setUp(self):
self.slash = settings.APPEND_SLASH; self.www = settings.PREPEND_WWW
def tearDown(self):
settings.APPEND_SLASH = self.slash; settings.PREPEND_WWW = self.www
def _get_request(self, path):
request = HttpRequest()
request.META = {'SERVER_NAME':'testserver', 'SERVER_PORT':80}
request.path = request.path_info = "/middleware/%s" % path
return request
def test_append_slash_redirect(self):
settings.APPEND_SLASH = True
request = self._get_request('slash')
r = CommonMiddleware().process_request(request)
self.assertEquals(r.status_code, 301)
self.assertEquals(r['Location'], 'http://testserver/middleware/slash/')
95
django.test.TestCase
Fixtures.
Test client.
Email capture.
Database management.
Slower than unittest.TestCase.
96
Doctests
Easy to write & read.
Produces self-documenting code.
Great for cases that only use assertEquals.
Somewhere between unit tests and
functional tests.
Difficult to debug.
Don’t always provide useful test failures.
97
class Template(object):
"""
Deal with a URI template as a class::
>>> t = Template("http://example.com/{p}?{‐join|&|a,b,c}")
>>> t.expand(p="foo", a="1")
'http://example.com/foo?a=1'
>>> t.expand(p="bar", b="2", c="3")
'http://example.com/bar?c=3&b=2'
"""
def parse_expansion(expansion):
"""
Parse an expansion ‐‐ the part inside {curlybraces} ‐‐ into its component
parts. Returns a tuple of (operator, argument, variabledict)::
>>> parse_expansion("‐join|&|a,b,c=1")
('join', '&', {'a': None, 'c': '1', 'b': None})
>>> parse_expansion("c=1")
(None, None, {'c': '1'})
"""
def percent_encode(values):
"""
Percent‐encode a dictionary of values, handling nested lists correctly::
>>> percent_encode({'company': 'AT&T'})
{'company': 'AT%26T'}
>>> percent_encode({'companies': ['Yahoo!', 'AT&T']})
{'companies': ['Yahoo%21', 'AT%26T']}
"""
98
****************************************************
File "uri.py", line 150, in __main__.parse_expansion
Failed example:
parse_expansion("c=1")
Expected:
(None, None, {'c': '2'})
Got:
(None, None, {'c': '1'})
****************************************************
99
Functional tests
a.k.a “Behavior Driven Development.”
“Blackbox,” holistic testing.
All the hardcore TDD folks look down on
functional tests.
But it keeps your boss happy.
Easy to find problems, harder to find the
actual bug.
100
Functional testing tools
django.test.Client
webunit
Twill
...
101
django.test.Client
Test the whole request path without
running a web server.
Responses provide extra information
about templates and their contexts.
102
def testBasicAddPost(self):
"""
A smoke test to ensure POST on add_view works.
"""
post_data = {
"name": u"Another Section",
# inline data
"article_set‐TOTAL_FORMS": u"3",
"article_set‐INITIAL_FORMS": u"0",
}
response = self.client.post('/admin/admin_views/section/add/', post_data)
self.failUnlessEqual(response.status_code, 302)
def testCustomAdminSiteLoginTemplate(self):
self.client.logout()
request = self.client.get('/test_admin/admin2/')
self.assertTemplateUsed(request, 'custom_admin/login.html')
self.assertEquals(request.context['title'], 'Log in')
103
Web browser testing
The ultimate in functional testing for
web applications.
Run test in a web browser.
Can verify JavaScript, AJAX; even design.
Test your site across supported browsers.
104
Browser testing tools
Selenium
Windmill
105
Exotic testing
Static source analysis.
Smoke testing (crawlers and spiders).
Monkey testing.
Load testing.
...
106
107
Further resources
Talks here at PyCon!
http://bit.ly/pycon2009-testing
108
Deployment
109
Deployment should...
Be automated.
Automatically manage dependencies.
Be isolated.
Be repeatable.
Be identical in staging and in production.
Work the same for everyone.
110
Dependency
Isolation Automation
management
pip Puppet
zc.buildout
111
Let the live demo begin
(gulp)
112
Building your stack
113
net.
LiveJournal Backend: Today
(Roughly.)
BIG-IP
perlbal (httpd/proxy) Global Database
bigip1 mod_perl
bigip2 proxy1 master_a master_b
web1
proxy2 web2
proxy3 web3 Memcached
slave1 slave2 ... slave5
djabberd proxy4 mc1
web4
djabberd proxy5
... mc2 User DB Cluster 1
djabberd
webN mc3 uc1a uc1b
mc4 User DB Cluster 2
... uc2a uc2b
gearmand
Mogile Storage Nodes gearmand1 mcN User DB Cluster 3
sto1 sto2 gearmandN uc3a uc3b
Mogile Trackers
... sto8
tracker1 tracker3 User DB Cluster N
ucNa ucNb
MogileFS Database “workers”
gearwrkN Job Queues (xN)
mog_a mog_b theschwkN jqNa jqNb
slave1 slaveN
http://danga.com/words/ Brad Fitzpatrik, http://danga.com/words/2007_06_usenix/
3
114
django
database
media
server
115
Application servers
Apache + mod_python
Apache + mod_wsgi
Apache/lighttpd + FastCGI
SCGI, AJP, nginx/mod_wsgi, ...
116
Use mod_wsgi
117
WSGIScriptAlias / /home/mysite/mysite.wsgi
118
import os, sys
# Add to PYTHONPATH whatever you need
sys.path.append('/usr/local/django')
# Set DJANGO_SETTINGS_MODULE
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
# Create the application for mod_wsgi
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
119
A brief digression
regarding the question
of scale
120
Does this scale?
django
database
media
server
Maybe!
121
122
Real-world example
Database A Database B
123
Real-world example
http://tweakers.net/reviews/657/6
124
django
media
web server
database
database server
125
Why separate hardware?
Resource contention
Separate performance concerns
0 → 1 is much harder than 1 → N
126
DATABASE_HOST = '10.0.0.100'
FAIL 127
Connection middleware
Proxy between web and database layers
Most implement hot fallover and
connection pooling
Some also provide replication, load
balancing, parallel queries, connection
limiting, &c
DATABASE_HOST = '127.0.0.1'
128
Connection middleware
PostgreSQL: pgpool
MySQL: MySQL Proxy
Database-agnostic: sqlrelay
Oracle: ?
129
django media
web server media server
database
database server
130
Media server traits
Fast
Lightweight
Optimized for high concurrency
Low memory overhead
Good HTTP citizen
131
Media servers
Apache?
lighttpd
nginx
132
The absolute minimum
django media
web server media server
database
database server
133
The absolute minimum
django media
database
web server
134
proxy media
database
database server
135
Why load balancers?
136
Load balancer traits
Low memory overhead
High concurrency
Hot fallover
Other nifty features...
137
Load balancers
Apache + mod_proxy
perlbal
nginx
138
CREATE POOL mypool
POOL mypool ADD 10.0.0.100
POOL mypool ADD 10.0.0.101
CREATE SERVICE mysite
SET listen = my.public.ip
SET role = reverse_proxy
SET pool = mypool
SET verify_backend = on
SET buffer_size = 120k
ENABLE mysite
139
you@yourserver:~$ telnet localhost 60000
pool mysite add 10.0.0.102
OK
nodes 10.0.0.101
10.0.0.101 lastresponse 1237987449
10.0.0.101 requests 97554563
10.0.0.101 connects 129242435
10.0.0.101 lastconnect 1237987449
10.0.0.101 attempts 129244743
10.0.0.101 responsecodes 200 358
10.0.0.101 responsecodes 302 14
10.0.0.101 responsecodes 207 99
10.0.0.101 responsecodes 301 11
10.0.0.101 responsecodes 404 18
10.0.0.101 lastattempt 1237987449
140
proxy proxy proxy media media
141
“Shared nothing”
142
BALANCE = None
def balance_sheet(request):
global BALANCE
if not BALANCE:
bank = Bank.objects.get(...)
BALANCE = bank.total_balance()
...
FAIL 143
Global variables are
right out
144
from django.cache import cache
def balance_sheet(request):
balance = cache.get('bank_balance')
if not balance:
bank = Bank.objects.get(...)
balance = bank.total_balance()
cache.set('bank_balance', balance)
...
WIN 145
def generate_report(request):
report = get_the_report()
open('/tmp/report.txt', 'w').write(report)
return redirect(view_report)
def view_report(request):
report = open('/tmp/report.txt').read()
return HttpResponse(report)
FAIL 146
Filesystem?
What filesystem?
147
Further reading
Cal Henderson, Building Scalable Web Sites
(O’Reilly, 2006)
John Allspaw, The Art of Capacity Planning
(O’Reilly, 2008)
http://kitchensoap.com/
http://highscalability.com/
148
Monitoring
149
Goals
When the site goes down, know it immediately.
Automatically handle common sources of
downtime.
Ideally, handle downtime before it even happens.
Monitor hardware usage to identify hotspots and
plan for future growth.
Aid in postmortem analysis.
Generate pretty graphs.
150
Availability monitoring principles
Check services for availability.
More then just “ping yoursite.com.”
Have some understanding of dependancies (if the
db is down, I don’t need to also hear that the web
servers are down.)
Notify the “right” people using the “right”
methods, and don’t stop until it’s fixed.
Minimize false positives.
Automatically take action against common sources
of downtime.
151
Availability monitoring tools
Internal tools
Nagios
Monit
Zenoss
...
External monitoring tools
152
Usage monitoring
Keep track of resource usage over time.
Spot and identify trends.
Aid in capacity planning and management.
Look good in reports to your boss.
153
Usage monitoring tools
RRDTool
Munin
Cacti
Graphite
154
155
156
Logging and log analysis
Record information about what’s
happening right now.
Analyze historical data for trends.
Provide postmortem information after
failures.
157
Logging tools
print
Python’s logging module
syslogd
158
Log analysis
grep | sort | uniq ‐c | sort ‐rn
Load log data into relational databases,
then slice & dice.
OLAP/OLTP engines.
Splunk.
Analog, AWStats, ...
Google Analytics, Mint, ...
159
Performance (and
when to care about it)
160
Ignore performance
First, get the application written.
Then, make it work.
Then get it running on a server.
Then, maybe, think about performance.
161
Code isn’t “fast” or
“slow” until it’s been
written.
162
Code isn’t “fast” or
“slow” until it works.
163
Code isn’t “fast” or
“slow” until it’s actually
running on a server.
164
Optimizing code
Most of the time, “bad” code is obvious as
soon as you write it. So don’t write it.
165
Low-hanging fruit
Look for code doing lots of DB queries --
consider caching, or using select_related()
Look for complex DB queries, and see if
they can be simplified.
166
The DB is the bottleneck
And if it’s not the DB, it’s I/O.
Everything else is typically negligible.
167
Find out what “slow” means
Do testing in the browser.
Do testing with command-line tools like
wget.
Compare the results, and you may be
surprised.
168
Sometimes, perceived
“slowness” is actually
on the front end.
169
Read Steve Souders’ book
170
YSlow
http://developer.yahoo.com/yslow/
171
What to do on the server side
First, try caching.
Then try caching some more.
172
The secret weapon
Caching turns less hardware into more.
Caching puts off buying a new DB server.
173
But caching is a trade-off
174
Things to consider
Cache for everybody? Or only for people
who aren’t logged in?
Cache everything? Or only a few complex
views?
Use Django’s cache layer? Or an external
caching system?
175
Not all users are the same
Most visitors to most sites aren’t logged
in.
CACHE_MIDDLEWARE_ANONYMOUS
_ONLY
176
Not all views are the same
You probably already know where your
nasty DB queries are.
cache_page on those particular views.
177
Site-wide caches
You can use Django’s cache middleware to
do this...
Or you can use a proper caching proxy
(e.g., Squid, Varnish).
178
External caches
Work fine with Django, because Django
just uses HTTP’s caching headers.
Take the entire load off Django -- requests
never even hit the application.
179
When caching doesn’t cut it
180
Throw money at your DB first
181
Web server improvements
Simple steps first: turn off Keep-Alive,
etc.
Consider switching to a lighter-weight
web server (e.g., nginx) or lighter-weight
system (e.g., from mod_python to
mod_wsgi).
182
Database tuning
Whole books can be written on DB
performance tuning
183
Using MySQL?
184
Using PostgreSQL?
http://www.revsys.com/writings/postgresql-performance.html
185
Learn how to diagnose
If things are slow, the cause may not be
obvious.
Even if you think it’s obvious, that may not
be the cause.
186
Build a toolkit
Python profilers: profile and cProfile
Generic “spy on a process” tools: strace,
SystemTap, and dtrace.
Django debug toolbar
(http://bit.ly/django-debug-toolbar)
187
Shameless plug
http://revsys.com/
188
Fin.
Jacob Kaplan-Moss <[email protected]>
189