100% found this document useful (2 votes)
2K views56 pages

DjangoCon 2010 Scaling Disqus

Disqus' presentation at DjangoCon 2010. Covers their basic hardware setup, some of their concerns with database usage, and how they manage with a small team of engineers.

Uploaded by

David Cramer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views56 pages

DjangoCon 2010 Scaling Disqus

Disqus' presentation at DjangoCon 2010. Covers their basic hardware setup, some of their concerns with database usage, and how they manage with a small team of engineers.

Uploaded by

David Cramer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Scaling the World’s Largest Django App

Jason Yan David Cramer


@jasonyan @zeeg

1
What is DISQUS?

2
What is DISQUS?

dis·cuss • dĭ-skŭs'

We are a comment system with an emphasis on


connecting communities

http://disqus.com/about/

3
What is Scale?

Number of Visitors
300M
250M
200M
150M
100M
50M

Our traffic at a glance


17,000 requests/second peak
450,000 websites
15 million profiles
75 million comments
250 million visitors (August 2010)

4
Our Challenges

• We can’t predict when things will happen


• Random celebrity gossip
• Natural disasters
• Discussions never expire
• We can’t keep those millions of articles from
2008 in the cache
• You don’t know in advance (generally) where the
traffic will be
• Especially with dynamic paging, realtime, sorting,
personal prefs, etc.

5
Our Challenges (cont’d)

• High availability
• Not a destination site
• Difficult to schedule maintenance

6
Server Architecture

7
Server Architecture - Load Balancing
• Load Balancing • High Availability
• Software, HAProxy • heartbeat
• High performance, intelligent
server availability checking
• Bonus: Nice statistics reporting

Image Source: http://haproxy.1wt.eu/


8
Server Architecture

• ~100 Servers
• 30% Web Servers (Apache + mod_wsgi)
• 10% Databases (PostgreSQL)
• 25% Cache Servers (memcached)
• 20% Load Balancing / High Availability
(HAProxy + heartbeat)
• 15% Utility Servers (Python scripts)

9
Server Architecture - Web Servers

• Apache 2.2
• mod_wsgi
• Using `maximum-requests` to
plug memory leaks.

• Performance Monitoring
• Custom middleware
(PerformanceLogMiddleware)
• Ships performance statistics
(DB queries, external calls,
template rendering, etc) through
syslog
• Collected and graphed through
Ganglia

10
Server Architecture - Database

• PostgreSQL
• Slony-I for Replication
• Trigger-based
• Read slaves for extra read capacity
• Failover master database for high
availability

11
Server Architecture - Database

• Make sure indexes fit in memory and


measure I/O
• High I/O generally means slow queries
due to missing indexes or indexes not in
buffer cache
• Log Slow Queries
• syslog-ng + pgFouine + cron to automate
slow query logging

12
Server Architecture - Database

• Use connection pooling


• Django doesn’t do this for you
• We use pgbouncer
• Limits the maximum number of
connections your database needs to
handle
• Save on costly opening and tearing down
of new database connections

13
Our Data Model

14
Partitioning

• Fairly easy to implement, quick wins


• Done at the application level
• Data is replayed by Slony
• Two methods of data separation

15
Vertical Partitioning
Vertical partitioning involves creating tables with fewer columns
and using additional tables to store the remaining columns.

Forums Posts Users Sentry

http://en.wikipedia.org/wiki/Partition_(database)

16
Pythonic Joins

Allows us to separate datasets

posts = Post.objects.all()[0:25]

# store users in a dictionary based on primary key


users = dict(
(u.pk, u) for u in \
User.objects.filter(pk__in=set(p.user_id for p in posts))
)

# map users to their posts


for p in posts:
p._user_cache = users.get(p.user_id)

17
Pythonic Joins (cont’d)

• Slower than at database level


• But not enough that you should care
• Trading performance for scale
• Allows us to separate data
• Easy vertical partitioning
• More efficient caching
• get_many, object-per-row cache

18
Designating Masters

• Alleviates some of the write load on your


primary application master
• Masters exist under specific conditions:
• application use case
• partitioned data
• Database routers make this (fairly) easy

19
Routing by Application

class ApplicationRouter(object):
def db_for_read(self, model, **hints):
instance = hints.get('instance')
if not instance:
return None

app_label = instance._meta.app_label

return get_application_alias(app_label)

20
Horizontal Partitioning
Horizontal partitioning (also known as sharding) involves splitting
one set of data into different tables.

Disqus Your Blog CNN Telegraph

http://en.wikipedia.org/wiki/Partition_(database)

21
Horizontal Partitions

• Some forums have very large datasets


• Partners need high availability
• Helps scale the write load on the master
• We rely more on vertical partitions

22
Routing by Partition

class ForumPartitionRouter(object):
def db_for_read(self, model, **hints):
instance = hints.get('instance')
if not instance:
return None

forum_id = getattr(instance, 'forum_id', None)


if not forum_id:
return None

return get_forum_alias(forum_id)

# What we used to do
Post.objects.filter(forum=forum)

# Now, making sure hints are available


forum.post_set.all()

23
Optimizing QuerySets

• We really dislike raw SQL


• It creates more work when dealing with
partitions
• Built-in cache allows sub-slicing
• But isn’t always needed
• We removed this cache

24
Removing the Cache

• Django internally caches the results of your QuerySet


• This adds additional memory overhead

# 1 query
qs = Model.objects.all()[0:100]

# 0 queries (we don’t need this behavior)


qs = qs[0:10]

# 1 query
qs = qs.filter(foo=bar)

• Many times you only need to view a result set once


• So we built SkinnyQuerySet

25
Removing the Cache (cont’d)

Optimizing memory usage by removing the cache


class SkinnyQuerySet(QuerySet):
def __iter__(self):
if self._result_cache is not None:
# __len__ must have been run
return iter(self._result_cache)

has_run = getattr(self, 'has_run', False)


if has_run:
raise QuerySetDoubleIteration("...")
self.has_run = True
# We wanted .iterator() as the default
return self.iterator()

http://gist.github.com/550438

26
Atomic Updates

• Keeps your data consistent


• save() isnt thread-safe
• use update() instead
• Great for things like counters
• But should be considered for all write
operations

27
Atomic Updates (cont’d)

Thread safety is impossible with .save()


Request 1

post = Post(pk=1)
# a moderator approves
post.approved = True
post.save()

Request 2

post = Post(pk=1)
# the author adjusts their message
post.message = ‘Hello!’
post.save()

28
Atomic Updates (cont’d)

So we need atomic updates


Request 1

post = Post(pk=1)
# a moderator approves
Post.objects.filter(pk=post.pk)\
.update(approved=True)

Request 2

post = Post(pk=1)
# the author adjusts their message
Post.objects.filter(pk=post.pk)\
.update(message=‘Hello!’)

29
Atomic Updates (cont’d)

A better way to approach updates


def update(obj, using=None, **kwargs):
"""
Updates specified attributes on the current instance.
"""
assert obj, "Instance has not yet been created."
obj.__class__._base_manager.using(using)\
.filter(pk=obj)
.update(**kwargs)
for k, v in kwargs.iteritems():
if isinstance(v, ExpressionNode):
# NotImplemented
continue
setattr(obj, k, v)

http://github.com/andymccurdy/django-tips-and-tricks/blob/master/model_update.py

30
Delayed Signals

• Queueing low priority tasks


• even if they’re fast
• Asynchronous (Delayed) signals
• very friendly to the developer
• ..but not as friendly as real signals

31
Delayed Signals (cont’d)

We send a specific serialized version


of the model for delayed signals

from disqus.common.signals import delayed_save

def my_func(data, sender, created, **kwargs):


print data[‘id’]

delayed_save.connect(my_func, sender=Post)

This is all handled through our Queue

32
Caching

• Memcached
• Use pylibmc (newer libMemcached-based)
• Ticket #11675 (add pylibmc support)
• Third party applications:
• django-newcache, django-pylibmc

33
Caching (cont’d)

• libMemcached / pylibmc is configurable with


“behaviors”.
• Memcached “single point of failure”
• Distributed system, but we must take
precautions.
• Connection timeout to memcached can stall
requests.
• Use `_auto_eject_hosts` and
`_retry_timeout` behaviors to prevent
reconnecting to dead caches.

34
Caching (cont’d)

• Default (naive) hashing behavior


• Modulo hashed cache key cache for index
to server list.
• Removal of a server causes majority of
cache keys to be remapped to new
servers.

CACHE_SERVERS = [‘10.0.0.1’, ‘10.0.0.2’]


key = ‘my_cache_key’
cache_server = CACHE_SERVERS[hash(key) % len(CACHE_SERVERS)]

35
Caching (cont’d)

• Better approach: consistent hashing


• libMemcached (pylibmc) uses libketama
(http://tinyurl.com/lastfm-libketama)

• Addition / removal of a cache server


remaps (K/n) cache keys
(where K=number of keys and n=number of servers)

Image Source: http://sourceforge.net/apps/mediawiki/kai/index.php?title=Introduction

36
Caching (cont’d)

• Thundering herd (stampede) problem


• Invalidating a heavily accessed cache key causes many
clients to refill cache.
• But everyone refetching to fill the cache from the data
store or reprocessing data can cause things to get even
slower.
• Most times, it’s ideal to return the previously invalidated
cache value and let a single client refill the cache.
• django-newcache or MintCache (http://
djangosnippets.org/snippets/793/) will do this for you.
• Prefer filling cache on invalidation instead of deleting
from cache also helps to prevent the thundering herd
problem.

37
Transactions

• TransactionMiddleware got us started, but


down the road became a burden
• For postgresql_psycopg2, there’s a database
option, OPTIONS[‘autocommit’]
• Each query is in its own transaction. This
means each request won’t start in a
transaction.
• But sometimes we want transactions
(e.g., saving multiple objects and rolling
back on error)

38
Transactions (cont’d)

• Tips:
• Use autocommit for read slave databases.
• Isolate slow functions (e.g., external calls,
template rendering) from transactions.
• Selective autocommit
• Most read-only views don’t need to be
in transactions.
• Start in autocommit and switch to a
transaction on write.

39
Scaling the Team

• Small team of engineers


• Monthly users / developers = 40m
• Which means writing tests..
• ..and having a dead simple workflow

40
Keeping it Simple

• A developer can be up and running in a few


minutes
• assuming postgres and other server
applications are already installed
• pip, virtualenv
• settings.py

41
Setting Up Local

1. createdb -E UTF-8 disqus


2. git clone git://repo
3. mkvirtualenv disqus
4. pip install -U -r requirements.txt
5. ./manage.py syncdb && ./manage.py migrate

42
Sane Defaults

settings.py
from disqus.conf.settings.default import *

try:
from local_settings import *
except ImportError:
import sys, traceback
sys.stderr.write("Can't find 'localsettings.py’\n”)
sys.stderr.write("\nThe exception was:\n\n")
traceback.print_exc()

local_settings.py
from disqus.conf.settings.dev import *

43
Continuous Integration

• Daily deploys with Fabric


• several times an hour on some days
• Hudson keeps our builds going
• combined with Selenium
• Post-commit hooks for quick testing
• like Pyflakes
• Reverting to a previous version is a matter of
seconds

44
Continuous Integration (cont’d)

Hudson makes integration easy

45
Testing

• It’s not fun breaking things when you’re the new


guy
• Our testing process is fairly heavy
• 70k (Python) LOC, 73% coverage, 20 min suite
• Custom Test Runner (unittest)
• We needed XML, Selenium, Query Counts
• Database proxies (for read-slave testing)
• Integration with our Queue

46
Testing (cont’d)

Query Counts
# failures yield a dump of queries
def test_read_slave(self):
Model.objects.using(‘read_slave’).count()
self.assertQueryCount(1, ‘read_slave’)

Selenium
def test_button(self):
self.selenium.click('//a[@class=”dsq-button”]')

Queue Integration
class WorkerTest(DisqusTest):
workers = [‘fire_signal’]

def test_delayed_signal(self):
...

47
Bug Tracking

• Switched from Trac to Redmine


• We wanted Subtasks
• Emailing exceptions is a bad idea
• Even if its localhost
• Previously using django-db-log to aggregate
errors to a single point
• We’ve overhauled db log and are releasing
Sentry

48
django-sentry

Groups messages intelligently

http://github.com/dcramer/django-sentry

49
django-sentry (cont’d)

Similar feel to Django’s debugger

http://github.com/dcramer/django-sentry

50
Feature Switches

• We needed a safety in case a feature wasn’t


performing well at peak
• it had to respond without delay, globally,
and without writing to disk
• Allows us to work out of trunk (mostly)
• Easy to release new features to a portion of
your audience
• Also nice for “Labs” type projects

51
Feature Switches (cont’d)

52
Final Thoughts

• The language (usually) isn’t your problem


• We like Django
• But we maintain local patches
• Some tickets don’t have enough of a following
• Patches, like #17, completely change
Django..
• ..arguably in a good way
• Others don’t have champions
Ticket #17 describes making the ORM an identify mapper

53
Housekeeping

Birds of a Feather
Want to learn from others about
performance and scaling problems?
Or play some StarCraft 2?

We’re Hiring!

DISQUS is looking for amazing engineers

54
Questions

55
References

django-sentry
http://github.com/dcramer/django-sentry

Our Feature Switches


http://cl.ly/2FYt

Andy McCurdy’s update()


http://github.com/andymccurdy/django-tips-and-tricks

Our PyFlakes Fork


http://github.com/dcramer/pyflakes

SkinnyQuerySet
http://gist.github.com/550438

django-newcache
http://github.com/ericflo/django-newcache

attach_foreignkey (Pythonic Joins)


http://gist.github.com/567356

56

You might also like