100% found this document useful (2 votes)

2K views56 pages

DjangoCon 2010 Scaling Disqus

Disqus' presentation at DjangoCon 2010. Covers their basic hardware setup, some of their concerns with database usage, and how they manage with a small team of engineers.

Uploaded by

David Cramer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views56 pages

DjangoCon 2010 Scaling Disqus

Disqus' presentation at DjangoCon 2010. Covers their basic hardware setup, some of their concerns with database usage, and how they manage with a small team of engineers.

Uploaded by

David Cramer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Scaling the World’s Largest Django App

Jason Yan David Cramer

@jasonyan @zeeg

1
What is DISQUS?

2
What is DISQUS?

dis·cuss • dĭ-skŭs'

We are a comment system with an emphasis on

connecting communities

http://disqus.com/about/

3
What is Scale?

Number of Visitors
300M
250M
200M
150M
100M
50M

Our traffic at a glance

17,000 requests/second peak
450,000 websites
15 million profiles
75 million comments
250 million visitors (August 2010)

4
Our Challenges

• We can’t predict when things will happen

• Random celebrity gossip
• Natural disasters
• Discussions never expire
• We can’t keep those millions of articles from
2008 in the cache
• You don’t know in advance (generally) where the
traffic will be
• Especially with dynamic paging, realtime, sorting,
personal prefs, etc.

5
Our Challenges (cont’d)

• High availability
• Not a destination site
• Difficult to schedule maintenance

6
Server Architecture

7
Server Architecture - Load Balancing
• Load Balancing • High Availability
• Software, HAProxy • heartbeat
• High performance, intelligent
server availability checking
• Bonus: Nice statistics reporting

Image Source: http://haproxy.1wt.eu/

8
Server Architecture

• ~100 Servers
• 30% Web Servers (Apache + mod_wsgi)
• 10% Databases (PostgreSQL)
• 25% Cache Servers (memcached)
• 20% Load Balancing / High Availability
(HAProxy + heartbeat)
• 15% Utility Servers (Python scripts)

9
Server Architecture - Web Servers

• Apache 2.2
• mod_wsgi
• Using `maximum-requests` to
plug memory leaks.

• Performance Monitoring
• Custom middleware
(PerformanceLogMiddleware)
• Ships performance statistics
(DB queries, external calls,
template rendering, etc) through
syslog
• Collected and graphed through
Ganglia

10
Server Architecture - Database

• PostgreSQL
• Slony-I for Replication
• Trigger-based
• Read slaves for extra read capacity
• Failover master database for high
availability

11
Server Architecture - Database

• Make sure indexes fit in memory and

measure I/O
• High I/O generally means slow queries
due to missing indexes or indexes not in
buffer cache
• Log Slow Queries
• syslog-ng + pgFouine + cron to automate
slow query logging

12
Server Architecture - Database

• Use connection pooling

• Django doesn’t do this for you
• We use pgbouncer
• Limits the maximum number of
connections your database needs to
handle
• Save on costly opening and tearing down
of new database connections

13
Our Data Model

14
Partitioning

• Fairly easy to implement, quick wins

• Done at the application level
• Data is replayed by Slony
• Two methods of data separation

15
Vertical Partitioning
Vertical partitioning involves creating tables with fewer columns
and using additional tables to store the remaining columns.

Forums Posts Users Sentry

http://en.wikipedia.org/wiki/Partition_(database)

16
Pythonic Joins

Allows us to separate datasets

posts = Post.objects.all()[0:25]

# store users in a dictionary based on primary key

users = dict(
(u.pk, u) for u in \
User.objects.filter(pk__in=set(p.user_id for p in posts))
)

# map users to their posts

for p in posts:
p._user_cache = users.get(p.user_id)

17
Pythonic Joins (cont’d)

• Slower than at database level

• But not enough that you should care
• Trading performance for scale
• Allows us to separate data
• Easy vertical partitioning
• More efficient caching
• get_many, object-per-row cache

18
Designating Masters

• Alleviates some of the write load on your

primary application master
• Masters exist under specific conditions:
• application use case
• partitioned data
• Database routers make this (fairly) easy

19
Routing by Application

class ApplicationRouter(object):
def db_for_read(self, model, **hints):
instance = hints.get('instance')
if not instance:
return None

app_label = instance._meta.app_label

return get_application_alias(app_label)

20
Horizontal Partitioning
Horizontal partitioning (also known as sharding) involves splitting
one set of data into different tables.

Disqus Your Blog CNN Telegraph

http://en.wikipedia.org/wiki/Partition_(database)

21
Horizontal Partitions

• Some forums have very large datasets

• Partners need high availability
• Helps scale the write load on the master
• We rely more on vertical partitions

22
Routing by Partition

class ForumPartitionRouter(object):
def db_for_read(self, model, **hints):
instance = hints.get('instance')
if not instance:
return None

forum_id = getattr(instance, 'forum_id', None)

if not forum_id:
return None

return get_forum_alias(forum_id)

# What we used to do
Post.objects.filter(forum=forum)

# Now, making sure hints are available

forum.post_set.all()

23
Optimizing QuerySets

• We really dislike raw SQL

• It creates more work when dealing with
partitions
• Built-in cache allows sub-slicing
• But isn’t always needed
• We removed this cache

24
Removing the Cache

• Django internally caches the results of your QuerySet

• This adds additional memory overhead

# 1 query
qs = Model.objects.all()[0:100]

# 0 queries (we don’t need this behavior)

qs = qs[0:10]

# 1 query
qs = qs.filter(foo=bar)

• Many times you only need to view a result set once

• So we built SkinnyQuerySet

25
Removing the Cache (cont’d)

Optimizing memory usage by removing the cache

class SkinnyQuerySet(QuerySet):
def __iter__(self):
if self._result_cache is not None:
# __len__ must have been run
return iter(self._result_cache)

has_run = getattr(self, 'has_run', False)

if has_run:
raise QuerySetDoubleIteration("...")
self.has_run = True
# We wanted .iterator() as the default
return self.iterator()

http://gist.github.com/550438

26
Atomic Updates

• Keeps your data consistent

• save() isnt thread-safe
• use update() instead
• Great for things like counters
• But should be considered for all write
operations

27
Atomic Updates (cont’d)

Thread safety is impossible with .save()

Request 1

post = Post(pk=1)
# a moderator approves
post.approved = True
post.save()

Request 2

post = Post(pk=1)
# the author adjusts their message
post.message = ‘Hello!’
post.save()

28
Atomic Updates (cont’d)

So we need atomic updates

Request 1

post = Post(pk=1)
# a moderator approves
Post.objects.filter(pk=post.pk)\
.update(approved=True)

Request 2

post = Post(pk=1)
# the author adjusts their message
Post.objects.filter(pk=post.pk)\
.update(message=‘Hello!’)

29
Atomic Updates (cont’d)

A better way to approach updates

def update(obj, using=None, **kwargs):
"""
Updates specified attributes on the current instance.
"""
assert obj, "Instance has not yet been created."
obj.__class__._base_manager.using(using)\
.filter(pk=obj)
.update(**kwargs)
for k, v in kwargs.iteritems():
if isinstance(v, ExpressionNode):
# NotImplemented
continue
setattr(obj, k, v)

http://github.com/andymccurdy/django-tips-and-tricks/blob/master/model_update.py

30
Delayed Signals

• Queueing low priority tasks

• even if they’re fast
• Asynchronous (Delayed) signals
• very friendly to the developer
• ..but not as friendly as real signals

31
Delayed Signals (cont’d)

We send a specific serialized version

of the model for delayed signals

from disqus.common.signals import delayed_save

def my_func(data, sender, created, **kwargs):

print data[‘id’]

delayed_save.connect(my_func, sender=Post)

This is all handled through our Queue

32
Caching

• Memcached
• Use pylibmc (newer libMemcached-based)
• Ticket #11675 (add pylibmc support)
• Third party applications:
• django-newcache, django-pylibmc

33
Caching (cont’d)

• libMemcached / pylibmc is configurable with

“behaviors”.
• Memcached “single point of failure”
• Distributed system, but we must take
precautions.
• Connection timeout to memcached can stall
requests.
• Use `_auto_eject_hosts` and
`_retry_timeout` behaviors to prevent
reconnecting to dead caches.

34
Caching (cont’d)

• Default (naive) hashing behavior

• Modulo hashed cache key cache for index
to server list.
• Removal of a server causes majority of
cache keys to be remapped to new
servers.

CACHE_SERVERS = [‘10.0.0.1’, ‘10.0.0.2’]

key = ‘my_cache_key’
cache_server = CACHE_SERVERS[hash(key) % len(CACHE_SERVERS)]

35
Caching (cont’d)

• Better approach: consistent hashing

• libMemcached (pylibmc) uses libketama
(http://tinyurl.com/lastfm-libketama)

• Addition / removal of a cache server

remaps (K/n) cache keys
(where K=number of keys and n=number of servers)

Image Source: http://sourceforge.net/apps/mediawiki/kai/index.php?title=Introduction

36
Caching (cont’d)

• Thundering herd (stampede) problem

• Invalidating a heavily accessed cache key causes many
clients to refill cache.
• But everyone refetching to fill the cache from the data
store or reprocessing data can cause things to get even
slower.
• Most times, it’s ideal to return the previously invalidated
cache value and let a single client refill the cache.
• django-newcache or MintCache (http://
djangosnippets.org/snippets/793/) will do this for you.
• Prefer filling cache on invalidation instead of deleting
from cache also helps to prevent the thundering herd
problem.

37
Transactions

• TransactionMiddleware got us started, but

down the road became a burden
• For postgresql_psycopg2, there’s a database
option, OPTIONS[‘autocommit’]
• Each query is in its own transaction. This
means each request won’t start in a
transaction.
• But sometimes we want transactions
(e.g., saving multiple objects and rolling
back on error)

38
Transactions (cont’d)

• Tips:
• Use autocommit for read slave databases.
• Isolate slow functions (e.g., external calls,
template rendering) from transactions.
• Selective autocommit
• Most read-only views don’t need to be
in transactions.
• Start in autocommit and switch to a
transaction on write.

39
Scaling the Team

• Small team of engineers

• Monthly users / developers = 40m
• Which means writing tests..
• ..and having a dead simple workflow

40
Keeping it Simple

• A developer can be up and running in a few

minutes
• assuming postgres and other server
applications are already installed
• pip, virtualenv
• settings.py

41
Setting Up Local

1. createdb -E UTF-8 disqus

2. git clone git://repo
3. mkvirtualenv disqus
4. pip install -U -r requirements.txt
5. ./manage.py syncdb && ./manage.py migrate

42
Sane Defaults

settings.py
from disqus.conf.settings.default import *

try:
from local_settings import *
except ImportError:
import sys, traceback
sys.stderr.write("Can't find 'localsettings.py’\n”)
sys.stderr.write("\nThe exception was:\n\n")
traceback.print_exc()

local_settings.py
from disqus.conf.settings.dev import *

43
Continuous Integration

• Daily deploys with Fabric

• several times an hour on some days
• Hudson keeps our builds going
• combined with Selenium
• Post-commit hooks for quick testing
• like Pyflakes
• Reverting to a previous version is a matter of
seconds

44
Continuous Integration (cont’d)

Hudson makes integration easy

45
Testing

• It’s not fun breaking things when you’re the new

guy
• Our testing process is fairly heavy
• 70k (Python) LOC, 73% coverage, 20 min suite
• Custom Test Runner (unittest)
• We needed XML, Selenium, Query Counts
• Database proxies (for read-slave testing)
• Integration with our Queue

46
Testing (cont’d)

Query Counts
# failures yield a dump of queries
def test_read_slave(self):
Model.objects.using(‘read_slave’).count()
self.assertQueryCount(1, ‘read_slave’)

Selenium
def test_button(self):
self.selenium.click('//a[@class=”dsq-button”]')

Queue Integration
class WorkerTest(DisqusTest):
workers = [‘fire_signal’]

def test_delayed_signal(self):
...

47
Bug Tracking

• Switched from Trac to Redmine

• We wanted Subtasks
• Emailing exceptions is a bad idea
• Even if its localhost
• Previously using django-db-log to aggregate
errors to a single point
• We’ve overhauled db log and are releasing
Sentry

48
django-sentry

Groups messages intelligently

http://github.com/dcramer/django-sentry

49
django-sentry (cont’d)

Similar feel to Django’s debugger

http://github.com/dcramer/django-sentry

50
Feature Switches

• We needed a safety in case a feature wasn’t

performing well at peak
• it had to respond without delay, globally,
and without writing to disk
• Allows us to work out of trunk (mostly)
• Easy to release new features to a portion of
your audience
• Also nice for “Labs” type projects

51
Feature Switches (cont’d)

52
Final Thoughts

• The language (usually) isn’t your problem

• We like Django
• But we maintain local patches
• Some tickets don’t have enough of a following
• Patches, like #17, completely change
Django..
• ..arguably in a good way
• Others don’t have champions
Ticket #17 describes making the ORM an identify mapper

53
Housekeeping

Birds of a Feather
Want to learn from others about
performance and scaling problems?
Or play some StarCraft 2?

We’re Hiring!

DISQUS is looking for amazing engineers

54
Questions

55
References

django-sentry
http://github.com/dcramer/django-sentry

Our Feature Switches

http://cl.ly/2FYt

Andy McCurdy’s update()

http://github.com/andymccurdy/django-tips-and-tricks

Our PyFlakes Fork

http://github.com/dcramer/pyflakes

SkinnyQuerySet
http://gist.github.com/550438

django-newcache
http://github.com/ericflo/django-newcache

attach_foreignkey (Pythonic Joins)

http://gist.github.com/567356

Report of 15 Days Internship
No ratings yet
Report of 15 Days Internship
57 pages
Dbms Cheat Sheet
100% (5)
Dbms Cheat Sheet
5 pages
ECE Campus Placement Preparation Guide
No ratings yet
ECE Campus Placement Preparation Guide
8 pages
Admin Dashboard Requirements PDF
No ratings yet
Admin Dashboard Requirements PDF
3 pages
Django
100% (1)
Django
23 pages
Notepad Tricks: Matrix Effect Trick
No ratings yet
Notepad Tricks: Matrix Effect Trick
10 pages
High Performance Django
100% (2)
High Performance Django
22 pages
DBG
No ratings yet
DBG
64 pages
Mike Krieger, Instagram at The Airbnb Tech Talk, On Scaling Instagram
85% (20)
Mike Krieger, Instagram at The Airbnb Tech Talk, On Scaling Instagram
185 pages
System Design
No ratings yet
System Design
56 pages
Django
No ratings yet
Django
10 pages
Why Django Sucks, and How We Can Fix It
100% (16)
Why Django Sucks, and How We Can Fix It
52 pages
How A Small Team Scales Instagram
No ratings yet
How A Small Team Scales Instagram
236 pages
Django Workshop 2013.06.22
100% (1)
Django Workshop 2013.06.22
78 pages
Building Pluggable Web Applications Using: Lakshman Prasad
No ratings yet
Building Pluggable Web Applications Using: Lakshman Prasad
78 pages
Crud
No ratings yet
Crud
64 pages
Django & The OWASP Top 10
No ratings yet
Django & The OWASP Top 10
40 pages
Crack Python Interview Part 2
No ratings yet
Crack Python Interview Part 2
9 pages
Interview Questions
No ratings yet
Interview Questions
32 pages
Web Application Development Using Django
100% (1)
Web Application Development Using Django
33 pages
Django TDTU
No ratings yet
Django TDTU
70 pages
Full Stack Development-Module 3
No ratings yet
Full Stack Development-Module 3
56 pages
Databases
No ratings yet
Databases
19 pages
CH2 Django QuerySet
No ratings yet
CH2 Django QuerySet
33 pages
Presentation On: Django A Python Framework For Web Applications
No ratings yet
Presentation On: Django A Python Framework For Web Applications
20 pages
Interview Questions of Django DRF
No ratings yet
Interview Questions of Django DRF
28 pages
Intro To Framework, Server Side Programming With Django
No ratings yet
Intro To Framework, Server Side Programming With Django
53 pages
How Python, Turbogears, and Mongodb: Sourceforge - Slashdot - Thinkgeek - Ohloh - Freshmeat Geeknet, Page 1
No ratings yet
How Python, Turbogears, and Mongodb: Sourceforge - Slashdot - Thinkgeek - Ohloh - Freshmeat Geeknet, Page 1
25 pages
A Web Development Python Framework: Presented By: Saqib Saud C Ii Mca
No ratings yet
A Web Development Python Framework: Presented By: Saqib Saud C Ii Mca
43 pages
Rapid Web Development With Python/Django: Julian Hill
No ratings yet
Rapid Web Development With Python/Django: Julian Hill
37 pages
DJANGO
0% (1)
DJANGO
6 pages
Django Notes
No ratings yet
Django Notes
20 pages
Django
100% (12)
Django
58 pages
CH - 5 Advance Python
No ratings yet
CH - 5 Advance Python
27 pages
Unit 5 Easy Notes
No ratings yet
Unit 5 Easy Notes
20 pages
Django Overview - Django
No ratings yet
Django Overview - Django
3 pages
Ikhtiyor Final Report
No ratings yet
Ikhtiyor Final Report
5 pages
Django and DRF Cheat Sheet
No ratings yet
Django and DRF Cheat Sheet
11 pages
L07-Django Models & Databases
No ratings yet
L07-Django Models & Databases
28 pages
Pick N Mix - DJUGL Sep 09
No ratings yet
Pick N Mix - DJUGL Sep 09
56 pages
System Design
No ratings yet
System Design
56 pages
TB 1300 - SAP Business One SDK
No ratings yet
TB 1300 - SAP Business One SDK
19 pages
Tirthoza PPT-1
No ratings yet
Tirthoza PPT-1
15 pages
1 - Introduction To Django Model and Database PDF
No ratings yet
1 - Introduction To Django Model and Database PDF
37 pages
Industrial Training
No ratings yet
Industrial Training
13 pages
2nd Review
No ratings yet
2nd Review
33 pages
20 Django Packages That I Use in Every Project
No ratings yet
20 Django Packages That I Use in Every Project
3 pages
Guide To Django Request Object
No ratings yet
Guide To Django Request Object
5 pages
Database Setup
No ratings yet
Database Setup
11 pages
Rest Api
No ratings yet
Rest Api
11 pages
Django/Python Framework
100% (5)
Django/Python Framework
57 pages
Django
No ratings yet
Django
3 pages
React Bootstrap
No ratings yet
React Bootstrap
6 pages
Django With Python
No ratings yet
Django With Python
38 pages
Exam Questions
No ratings yet
Exam Questions
18 pages
Mytruconnect Training - Update
No ratings yet
Mytruconnect Training - Update
22 pages
EXPT 5 and 6
No ratings yet
EXPT 5 and 6
4 pages
Cs Project Django
No ratings yet
Cs Project Django
39 pages
Python Django
No ratings yet
Python Django
12 pages
How I Made Over $10k From Udemy With This LAZY Method
No ratings yet
How I Made Over $10k From Udemy With This LAZY Method
2 pages
Web Application Development Using Django PDF
No ratings yet
Web Application Development Using Django PDF
33 pages
Django Interview Questions
No ratings yet
Django Interview Questions
4 pages
Overview of Django: Django-Admin Startproject Projectname o
No ratings yet
Overview of Django: Django-Admin Startproject Projectname o
2 pages
A Case Study of Sony Interactive Entertainment
No ratings yet
A Case Study of Sony Interactive Entertainment
6 pages
GitanjaliJoshi QA 8years
No ratings yet
GitanjaliJoshi QA 8years
3 pages
Answer Sheet in Tle Ict 7&8 - Quarter 4
No ratings yet
Answer Sheet in Tle Ict 7&8 - Quarter 4
30 pages
Database Security - Concepts, Approaches: IEEE Transactions On Dependable and Secure Computing February 2005
No ratings yet
Database Security - Concepts, Approaches: IEEE Transactions On Dependable and Secure Computing February 2005
23 pages
How To Barred All Access Class of Call in Utran Cell of Wcdma
No ratings yet
How To Barred All Access Class of Call in Utran Cell of Wcdma
1 page
E-TECH - 3
No ratings yet
E-TECH - 3
9 pages
Manually Process An EWA Report
No ratings yet
Manually Process An EWA Report
4 pages
808D For PC Simulator
No ratings yet
808D For PC Simulator
1 page
Administrator and User Passwords in Windows XP
No ratings yet
Administrator and User Passwords in Windows XP
4 pages
NJ SQL Best Practices V1 0 QuickStartGuide en 201504 P77I-E-01
No ratings yet
NJ SQL Best Practices V1 0 QuickStartGuide en 201504 P77I-E-01
16 pages
SRS Tun DR Ismail College (Ktdi) Online Venue Booking System
No ratings yet
SRS Tun DR Ismail College (Ktdi) Online Venue Booking System
35 pages
Automation Technology AppNote Matlab Image Acquisition Toolbox
No ratings yet
Automation Technology AppNote Matlab Image Acquisition Toolbox
15 pages
GS1200-5 / GS1200-8: Quick Start Guide
No ratings yet
GS1200-5 / GS1200-8: Quick Start Guide
4 pages
3 IaaS
No ratings yet
3 IaaS
33 pages
Xtium CLHS PX8 Dsheet
No ratings yet
Xtium CLHS PX8 Dsheet
2 pages
Sequence Diagram-Vending Machine Case Study
No ratings yet
Sequence Diagram-Vending Machine Case Study
3 pages
H30 LIS Protocol
No ratings yet
H30 LIS Protocol
11 pages
Retail Brochure Revise 2
No ratings yet
Retail Brochure Revise 2
10 pages
Proposal Social Engagement
No ratings yet
Proposal Social Engagement
9 pages
VPN Form202404050172
No ratings yet
VPN Form202404050172
3 pages
MKA500 Data Sheet en 12
No ratings yet
MKA500 Data Sheet en 12
4 pages
Telpro
No ratings yet
Telpro
2 pages
CCS 335-Assign - 1
No ratings yet
CCS 335-Assign - 1
3 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet