Scribd Architecture Overview
Scribd Architecture Overview
com
Architecture
Overview
X1 Tech Talks
Dmytro Shteflyuk
04/09/2009
Who Am I?
Ruby developer at Scribd.com
Lazy blogger at kpumuk.info
Experienced in Ruby & Rails, ASP.NET,
MySQL, Sphinx, etc.
Author of several projects
Sphinx Ruby API maintainer
What Is Scribd.com
Social document sharing
The largest Rails site over the Net
65th place on Quantcast (before Digg)
53.5M visitors, 178M page views
10.5M users, 14M document, over 1PB
15 app, 17 db, 7 search, 3 web, 4 proxy
boxes
Online Viewer
Groups
Partners
Desktop Uploader
The Big Picture
Nginx
Delivers static content
Handles file uploads
Selects app cluster (main, api, etc)
Forwards doc page requests to Squid
Forwards all requests to HAProxy
HAProxy
Performs load balancing among application
servers
That’s all - as easy as pie :-)
Squid
Caches all document pages for bots and
anonymous users
Forwards requests to HAProxy
Allows gracefully clear whole cache
Clears cached pages by request (HTCP)
Handles 90% of Scribd traffic!
MySQL
All writes to master
Almost all reads from slaves.
Texts are in separate DB (sharded)
All tables are in InnoDB
Mysql 5.0 / 5.1 with Percona patches
Application Boxes