100% found this document useful (103 votes)
19K views23 pages

Scaling Rails Presentation (From Scribd Launch)

This document summarizes a presentation about scaling Ruby on Rails applications. It discusses how fragment caching can provide a significant performance boost. It also describes how the company Scribd built their own traffic analytics system to gain insights beyond basic analytics tools. The presentation provides code examples for implementing fragment caching, expiring cached fragments, and connecting Rails applications to multiple databases for analytics queries.

Uploaded by

wesleyhull9920
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (103 votes)
19K views23 pages

Scaling Rails Presentation (From Scribd Launch)

This document summarizes a presentation about scaling Ruby on Rails applications. It discusses how fragment caching can provide a significant performance boost. It also describes how the company Scribd built their own traffic analytics system to gain insights beyond basic analytics tools. The presentation provides code examples for implementing fragment caching, expiring cached fragments, and connecting Rails applications to multiple databases for analytics queries.

Uploaded by

wesleyhull9920
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Yet

Another
Rails
Scaling
Presentation
Ruby on Rails Meetup
May 10, 2007
Jared Friedman ([email protected]) and
Tikhon Bernstam ([email protected])
Should you bother with
scaling?
 Well, it depends
 But if you’re launching a startup, probably
 The best way to launch a startup these days
is to get it on TechCrunch, Digg, Reddit, etc.
 You don’t get as much time to grow
organically as you used to
 You only get one launch – don’t want your
site to fall over
The Predecessors
 Other great places to look for info on this
 poocs.net The Adventures of Scaling Rails
http://poocs.net/2006/3/13/the-adventures-of-scaling-stage-1

 Stephen Kaes “Performance Rails”


http://railsexpress.de/blog/files/slides/rubyenrails2006.pdf

 RobotCoop blog and gems


http://www.robotcoop.com/articles/2006/10/10/the-software-and-hardware-that-runs-our-sites

 O’reilly book “High Performance MySQL”


 It’s not rails, but it’s really useful
Big Picture
 This presentation will concentrate on what’s
different from previous writings, not a
comprehensive overview
 Available at http://www.scribd.com/blog
Who we are
 Scribd.com
 Like “YouTube for documents”
 Launched in March, 2007
 Handles ~1M requests per day
Key Points
 General architecture
 Use fragment caching!
 Rolling your own traffic analytics and some
SQL tips
Current Scribd architecture
 1 Web Server
 3 Database Servers
 3 Document conversion servers
 Test and backup machines
 Amazon S3
Server Hardware
 Dual, dual-core woodcrests at 3GHz
 16GB of memory
 4 15K SCSCI hard drives in a RAID 10
 We learned: disk speed is important
 Don't skimp; you’re not Google, and it's
easier to scale up than out
 Softlayer is a great dedicated hosting
company
Various software details
 CentOS
 Apache/Mongrel
 Memcached, RobotCoop’s memcache-client
 Stefan Kaes’ SQLSessionStore
 Best way to store persistent sessions
 Monit, Capistrano
 Postfix
Fragment Caching
 "We don’t use any page or fragment
caching." - robotcoop
 "Play with fragment caching ... no
improvement, changes were reverted at a
later time." - poocs.net
 Well, maybe it's application specific
 Scribd uses fragment caching extensively,
enormous performance improvement
ScreenShot
How to Use Fragment Caching
 Ignore all but the most frequently accessed pages
 Look for pieces of the page that don't change on
every page view and are expensive to compute
 Just wrap them in a
<% cache('keyname‘) do %>

<% end %>
 Do timing test before and afterwards; backtrack
unless significant performance gains
 We see > 10X
Expiring fragments, 1. Time based
 You should really use memcached for storing
fragments
 Better performance
 Easier to scale to multiple servers
 Most important: allows time-based expiration
 Use plugin http://agilewebdevelopment.com/plugins/memcache_fragments_with_time_expiry

 Dead easy:
<% cache 'keyname‘, :expire => 10.minutes do %>
...
<% end %>
Expiring fragments, 2. Manually

 No need to serve stale data


 Just use:
Cache.delete("fragment:/partials/whatever")
 Clear fragments whenever data changes
 Again, easier with memcached
Traffic Analytics
 Google Analytics is nice, but there are a lot of
reasons to roll your own traffic analytics too
 Can be much more powerful
 You can write SQL to answer arbitrary questions
 Can expose to users
Scribd’s analytics
(screenshots)
Building traffic analytics, part 1
 create_table “page_views” do |t|
t.column “user_id”, :integer
t.column “request_url”, :string, :limit => 200
t.column “session”, :string, :limit => 32
t.column “ip_address”, :string, :limit => 16
t.column “referer”, :string, :limit => 200
t.column “user_agent”, :string, :limit => 200
t.column “created_at”, :timestamp
end
 Add a whole bunch of indexes, depending on queries
Building traffic analytics, part 2

 Create a PageView on every request


 We used a hand-built SQL query to take out
the ActiveRecord overhead on this
 Might try MySQL’s “insert delayed”
 Analytics queries are usually hand-coded
SQL
 Use “explain select” to make sure MySQL is
using the indexes you expect
Building Traffic Analytics, part 3

 Scales pretty well


 BUT analytics queries expensive, can clog up
main DB server
 Our solution:
 use two DB servers in a master/slave setup
 move all the analytics queries to the slave
Rails with multiple databases, part 1
 "At this point in time there’s no facility in Rails to talk
to more than one database at a time." - Alex Payne,
Twitter developer
 Well that's true
 But setting things up yourself is about 10 lines of
code.
 There are now also two great plugins for doing this:
Magic multi-connections
http://magicmodels.rubyforge.org/magic_multi_conn
ections/
Acts as read onlyable-
http://rubyforge.org/frs/?group_id=3451
Rails with multiple databases, part 2

 At Scribd we use this to send pre-defined


expensive queries to a slave
 This can be very important for dealing with
lock contention issues
 You could also do automatic load balancing,
but synchronization becomes more
complicated (read a SQL book, not a Rails
issue)
Rails with multiple databases, code
 In database.yml
slave1:
host: 18.48.43.29 # your slave’s IP
database: production
username: root
password: pass
 Define a model Slave1.rb
class Slave1 < ActiveRecord::Base
self.abstract_class = true
establish_connection :slave1
end
 When you need to run a query on the slave, just do
Slave1.connection.execute("select * from some_table")
Shameless Self-Promotion
 Scribd.com: VC-backed and hiring
 Just 3 people so far! >10 by end of year.
 Awesome salary/equity combination
 If you’re reading this, you’re probably the
right kind of person
 Building the world's largest open document
library
 Email: [email protected]

You might also like