Inside Livejournal Backend
Inside Livejournal Backend
or,
“holy hell that's a lot of hits!”
April 2004
Brad Fitzpatrick
[email protected]
Danga Interactive
danga.com / livejournal.com
● $u = LJ::load_user(“brad”)
– hits global cluster
– $u object contains its clusterid
● $dbcm = LJ::get_cluster_master($u)
– writes
– definitive reads
● $dbcr = LJ::get_cluster_reader($u)
– reads
User Clusters
● per-user numberspaces
– can't use AUTO_INCREMENT
– avoid it also on final column in multi-col index:
(MyISAM-only feature)
● CREATE TABLE foo (uid INT, postid INT
AUTO_INCREMENT, PRIMARY KEY (userid, postid))
● moving users around clusters
– balancing disk IO
– balance disk space
– monitor everything
● cricket
● nagios
● ...whatever works
Subclusters
Solution?
Master-Master Clusters!
– two identical machines per cluster
● both “good” machines
– do all reads/writes to one at a time, both
replicate from each other
– intentionally only use half our DB hardware at a
time to be prepared for crashes
– easy maintenance by flipping the active in pair
– no points of failure
Master-Master Prereqs
● failover can't break replication, be it:
– automatic (be prepared for flapping)
– by hand (probably have other problems)
● fun/tricky part is number allocation
– same number allocated on both pairs
– cross-replicate, explode.
● strategies
– odd/even numbering (a=odd, b=even)
● if numbering is public, users suspicious
– where's my missing _______ ?
– solution: prevent enumeration. add gibberish 'anum' = rand
(256). visiblenum = (realid << 8 + anum). verify/store the
anum
– 3rd party arbitrator for synchronization
Cold Co-Master
● inactive pair isn't getting reads
● after switching active machine, caches full,
but not useful (few min to hours)
● switch at night, or
● sniff reads on active pair, replay to inactive
guy
Summary Thus Far
● Dual BIG-IPs (or LVS+heartbeat, or..)
● 30-40 web servers
● 1 “global cluster”:
– non-user/multi-user data
– what user is where?
– master-slave (lame)
● point of failure; only cold spares
● pretty small dataset (<4 GB)
– MySQL cluster looks potentially interesting
– or master-election
● bunch of “user clusters”:
– master-slave (old ones)
– master-master (new ones)
● ...
Static files...
Directory
Dynamic vs. Static Content
● static content
– images, CSS
– TUX, epoll-thttpd, etc. w/ thousands conns
– boring, easy
● dynamic content
– session-aware
● site theme
● browsing language
– security on items
– deal with heavy processes
● CDN (Akamai / Speedera)
– static easier, APIs to invalidate
– security: origin says 403 or 304
Misc MySQL Machines (Mmm...)
Directory
MyISAM vs. InnoDB
● We use both
● This is all nicely documented on mysql.com
● MyISAM
– fast for reading xor writing,
– bad concurrency, compact,
– no foreign keys, constraints, etc
– easy to admin
● InnoDB
– ACID
– good concurrency
● Mix-and-match. Design for both.
Directory & InnoDB
● Directory Search
– multi-second queries
– many at once
– InnoDB!
– replicates subset of tables from global cluster
– some data on both global and user
● write to both
● read from directory for searching
● read from user cluster when loading use data
Postfix & MySQL
● Postfix
– 4 servers: postfix + mysql maps
– replicating one table: email_aliases
● Secondary Mail Queue
– async job system
– random cluster master
– serialize message.
Logging to MySQL
● mod_perl logging handler
● new table per hour
– MyISAM
● Apache access logging off
– diskless web nodes, PXE boot
– apache error logs through syslog-ng
● INSERT DELAYED
– increase your insert buffer if querying
● minimal/no indexes
– table scans are fine
● background job doing log analysis/rotation
Load Balancing!
Web Load Balancing
● slow client problem (hogging mod_perl/php)
● BIG-IP [mostly] packet-level
● doesn't buffer HTTP responses
● BIG-IP can't adjust server weighting quick
enough
– few ms to multiple seconds responses
● mod_perl broadcasting state
– Inline.pm to Apache scoreboard
● mod_proxy+mod_rewrite
– external rewrite map (listening to mod_perl
broadcasts)
– map destination is [P] (mod_proxy)
● Monobal
DBI::Role – DB Load Balancing
● Our library on top of DBI
– GPL; not packaged anywhere but our cvs
● Returns handles given a role name
– master (writes), slave (reads)
– directory (innodb), ...
– cluster<n>{,slave,a,b}
– Can cache connections within a request or
forever
● Verifies connections from previous request
● Realtime balancing of DB nodes within a role
– web / CLI interfaces (not part of library)
– dynamic reweighting when node down
Caching!
Caching
● caching's key to performance
● can't hit the DB all the time
– MyISAM: r/w concurrency problems
– InnoDB: good concurrency for disk
– MySQL has to parse your query all the time
● better with new MySQL binary protocol
● Where to cache?
– mod_perl caching (address space per apache child)
– shared memory (limited to single machine, same with
Java/C#/Mono)
– MySQL query cache: flushed per update, small max
size
– HEAP tables: fixed length rows, small max size
memcached
http://www.danga.com/memcached/
Questions to...
[email protected]