100% found this document useful (7 votes)

766 views49 pages

LiveJournal's Backend Evolution

The document summarizes the evolution of LiveJournal's backend architecture from a single server setup to a distributed system with over 60 servers to handle their increasing traffic load. Key points include moving to multiple web and database servers, implementing MySQL replication to improve read performance and high availability, partitioning data by user into "clusters" to spread the write load, and employing techniques like load balancing and caching to further improve scalability. The architecture transitioned from simple single server and basic replication designs to more sophisticated configurations using techniques like master-master replication, load balancing, and caching to handle their large scale.

Uploaded by

peter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (7 votes)

766 views49 pages

LiveJournal's Backend Evolution

Uploaded by

peter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Inside LiveJournal's Backend

or,
“holy hell that's a lot of hits!”

April 2004

Brad Fitzpatrick
[email protected]

Danga Interactive
danga.com / livejournal.com

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To

view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/1.0/ or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
LiveJournal Overview
● college hobby project, Apr 1999
● blogging, forums
● aggregator, social-networking ('friends')
● 2.8 million accounts; ~half active
● 40-50M dynamic hits/day. 700-800/second
at peak hours
● why it's interesting to you...
– 60+ servers
– lots of MySQL usage
LiveJournal Backend
(as of a few months ago)
Backend Evolution
● From 1 server to 60+....
– where it hurts
– how to fix
● Learn from this!
– don't repeat my mistakes
– can implement our design on a single server
One Server
● shared server
● dedicated server (still rented)
– still hurting, but could tune it
– learn Unix pretty quickly (first root)
– CGI to FastCGI
● Simple
One Server - Problems
● Site gets slow eventually.
– reach point where tuning doesn't help
● Need servers
– start “paid accounts”
Two Servers

● Paid account revenue buys:

– Kenny: 6U Dell web server
– Cartman: 6U Dell database
server
● bigger / extra disks
● Network simple
– 2 NICs each
● Cartman runs MySQL on
internal network
Two Servers - Problems

● Two points of failure

● No hot or cold spares
● Site gets slow again.
– CPU-bound on web node
– need more web nodes...
Four Servers
● Buy two more web nodes (1U this time)
– Kyle, Stan
● Overview: 3 webs, 1 db
● Now we need to load-balance!
– Kept Kenny as gateway to outside world
– mod_backhand amongst 'em all
mod_backhand
● web nodes broadcasting their state
– free/busy apache children
– system load
– ...
● internally proxying requests around
– network cheap
Four Servers - Problems
● Points of failure:
– database
– kenny (but could switch to another gateway
easily when needed, or used heartbeat, but we
didn't)
● Site gets slow...
– IO-bound
– need another database server ...
– ... how to use another database?
Five Servers
introducing MySQL replication

● We buy a new database server

● MySQL replication
● Writes to Cartman (master)
● Reads from both
Replication Implementation
● get_db_handle() : $dbh
– existing
● get_db_reader() : $dbr
– transition to this
– weighted selection
● permissions: slaves select-only
– mysql option for this now
● be prepared for replication lag
– easy to detect in MySQL 4.x
– user actions from $dbh, not $dbr
More Servers
● Site's fast for a while,
● Then slow
● More web servers,
● More database slaves,
● ...
● IO vs CPU fight
● BIG-IP load balancers
– cheap from usenet
– two, but not automatic
fail-over (no support
Chaos!
contract)
– LVS would work too
Where we're at...
Problems with Architecture
or,
“This don't scale...”

● Slaves upon slaves doesn't scale well...

– only spreads reads
– databases eventual consumed by writing
● 1 server: 100 reads, 10 writes (10% writes)
● Traffic doubles: 200 reads, 20 writes (10% writes)
– imagine nearing threshold
● 2 servers: 100 reads, 20 writes (20% writes)
● Database master is point of failure
● Reparenting slaves on master failure is tricky
Spreading Writes
● Our database machines already did RAID
● We did backups
● So why put user data on 6+ slave machines?
(~12+ disks)
– overkill redundancy
– wasting time writing everywhere
Introducing User Clusters
● Already had get_db_handle() vs
get_db_reader()
● Specialized handles:
● Partition dataset
– can't join. don't care. never join user data w/
other user data
● Each user assigned to a cluster number
● Each cluster has multiple machines
– writes self-contained in cluster (writing to 2-3
machines, not 6)
User Cluster Implementation

● $u = LJ::load_user(“brad”)
– hits global cluster
– $u object contains its clusterid
● $dbcm = LJ::get_cluster_master($u)
– writes
– definitive reads
● $dbcr = LJ::get_cluster_reader($u)
– reads
User Clusters

● almost resembles today's architecture

User Cluster Implementation

● per-user numberspaces
– can't use AUTO_INCREMENT
– avoid it also on final column in multi-col index:
(MyISAM-only feature)
● CREATE TABLE foo (uid INT, postid INT
AUTO_INCREMENT, PRIMARY KEY (userid, postid))
● moving users around clusters
– balancing disk IO
– balance disk space
– monitor everything
● cricket
● nagios
● ...whatever works
Subclusters

● easy at this point; APIs already exist

● multiple databases per real cluster
– lj_50
– lj_51
– lj_52
– ...
● MyISAM performance hack
● incremental maintenance
Where we're at...
Points of Failure
● 1 x Global master
– lame
● n x User cluster masters
– n x lame.
● Slave reliance
– one dies, others reading too much

Solution?
Master-Master Clusters!
– two identical machines per cluster
● both “good” machines
– do all reads/writes to one at a time, both
replicate from each other
– intentionally only use half our DB hardware at a
time to be prepared for crashes
– easy maintenance by flipping the active in pair
– no points of failure
Master-Master Prereqs
● failover can't break replication, be it:
– automatic (be prepared for flapping)
– by hand (probably have other problems)
● fun/tricky part is number allocation
– same number allocated on both pairs
– cross-replicate, explode.
● strategies
– odd/even numbering (a=odd, b=even)
● if numbering is public, users suspicious
– where's my missing _______ ?
– solution: prevent enumeration. add gibberish 'anum' = rand
(256). visiblenum = (realid << 8 + anum). verify/store the
anum
– 3rd party arbitrator for synchronization
Cold Co-Master
● inactive pair isn't getting reads
● after switching active machine, caches full,
but not useful (few min to hours)
● switch at night, or
● sniff reads on active pair, replay to inactive
guy
Summary Thus Far
● Dual BIG-IPs (or LVS+heartbeat, or..)
● 30-40 web servers
● 1 “global cluster”:
– non-user/multi-user data
– what user is where?
– master-slave (lame)
● point of failure; only cold spares
● pretty small dataset (<4 GB)
– MySQL cluster looks potentially interesting
– or master-election
● bunch of “user clusters”:
– master-slave (old ones)
– master-master (new ones)
● ...
Static files...

Directory
Dynamic vs. Static Content
● static content
– images, CSS
– TUX, epoll-thttpd, etc. w/ thousands conns
– boring, easy
● dynamic content
– session-aware
● site theme
● browsing language
– security on items
– deal with heavy processes
● CDN (Akamai / Speedera)
– static easier, APIs to invalidate
– security: origin says 403 or 304
Misc MySQL Machines (Mmm...)

Directory
MyISAM vs. InnoDB
● We use both
● This is all nicely documented on mysql.com
● MyISAM
– fast for reading xor writing,
– bad concurrency, compact,
– no foreign keys, constraints, etc
– easy to admin
● InnoDB
– ACID
– good concurrency
● Mix-and-match. Design for both.
Directory & InnoDB
● Directory Search
– multi-second queries
– many at once
– InnoDB!
– replicates subset of tables from global cluster
– some data on both global and user
● write to both
● read from directory for searching
● read from user cluster when loading use data
Postfix & MySQL
● Postfix
– 4 servers: postfix + mysql maps
– replicating one table: email_aliases
● Secondary Mail Queue
– async job system
– random cluster master
– serialize message.
Logging to MySQL
● mod_perl logging handler
● new table per hour
– MyISAM
● Apache access logging off
– diskless web nodes, PXE boot
– apache error logs through syslog-ng
● INSERT DELAYED
– increase your insert buffer if querying
● minimal/no indexes
– table scans are fine
● background job doing log analysis/rotation
Load Balancing!
Web Load Balancing
● slow client problem (hogging mod_perl/php)
● BIG-IP [mostly] packet-level
● doesn't buffer HTTP responses
● BIG-IP can't adjust server weighting quick
enough
– few ms to multiple seconds responses
● mod_perl broadcasting state
– Inline.pm to Apache scoreboard
● mod_proxy+mod_rewrite
– external rewrite map (listening to mod_perl
broadcasts)
– map destination is [P] (mod_proxy)
● Monobal
DBI::Role – DB Load Balancing
● Our library on top of DBI
– GPL; not packaged anywhere but our cvs
● Returns handles given a role name
– master (writes), slave (reads)
– directory (innodb), ...
– cluster<n>{,slave,a,b}
– Can cache connections within a request or
forever
● Verifies connections from previous request
● Realtime balancing of DB nodes within a role
– web / CLI interfaces (not part of library)
– dynamic reweighting when node down
Caching!
Caching
● caching's key to performance
● can't hit the DB all the time
– MyISAM: r/w concurrency problems
– InnoDB: good concurrency for disk
– MySQL has to parse your query all the time
● better with new MySQL binary protocol
● Where to cache?
– mod_perl caching (address space per apache child)
– shared memory (limited to single machine, same with
Java/C#/Mono)
– MySQL query cache: flushed per update, small max
size
– HEAP tables: fixed length rows, small max size
memcached
http://www.danga.com/memcached/

● our Open Source, distributed caching system

● run instances wherever there's free memory
– requests hashed out amongst them all
– choose to rehash or not on failure
● no “master node”
● protocol simple and XML-free; clients for:
– perl, java, php, python, ruby, ...
● In use by:
– LiveJournal, Slashdot, Wikipedia, ...
● People speeding up their:
– websites, mail servers, ...
memcached – speed
● C
– prototype Perl version proved concept, dog slow
● async IO, event-driven, single-threaded
● libevent (epoll, kqueue, select, poll...)
– run-time mode selection
● lockless, refcounted objects
● slab allocator
– glibc malloc died after 7~8 days
– slabs: no address space fragmentation ever.
● O(1) operations
– hash table inside
– Judy didn't work (malloc problems?)
● multi-server parallel fetch (can't do in DBI?)
LiveJournal and memcached
● 10 unique hosts
– none dedicated
● 28 instances
● 30 GB of cached data
● 90-93% hit rate
– not necessarily 90-93% less queries:
● FROM foo WHERE id IN (1, 2, 3)
● would be 3 memcache hits; 1 mysql query
– 90-93% potential disk seeks?
● 12 GB machine w/ five 2GB instances
– left-over 'big' machines from our learn-to-scale-
out days
What to Cache
● Everything?
● Start with stuff that's hot
● Look at your logs
– query log
– update log
– slow log
● Control MySQL logging at runtime
– can't
● help me bug them.
– sniff the queries! Net::Pcap
● tool to be released? bug me.
● canonicalize and count
– name queries: SELECT /* name=foo */
Caching Disadvantages
● updating your cache
– decide what's important
– when to do a clean read (from DB) vs potentially-
dirty read (from memcached)
● more crap to admin
– but memcached is easy
– one real option: memory to use
● disable rehashing, or be prepared
– small, quick objects
● “time user #123 last posted = t”
– heavy objects with unlimited lifetime, containing
small item too
● “last 100 recent post ids for user #123, as of time t”
– application can detect problems
MySQL Persistent Connection
Woes
● connections == threads == memory
– (until MySQL 5.x? thanks, Brian!)
● max threads
– limit max memory
● with 10 user clusters:
– Bob is on cluster 5
– Alice on cluser 6
– Do you need Bob's DB handles alive while you
process Alice's request?
● Major wins by disabling persistent conns
– still use persistent memcached conns
Software Overview
● Linux 2.4
– database servers
● Linux 2.6
– web nodes; memcached (epoll)
– experimenting on dbs w/ master-master
● Debian woody
– moving to sarge
● BIG-IPs
– got new ones, w/ auto fail-over
– management so nice, anti-DoS
● mod_perl
Questions?
Thank you!

Questions to...
[email protected]

Slides linked off:

http://www.danga.com/words/

Scaling LiveJournal's Backend Architecture
100% (3)
Scaling LiveJournal's Backend Architecture
70 pages
MySQL Cluster Deployment Guide
No ratings yet
MySQL Cluster Deployment Guide
39 pages
Web Scalability - Part - 2
100% (2)
Web Scalability - Part - 2
25 pages
MySQL Scaling and High Availability Architectures
100% (8)
MySQL Scaling and High Availability Architectures
57 pages
MySQL Scalability for Developers
No ratings yet
MySQL Scalability for Developers
73 pages
Linux & Hardware Optimizations for MySQL
No ratings yet
Linux & Hardware Optimizations for MySQL
160 pages
Linux and H/W Optimizations For MySQL
100% (2)
Linux and H/W Optimizations For MySQL
160 pages
Performance Is Overrated - NEDB 2012
100% (2)
Performance Is Overrated - NEDB 2012
44 pages
Web Scaling for Startups
No ratings yet
Web Scaling for Startups
21 pages
MySQL High Availability & Scalability Guide
No ratings yet
MySQL High Availability & Scalability Guide
36 pages
Application Partitioning WP
No ratings yet
Application Partitioning WP
3 pages
Mixi JP Scaling Out With Open Source
100% (1)
Mixi JP Scaling Out With Open Source
57 pages
Lessons Learned Building A Web 2.0 Application Using Mysql
100% (3)
Lessons Learned Building A Web 2.0 Application Using Mysql
50 pages
Database Sharding Strategies for Netlog
100% (5)
Database Sharding Strategies for Netlog
70 pages
MySQL High Availability Overview
No ratings yet
MySQL High Availability Overview
41 pages
MySQLConf2007 Capacity
No ratings yet
MySQLConf2007 Capacity
54 pages
MySQL Cluster: High-Performance Database Solutions
No ratings yet
MySQL Cluster: High-Performance Database Solutions
31 pages
Unit 2
No ratings yet
Unit 2
27 pages
MySQL High Availability with Pacemaker
No ratings yet
MySQL High Availability with Pacemaker
49 pages
Mysql Replication & Cluster
100% (9)
Mysql Replication & Cluster
40 pages
Web Scalability & Performance Guide
100% (26)
Web Scalability & Performance Guide
189 pages
Mysql Cluster Datasheet
No ratings yet
Mysql Cluster Datasheet
5 pages
Introduction To Mysql Cluster: Architecture and Use: (Based On An Original Paper by Stewart Smith, Mysql Ab)
No ratings yet
Introduction To Mysql Cluster: Architecture and Use: (Based On An Original Paper by Stewart Smith, Mysql Ab)
7 pages
MySQL Cluster Tutorial Overview
No ratings yet
MySQL Cluster Tutorial Overview
32 pages
Shard-In After Sharding Out With SSD
100% (1)
Shard-In After Sharding Out With SSD
41 pages
Another MySQL Performance Talk
100% (1)
Another MySQL Performance Talk
35 pages
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
74 pages
Mysql: Aaron Byers Sr. Sales Manager - Telecom & Networking
No ratings yet
Mysql: Aaron Byers Sr. Sales Manager - Telecom & Networking
34 pages
Large Datasets in MySQL On Amazon EC2
No ratings yet
Large Datasets in MySQL On Amazon EC2
30 pages
Mysql High Availability Solutions
No ratings yet
Mysql High Availability Solutions
35 pages
MySQL Cluster Setup Guide
No ratings yet
MySQL Cluster Setup Guide
6 pages
0 To 60 in 3.1: Tyler Carlton Cory Sessions
No ratings yet
0 To 60 in 3.1: Tyler Carlton Cory Sessions
21 pages
Percona Server with XtraDB Overview
No ratings yet
Percona Server with XtraDB Overview
74 pages
Lecture 15 - MySQL - PHP 1
No ratings yet
Lecture 15 - MySQL - PHP 1
81 pages
Unit 4
No ratings yet
Unit 4
25 pages
Gearman & MySQL
100% (1)
Gearman & MySQL
26 pages
Designing Scalable Systems - A Guide For Engineers
No ratings yet
Designing Scalable Systems - A Guide For Engineers
56 pages
MySQL Architecture Design Patterns For Performance, Scalability, and Availability Presentation
No ratings yet
MySQL Architecture Design Patterns For Performance, Scalability, and Availability Presentation
26 pages
MySQL 简要介绍
No ratings yet
MySQL 简要介绍
27 pages
How Digg Com Uses The LAMP Stack To Scale Upward
100% (1)
How Digg Com Uses The LAMP Stack To Scale Upward
2 pages
MySQL WhitePaper New Features in MySQL Cluster 51
No ratings yet
MySQL WhitePaper New Features in MySQL Cluster 51
22 pages
3 Web Application Architecture
No ratings yet
3 Web Application Architecture
23 pages
High-Value Transaction Processing With MySQL
No ratings yet
High-Value Transaction Processing With MySQL
53 pages
InnoDB Performance Tuning Guide
No ratings yet
InnoDB Performance Tuning Guide
18 pages
How To Scale: George Palmer
No ratings yet
How To Scale: George Palmer
25 pages
(PingCAP Meetup SH 5.26) Using MySQL Distributed Database Architectures Peter
No ratings yet
(PingCAP Meetup SH 5.26) Using MySQL Distributed Database Architectures Peter
67 pages
Kit Home Buyers' Guide
No ratings yet
Kit Home Buyers' Guide
12 pages
Tackling Mortgage Crisis by The State Government
No ratings yet
Tackling Mortgage Crisis by The State Government
30 pages
Carbon Footprint Report
100% (7)
Carbon Footprint Report
83 pages
Wee Budget
No ratings yet
Wee Budget
1 page
Weehouse Brochure
100% (3)
Weehouse Brochure
34 pages
The Baby Boom, The Baby Bust, and The Housing Market
100% (1)
The Baby Boom, The Baby Bust, and The Housing Market
24 pages
Historical Evidence of US Home Price Booms and Busts 1978 2003
100% (3)
Historical Evidence of US Home Price Booms and Busts 1978 2003
1 page
Reuters2012 6 07
No ratings yet
Reuters2012 6 07
2 pages
Mortgage Market Impact Analysis
100% (2)
Mortgage Market Impact Analysis
1 page
Chapter 1 Fundamentals of Computer Design
No ratings yet
Chapter 1 Fundamentals of Computer Design
40 pages
Module 1 Advanced Vlsi
No ratings yet
Module 1 Advanced Vlsi
18 pages
IIUM Students & Cloud Computing
No ratings yet
IIUM Students & Cloud Computing
3 pages
AT&T Deploys XGS-PON To Power FTTH Nets - Technology Blog
No ratings yet
AT&T Deploys XGS-PON To Power FTTH Nets - Technology Blog
5 pages
Duplicate Cleaner Log File
No ratings yet
Duplicate Cleaner Log File
37 pages
Vulnerability Exploitation and Defense: Elements of Information Security
No ratings yet
Vulnerability Exploitation and Defense: Elements of Information Security
13 pages
Ethernet Protocol Basics PDF
No ratings yet
Ethernet Protocol Basics PDF
2 pages
BCS BOE Technology Presentation 9
No ratings yet
BCS BOE Technology Presentation 9
19 pages
Career Summary: Aditya XXXXX
No ratings yet
Career Summary: Aditya XXXXX
10 pages
C# Barcode Generator Generate, Create Linear, 2d Bar Code Images in C#
No ratings yet
C# Barcode Generator Generate, Create Linear, 2d Bar Code Images in C#
3 pages
Micros Pos Guide
0% (1)
Micros Pos Guide
5 pages
Multimedia Project Planning & Compression
No ratings yet
Multimedia Project Planning & Compression
9 pages
Payroll Management System PDF
No ratings yet
Payroll Management System PDF
34 pages
Handouts Digital Vlsi Design MTech
No ratings yet
Handouts Digital Vlsi Design MTech
3 pages
h15300 Vxrail Network Guide PDF
No ratings yet
h15300 Vxrail Network Guide PDF
62 pages
Federal Audio Digitization Guidelines
No ratings yet
Federal Audio Digitization Guidelines
6 pages
E-Banking and Security Guide
No ratings yet
E-Banking and Security Guide
51 pages
SQL Server 2016 Step by Step - Creating AlwaysOn Availability Group - TechNet Articles - United States (English) - TechNet Wiki
No ratings yet
SQL Server 2016 Step by Step - Creating AlwaysOn Availability Group - TechNet Articles - United States (English) - TechNet Wiki
1 page
Photoshop Fire Effects Guide
No ratings yet
Photoshop Fire Effects Guide
2 pages
Guide For External Users - Strong Authentication Service
No ratings yet
Guide For External Users - Strong Authentication Service
29 pages
IuB Congestion Identification Guide
No ratings yet
IuB Congestion Identification Guide
3 pages
Megascans Educational Program - Getting Started Guide (Free)
No ratings yet
Megascans Educational Program - Getting Started Guide (Free)
3 pages
C Programming Output Analysis
No ratings yet
C Programming Output Analysis
32 pages
Legato Net Worker 6
No ratings yet
Legato Net Worker 6
54 pages
Arm 2011
No ratings yet
Arm 2011
55 pages
Decoder and Encoder Overview
No ratings yet
Decoder and Encoder Overview
38 pages
San Storage Interview Question
0% (1)
San Storage Interview Question
12 pages
Pineapp™ Archive-Secure™: The Best Answer For All Businesses' Mail Archiving Needs
No ratings yet
Pineapp™ Archive-Secure™: The Best Answer For All Businesses' Mail Archiving Needs
6 pages
Grade 7 Module 7
100% (1)
Grade 7 Module 7
5 pages
Chapter 4 Threads
No ratings yet
Chapter 4 Threads
36 pages

LiveJournal's Backend Evolution

Uploaded by

LiveJournal's Backend Evolution

Uploaded by

Inside LiveJournal's Backend

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To

● Paid account revenue buys:

● Two points of failure

● We buy a new database server

● Slaves upon slaves doesn't scale well...

● almost resembles today's architecture

● easy at this point; APIs already exist

● our Open Source, distributed caching system

Slides linked off:

You might also like