0% found this document useful (0 votes)

909 views61 pages

Pubsubhubbub For Developers: Brett Slatkin Software Engineer Google Inc

This document discusses PubSubHubbub, a protocol for decentralized, real-time messaging at scale. It introduces PubSubHubbub and how it works, explaining the roles of publishers, subscribers, and hubs. It emphasizes the need for "fat pinging" over "light pinging" to efficiently distribute feed updates without burdening servers or networks. The document outlines how PubSubHubbub can scale to the size of the entire web through its hub-based architecture and encourages further adoption and development of the protocol.

Uploaded by

api-19821754

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

909 views61 pages

Pubsubhubbub For Developers: Brett Slatkin Software Engineer Google Inc

Uploaded by

api-19821754

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 61

PubSubHubbub for

Developers

Brett Slatkin
Software Engineer
Google Inc.
September 28, 2009
Agenda
• Background
• Intro
• Motivation
• Scale
• Progress
Background
Why do real-time messaging?
• Syndication
o Creating a "flow"
o Simultaneous delivery of an event spurs immediate
conversation
o More participation enables more developed
conversations, better exchanging of ideas
o Cross-site allows promotion, linking, swarming around
sources, mash-ups, growth opportunity
Why do real-time messaging?
• Business, politics
o 1 minute of delay could cost a company millions, cause a
political scandal, be harmful to investors, etc
o Concrete example: SEC earnings requirements
Why do real-time messaging?
• Future applications (out of scope, but ...)
o Financial data
o Public scientific measurements (e.g., stream of weather
data, traffic status, polling, votes)
o Sensor networks
o Emergency information distribution
o Anything you can think of that's a stream of information!
Why do decentralized messaging?
• Web was built on decentralized protocols
• No single point of failure
• Interoperability is key to network effects and growth
• One API for application developers
Intro
What is PubSubHubbub?

• A simple publish/subscribe protocol

• Turns Atom and RSS feeds into real-time streams
• Web-scale, low-latency messaging
• Three participants: Publisher, Subscriber, Hubs

Publisher Hub Subscriber

Design goals of PubSubHubbub

• Decentralized: No one company in control

• Scale to the size of the whole web
• Publishing and subscribing as easy as possible
• Complexity in the Hub
• Pragmatic (i.e., not theoretically perfect, but solve huge,
known use cases with minimal effort)

How-to for Publishers
1. Add a declaration in your feed with your Hub of choice
<link rel="hub"
href="https://pubsubhubbub.appspot.com/"/>

2. Add something to your feed!

3. Send a ping to the Hub with the feed URL

POST / HTTP/1.1
Content-Type: application/x-www-form-urlencoded
...

hub.mode=publish&hub.url=<your feed>

4. 204 response = Success, 4xx = Bad request, 5xx = Try again

How-to for Subscribers
1. Detect the Hub declaration in a feed

2. Send a subscribe request to the feed's Hub
POST / HTTP/1.1
Content-Type: application/x-www-form-urlencoded
...

hub.mode=subscribe&hub.verify=sync&
hub.topic=<feed URL>&hub.callback=<callback URL>

3. Hub will send a request to verify the subscription

GET /callback?hub.challenge=<random> HTTP/1.1

HTTP/1.1 200
...
<echo random>
How-to for Subscribers
Process new content from the Hub
POST /callback HTTP/1.1
Content-Type: application/atom+xml
...

<?xml version="1.0" encoding="utf-8"?>

<feed xmlns="http://www.w3.org/2005/Atom">
<title>Awesome feed</title>
<link rel="hub"
href="http://pubsubhubbub.appspot.com"/>
...
<entry>
...
</entry>
</feed>
The role of the Hub
• Logical component
o Publishers may be their own Hub
o Combined Hub/Publisher has p2p speed-up

• Distinct functions
o Accept and verify subscriptions to new topics
o Receive pings from publishers, retrieve content
o Extract new/updated items from feed
o Send all subscribers the new content
The role of the Hub
• Scalability
o # of subscribers & feeds, update frequency
o Delegation of content distribution (= bandwidth)

• Reliability
o Retry fetch, delivery, idempotence
How the hub works
How the hub works

See my talk on building a hub using App Engine

http://tinyurl.com/building-a-hub
Security model
• Subscriber verification prevents DoS attacks

• Declaration of the Hub is a delegation of trust
o Subscribers may trust the Hub to deliver content on
publisher's behalf
o v0.2 supports shared-secret HMACs for subscribers to
verify that notifications came from the hub

• Privacy through HTTPS for hubs, feeds, and callbacks
o URLs and payloads can be sent via encrypted channel
o Subscribed topics are not discoverable
o Unguessable, capability URLs (e.g., from OAuth)

• Publishers can run their own hub!
Motivation
Push it to the limit
Why push content?
Push it to the limit
Why push content?

Learn from our forefathers.
Push it to the limit
Why push content?

Learn from our forefathers.

TCP
(est. 1974)

Push it to the limit
What is magical about TCP? The Window.
Push it to the limit
Without the window, the tube can't be full.
Push it to the limit
TCP maximizes the throughput of a link
• Dump data in, it will be received
• The window means no waiting for acks!
• When acks are missed, the sender will retransmit
• Receivers reassemble the message in-order, de-dupe
• Good citizenship with congestion control
Push it to the limit
Where is such efficiency for application-level protocols?
• Exists, but often proprietary or an interoperability
nightmare
Push it to the limit
Where is such efficiency for application-level protocols?
• Exists, but often proprietary or an interoperability
nightmare

(cough SOAP cough)
Why another protocol?

Why another protocol?
• We want interoperable, web-scale messaging

• Almost every company already has an internal system

o TIBCO, WebsphereMQ, ActiveMQ, RabbitMQ, ...
o Proprietary message payloads, topics, networks

• Existing attempts at an standard haven't caught on
o XMPP weirds people out; started in 1999, still isn't used
for interop widely beyond IM
o These standards are too complex or not pragmatic (XEP-
0060, WS-*, AMQP, RestMS, new REST-*)

Why another protocol?
• Build the simplest interoperable messaging protocol that
can scale to the size of the web
• Make the base specification bare-bones, easy-to-use
• Target Atom/RSS initially as a payload format; everyone
uses them for time-based, idempotent streams
• In the future, add extensions for cool stuff

Why another protocol?
• Proof of simplicity is in the code
o Bret Taylor added PubSubHubbub subscription to
FriendFeed in a single evening
Scale
Goal
• World-wide RSS publishing currently
o ~X,000 updates per second
• Legitimate email currently
o ~X,000,000 per second

• Need to scale by at least 1000x; hopefully more

• Trying to enable new use-cases
Light pinging
Light pinging
• Protocols exist for faster Atom/RSS
o Ping-o-Matic, changes.xml, SUP, rssCloud
• All only indicate the feed URL that has changed
o Still need to go and fetch the content
o These protocols are just optimized polling
o Equivalent to killing the TCP window!

Light pinging
• Optimized polling is still worse
o Latency is high: 3 round trips
o Thundering herd as subscribers fetch published feeds
 Unpredictable, bursty load pattern
o More bandwidth, CPU, connection star-pattern

Light pinging
Light pinging
Light pinging at scale
What if you had to use light pinging at scale?
• Send out pings slowly to reduce the herd

• Herd causes all feeds to be fully regenerated
o Invalidates existing caches

• Bandwidth increases extremely fast
o (average updates per feed) * (# feeds) * (# subscribers) *
(average feed size)
o Often 99.5%+ more than you needed

• CPU costs increase for subscribers with update frequency

Light pinging at scale
Consider a single-master replication scheme
• After each update, wait for copying to all replicas

Fat pinging
Fat pinging
Compared to light pings
• Latency: 1/3 as much
• Based on reasonable averages
o Bandwidth: ~20x less
o CPU:~20x less
• Never wait for replication delays
Fat pinging
Fat pinging
Fat pinging at scale
What if you had to scale fat pinging?
• Run your own hub

• Compute feed deltas at update time; no need to
regenerate a whole feed (or churn your caches)

• Send out new content at sustained network rate

• Bandwidth is minimum possible per subscriber
o (update size) * (# feeds) * (# subscribers)

• CPU costs is minimum possible per subscriber

Fat pinging at scale
Fat pinging at scale
Fat pinging at scale
Advanced protocol pieces
• Connection reuse from HTTP/1.1
• Pipeline HTTP requests for feed fetching
• Use aggregated content delivery
o Many Atom feeds in a single <feed> XML doc
o Fewer connections
Progress
PubSubHubbub status
• Over 100 Million feeds are PubSubHubbub-enabled
• Companies: Google, FriendFeed (FB), livedoor, Six Apart,
LiveJournal, LazyFeed, Superfeedr, ...
• Google products: FeedBurner, Blogger, Reader shared
items, Google Alerts, ...
• Cool apps: Socnode, Reader2Twitter, chat gateways, ...

• More publishers, subscribers, hubs, apps on the way
• Publisher clients: Perl, PHP, Python, Ruby, Java, Haskell,
C#, MovableType, WordPress, Django, Zend
• Active mailing list with 240+ members

Getting involved
• Review the spec; recommend improvements
o Open process, will be licensed by Open Web Foundation
• Write some sample code for your favorite language or CMS
• Contribute to one of the open source Hub implementations
• Write on your blog about why we need push for the future
o Do it for the children

What Facebook can do right now
• Subscribe to feeds that are PubSubHubbub-enabled
o Put that great UI to work
o Maybe reuse the FriendFeed index pipeline?
o Call Bret and Ben

• Enable PubSubHubbub for activity streams
o Provide Facebook app developers with real-time updates
to users' home streams
o Speeds up surfacing Facebook in other apps
o Detecting new events could trigger the app to take
action in real-time (send an email, classify a photo,
initiate an action in a game, etc)

What Facebook can do next
• Figure out if private feeds will work with this model
o Run your own hub
o Use capability URLs (OAuth token in the query string)

• Give your developers more feeds to consume and syndicate

Rehash
Rehash
• Push for the future! Scale to new use-cases
• Decentralized, open spec: no company owns it
• One API for all stream-based content

Rehash
• Project page: http://pubsubhubbub.googlecode.com
o Full Hub source code with tests
o Example publisher and subscriber apps
o Demo hub at http://pubsubhubbub.appspot.com
?
Hub storage space
• How much storage space does a Hub need?
o Manageable costs
 ~10 million feeds
 ~1 million subscribers
o Assume 1 billion events per day (~11,000/second)
 Thar be dragons!
Hub storage space
FeedEntryRecord
• Key name
o "FeedEntryRecord" + entry_id_hash + parent key
o 400 bytes, could be smaller
• Indexed properties
o Entry ID hash (again-- doh!): 160 bytes
o Entry content hash: 160 bytes
o Update time: 8 bytes
• Unindexed properties
o Entry ID: 2048 bytes maximum, 200 on average

Result
• ~1KB per entry
• 27TB per month at ~11,000 req/sec -- no sweat!
WebFinger
Unified discovery for email addresses
• Transform an email address into XRD
• XRD defines all the services that address has
• Helps provide social networking as a protocol
• E.g., Simple way to discover if an account has a Portable
Contacts interface
WebFinger

Access Control List - ServiceNow Community
No ratings yet
Access Control List - ServiceNow Community
9 pages
Application Layer
100% (1)
Application Layer
127 pages
Unit 5 - IOT
No ratings yet
Unit 5 - IOT
72 pages
Lecture 8 - Internet of Things
No ratings yet
Lecture 8 - Internet of Things
67 pages
Embedded Systems and IOT Unit-4
No ratings yet
Embedded Systems and IOT Unit-4
60 pages
2023 24 Redes I 20 Application Layer en
No ratings yet
2023 24 Redes I 20 Application Layer en
86 pages
High School DXD Volume 05 - Hellcat of The Underworld Training Camp PDF
No ratings yet
High School DXD Volume 05 - Hellcat of The Underworld Training Camp PDF
312 pages
IOT Unit 4
100% (1)
IOT Unit 4
60 pages
Iot Physical Servers Cloud Offereings Iot Case Studies
No ratings yet
Iot Physical Servers Cloud Offereings Iot Case Studies
99 pages
Unit 4 - 2
No ratings yet
Unit 4 - 2
20 pages
IOT Communication Models and Protocols
No ratings yet
IOT Communication Models and Protocols
72 pages
Activity Guide Term 2 - Ndebele - Print
No ratings yet
Activity Guide Term 2 - Ndebele - Print
204 pages
IoTES Unit 5 Ppts
No ratings yet
IoTES Unit 5 Ppts
114 pages
Intro To REST: Joe Gregorio Google
No ratings yet
Intro To REST: Joe Gregorio Google
120 pages
SQL Server DBA Interview Questions and Answers: Answer
No ratings yet
SQL Server DBA Interview Questions and Answers: Answer
13 pages
Session 5 IoT Protocols and Architecture
No ratings yet
Session 5 IoT Protocols and Architecture
71 pages
CN - Module-5 - Application Layer
No ratings yet
CN - Module-5 - Application Layer
80 pages
3.intro Applications
No ratings yet
3.intro Applications
85 pages
ch02 - Application Layer
No ratings yet
ch02 - Application Layer
87 pages
Huang Fuguo Thesis
100% (1)
Huang Fuguo Thesis
66 pages
Chapter 3
100% (1)
Chapter 3
40 pages
4-Introduction - About Things - Sensors and Actuators-19!07!2024
No ratings yet
4-Introduction - About Things - Sensors and Actuators-19!07!2024
44 pages
Lecture 7 - Internet of Things
No ratings yet
Lecture 7 - Internet of Things
62 pages
Microscopical Determination of The Vitrinite Reflectance of Coal
No ratings yet
Microscopical Determination of The Vitrinite Reflectance of Coal
6 pages
Computing Paradigm
No ratings yet
Computing Paradigm
29 pages
ES-IoT Unit 4
No ratings yet
ES-IoT Unit 4
32 pages
Trek Marlin 29er Owners Manual
No ratings yet
Trek Marlin 29er Owners Manual
3 pages
IOTPPT
No ratings yet
IOTPPT
24 pages
Aws Resume Sample
67% (3)
Aws Resume Sample
1 page
ES Unit 4 & 5 Notes
No ratings yet
ES Unit 4 & 5 Notes
46 pages
CN Module 2
No ratings yet
CN Module 2
71 pages
EIOT Unit-4
No ratings yet
EIOT Unit-4
51 pages
AJE American and British English
100% (1)
AJE American and British English
3 pages
Lecture#09-13 Application Layer (Computer Networks Part-2)
No ratings yet
Lecture#09-13 Application Layer (Computer Networks Part-2)
74 pages
Network Security 4th Unit
No ratings yet
Network Security 4th Unit
43 pages
Chapter 2 - Application Layer
No ratings yet
Chapter 2 - Application Layer
52 pages
Unit2 PPT
No ratings yet
Unit2 PPT
48 pages
IoT - Lec05
No ratings yet
IoT - Lec05
17 pages
Ppttest 1 SDF
No ratings yet
Ppttest 1 SDF
49 pages
Environmental Studies: Griha Water Conservation Case Studies Semester - 8 Rizvicollege of Architecture
No ratings yet
Environmental Studies: Griha Water Conservation Case Studies Semester - 8 Rizvicollege of Architecture
3 pages
Chapter1 - Web Technologies
No ratings yet
Chapter1 - Web Technologies
84 pages
Design of Iot Full
No ratings yet
Design of Iot Full
18 pages
Audiveris
100% (1)
Audiveris
3 pages
Module V
No ratings yet
Module V
46 pages
Week4 CCS2201 Introduction To Networks
No ratings yet
Week4 CCS2201 Introduction To Networks
34 pages
Unit 1. The Digital Age
No ratings yet
Unit 1. The Digital Age
63 pages
IoT Module3 Draft
No ratings yet
IoT Module3 Draft
20 pages
Business Plan For TNTS Standard Format
No ratings yet
Business Plan For TNTS Standard Format
21 pages
Bridging MQTT & XMPP Networks
No ratings yet
Bridging MQTT & XMPP Networks
22 pages
Mad 1 Week 1
No ratings yet
Mad 1 Week 1
10 pages
Module 4
No ratings yet
Module 4
24 pages
00 - ApplicationLayer
No ratings yet
00 - ApplicationLayer
81 pages
Computer Networks
No ratings yet
Computer Networks
27 pages
MQTT
No ratings yet
MQTT
25 pages
Test Answer Sheet 3 (En)
No ratings yet
Test Answer Sheet 3 (En)
18 pages
Chapter 7 Notes Final
No ratings yet
Chapter 7 Notes Final
13 pages
IoT - BS - Week 4 - Application Layer Part I
No ratings yet
IoT - BS - Week 4 - Application Layer Part I
23 pages
Art and Design Asthetics
No ratings yet
Art and Design Asthetics
30 pages
Summer Vacation BST Holiday Homework
No ratings yet
Summer Vacation BST Holiday Homework
22 pages
Week 2-Lec2 HTTP
No ratings yet
Week 2-Lec2 HTTP
27 pages
Unit Iv
No ratings yet
Unit Iv
15 pages
Ctual4 Est: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
No ratings yet
Ctual4 Est: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
4 pages
CASA FOIA Department of The Army - West Point Mission Statement - 3.20.24
No ratings yet
CASA FOIA Department of The Army - West Point Mission Statement - 3.20.24
4 pages
Xmpp-Real Time Web
No ratings yet
Xmpp-Real Time Web
32 pages
Student Strategy at Shein The Secrets of Ultra Fast Fashion
No ratings yet
Student Strategy at Shein The Secrets of Ultra Fast Fashion
8 pages
Process and Task Mining
No ratings yet
Process and Task Mining
2 pages
Online Distractions, Website Blockers, and Economic Productivity: A Randomized Field Experiment
No ratings yet
Online Distractions, Website Blockers, and Economic Productivity: A Randomized Field Experiment
27 pages
10 Internet of Things - Logical Design
No ratings yet
10 Internet of Things - Logical Design
9 pages
Sociolinguistics - Politeness Strategy Used by Judges and Contestant in Indonesian Idol
No ratings yet
Sociolinguistics - Politeness Strategy Used by Judges and Contestant in Indonesian Idol
16 pages
DR Jagadish Resume 2022
No ratings yet
DR Jagadish Resume 2022
7 pages
Wedding Dance
100% (4)
Wedding Dance
2 pages
Dependent and Independent Clauses
No ratings yet
Dependent and Independent Clauses
1 page
Gauteng Department of Health Bursary Application Form 2024
No ratings yet
Gauteng Department of Health Bursary Application Form 2024
4 pages
Bhutan Spirit Sanctuary Prices For Walk in Guests Wellness 2024 Final
No ratings yet
Bhutan Spirit Sanctuary Prices For Walk in Guests Wellness 2024 Final
8 pages
Unit 3 - Internet of Things - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Internet of Things - WWW - Rgpvnotes.in
9 pages
HTTP
No ratings yet
HTTP
8 pages
Networks
No ratings yet
Networks
8 pages
Sisteme Distribuite Documentatie Laboratorul 2
No ratings yet
Sisteme Distribuite Documentatie Laboratorul 2
13 pages
2
No ratings yet
2
8 pages
Publish Subscribe Systems
No ratings yet
Publish Subscribe Systems
5 pages
Aditya Praksh Jalan Saraswati Vidya Mandir, Kudlum: Online Class Routine
No ratings yet
Aditya Praksh Jalan Saraswati Vidya Mandir, Kudlum: Online Class Routine
1 page
Ibps Po Prelims - 25 (18-10-2024) - Rank List
No ratings yet
Ibps Po Prelims - 25 (18-10-2024) - Rank List
2 pages
Assessment
No ratings yet
Assessment
3 pages
Study Centre PDF
No ratings yet
Study Centre PDF
3 pages
Introduction To HTTP - Understanding HTTP Basics
No ratings yet
Introduction To HTTP - Understanding HTTP Basics
1 page
M 33
No ratings yet
M 33
1 page
Network Programming in Python : The Basic: A Detailed Guide to Python 3 Network Programming and Management
From Everand
Network Programming in Python : The Basic: A Detailed Guide to Python 3 Network Programming and Management
John Galbraith
No ratings yet
PHP 7 Programming Blueprints
From Everand
PHP 7 Programming Blueprints
Jose Palala
No ratings yet
cPanel User Guide and Tutorial
From Everand
cPanel User Guide and Tutorial
Aric Pedersen
No ratings yet
Implementing DevOps on AWS
From Everand
Implementing DevOps on AWS
Veselin Kantsev
No ratings yet

Pubsubhubbub For Developers: Brett Slatkin Software Engineer Google Inc

Uploaded by

Pubsubhubbub For Developers: Brett Slatkin Software Engineer Google Inc

Uploaded by

PubSubHubbub for

• A simple publish/subscribe protocol

Publisher Hub Subscriber

• Decentralized: No one company in control

3. Send a ping to the Hub with the feed URL

4. 204 response = Success, 4xx = Bad request, 5xx = Try again

3. Hub will send a request to verify the subscription

<?xml version="1.0" encoding="utf-8"?>

See my talk on building a hub using App Engine

• Almost every company already has an internal system

• CPU costs increase for subscribers with update frequency

• CPU costs is minimum possible per subscriber

You might also like