Facebook Thrift

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27
At a glance
Powered by AI
The key takeaways are that Thrift is a framework used by Facebook to generate programs that can communicate across different programming languages efficiently. It allows for services developed in one language to be used from other languages. Some of its main features include common data types, transport interfaces, protocols and versioning. Facebook uses Thrift for services like search and logging.

Some of the main features of Thrift include common data types, transport interfaces, protocols, versioning and processors. It allows services to define methods and data types that can be used across multiple languages.

Facebook uses Thrift for some key services like search and logging.

A Seminar Report on:

Prepared by :

SAVARA GOVIND

Roll. No.

Class

: B.Tech IV (Computer Engineering) 7th Semester

Year

: 2015-16

Guided by :

U12CO086

Mrs. Dipti P. Rana (Asst. Prof)

Department of Computer Engineering


Sardar Vallabhbhai National Institute of Technology,
Surat -395007 (Gujarat), India

Sardar Vallabhbhai National Institute of Technology,


Surat -395007 (Gujarat), India

CERTIFICATE

This is to certify that the seminar report entitled Facebook


Thrift is prepared and presented by Mr. Savara Govind bearing
Roll No. : U12CO086, 4th Year of B. Tech (Computer
Engineering) and his work is satisfactory.

GUIDE

JURY(s)

HOD

Mrs. Dipti P. Rana

COED

Assistant professor

Dr. Dhiren R. Patel

Table of Contents
1

Introduction
1.1 Facebook Thrift
1.2 Organisation of the Report

1
1
1

Thrift Design Features


2.1 Common Data Types
2.2 Transport Interface
2.3 Protocols
2.4 Versioning
2.5 Processors

3
3
3
4
4
4

Thrift Architecture
3.1 Architecture
3.2 Supported protocols, transports and servers
3.3 Advantages of Thrift

5
5
6
7

Facebook Thrift Services


4.1 Search
4.2 Logging

8
8
8

Thrift vs Other Service Technologies


5.1 Size comparison
5.2 Runtime performance

9
10
11

Challenges of FBThrift
6.1 Evolving the architecture
6.2 Re-open-sourcing Thrift as FBThrift

14
14
16

Conclusion

18

References

19

Acknowledgement

20

ii

List of Tables

Table 1

Methods used for size comparison

10

Table 2

Size comparison of different techniques

10

Table 3

Server system specifications

11

Table 4

Client system specifications

11

Table 5

Methods used for runtime comparison

11

Table 6

Runtime comparison of different techniques

13

iii

List of Figures

Fig. 1

Thrift architecture

Fig. 2

Graph showing comparison of different techniques

10

Fig. 3

Runtime comparison of different techniques

12

Fig. 4

Graph showing average wall time for different techniques

12

Fig. 5

Out-of-order chained buffers

15

Fig. 6

Latency improvements with out-of-order responses

16

iv

Nomenclature
HHVM HipHop virtual machine
IDL Interface definition logic file
I/O input/output
JSON JavaScript Object Notation
LAMP Acronym for Linux, Apache, MySql and PHP framework
OOP Object-oriented Programming
REST Representational State Transfer
RMI Remote Method Invocation
RPC Remote Procedure Call
SOAP Simple Object Access Protocol
STL Standard template library
TCP Transmission Control Protocol
XML Extensible Markup Language

ABSTRACT
Facebook, a popular social networking site emphasis on choosing the best tools and
implementations for backend services, irrespective of programming languages. Facebook
Thrift is a framework used to generate a program using single language to communicate easily
and efficiently with many other programming languages including C++, Java, Python, PHP
and many more. The individual programming languages are developed for some particular
property enhancement and they contain some special functions and methods that are efficient.
Thrift uses all these efficient methods and function from different programming languages to
make a strong, reliable technology for the development of software products. Facebook is using
thrift internally to develop many of the features of social networking site including the News
feed which provides updates on users status and its search engine. This report briefly discusses
the architecture, applications, services provided by the thrift and challenge faced by this
technology.
Keywords: Facebook Thrift, Remote Procedure Call, Cross-language Development
Environment, Apache Thrift.

vi

Chapter 1 Introduction

Facebook, a popular social network used throughout the globe. The statistical study
shows there are 968 million active users visit the site on daily basis. It is an important task to
present and retrieve the data efficiently for the individual accounts of the user. Hence for the
backend development they choose the best and efficient tools and implementations from
different programming languages. Various programming languages are used to optimize for the
right combination of performance, ease and speed of development, availability of libraries and
so on.
For the backend development at Facebook, when it started, LAMP framework is used
[1]. LAMP is the acronym for Linux, Apache, MySql and PHP. With the increase in the number
of users, the network traffic grew giving rise to the need for scaling its network structure for
many of its onsite applications like, search, ad selection and delivery, event logging and so on.
Scaling these operations to match the resource demands was not possible within the LAMP
framework. To handle the resource demand problems for many of Facebooks onsite
applications, in 2006 a cross-language framework is developed at Facebook known as
Facebook Thrift. In order to foster great use, Facebook Thrift is open-sourced under Apache
license 2007. This is also known as FB Thrift or Apache Thrift.
1.1 Facebook Thrift
Thrift is a software library and set of code-generation tool for development and
implementation of scalable and efficient backend service. The primary goal of thrift is enable
efficient and reliable communication across programming languages by abstracting the portion
of each language that tend to require the most customization into a common library that is
implemented in each language. This is done by allowing the users to define the data types and
service interfaces in a common Interface Definition Logic file (IDL file), which is a language
neutral file and it generates all the necessary code to build Remote Procedure Calls (RPC) to
clients and servers.
1.2 Organisation of the Report
Chapter 1 discusses briefly about the new cross-language service technology Facebook Thrift.
Chapter 2 discusses the salient feature of Thrift.
1

Chapter 3 discusses briefly about the architecture, supported protocols, transports and servers
and the advantages of FBThrift.
Chapter 4 discusses the two major services provided by FBThrift that are Search and Logging.
Chapter 5 shows the usefulness of thrift over other service technologies like REST, RIM and
Protocol Buffers by the result of comparison drawn by Andrew Prunicki, a Senior Software
Engineer at Object Computing, Inc. (OCI).
Chapter 6 focuses on the challenges or problems faced by FBThrift and the expected feature
developments.
Chapter 7 is the concluding notes on the FBThrift.

Chapter 2 Thrift Design Features

Thrift combines a language neutral software stack implemented across various


programming languages and an associated code generation engine which transforms a simple
interface and data definition language into client and server remote procedure call (RPC)
libraries[2]. The selection of static code generation over a dynamic system allows to create
validated code that can be run without any advanced introspective run-time type checking.
Thrift is very simple to design for the developers who can define all the necessary data
structures and interfaces for a complex service in a single short file called Thrift Interface
definition Logic file or IDL file. The developers identified some key components/features while
evaluating the challenges of cross-language interaction in a networked environment. The
following are salient features of thrift discussed briefly.
2.1 Common Data Types
A common data type system must exist across programming languages without using
any custom thrift datatypes by application developer or writing their own serialization code.
Serialization is the process of transforming an object of one type to another. For example a C++
programmer should be able to transparently exchange a strongly typed STL (Standard Template
Library) map for a Python dictionary. A programmer should not be forced to write any code
below the application layer to achieve this. The Thrift IDL file is logical way for the developers
to annotate their data structures with minimal amount of extra information necessary to tell
code generator how to safely transfer the objects across languages.
2.2 Transport Interface
Each language must have a common interface to bidirectional raw data transport.
Consider a scenario where there are two servers in which, one is deployed in Java and the other
one is deployed in Python. So a typical service written in Java should be able to send the raw
data from that service to a common interface which should be understandable by the other
server which is running on Python and vice-versa. The Transport Layer should be able to
transport the raw data file across the two ends. The specifics about how this transport is
implemented shouldnt matter to the service developer. The same application code should be
3

able to run against TCP Stream Sockets, raw data in memory or files on disk. The transport
interface is designed to support easy extension using common OOP techniques, such as
composition.
2.3 Protocol
Data types must have some way of using the transport layer to encode and decode
themselves. Again the application developer need not to concern about this layer. Whether the
service uses XML or binary protocol is immaterial to the application code. The data should be
able to read or write in consistent and deterministic manner.
2.4 Versioning
For robust services, the involved data types should evolve from the present version.
More precisely, there should be a possible way to add or remove fields in an object or alter the
argument list without any interruption in the service. The system must be able to read old data
from log files, as well as requests from out-of-date clients to new servers and vice versa.
2.5 Processors
Processors are the generated code capable of processing data streams to accomplish
remote procedure calls.

Chapter 3 Thrift Architecture

This chapter discusses briefly about the basic architecture of thrift, enlist the protocols,
transports and servers supported by Thrift and the benefits of Thrift.
3.1 Architecture
Thrift includes a complete stack for creating clients and servers [3]. The top portion of
the stack is generated code from the Thrift definition file. Client and processor code services
are generated from this file. The output of generated code is created data structures (except
built-in types). The protocol and transport are part of Thrift runtime library. Therefore with
Thrift, it is easy to define a service and are free to change the protocol and transport without
re-generating the code.

Fig. 1 Thrift Architecture

Thrift also includes server infrastructure to tie protocols and transport together, like
blocking, non-blocking and multi-threaded servers. The underlying I/O portion of the stack is
differently implemented for different languages. For Java and Python network I/O, the built-in
libraries are leveraged by the Thrift library, while the C++ implementation uses its own custom
implementation.
3.2 Supported protocols, transports and servers
Thrift allows free to choose independently between protocol, transport and server. With
Thrift being originally developed in C++, Thrift has the greatest variation among these in the
C++ implementation.
Thrift supports both binary and text protocols. The binary protocols outperform the text
protocols, but text protocols also useful in some times like in debugging. Some of Thrift
supported protocols are:
1. TBinaryProtocol - A straight forward binary format encoding numeric values to binary,
rather than to text. Simple, but not optimized for space efficiency. Faster to process than
the text protocol but more difficult to debug.
2. TCompactProtocol - More compact binary format and most efficient.
3. TDebugProtocol - A human readable text format and easy to debug.
4. TDenseProtocol - Similar to TCompactProtocol but strips off the meta information
from what is transmitted, and adds it back at the receiver. TDenseProtocol is still
experimental and not yet available in the Java implementation.
5. TJSONProtocol Uses JSON for encoding of data.
6. TSimpleJSONProtocol A write only protocol that cannot be parsed by Thrift because
it drops metadata using JSON. Suitable for parsing by scripting languages.
The above protocols describe what is transmitted, while Thrift transports describe how to
transmit. Some of Thrift supported transports are:
1. TfileTransport This transport writes to a file. This transport is not included with the
Java implementation, but simple to implement.
2. TFramedtransport - Sends data in frames, where each frame is preceded by a length.
This transport is required when using a non-blocking server.

3. TMemoryTransport Uses memory for I/O.


4. TSocket - Uses blocking socket I/O for transport.
5. TZlibTransport - Performs compression using Zlib. Used in conjunction with another
transport. Not available in the Java implementation.
Thrift also supports a number of servers, they are:
1. TNonblockingServer - A multi-threaded server using non-blocking I/O (Java
implementation uses NIO channels). TFramedTransport must be used with this server.
2. TSimpleServer - A single-threaded server using standard blocking I/O. Useful for
testing.
3. TThreadPoolServer - A multi-threaded server using standard blocking I/O.
Thrift allows only one service per server. Although this is certainly a limitation, this can
be accommodated through a work-around. By defining a composite service that extends all of
the other services that a given server should process, a single server thus can accommodate
multiple services. If this work-around is insufficient for the needs, it is always possible to create
multiple servers. For this scenario more resources (like ports, sockets etc.) are required than
the necessary.
3.3 Advantages of Thrift
Thrift provides clean abstractions for data transport, data serialization, and application
level processing. Thrift toolset, a collection of software libraries and code generation tools that
can be used to automatically generate the client and server code for distributed applications.
Thrift is useful for cross-language serialization with lower overhead than alternatives such as
SOAP due to use of binary format. It uses a clean and lean library and there are no XML
configuration files or frameworks to code. FBThrift, adds a number of new features aimed at
handling larger, more complex collections of services, a new C++ code generator, and
components aimed at creating services that are less memory-intensive and demand less of
hardware when under heavy load. It is useful for Remote Database handling and Remote
computer access. Thrift is useful in network communication for Facebooks platform and other
major Web applications, as well as the backend for many mobile applications.

Chapter 4 Facebook Thrift Services

Thrift has been employed in a large number of services at Facebook including search,
logging, mobile, ads and the developer platform [2]. This chapter discusses the major two
services of Facebook Thrift.
4.1. Search
Thrift is used as the underlying protocol and transport layer for the Facebook search
service. The multi-language code generation is easily suitable for search because it allows
application development in an efficient server side language (C++) and allows the Facebook
PHP based web application to make calls to the search service using Thrift PHP libraries. There
are large variety of search stats, deployment and testing functionalities also built on top of
generated PHP code. Additionally, the Thrift log le format is used as a redo log for providing
real-time search index updates. Thrift has allowed the search team to leverage each language
for its strengths and to develop code at a rapid pace.
4.2. Logging
The Thrift TFileTransport functionality is used for structured logging. Each service
function denition along with its parameters can be considered to be a structured log entry
identied by the function name. This log can be used for a variety of purposes, including inline
and ofine processing, stats aggregation and as a redo log.

Chapter 5 Thrift vs Other Service Technologies

This chapter discusses other service technologies in brief and shows their comparison
with Thrift.
REST (Representational State Transfer) is an architecture style for designing networked
applications. It depends on a stateless, client-server and cacheable communication protocolusually HTTP protocol is used. REST is a lightweight alternative to mechanisms like RPC
(Remote Procedure Calls) and Web Services. RESTful applications use HTTP requests to post,
read and delete data. Thus REST uses HTTP for all four CRUD (Create/Read/Update/Delete)
operations.
RMI (Remote Method Invocation) is a Java Application Programming Interface (API)
that performs the object oriented equivalent of remote procedure calls. Protocol buffers are
Google's language-neutral, platform-neutral, extensible mechanism for serializing structured
data like XML, but smaller, faster, and simpler.
The author Andrew Prunicki, compared Thrift with other technologies and noted down
the results to show the value proposition of Thrift over other Service technologies which are
also fairly easy to use in practice [3]. Of late RESTful web services seems to be very popular,
thus this chapter compares it with Thrift. Protocol buffers by Google does not include service
infrastructure, but it transports objects in a similar fashion to Thrifts TCompactProtocol, thus
making it a useful comparison. Lastly, RMI also includes, as it uses a binary transport and serve
as a reference implementation of sorts for Java binary object transport. In this chapter the file
sizes and run time performance of each service technology are compared. For REST, both
JSON-based and XML-based and for Thrift, the most efficient transport available for Java,
TCompactProtocol are considered.

5.1 Size comparison


Table 1 Methods used for size comparison

Method
Thrift
Protocol Buffers
RMI
REST

Capture Technique
Custom client that forked the returning input stream to a file.
Stream to a file. Excludes messaging overhead.
Object serialization of the response. Excludes messaging overhead.
Use wget from the command line redirecting the response to a file.

The chart and table below show the results. None of the sizes include TCP/IP overhead.
Sizes are in bytes. Smaller the size is better.
Table 2 Size comparison of different techniques

Method
Thrift TCompactProtocol
Thrift TBinaryProtocol
Protocol Buffers
RMI (using Object Serialization for
estimate)
REST JSON
REST XML

Size
278
460
250

% larger than TCompactProtocol


65.47
-10.07

905

225.54

559
836

101.08
200.72

Fig. 2 Graph showing size comparison of different techniques

The comparison clearly shows that the Thrift has clear advantage in the size of its
payload particularly compared to RMI and REST. Protocol buffers from Google is a little better
than Thrift but it is not an open source.

10

5.2 Runtime performance


Test Scenario:

Query the list of Course numbers.


Fetch the course for each course number.

This test scenario is executed 10,000 time. The tests were run on the following systems:
Table 3 Server System Specifications

Operating System

Ubuntu Linux 8.04 (hardy)

CPU

Intel Core 2 T5500 @ 1.66 GHz

Memory

2GiB

Cores

2
Shutdown - To avoid any unnecessary spikes from other processes
during execution.
Sun Java SE Runtime Environment (build 1.6.0_14-b08)

Window System
Java Version

Table 4 Client System Specifications

Operating System
CPU
Memory
Cores
Window System
Java Version

Ubuntu Linux 8.04 (hardy)


Intel Pentium 4 @ 2.40 GHz
1GiB
1
Shutdown - To avoid any unnecessary spikes from other processes
during execution.
Sun Java SE Runtime Environment (build 1.6.0_14-b08)

Table 5 Methods used for runtime comparison

Method

Description

Thrift

Complete Thrift stack

Protocol Buffers
RMI

Custom server using normal, blocking socket I/O


Standard RMI

REST XML & JSON

Jersey running inside a Jetty server

11

The chart and table below summarize the results. All times are in seconds.

Fig. 3 Runtime comparison of different techniques

Fig. 4 Graph showing average wall time for different techniques

CPU time is the total amount of time the CPU spent running the code or anything
requested by the code. This includes kernel time also. Whereas the wall time is the amount of
time required to complete the given task as counted by the system clock or stop watch.

12

Table 6 Runtime comparison of different techniques

Method

Server CPU %

Avg. Client CPU %

Avg. Wall Time

REST XML

12.00

80.75

05:27.45

REST JSON

20.00

75.00

04:44.83

RMI

16.00

46.50

02:14.54

Protocol Buffers
Thrift
TBinaryProtocol
Thrift
TCompactProtocol

30.00

37.75

01:19.48

33.00

21.00

01:13.65

30.00

22.50

01:05.12

Some interesting observations can be derived from the comparisons. In terms of wall
time Thrift clearly out performed REST and RMI. In fact, TCompactProtocol took less than
20% of the time it took REST-XML to transmit the same data.
Overall the Thrift is a very good for service technology and it is an open source.
Protocol buffers is also very useful but it is under the copy right of Google. Thrift is a powerful
library for creating high-performance services that can be called from multiple languages.

13

Chapter 6 Challenges of FBThrift

Thrift is developed in 2006 at Facebook and then released as open source under Apache license
in 2007 [4]. Since then Thrift is used in Facebook and continually undergone so many
developments to increase efficiency, optimization, speed, memory utilization and reliability.
Today it powers more than 100 services used in production, which are written in C++, Java,
Python and PHP. The developers found two challenges after running Thrift for 8 years. They
are:
1. Thrift is missing core set of features
2. Performance
For example, one issue found was the internal service owners were constantly reinventing the same feature again and again, such as transport compression, authentication and
counters to track the health of the servers. To make asynchronous request handling work better,
Facebook engineers had to improve the memory handling capabilities of the generated C++
code. Engineers were spending large amount of time to improve the performance of their
services. Outside of Facebook, Thrift gained wide use as a serialization and RPC framework,
but ran in to similar performance concerns and issues separating the serialization and transport
logic.
Over time, developers found that parallel processing of requests from the same client
and out-of-order responses solved many of the performance issues. The benefits of the former
are obvious, the latter helps avoid application-level, head-of-line blocking. But still there is
need for more features.
6.1 Evolving the architecture
When Thrift was originally conceived, most services were relatively straightforward in
design. A web server would make a Thrift request to some backend service, and the service
would respond. But with the increase in the users of Facebook, the complexity of the service
also increased. Making a Thrift request was no longer so simple. The requirement of tiers of
services (services calling other services) and the need of unique feature demands for each
14

service, such as the various compression or trace/debug needs. Hence there is a need to upgrade
of specific use cases of Thrift.
To make asynchronous request handling work better, Facebook engineers had to
improve the memory handling capabilities of the generated C++ code. Thrifts original C++
generated code reuses the same memory space over and over for each request, which made it
impossible to process requests in out of order. So the developers rolled in a library from the
open source folly library called IOBuf that requests new buffers for each request, with some
optimization to reduce the performance hit that it creates.
In earlier versions of Thrift, the same memory buffer was reused for all requests, but
memory management quickly became tricky to use when it tried to update the buffer to send
responses out of order. Instead, now it requests new buffers from the memory allocator on every
request. To reduce the performance impact of allocating new buffers, it allocate constant-sized
buffers from JEMalloc to hit the thread-local buffer cache as often as possible. Hitting the
thread-local cache was an impressive performance improvement, for the average Thrift server,
it's just as fast as reusing or pooling buffers, without any of the complicated code. These buffers
are then chained together to become as large as needed, and freed when not needed, preventing
some memory issues seen in previous Thrift servers where memory was pooled indefinitely. In
order to support these chained buffers, all of the existing Thrift protocols had to be rewritten.

Fig. 5 Out- of-order Chained buffers

To allow for per-request attributes and features, a new THeader protocol and transport
were introduced. Thrift was previously limited in the fields that could be used to add perrequest information, and they were hard to access. As Thrift evolved, there is a need of a new
way to allow service owners to add new features without making changes to the core Thrift
libraries or breaking backward compatibility. For example, if a service wanted to start
15

compressing some responses or change timeouts, this should be easy to do without having to
completely change the transport used. The THeader format is very similar to HTTP headers,
each request passes along headers that the server can interpret. With some clever programming,
it was possible to make the THeader format backward compatible with all the previous Thrift
transports and protocols.

Fig.6 Latency improvements with out-of-order responses [4]

6.2 Re-open-sourcing Thrift as FBThrift


The evolved internal branch of Thrift (FBThrift) is released on Github. The changes
made in that version is addition of a new C++ code generator, available as the new target
language, cpp2. It also includes all the header transport and protocol changes for several
languages, including C++, Python, and Java. Both the new Thrift C++ generated code and
16

THeader format are being used by a number of Thrift services at Facebook. Service requests
that go between data centers are dynamically compressed based on the size of the message,
while in-rack requests skip compression (and thus avoid the CPU hit). A number of services
that have moved to the new cpp2 generated code have seen up to a 50% decrease in latency,
and/or large decreases in memory footprint. Additionally, the new C++ async code is a
dependency for newer HHVM releases.
With the use of cpp2 and Theader there is lot of improvement in the performance and
memory footprint. There is still need of performance and memory improvements to achieve the
goals. The new version, FBThrift, adds a number of new features aimed at handling larger,
more complex collections of services, a new C++ code generator, and components aimed at
creating services that are less memory-intensive and demand less of hardware when under
heavy load. Hence there is a great need to develop the technology to the greatest extent.

17

Chapter 7 Conclusion
Thrift is a powerful library for creating high-performance services that can be called
from multiple languages. Thrift will be a good choice for an application where there is need for
multiple languages to communicate where speed is a concern and the clients and servers are
co-located. Thrift might also make a good choice for IPC on a single machine where speed
and/or interoperability are a concern.
Thrift is already used in wide variety of applications at Facebook. So many developers
are also contributing at Apache to make the Thrift a scalable, efficient and system reliable
technology. Thrift is a forthcoming technology for software paradigm.

18

References
1. Thrift White Paper, http://thrift.apache.org/static/thrift- 20070401.pdf.
2. Mark Slee, Aditya Agarwal and Marc Kwiatkowski, Thrift: Scalable Cross-Language
Services Implementation.
3. Andrew Prunicki, Senior Software Engineer, Apache Thrift, Object Computing, Inc.
(OCI).
4. Dave Watson, Under the Hood: Building and open-sourcing FBThrift.
5. Thor Olavsrud, Facebook Open Sources Thrift Protocol.
6. Sean Gallagher, Facebook open-sources Thrift, again, with FBThrift overhaul.
7. Shane Schick, Facebook shows off Thrift development environment.
8. Michael Cvet, Facebook Thrift Tutorial.

19

Acknowledgement
I take this opportunity to appreciate the Facebook Thrift developers at Facebook and Apache
for their restless work to develop such a versatile technology for cross-language services. I also
convey my sincere gratitude to all those intellectuals concerned for their magnanimous vision
by virtue of which I have been guided through to accomplish my mission. I also express my
heartiest gratitude to Mrs. Dipti P. Rana (Asst. Prof.) and other faculties of Computer
Engineering Department, SVNIT for their valuable guidance, moral support and believing in
me.

20

You might also like