75. MySQLとHBaseの比較
MySQL HBase
Parallel Manual sharding Automatic
load balancing
Fail-over Manual master/ Automatic
slave switch
Read High Low
efficiency
Write Medium High
efficiency
Columnar No Yes
support
103. PQL – Puma Query Language
o CREATE INPUT TABLE t o CREATE AGGREGATION
(‘time', ‘adid’, ‘userid’); ‘abc’
o CREATE VIEW v AS INSERT INTO l (a, b, c)
SELECT *, udf.age(userid) SELECT
FROM t udf.hour(time),
WHERE udf.age(userid) > adid,
21 age,
count(1),
o CREATE HBASE TABLE h … udf.count_distinc(userid)
o CREATE LOGICAL TABLE l … FROM v
GROUP BY
udf.hour(time),
adid,
age;
113. Facebook recently deployed Facebook
Messages, its first ever user-facing application
built on the Apache Hadoop platform. Apache
HBase is a database-like layer built on Hadoop
designed to support billions of messages per
day. This paper describes the reasons why
Facebook chose Hadoop and HBase over other
systems such as Apache Cassandra and
Voldemort and discusses the application’s
requirements for consistency, availability,
partition tolerance, data model and scalability.
We explore the enhancements made to
Hadoop to make it a more effective realtime
system, the tradeoffs we made while
configuring the system,
114. and how this solution has significant
advantages over the sharded MySQL
database scheme used in other
applications at Facebook and many other
web-scale companies.
We discuss the motivations behind our
design choices, the challenges that we face
in day-to-day operations, and future
capabilities and improvements still under
development. We offer these observations
on the deployment as a model for other
companies who are contemplating a
Hadoop-based solution over traditional
sharded RDBMS deployments
184. Scribe
https://github.com/facebook/scribe/wiki
o Scribe is a server for aggregating log data
that‘s streamed in real time from clients. It is
designed to be scalable and reliable.
o There is a scribe server running on every node
in the system, configured to aggregate
messages and send them to a central scribe
server (or servers) in larger groups.
o If the central scribe server isn’t available the
local scribe server writes the messages to a file
on local disk and sends them when the central
server recovers.
185. Scribe
https://github.com/facebook/scribe/wiki
o The central scribe server(s) can write the
messages to the files that are their final
destination, typically on an nfs filer or a
distributed filesystem, or send them to another
layer of scribe servers.
o Scribe is unique in that clients log entries
consisting of two strings, a category and a
message. The category is a high level
description of the intended destination of the
message and can have a specific configuration
in the scribe server, which allows data stores to
be moved by changing the scribe configuration
instead of client code.
186. Scribe
https://github.com/facebook/scribe/wiki
o The server also allows for configurations based
on category prefix, and a default configuration
that can insert the category name in the file
path.
o Flexibility and extensibility is provided through
the “store” abstraction.
o Stores are loaded dynamically based on a
configuration file, and can be changed at
runtime without stopping the server.
187. Scribe
https://github.com/facebook/scribe/wiki
o Stores are implemented as a class hierarchy,
and stores can contain other stores. This allows
a user to chain features together in different
orders and combinations by changing only the
configuration.
o Scribe is implemented as a thrift service using
the non-blocking C++ server. The installation
at facebook runs on thousands of machines
and reliably delivers tens of billions of
messages a day.
188. Scribe Overview / Reliability
https://github.com/facebook/scribe/wiki/Scribe-Overview
o The scribe system is designed to be robust to
failure of the network or any specific machine,
but does not provide transactional guarantees. If
a scribe instance on a client machine (we’ll call it
a resender for the moment) is unable to send
messages to the central scribe server it saves
them on local disk, then sends them when the
central server or network recovers. To avoid
overloading the central server upon a restart, the
resender waits a random time between
reconnect attempts, and if the central server is
near capacity it will return TRY_LATER, which
tells the resender to not attempt another send
for several minutes.
189. Scribe Overview / Reliability
https://github.com/facebook/scribe/wiki/Scribe-Overview
o The central server has similar behavior (the
same code in fact) for handling failure of the
nfs filer or distributed filesystem it’s writing to.
If the filesystem goes down the scribe server
writes to local disk until it recovers, then sends
the data from local disk to the remote
filesystem. The order of the messages is
preserved in both this and the resender case.
190. Scribe Overview / Reliability
https://github.com/facebook/scribe/wiki/Scribe-Overview
o These error cases will lead to loss of data:
o If a client can’t connect to either the local or
central scribe server the message will be lost
o If a scribe server crashes it could lose a small
amount of data that’s in memory but not on
disk
o Some multiple component failure cases, such
as a resender can’t connect to any central
server and its local disk fills up
o Some rare timeout conditions can lead to
duplicate messages
191. Scribe Overview / Configuration
https://github.com/facebook/scribe/wiki/Scribe-Overview
o The scribe server is configured by the file
specified in the -c command line option, or the
file /usr/local/scribe/scribe.conf if none is
specified on the command line.
o The basic idea of the configuration is that a
particular category if messages is sent to one
or more “stores” of various types. Some types
of stores can contain other stores, for example
a bucket store contains many file stores and
distributes messages to them based on a hash.
192. Scribe Overview / Configuration
https://github.com/facebook/scribe/wiki/Scribe-Overview
o The configuration file consists of a global
section and a section for each store. The global
section includes the listening port number and
the maximum number of messages that the
server can handle in a second.
o Each store section must include a category and
a type. There is no restriction on the number
categories or the number of stores per
category.
193. Scribe Overview / Configuration
https://github.com/facebook/scribe/wiki/Scribe-Overview
o The remaining items in the store configuration
depend on the store type, and include such
things as file location, maximum file size, how
often to rotate files, and where a resender
should send its data.
o A store can also contain another store
configuration, the name of which is specific to
the type of store. For example a store of type
buffer contains and stores and a store of type
bucket contains a store called .
194. Scribe Overview / Configuration
https://github.com/facebook/scribe/wiki/Scribe-Overview
o The types of stores currently available are:
o file – writes to a file, either local or nfs.
o network – sends messages to another scribe
server.
o buffer – contains a primary and a secondary
store. Messages are sent to the primary store if
possible, and otherwise the secondary. When the
primary store becomes available the messages
are read from the secondary store and sent to
the primary. Ordering of the messages is
preserved. The secondary store has the
restriction that it must be readable, which at the
moment means it has to be a file store.
195. Scribe Overview / Configuration
https://github.com/facebook/scribe/wiki/Scribe-Overview
o bucket – contains a large number of other
stores, and decides which messages to send to
which stores based on a hash.
o null – discards all messages.
o thriftfile – similar to a file store but writes
messages into a Thrift TFileTransport file.
o multi – a store that forwards messages to
multiple stores.
196. The Underlying Technology
of Messages
http://www.facebook.com/note.php?
note_id=454991608919#
2010年11月16日