Facebook Distributed System Case Study For Distributed System Inside Facebook Datacenters
Facebook Distributed System Case Study For Distributed System Inside Facebook Datacenters
Facebook Distributed System Case Study For Distributed System Inside Facebook Datacenters
ISSN 2347-4289
152
1. INTRODUCTION
is based on,
Keywords: facebook; distributed system; availabilty; scalability; Hadoop ; social cloud ;Hive ;HDFS ;
friendsificationspagesand updatesnot.
153
3. FACEBOOK
DESIGN
DISTRIBUTED
3.1 Architecture
Facebook, the online social network (OSN) system is relying
on globally distributed datacenters which are highly
dependent on centralized U.S data centers, in which
scalability, availability, openness, reliability and security are
the major System requirements. When founded in 2004 it
was such a dream to be the largest OSN by the year of
2013 putting the system on the surface of risk unless it well
designed and protected against failure and attacks [8]. the
architecture of the system ,the scheme here is 3 tier
architecture or more (4 tier) ,in which the data folw
originated form clients requests that are servedby the
follwing steps :
1- Initially by dedicated webservers,these web serveres
are highly connected in high available scheme to
handle billions of requests and aggregate the logs
coming from different webservers .
2- then they are redirected in uncompressed format to pages or
friends profiles the ScribeHadoop Clusters they are dedicated for
3.2 Distributed systems components:Scalability and reliability are mandatory requirements according
to the globalization of the system, Facebook is global OSN that
serving billions of requests and being responsible for replying
back to their requests in just few seconds, and not being too
late, these requirements need scalability ability in size,
geographically scalability and save the robustness of the system
[9]. Systems design, big data processing and analysis and huge
Storage that are examples of these components that are
Facebook relying on, because of their ability to holding text,
multimedia and many third party applications and advertisement
and put them on the surface to the users [8] . Facebook is
relying on Hadoop platform, which is well suited to deal with
unstructured text,logs,and events steams , and structured data,
as well as when a data discovery process is needed. it is built
SYSTEM
154
2.
Data Node (DN): they are responsible for storing in HDFS , they
are keeping indexes for files stored in , they are interact
between client applications and the NN .providing the clients
with name of NN that are hold the required data . Worker
Nodes: they are the servers who are responsible for processing
tasks; each worker (slave) holds DN and a task tracker. See
figure (4).
155
156
3.3 Communication
3.3.1 Communication in general system
Facebook
Usersupdatescontactbyestablishing
atheTCP
connection oriented (persistent in case of polling updates), and
receives HTML responses post back to them by browsers [9].
Thinking of these traffic generators, and the locations of
Facebook datacenters that are centralized in US California :
Santa Clara ,Palo Alto ,Ashburn ,the bandwidth and latency
measured form outside the U.S users and these distributed
datacenters will be risky dangerous ,and definitely encouraged
the decision taker to think of multiple solutions to maintain the
network reliability and system availability and protect the system
from network bottleneck problems [9] . The solution was to let
Facebook servers Content Delivery Network CDN handling the
objects and well co-located geographically illustrated in figure (8).
CDN are spanning widely, and geographically distributed through
Russia, Egypt, Sweden, and UK ,etc .
157
158
159
5. REFERENCES
[1]. Thusoo, Ashish, et al. "Hive-a petabyte scale data
warehouse using hadoop." Data Engineering (ICDE),
2010 IEEE 26th International Conference on. IEEE,
2010.
[2]. Borthakur, Dhruba, et al. "Apache Hadoop goes
realtime at Facebook." Proceedings of the 2011 ACM
SIGMOD International Conference on Management of
data. ACM, 2011.
[3]. Shvachko, Konstantin, et al. "The hadoop distributed
file system."
Mass Storage Systems
and
Technologies (MSST), 2010 IEEE 26th Symposium
on. IEEE, 2010.
[4]. Lakshman, Avinash, and Prashant Malik. "Cassandra:
a decentralized structured storage system." ACM
SIGOPS Operating Systems Review 44.2 (2010): 3540.
[5]. Chard, Kyle, et al. "Social cloud: Cloud computing in
social networks." Cloud Computing (CLOUD), 2010
IEEE 3rd International Conference on. IEEE, 2010.
[6]. Chieu, Trieu C., et al. "Dynamic scaling of web
applications in a virtualized cloud computing
are
IEEE, 2009.
derived from
users
160