Facebook Distributed System Case Study For Distributed System Inside Facebook Datacenters PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9
At a glance
Powered by AI
Facebook uses a distributed system across multiple data centers to handle large amounts of user data and ensure high availability. Key technologies include Hadoop, Hive and cloud computing.

The main components include Hadoop, HDFS, data centers, load balancers, web and application servers, TCP proxies, and Memcached servers.

Facebook uses centralized data centers located in the US, distributed CDN for replication, load balancing, compression between servers, and isolation of user requests.

INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 152

ISSN 2347-4289

Facebook Distributed System Case Study For


Distributed System Inside Facebook Datacenters
Asma Mohammad Salem

Department of Computer and Networking engineering, JU Amman, Jordan


[email protected]

Abstract: Facebook is recognized as the largest online social network system in the last few years, which is come up with billions numbers of users in
the last 2013. The system is recognized as distributed system in its design, infrastructure and architecture .The datacenters behind this network system
are huge, robust, keeping the system scalable, reliable, secure, and let the Facebook accessible from anywhere with highly availability

Keywords: facebook; distributed system; availabilty; scalability; Hadoop ; social cloud ;Hive ;HDFS ;

1. INTRODUCTION generated content ,such as text ,multimedia from audio


Facebook system was founded in 2004, with a mission to ,video and such third party OSN application updates over a
give people the power to share and make the world more social graph ,these services attract users and will be the
open and still connecting them with friendship relationship. main reason for huge traffic that flow through the system
People from anywhere can use Facebook to stay parts Facebook system is an open website that is published
connected with friends and family, they can share such on the internet as a social network system, in which a user
contents of data and multimedia such as audios/videos and can easily connect to by accessing its home page and
express what matters to them by comments and likes continue registration by few screens that navigate him or
[9][10]. Facebook systems at the first are responsible for her within few steps to complete the registration.
processing large quantities of data, named as Big Data, Accessibility to this system is provided by any device that
which is ranging from simple reporting and business has accessibility to the internet, these machines such as
intelligence to the huge measurements and reports desktops, mobiles etc. some of them listed in figure (1).
executed from different perspectives [8], this numerous Facebook features are being provided by starting with a
large of data located on different geographically distributed registration phase, requiring a user name and password,
datacenters and being processed under highly equipments this registration in done once and a login should be done
servers, which they architected in high technologies to after registration to start using the website, starting with
improve the whole performance of the system ,Facebook inviting your friends from your email account, this website is
inspired by Hadoop and Hive systems [1][2] supported by based on 2 building a friendship to start sharing your status,
its integrated components which Facebook was built on top media and news with them, most of website are:-
of this technologies [3] [4] . We will go through the system
details, starting with the system feature in section (2), Wall: it is the original profile space for a user where
exploring the system design and discussing all system contents posted there, including photos and videos, and
components in details in section (3), as the system is being files, user can attach any content on his or her wall and
under steps of enhancements we will explore some of these being visible to anyone, by choosing the space of visibility
enhancements in this section. Nowadays Cloud computing on the wall user can limit visibility to the wall contents,
is the main topic for supporting systems and realizing which were in early versions of Facebook as text only [9].
applications. Facebook system is as a geographically
distributed system is recently being integrated with its News Feed: it is a home page in which users can see a
feature and services by cloud computing solutions. Ending continue updates list of their friends activities. They can
the system design with cloud technology solutions, this explore information that includes profile changes, updates
paradigm shift in technologies would server an alternative and coming events, users can explore the conversations
solution that could keep system in Facebook dynamically that taking place between the walls of a user's friends.
scale in the future, and maintain the rapid growth while
keeping performance metrics in bounds and saving the Timeline: a space in which all photos, videos, posts, and
system stability and functionality [5][6][7]. Ending up with contents are categorized according to instant of time in
our conclusions in section (4), we will investigate the whole which they were uploaded or created.
system design and implying our critical evaluation for this
distributed system features and ,design. Friendship: this feature is what Facebook is based on,
Friending" someone is the act of sending another user a
2. DISTRIBUTED SYSTEM FEATURES friend request on Facebook or accepting friendship request.
Starting at early system features and services, Facebook as A user has full control to manage his or her friend list.
an example of commercial Online Social Network (OSN),
and a hosted application that attract users with a set of Likes and tags: it is positive feedback; users can apply likes
features and attracts advertisers, who pay for the privilege on updates, comments, photos, status and links posted by
of displaying ads targeted to these users. OSN their friends, these likes make the content appear in their
interconnecting users though friendship relations, and allow friends pages notifications and updates.
for synchronous and asynchronous communications of user

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 153
ISSN 2347-4289

Notifications: keeping track of all the most recent actions Messages service which give a user an account
or updates. It is an indicator to inform the user that an under the Facebook.com ,This system is available to
action has been added to profile page, his or her wall or all of the users, providing text messages, instant
time line, any Comment or like, shared media that being messaging, emails and regular messages, every user
tagged in [9]. has a strong controls over his mail box ,it was the
foundation of a Social Inbox [2] .
Networks, groups, and pages: Facebook allow users to 4- Facebook Applications: the biggest motivation and
build their networks ,groups and creating pages which convenient integration between many application and
combine them around an idea or specific community .they the web site interface of the Facebook pages, which
can used for posting items or issuing messages for a group leads many such users to still connected with
of users who join these communities. Facebook home pages and being in touch with many
advertisers ,besides the applications are games , this
Messaging and outbox: a service allows users to send commercial OSN attracts users and advertiser to be
messages to each other. Users can send a message to any there ,the integration between the games
number of friends at a time. Managing messages also components and users, their profiles ,images ,lists of
provided .By the year of 2010, Facebook announced a new their friends and already joined groups ,increase the
Facebook Messages service which give a user an account functionality and integration levels with different
under the Facebook.com ,This system is available to all of components [10].
the users, providing text messages, instant messaging,
emails and regular messages [2][10] . Most Facebook Applications are more simplified than most
casual modern games, requiring an average of one or two-
All these features and more are being served on Facebook click actions and supplying a random outcome mostly
, adding the different applications that are : events ,market independent from skills, usually in a very short span of time
places ,notes ,places, questions ,photos ,videos ,and (seconds). Frequently, the actual gameplay is substituted
Facebook pages ,we are interested on the system features by a text offering a narration of the events and their
that will produce the traffic basically ,we will categorized outcome, as some sort of prize in exchange of the minimal
them in few later lines in major categories in which they will (one click) engagement required. Facebook Applications
help us in system investigations .this categories are based feature several elements of social play, Making the
on the type of data and the communication mechanisms [9] participation of the users Friends is a must, in order to
access the Application, or by proposing primers for
Confrontation with others [2][10] .

3. FACEBOOK - DISTRIBUTED SYSTEM


DESIGN
3.1 Architecture
Facebook, the online social network (OSN) system is
relying on globally distributed datacenters which are highly
dependent on centralized U.S data centers, in which
scalability, availability, openness, reliability and security are
the major System requirements. When founded in 2004 it
was such a dream to be the largest OSN by the year of
2013 putting the system on the surface of risk unless it well
designed and protected against failure and attacks [8]. the
architecture of the system ,the scheme here is 3 tier
Figure (1): Facebook accessibility feature architecture or more (4 tier) ,in which the data folw
originated form clients requests that are servedby the
OSN traffic patterns consist of Facebook built in follwing steps :
interactions: 1- Initially by dedicated webservers,these web serveres
1- the wall post which is known by status updates: are highly connected in high available scheme to
allowing users to share text and multimedia handle billions of requests and aggregate the logs
consisting of audio /video which is will published on coming from different webservers .
the main page of their own pages or friends profiles 2- then they are redirected in uncompressed format to
interface, and could be easily seen by a user friends the ScribeHadoop Clusters they are dedicated for
and to be under their comments and likes of course, logs aggregations , the later is then communicate the
these updates are could easily come to the surface Hive Hadoop servers cluster ,these servers are
by pushing or polling the Facebook updates [9]. divided in two categories ,the Production and the
2- The comments and the like tags: Comments and Adhoc ,they are clusters of servers that are balancing
likes are the second mode for interaction and being acoording to the priority of jobs, for example the
used on existing post and updates [9]. Production servers are dedicated to the jobs that
3- Facebook Messages and chat: a service allows users being strict in delivery deadlines time constrains ,
to send messages to each other. By the year of while the adhoc cluster is serving the low priority
2010, Facebook announced a new Facebook

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 154
ISSN 2347-4289

batch jobs as well as any ad hoc analysis that the A typical Hadoop environment consists of a master node,
user want to do on historical data sets . and worker nodes with specialized software components.
3- federated Mysql is the data base engine which hold Hadoop consists of multiple master nodes to avoid single
the data bases holding up the whole system [8] point of failure in any environment. The elements of master
.these tier parts are described in the figure (2) . node are:-
Job Tracker: Job tracker interacts with client applications.
3.2 Distributed systems components:- It is distributing Map and reducing tasks to particular nodes
Scalability and reliability are mandatory requirements within a cluster.
according to the globalization of the system, Facebook is Task tracker: it is process receives the tasks from a job
global OSN that serving billions of requests and being tracker in in the master node like Map, Reduce it to specific
responsible for replying back to their requests in just few cluster node and shuffle.
seconds, and not being too late, these requirements need Name node (NN): they are responsible for keeping track for
scalability ability in size, geographically scalability and save each file in Hadoop Distribute File System HDFS ,a client
the robustness of the system [9]. Systems design, big data application contact NN to locate file ,delete ,copy ,or add.
processing and analysis and huge Storage that are Data Node (DN): they are responsible for storing in HDFS ,
examples of these components that are Facebook relying they are keeping indexes for files stored in , they are
on, because of their ability to holding text, multimedia and interact between client applications and the NN .providing
many third party applications and advertisement and put the clients with name of NN that are hold the required data .
them on the surface to the users [8] . Facebook is relying Worker Nodes: they are the servers who are responsible
on Hadoop platform, which is well suited to deal with for processing tasks; each worker (slave) holds DN and a
unstructured text,logs,and events steams , and structured task tracker. See figure (4).
data, as well as when a data discovery process is needed.
it is built for the purpose of handling larger volumes of data,
so preparing data and processing it should be cost
prohibitive [2][3] .

Figure (4) :Hadoop master /slave architecture

3.2.1 Map Reduce (M-R)


Tera Bytes and Peta Bytes of data to get processed and
analyzed daily by Facebook data centers. So to handle
them we use Map Reducer which basically has two major
phases map & reduce they are divided in the following
Figure (2): Facebook system architecture steps:-
1- The Mapper - Reducer uses key/value pairs to
index any data comes from HDFS and being
divided into 1- The Mapper - Reducer uses
key/value pairs to index any data comes from
HDFS and being divided into blocks, replicating
these values to protect system in case of failure.
2- Submit the M-R Job and its details to the Job
tracker that contact the task tracker on each DN
that schedule Map Reduce tasks.
3- When Mapper process data blocks and generates
a list of key value pairs. Sorting the list of key value
pair and transfers mapped results to the reducers
in sorted format .
4- M-R merge list of key value pairs to generate final
Figure (3): Hadoop system on Facebook results. Storing in HDFS and replicated ,clients
now will be able to read from HDFS easily [3]. The
Hadoop has Two main components: - steps are summed up in figure (5).
1. MapReduce, which dedicated for Computation.
(M-R)
2. Hadoop Distributed File System (HDFS), deals with
Storage. See figure (3).

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 155
ISSN 2347-4289

Figure (5): Map /Reduce whole steps Figure (6): HDFS architecture DN and NN

3.2.2 Hadoop distributed file system (HDFS) 3.2.3 Hadoop and Hive
Distributed file system that serve the Facebook is mainly In Facebook Hive is a data warehouse infrastructure built
Hadoop distributed file system (HDFS) ,which is designed on top of Hadoop technology, that provides tools to enable
to run on low-cost hardware ,and being highly fault- easy data summarization, heavily reporting ,adhoc querying
tolerance (as it supports block replication) . HDFS is and analysis of large datasets data stored in Hadoop files
designed to store very large data sets reliably; it is able to HDFS . Providing a mechanism to put structure on this data
stream those data sets at high bandwidth to user and it also provides a simple query language called HiveQL
applications. It used In a large cluster, thousands of servers which is based on SQL and which enables users familiar
are directly attached storage and execute user application with SQL to query this data [1]. In System design of
tasks. By distributing storage and computation across many Facebook without Hive, the same job would take hours if
servers, which give the system ability to dynamically scale not days ,in order to move to the second phase and author
,the resource can grow on demand while remaining in map-reduce process . While Using Hive the task could be
economical at every size and retaining the system available expressed very easily in a matter of minutes. It has been
and reliable l. An HDFS instance may consist of hundreds possible with Hive to bring the immense scalability of map-
or thousands of server machines, each storing part of the reduce to the non-engineering users as well business
file system's data; HDFS is designed more for batch analysts, product managers and the like who, though
processing rather than 5 interactive use by users. The familiar with SQL would be in a very strange environment if
emphasis is on high throughput of data access rather than they were to write map-reduce programs for querying and
low latency of data access, a typical file in HDFS is analyzing data by themselves and without Hive-QL syntax
gigabytes to terabytes in size. HDFS applications need a [1]. Figure (7) show Hive system architecture.
write-once-read-many access model for files. This
assumption simplifies data consistency issues and enables 3.2.4 Apache HBase
high throughput data access [2]. HDFS exposes a file Facebook messaging system has recently added to the
system namespace and allows user data to be stored in application, by the support of Apache HBase which is a
files. Internally, a file is split into one or more blocks and database-like layer built on Hadoop designed to support
these blocks are stored in a set of Data Nodes The billions of messages per day. The applications
existence of a single Name Node in a cluster greatly requirements for consistency, availability, partition
simplifies the architecture of the system. The Name Node is tolerance, data model and scalability. Enhancements made
the arbitrator and repository for all HDFS metadata. The to Hadoop to make it a more effective real time system,
system is designed in such a way that user data never Facebook made many tradeoffs while configuring the
flows through the Name Node [3][4] ,see figure (6). system, to add significant advantages over the shared
MySQL database scheme used in applications at Facebook
[2]. HBase will add the following to Facebook as it moves to
real time rather than being offline ,this emerging
movements are support Facebook billion messages
capacity which will be increased with minimal overhead and
no down time , with Highly write throughput ,efficient and
low-latency that support the strong consistency semantics
within a data center, the efficient random reads from disks ,

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 156
ISSN 2347-4289

and being highly available specially in disaster recovery , Although the CDN regional servers posed an attractive
and fault isolation ,and retaining the atomic read modify solutions for infrastructure expansion another solutions
write primitives .It added a zero downtime in case of mentioned here will serve a good support for the huge
individual data center failure, running on Active-Active growth and datacenters extensions ; TCP proxies and
serving capabilities across different data centers [2]. regional OSN caching servers would be attractive solutions
to enhance the network performance and reduce latency;
unfortunately these solutions are under tacking and are not
being applied yet, which cause slow performance and long
latency measurements in Facebook overall statistics [9]. In
figure(9) : we can see that a user will contact webservers in
U.S ,CDN should maintain connected in more than 4 steps
then CDN complete serving the user requests ,while figure
(10) which use TCP proxy or figure (11) that illustrated the
OSN cache solutions .

Figure (9): current state for Facebook communication

In TCP proxies figure (10) ,user can be served totally by


contacting his regional server ,sometimes there is a need to
establish the connection form the original servers and being
Figure (7): Hive System Architecture completed by their CDN , while in OSN cache regional
servers in figure (11); the requests are being served totally
3.3 Communication by them ,sometimes there is a little bit need to be asking
the original servers ,these solutions will help Facebook to
3.3.1 Communication in general system be away from bad performance ,and increase the capability
Facebook Users contact the updates by establishing a TCP for the system to scale well in the future [9].
connection oriented (persistent in case of polling updates),
and receives HTML responses post back to them by
browsers [9]. Thinking of these traffic generators, and the
locations of Facebook datacenters that are centralized in
US California : Santa Clara ,Palo Alto ,Ashburn ,the
bandwidth and latency measured form outside the U.S
users and these distributed datacenters will be risky
dangerous ,and definitely encouraged the decision taker to
think of multiple solutions to maintain the network reliability
and system availability and protect the system from network
bottleneck problems [9] . The solution was to let Facebook
servers Content Delivery Network CDN handling the objects
and well co-located geographically illustrated in figure (8).
CDN are spanning widely, and geographically distributed
through Russia, Egypt, Sweden, and UK ,etc .

3.3.2 Communication within systems processes


Hadoop servers are compatible with Remote Procedural
Call (RPC), in which all coming requests that are redirected
Figure (8): CDN support Facebook network from application servers to MY-SQL based architecture

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 157
ISSN 2347-4289

servers are served in term of RPC ,this mechanism of 3.4.1 Memcahed servers
communication improved for real time work load since Recent design of Facebook, let Hadoop performing a
Facebook have published Messaging service in later years random access workloads that provides low latency access
of working as online social network ,and being enhance a to HDFS, by using a combination of large clusters of
little bit in Hadoop to be limited with time constrains [2] . MySQL databases and caching tiers built using
Hadoop exploits tcp connections by sending RPCs. When a memcached ,that will be support a better in performance
RPC client detects a tcp-socket timeout limits , it sends a while all results from Hadoop are directed to MySQL or
ping to the RPC server instead of declaring RPC timeout memcached for consumption by the web tier side [2] , see
.now if the server is still alive and could communicate with figure (13) .
clients , client can continue waiting for a response. While in
case of a RPC server is experiencing a communication
burst, a temporary overhead or load, the client should wait
and direct its traffic to the server. And from opposite side in
case of throwing a timeout exception or retrying the RPC
request causes tasks to fail unnecessarily or add additional
load to a RPC server [2]. In another side of system,
choosing infinite wait will have an impact on any application
that has a real time requirement. For example An HDFS
client occasionally makes an RPC to some Data node DN ,
and it is not good when the Data Node fails to respond back
in time and the client is stuck in an RPC. A better scenario
is to fail fast and try a different Data Node DN for either
reading or writing. Hence, Hadoop has the ability for
specifying an RPC-timeout for each request depending on
the job which could be served from application servers or
want to call data base servers that had to call HDFS in
deed .when starting a RPC session with a server; Hadoop
is responsible for these tuning and configurations [2][3][4].
Facebook Messaging service combines existing old fashion
Facebook messages service with e-mail messaging , chat,
and SMS. Hadoop offer a persisting communication Figure (13) : memcached servers
between clients, it added a new threading model also
requires messages to be stored for each participating user Recently, a new generation of applications has been
this feature gives user ability to manage his social inbox applied at Facebook in which requires very high write
account with highly write /read throughput ,the idea of this throughput and cheap and elastic storage, while keeping
threading model As part of the application server low latency and disk efficient sequential and random read
requirements, letting each user be sticky to a single data performance [1][3][4].MySQL storage engines are proven
center at a time [2]. and have very good random read performance, but suffer
from low random write throughput. Scaling up Database
MySQL clusters rapidly is difficult to deal with, because of
the needs to maintain load balancing and have long and
high uptime. Administration of MySQL clusters requires
higher managing overhead and costly hardware [2] [3] [8].
We sum-up the whole system components in the figure (14)
below and listing the major parts that we have discussed in
this paper in table (1).

Figure (12): RPC between Hadoop servers

3.4 System design enhancment


In just few years Facebook distributed system has a
traditional design, in which Hadoop and Hive were working
together to perform tasks for storage and analysis of large
data sets .these analysis are classified in to two categories,
most of them are offline batch jobs to maximize the Table (1): Hadoop project components
throughput and efficiency and the others are online jobs.
These workloads are read and write large amount of data
form disks sequentially.

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 158
ISSN 2347-4289

performance and degrade the application behavior when


they are running on shortage of them ,when application is
not scale well it is encounter the performance and service
availability as demand increase [6]. Scaling indicators
should be determined well in order to tune applications
regarding these indicators ,such : number of concurrent
users (they are access in the same time) ,number of active
connections being served ,number of requests per seconds
,and average response times per request ,sampling of
these indicators in real time , based on historical values
used and some of predictable ones are set ,resulting in
scaling up or down decisions are being taken for web
application instances ,this is being done by let the amount
of web servers and web application component to grow or
shrink upon demand this is dynamic scaling feature [5][6] .
see figure (14) .

Figure (14) : whole recently system architecture

3.4.2 Colud computing support


Redundant Cluster servers are used to hold the whole
system in Facebook, now the system in consist of physical
server that needed to be extended day by day, this scheme
of datacenters hosted Facebook servers is subjected to be
at risk one day and subjective to many problems being as
limitations for growth [5][6]. Nowadays cloud computing is
offering a powerful environment to scale web applications
without difficulty. Using such schemes of resources on-
demand for many scaling points as web applications,
storages, and servers Cloud computing aims to deliver
services over the network it provides ability to add capacity Figure (15): Architecture to scale web applications in a
as needed ,it is basically use virtualization techniques to Cloud
turn computer resources in to virtual guest depending on
availability of such resources in the hosting environment 3.4.3 social network as virtual organization
,guest computers are running sharing the same resources The structure of social network is essentially a dynamic
while they are isolated in their design and configurations virtual organization, in which a trusting relationship is
,while cloud computing offer accessibility to the users form inherently among friends relationship, while resources
anywhere though their connected devices to their published (information ,hardware, services ) are shared among these
applications ,many trends appear here to save the data social network , a social cloud which offer a low level
navigated between users and applications in secure abstractions of computations and storage ,could easily acts
manner [6]. This shift for the technology will put data as a complementary building block for any social network
centers and their administrators at the center of distributed ,this is because a social cloud is a scalable computing
network, as computational power, web applications, model in which virtualized resources are shared by users
resources that being shared among them ,bandwidth and and dynamically provisioned among them, some service
storages are all managed remotely. While Facebook level agreement (SLA) should be exist to manage the
datacenters until now is physically hosting all its servers sharing process of virtual resources. Cloud here offer the
and data bases in real data centers, and not depend on scheme of application as a service APAS [6]. Cloud
cloud computing to scale its platform or infrastructure; cloud platforms are used to host social networks or to create such
computing such application as a service will be a good scalable applications PAS, Facebook applications is such
example to exploit the scalability gain for virtualization example and a particular part that play significant role in the
technology to meet some demand on growing requests and social clouds, these applications exploit Facebook methods
numerous traffic and offering a lot of increasing demand in order to render friends ,events, relationships, groups,
appear to integrate many applications with Facebook profile information ,and multimedia as audio /video, and
application system [5]. While scalability is a measure of Facebook markup language (FBML),these range of data
ability of an application to expand to meet enterprise enable completely integration between Facebook
business needs ,resources under demand are anything components and these applications ,which are definitely are
could be required or shared by the system users ,it is not hosted within Facebook environment they are hosted
ranging from processor, storage space and network independently [6] . All communications between specific
bandwidth ,these resources will affect primary the system user and these applications are done isolated without

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 159
ISSN 2347-4289

interrupting Facebook servers ,which is more attractive requests. Being geographically distributed by using
performance behavior, since once a user request an URL centralized data centers located on US and being replicated
for any application, and all communications later are served by distributed CDN, is providing the system the level of
from specific application server hold that application ,this acceptable scalability, with the CDN the system is still
scheme is adding a positive point in design considerations working in an acceptable levels, the TCP proxies and OSN
,Facebook JavaScript (FBJS) are used often to request cache servers will provide the system the up limits
Facebook servers asynchronously and in transparent scalability they are under studying and research and
manner without routing through applications servers [5][6]. unfortunately are not applied yet. Hadoop projects and
whole components are example of success story that
provide Facebook system with its requirements to be the
most popular social network by the year of 2013 ,while
rapidly added services and being occasionally updating
their services ; messaging and chat are examples of these
services that requires Hadoop to do a little bit
enhancements on their design to be real-time system rather
than to work offline processing and save the low latency
issues required to access the HDFS as fast as possible ,
adding RPC timeout as final enhancement . Memcahed
severs are also another example of these enhancements to
decrease the load of accessing the data base in each case
that require access to the data base. Cloud computing is
model example that Facebook used to integrate with its
features and services .this integration is done without any
infrastructure modifications or any architectural changes ,
this is because cloud computing is offering an acceptable
solution for integrating Facebook with such examples of
cloud applications .the most interesting examples of these
solutions the social cloud being built by the virtualization
organizations that provided ,these are being scaled
dynamically and on demand .

Figure (16): Facebook applications hosting environment 5. REFERENCES


[1]. Thusoo, Ashish, et al. "Hive-a petabyte scale data
The Social Cloud utilize web services to create scalable,
warehouse using hadoop." Data Engineering
distributed and decentralized infrastructure, with storage as
(ICDE), 2010 IEEE 26th International Conference
a service that complete the scenario well done, each
on. IEEE, 2010.
storage service is relying on a web application to deliver
content to the Facebook application with no need to route
[2]. Borthakur, Dhruba, et al. "Apache Hadoop goes
the requests through the social cloud applications, this
realtime at Facebook." Proceedings of the 2011
earlier steps done by using JavaScript JS and dynamic
ACM SIGMOD International Conference on
AJAX invocations [6]. Users easily can create a storage by
Management of data. ACM, 2011.
passing agreement to the storage service ,they access their
virtual storage and create their own resources , keeping
[3]. Shvachko, Konstantin, et al. "The hadoop
track for their storage contents ,view storage limits and
distributed file system." Mass Storage Systems
used/available spaces ,managing files and folders that the
and Technologies (MSST), 2010 IEEE 26th
storage holds ,and getting agreement outlines and
Symposium on. IEEE, 2010.
subscribing information [5][6].see figure (16)
[4]. Lakshman, Avinash, and Prashant Malik.
4. CONCLUSION AND CRITICAL EVLUATION "Cassandra: a decentralized structured storage
We have explored Facebook as a case study for distributed system." ACM SIGOPS Operating Systems
sytem,discussed the system features and providing a Review 44.2 (2010): 35-40.
detailed system design architecture, communications and
system components .this paper is provide an extensive [5]. Chard, Kyle, et al. "Social cloud: Cloud computing
study for Facebook distributed system inside its data center in social networks." Cloud Computing (CLOUD),
The system is built on top of highly equipped data centers 2010 IEEE 3rd International Conference on. IEEE,
that are provide the system the availability and reliability 2010.
,the Hadoop project is an example of this system that
Facebook in built on top of its technology . Using the [6]. Chieu, Trieu C., et al. "Dynamic scaling of web
clusters for the data base systems, load balancing applications in a virtualized cloud computing
webservers and application servers that are responsible for environment." e-Business Engineering, 2009.
replying on users requests, the ability to compress the ICEBE'09. IEEE International Conference on.
traffic between servers to save the bandwidth and the IEEE, 2009.
isolation between jobs that are derived from users

Copyright 2014 IJTEEE.


INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 2, ISSUE 7 160
ISSN 2347-4289

[7]. Yang, Bo-Wen, et al. "Cloud Computing


Architecture for Social Computing-A Comparison
Study of Facebook and Google." Advances in
Social Networks Analysis and Mining (ASONAM),
2011 International Conference on. IEEE, 2011.

[8]. Thusoo, Ashish, et al. "Data warehousing and


analytics infrastructure at facebook." Proceedings
of the 2010 ACM SIGMOD International
Conference on Management of data. ACM, 2010.

[9]. Wittie, Mike P., et al. "Exploiting locality of interest


in online social networks." Proceedings of the 6th
International COnference. ACM, 2010.

[10]. Rao, Valentina. "Facebook Applications and playful


mood: the construction of Facebook as a third
place." Proceedings of the 12th international
conference on Entertainment and media in the
ubiquitous era. ACM, 2008.

Copyright 2014 IJTEEE.

You might also like