A High Availability Scalable Website

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

A High Availability Scalable Website

Stefan Debattista C3126232


Leeds Metropolitan University, Leeds, United Kingdom

Abstract
Purpose – To recommend a Website hosting solution for RaisingMillions.com and demonstrate
real-life solutions grounded in reality that provides High Availability (HA) and best practice
standards
Design/Methodology/Approach – This paper reviews the real-world implementation of a
Website with ambition to grow from small to large.
Findings – The technology is readily available to scale slowly when needed, without the need to
invest in a server that is likely to last years on end. This does not however mean that the
application code does not need to be thoroughly planned out and designed for scaling from day
one.
Colleagues – During this paper I will refer to Josh Nesbitt, an Application Designer whom I shall
be collaborating with during implementation to get our application Online.
Practical Implications – The solution provides HA for web servers on a Just in Time (JIT) basis.
A systems analysis is recommended for implementing these ideas.

Before commencing this assignment I carried out an analysis of the problem. As this was not a
requirement I have included it in Appendices.

1 Introduction
RaisingMillions.com aims to be a world leading site to help raise money for charities across the
globe. The Application will be designed using an agile Web development framework - Ruby on
Rails hosted on a Linux server. Our core process involves collecting 1 Million images from
donators to charity. When a user uploads an image it will be saved into three image sizes; we will
therefore be dealing with at least 3 million images which will add up to a combined total of
~100Kb per image set, this does not include the additional text/images that people will upload nor
does it include the size of the Web pages. We will therefore be handling at least 1.5 Terabytes
(TB) of Data and even more bandwidth than that. A fair amount of data considering a Seagate
Cheetah hard disk takes 2.2 hours to read 1Tb at 125 MB/s. This complexity will have an impact
on hardware and software such as bandwidth, storage, and response times. My goal in this
research is to discover the opportunities in order to achieve in designing a scalable plan to host
our website that will begin with a handful of users and eventually be able to a million.
This paper sets out to identify:
 The requirements of our site (See Appendices)
 Software elements comprising a Web Application
 Strategic deployment methods
 Problems likely to be generated from a hardware perspective as a result of growth
 Solutions for HA
 Solutions for Data Management
A High Availability Scalable Website Stefan Debattista

2 Review of methods and technologies


2.1 What are the components of a Web Application and specifically of a Ruby on Rails
Web app?
Prior to commencing a review of how to host a HA Web Application it is important to understand
the architecture of the “elements” (Booch, 2001) constituting it. Jim Conallen, a Web Modelling
Evangelist offers a diagram of a canonical Web architecture seen in Figure 1.

Figure 1 - Web Architecture Diagram (Conallen, 2003)


During development stage RaisingMillions.com is run on a local machine, specifically, on Josh‟s
Apple MacBook® Pro. This is only possible because the requirements on the components of the
machine are not very demanding as it is not generating loads from the world to use just yet.

Figure 2 - Website Elements running on a single machine


Mongrel (Web server) executes the application during development. The script/server command
will run the application in development mode on port 3000 (Thomas, Hansson, 2007). However
for the deployed application Mongrel is far from ideal, this is because it is a single-threaded, and
can therefore can only process a single request at a time:- it couldn‟t possible keep up on the
Web.
Rails applications are deployed using a front end Web server such as Apache to handle the
incoming requests from clients (Thomas, Hansson, 2007) which can handle hundreds of requests
– at the same time.
Rails (Web App)

Web Server Rails (Web App)


Internet (Apache or similar)

Rails (Web App)

This setup can scale to multiple application servers. Apache will distribute load to any number of
Rails processes running on any number of back-end machines. This basic architecture allows for
Rails to scale until the Web or database server falls over, however, as you might expect, these too
can scale up to multiple machines. See where were going here? We‟ll see how to do this in just a
bit.

2.2 Deployment
Deployment is the stage where the application is uploaded to a live Web server. It‟s when the
beer and champagne are supposed to flow, it will be written about in Wired Magazine and will

Page 2
A High Availability Scalable Website Stefan Debattista

become an overnight name in the WWW. Unfortunately (for the Marketers) it may not be that
easy, but fortunately for us (Techs) it gives us time to manage growth.
It is best practice to first deploy onto a development server (Thomas, Hansson, 2007). This
doesn‟t have to be, and indeed shouldn‟t be, the final deployment environment, nor does it have
to be on a heavy-duty machine just yet. The sole purpose of this stage is practise, and testing. A
code – test – commit – deploy routine (Thomas, Hansson, 2007) is good practice and worth
getting used to. It will highlight teething problems that are likely to emerge during real
deployment. It will also give clients, lecturers, and trusted friends the opportunity feedback ideas
and issues to us. The skills I will acquire during this stage are:
 Server deployment
 Migration
This will also allow me to begin monitoring and testing the server, at some point the Pentium 4
laptop sitting in the hallway is going to run out of resources! And measuring when this happens
will give an insight to loads the application produces on the server with x number of clients.
So far the application has been hosted on PCs, the development server during coding, and the
deployment server during the test stage. Now comes the time when we commit and deploy onto a
Web server. A Web server is essentially a more powerful computer that can usually be upgraded
and that is always connected to the Internet with a stable and static Internet connection. There are
three main options here:
 To host the website yourself
 To rent web space - although this gives you no control over the machine which renders
this type of service useless for us
 To outsource it to a third party Web hosting company
Hosting in-house requires a WLAN link, Firewall, Router and Fast Internet connection (a 6MB -
T3 equivalent 1:1 connection to a provider like BT or Easynet costs in the region of £5,000 per
annum) as well as the Web / Domain Server itself. Web hosting companies will rent you a
Machine for a monthly price and generally provide a host of hardware support. Figures 3
demonstrates a real-life hosting service offered by one of the world‟s leading hosting companies –
Rackspace whom i requested a quote from..

Page 3
A High Availability Scalable Website Stefan Debattista

Figure 3 - Rackspace Options 1/2, and 3

Page 4
A High Availability Scalable Website Stefan Debattista

As the amount of users increase more demand will be put on the Web server, Application Server
and Database server. Each element must be available to perform their functions (Microsoft
MCSE 70-293, 2004), and depending to how critical they are it is important to take steps to

Page 5
A High Availability Scalable Website Stefan Debattista

ensure that they are up and running as much of the time as possible. What are the resources that
are going to suffer? What needs to be done?

2.3 Problems are likely to be generated from a hardware perspective as a result of


growth
“Scalability” refers to the ability of computer systems to handle growing amounts of work in a
graceful manner or to be readily enlarged” (A B. Bondi, 2000). In order to do this a scalable
computer system needs to support three basic areas of growth: processing, storage, and
bandwidth (Randal E. Bryant, 2009). One can achieve scalability by scaling vertically (scale up)
or scaling horizontally (scale out). To scale vertically means to add resources to a single node in
the system and therefore improving its performance. This typically involves the addition of more
resources such as processors, memory or hard disk space. Scaling horizontally on the other hand
involves distributing resources among multiple machines, or, a cluster of servers that are
interconnected with a high speed LAN, and in the case of multiple data centres, a high speed
WAN. Google for example has data centres throughout the globe and their algorithms cleverly
work out the closest centre to you therefore minimising your response rate (C D. Cuong, 2007).
There is always the need make constant iterations on bottlenecks (C D. Cuong, 2007). Software
iterations deal with making the actual applications more efficient. The main hardware concerns
are bandwidth, processing resources and data space. Gigabit Ethernet and switches between
nodes (servers) solve LAN bandwidth issues with their high data rates.

2.4 Solutions for High Availability


One surprising thing about web hosting is that a machine can handle a large number of visitors as
long as the data is mostly static.

Wan (Internet)
Switch Router
Web, Database, and Storage Server

Figure 4: A simple website works great on a standard PC


A standard machine with a Core 2 Duo Processor running Windows or Linux and Apache
connected with a 6MBps connection could handle hundreds of thousands of visitors per day. This
configuration will work great unless:
 Traffic increases
 Machine fails
 Pages are large
 Pages are dynamic
 Back-end processing needs to take place
This is because a single processor can only provide a limited amount of processing power. There
are three two strategies for handling the increased load:
1. Servers can be upgraded – mainly Ram and CPUs (Vertical Scaling)
2. Clustering a number of machines (Horizontal Scaling)
A server OS can ensure HA by means of clustering, which are groups of servers that function as a
single entity. The elements that constitute the Web Application in Section 3.1 will be hosted on
each one of these clustered servers as and when needed. The general idea is that when the load is
too great and nothing can be done from a software point of view, the next step is to add another
machine to distribute the load. Clients access the server applications using a specially assigned
cluster name and cluster Internet Protocol (IP) address, and one or more of the servers in the
cluster are responsible for responding to each client request (Microsoft MCSE 70-293, 2004).
Also should a server in the cluster fail then another server in the cluster will take over the
Page 6
A High Availability Scalable Website Stefan Debattista

responsibility of the failed server‟s processes. This is called a failover, and when the
malfunctioning machine comes back online it can begin it can restart its processes, this is called a
failback (Microsoft MCSE 70-293, 2004).
Microsoft‟s solution technologies consist of server clusters and network load balancing. These
Microsoft solutions, a part of MS Server 2003, provide excellent support and integration with the
Windows environment.
Linux‟s HA concept is very similar to Microsoft‟s. Google amended Linux Red Hat for load
balancing (Harnur, 2000). They used 15,000 PCs to build the world‟s largest Linux cluster and
achieved nearly 100% uptime for processing over 150 million queries per day with <0.25 second
response time (Holzle, 2002; Barroso et al., 2003). Figure 6 demonstrates the concept behind this
technology whereby node A and B let each-other know they are still „alive‟ with a heartbeat.
Should a node fail the functioning node initiates a failover. A module called “rsync” constantly
synchronizes the files of the two nodes and the web server because of the frequent changes to web
contents.

Figure 5 - Linux HA cluster normal operation

2.5 Solutions for data management


Redundancy Array of Independent Disks (RAID) configuration is one solution for storage
management is a. This is a disk system whereby the disks in the system, or array, work in-sync to
improve fault tolerance, performance or both (Microsoft MCSE 70-293, 2004). RAID can be set
up in various configurations ranging from RAID-0 to RAID-6 (Wikipedia, 2006).
RAID-0 uses two or more disks and writes to all disks at the same rate and by doing improves
read/write performance (Microsoft MCSE 70-293, 2004). This also offers some redundancy. Up
to 32 disks can participate in this collaboration of hardware. The amount of space on each disk
will be equal to the smallest amount of space on any one disk. This configuration is popular for
large storage is critical; however RAID 0 offers no form of redundancy.
RAID-1 provides good performance along with excellent fault tolerance (Microsoft MCSE 70-
293, 2004). Two disks participate in sync and all data is written to both volumes. If one fails
data on the second disk remains active, the broken disk can be replaced and once that is done the
data will be written back on to it.
RAID-5 uses three or more physical disks to provide fault tolerance and excellent read
performance while reducing the cost of the fault tolerance in terms of disk capacity. This is
achieved by one third of the capacity being used for fault tolerance as opposed to one half in
RAID-1
The other variations of raid offer similar but different functionality, MS Windows 2003 supports
RAID-1, 0 and 5 and Linux supports all versions of RAID (Wikipedia, 2006).
Storage Area Networks (SANS) a solution for centralizing data management that provides:
 Availability
 Scalability
 Manageability

Page 7
A High Availability Scalable Website Stefan Debattista

A SAN is a network connecting servers to storage to provide HA for data (Microsoft 2002;
Freedman, 1999) with Fibre Optic Cable. SAN devices generally deliver data really quickly (up
to 2Gbps) and connect up to 256 devices. A SAN is designed to eliminate single points of failure
and is the way forward for large disk capacities, fast delivery and high redundancy but at a much
greater cost to RAID.

3 Analysis
3.1 Components of a Web Application
Every application has its own unique setup and different requirements for accessing and
processing information. YouTube for example had unforeseen problems with their thumbnails.
Cyong Do talked about the realization that even though the team (of 7) were concerned with
delivering Video they did not envisage a problem with the little thumbnails. This issue was
caused by immense amounts of requests for data, unlike a video stream which involved one
request for the video to be played with one video came tens of thumbnail requests because from
the clients.
The thoughts and methodology behind the scaling of elements highlighted in this report in that
they are easily scalable by cloning the processes and distributing them onto multiple servers is
remarkable. It is a simple solution for a complex problem.
When it comes to dealing with the problems between applications and also within applications
themselves the basic principle is to simply the solution as much as possible and deal with it,
ensuring not to break something else.

3.2 Deployment
In the data driven world that we now live in, deployment solutions is not an issue. In an attempt to
quantify the volume, researchers reckon there was enough digital data in 2006 to theoretically fill
12 separate stacks of novels, each of which would extend the 93 million miles from the Earth to
the sun. By 2010, the accumulation of digital data would further extend these 12 stacks of books
to reach from the sun to Pluto and back! (McLean, 2007)
What is an issue however is coping with the vastness of the operating systems that Web
Applications, as are all applications, deployed on. There are simply so many things to consider in
setting up and maintaining a secure Operating System(s) that there are always major threats to the
Data.
A fail-proof backup backing up system is imperative because when everything is working -
everything is great, however should data get corrupted, deleted, or lost, business implications can
quite literally ruin a company and render it useless.
I believe deployment is a quick and painless process, because someone with something to deploy
will always have one of four options:
 Hosting Providers where all one needs is a credit card
to access one of the countless hosting providers
 A static IP at home and a small computer or set of
computers to experiment on
 A largish system of his/her own at work, or a
company‟s cash to buy one
 A beast of a system at work capable of thousands,
millions, or even trillions of processes every second

Page 8

Figure 6 - Google's first server


A High Availability Scalable Website Stefan Debattista

3.3 Problems are likely to be generated from a hardware perspective as a result of


growth
As with Application components every system has its own unique setup and is prone to its own
problems. One major differentiating factor is that dealing with hardware issues is very likely to
cost a lot more money.

3.4 Solutions for HA and Data Management


The flexibility of hardware allows us to build computer systems that serve different computing
needs for different organisations.
Solutions for HA systems have the potential to be very expensive. Although from a “hobby” or
“make-do” perspective solutions can be cheap but complicated. 40 Clustered Pentium 4s may
have the same processing power as a rack mounted IBM Blade Server but are certainly a lot
messier to deal with; but certainly something to be proud of.
A differentiating factor between solving hardware and software problems is that there is a lot of
reliance on hardware vendors. These companies sometimes also go astray of regulations. Some
SAN vendors for example are not part of the Storage Networking Industry Association (SNIA)
and this can cause problems because of hardware conflicts. Reviews are excellent resources to
find out more information.

4 Proposed methodology
My recommendations for the assessment and implementation of a hosting the
RaisingMillions.com computer system adopts a Systems Analysis process in terms of designing
and planning the system and a Just In time (JIT) methodology for implementing new ideas. This
is an established process for implementing better (usually more expensive or more complex) ideas
only when needed to keep overheads and time to a minimum. The Systems Analysis approach I
recommend consists four phases:
 Phase 1 - Understanding the business needs
 Phase 2 - Analysing systems requirements
 Phase 3 - Analysing and making-decisions and
 Phase 4 - Implementing the system

 Phase 1
o Answering questions like Budget
o Timescales
o Realistic HA needs
o Realistic storage needs
o Speed
o Environmental considerations
o Risk log

 Phase 2
o Specific technical requirements (Use Case Document, Wireframes)
o Research will help determine more precise hard disk requirements, processor
requirements, and backup.
o Operating system will need to be Linux to host Rails, but do we also need a
Windows server?
o Future technical requirements
o A checklist of functional requirements should be compiled.
Page 9
A High Availability Scalable Website Stefan Debattista

 Phase 3
o Matching precise set ups with the constraints and requirements set out in Phase 1
and 2.
o Skills should be considered and technology appropriately chosen
o Testing - site should be loaded onto the deployment server.
o Technologies and their set-up will be detailed.
o Tests should be carried out to make implementation is as smooth as possible.

 Phase 4
o Implementation/project plan with all above details detailed.
What I can envisage is demonstrating hosting the site being hosted on a small
cluster of laptops or virtual servers and switching off one demonstrating the
failback and failback features. Also implementing some sort of backup and
disaster recovery plan. It would also be interesting to explore S3 or EC2
depending on constraints

5 Summary
There is an unlimited amount of money one can spend, on an unlimited amount of ways to host a
Web Application because of the vast amounts of systems and services available on the market.
A website needs to support three types of areas of growth, these being processing, bandwidth, and
storage and one can achieve scalability by distributing these among multiple physical or virtual
servers. Usually the Operating System can take care of all these by means of clustering. Storage,
however, is nowadays managed by means of SANs (Storage Area Networks) although a cheaper
alternative is the older RAID Configuration which can still be affective if centralized storage is
not the key objective greatly minimizing management time. Database scalability is usually
handled independently of the OS by means of its intrinsic design; this is a DBA‟s role.
The important thing is to plan for the highest foreseeable simultaneous load on the systems
software and hardware. There is no point in investing in a massive infrastructure for a website
that may or may not become very popular. And becoming popular on the net is not as easy as one
may think.
The infrastructure available to us IT professionals is so vast that we can literally pick and choose
specific areas to invest money in to better performance, and the flexibility is unparalleled. Site
and server statistics help us to identify when to scale and by how much.
This paper has helped me gain a sound understanding of the systems supporting a website and the
importance of planning the architecture, not least to save money because of buying or renting the
wrong products or software. No engineer constructs a building without a solid plan in place and
neither should we.

Page 10
A High Availability Scalable Website Stefan Debattista

6 Appendices
6.1 Systems Analysis of computing needs

6.1.1 Problems and requirements


This is the most important part of the project. It encompasses understanding the business
requirements and to design a system to realize these. A systems analysis approach was undertaken
and this generally consists of preliminary investigation, problem identification, requirements
analysis, decision analysis, system implementation and operation/support (Whitten et al., 2001)
The website is what our customers use to interact with our business and must be stable to provide
a true 24/7 high availability (HA). A measure of a system‟s availability is the amount of uptime.
Table 1 provides an example of ranges of Uptime, however there is no hard and fast rule of what
the accepted rate is – indeed, different organizations have different standards. For example, EBay
achieved a 99.94% pa (June 2004) uptime, a 4% increase on site availability from 95.2% (June
1999) (eWeek, 2004).
Software and hardware failure are reported to account for 50% of unplanned downtime. Natural
disasters and human errors are the other causes (Marcus and Stern, 2003). HA aims to minimise
these failures.
These requirements were drawn up for RaisingMillions.com. These requirements are speculative
estimates from the RaisingMillions team:
 RaisingMillions.com must (eventually) support 1000-1500 of simultaneous users
 RaisingMillions.com must be a true 24/7/365 HA minimizing downtime
 RaisingMillions.com must be backed up
 RaisingMillions.com requires 2TB of storage for the Images and other files
 RaisingMillions.com must have failover redundancy
 RaisingMillions.com must be cost effective with the possibility to expand

Uptime (%) Downtime per year Downtime per week


(minutes)
98.00 7.3 days 202
99.00 3.65 days 101
99.50 43.8 hours 50
99.80 17.52 hours 20
99.90 8.76 hours 10
99.99 52.6 minutes 1

Table 1 - Measuring Availability Cited from Yan Han – An Integrated High Availability Platform

6.1.2 Insight into modern scalability solutions such as Grid/Cloud computing:


I was going to write about this topic and even though I looked into it, and very much enjoyed
researching and watching some very interesting videos about stuff like Google’s MAP Reduce
Programming Model, Amazon’s EC2 and S3, and Yahoo’s M45 models comprising of tens of
thousands of processors, petabytes of discs (Petabyte isn’t even in my Microsoft Word!),
capable of tens of millions calculations per second! Unfortunately I feel that the word count is
simply not large enough to explore and report this field of study and it does not really tie into
the aim of this assignment: To give a Web Start-ups (RiaisingMillions.com) a realistic, cost
efficient solution to scaling an internet site.

Page 11
A High Availability Scalable Website Stefan Debattista

7 Bibliography
1. BONDI, A B. 'Characteristics of scalability and their impact on performance',
Proceedings of the 2nd international workshop on Software and performance,
Ottawa, Ontario, Canada, 2000, ISBN 1-58113-195-X, pages 195 – 203

2. Video: YouTube Scalability. Google Tech Talk. CYONG DO (and an elite group of
scalability ninjas). Google: 2007.

3. BRYANT, R E. 2009. Data Intensive Super Scalable Computing, lecture notes


distributed in the topic module code MSc COMPUTING. Carnegie Mellon University,
Oxford LHB.

4. BOOCH, G. 2001. IBM: The architecture of Web applications [online]. [03 March 2009].
Available from World Wide Web:
http://www.ibm.com/developerworks/ibm/library/it-booch_web/

5. D. THOMAS, D H. HANSSON. 2007. Agile Web Development with Rails. Texas:


Pragmatic Bookshelf

6. C. ZACHER. 2004. Windows Server 2003 Network Infrastructure MCSE 70-293.


Washington: Microsoft Press

7. RAID. 2006. Wikipedia: Redundant array of independent disks [online]. [04 March
2009]. Available from World Wide Web:
http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks

8. MCLEAN, D. 2007. There’s how much data? 15/03/2007. Internet News [online]. [06
Mar 2009]. Available from World Wide Web:
http://www.websearchguide.ca/netblog/archives/005907.html

9. Whitten, J.L., Bentley, L.D., Dittman, K.C. (2001), Systems Analysis and Design
Methods, 5th ed., McGraw-Hill, Boston, MA, .

10. eWeek (2004), "Marketplace to the world", eWeek, Vol. 21 No.35, pp.22-4.

11. Marcus, E., Stern, H. (2003), Blueprints for High Availability, 2nd ed., Wiley, New York,
NY.

12. Harnur, S. (2000), "Google relies exclusively on Linux platform to chug along",
available at: www.hpworld.com/hpworldnews/hpw009/02nt.html (accessed 26
October 2004), .

13. Holzle, U. (2002), The Google Linux Cluster, University of Washington, Seattle, WA,
available at:
www.cs.washington.edu/info/videos/asx/colloq/UHoelzle_2002_11_05.asx (accessed
26 October 2004), .

14. Microsoft (2002), Microsoft Computer Dictionary, 5th ed., Microsoft Press,
Washington, DC, .

Page 12

You might also like