High-Availability, Fault Tolerance, and Resource Oriented Computing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Eugene Ciurana

[email protected] - pr3d4t0r ##java, irc.freenode.net

High-Availability,
Fault Tolerance,
and Resource Oriented
Computing
This presentation is available from:
http://ciurana.eu/GeeCON-2010
Lets move the Java world!

About Eugene...

15+ years building mission-critical, highavailability systems


14+ years of Java work
Open source evangelist

Official adoption of open source/Linux at


Walmart worldwide

State of the art main line of business at


the largest companies in the world - not
a web guy!

Lets move the Java world!

What Youll Learn...

Decoupled, event-driven, resource-oriented systems are


more flexible
Avoid tight, point-to-point integration
Enhance JVM-based apps with better domain-specific
languages
How to move away from monolithic app servers and
architectures
How to implement event-driven systems based by
leveraging existing infrastructure and SOA investment
Treat computational resources as addressable entities
Balance open source vs. commercial products

Lets move the Java world!

Very Important!

Please Ask Questions!


(dont be shy)

Lets move the Java world!

What is Scalability?

Scalability is the property of a system to:


handle bigger amounts of work; or
to be easily expanded in response to increased demand
network, processing, database, file resources
Types of scalability
Horizontal (out): add more nodes with identical
functionality as existing ones and redistribute the load
Vertical (up): expand by adding more cores, main
memory, storage, or network interfaces

Lets move the Java world!

Horizontal Scalability
Load Balancer

Node

Node

Node

Scales
out

Load Balancer

Node

Node

Node

Node

Clustering!
Lets move the Java world!

Vertical Scalability
Virtual Node 3
Virtual Node 2
Virtual Node 2
Virtual Node 1
Virtual Node 1
Scales up
Virtual Node 0

Virtual Node 0

Dual Core
Single Processor
16 MB RAM

Dual Core
Dual Processor
32 MB RAM

Lets move the Java world!

What is Availability?

How well a system provides useful resources over a set


period of time
High availability guarantees an absolute degree of
functional continuity within a time window
Expressed as a relationship between uptime and
unplanned downtime
A = 100 - (100*D/U); D, U expressed in minutes
Beware: uptime != available

Lets move the Java world!

The Nines Game


Availability %

Downtime (minutes)

Downtime/year

Vendor jargon

90

52560.00

36.5 days

one nine

99

5256.00

3.7 days

two nines

99.9

526.60

8.8 hours

three nines

99.99

52.56

53 minutes

four nines

99.999

5.26

5.3 minutes

five nines

99.9999

0.53

32 seconds

six nines

Lets move the Java world!

Service Level Agreements

SLAs are negotiated terms that outline the obligations of the two
parties delivering and using a system
System type - not all systems require the same SLA
Levels of availability
Minimum
Target
SLAs help
Uptime
determine if
Network
you scale up
Power
or out
Maintenance windows
Serviceability
Performance and metrics
Billing

Lets move the Java world!

Load Balancers

They work by spreading requests among two or more resources


Implemented in hardware or in software
Multiple machines
Multiple processes
Multiple threads
Resources appear as a single device to consumers
Can be stateless (web services), or stateful (applications that
require session management)
Algorithms determine the distribution
1/n == all systems equally likely to service
Special requests (e.g. music store) some servers get hit more
than others

Lets move the Java world!

Load Balancers
Consumer

Rn

R = request
n = sequence number

Load Balancer
74.0.125.28
R1

Node
192.168.202.55

R3

Node
192.168.202.66

R2

Node
192.168.202.67

Node
192.168.202.69

Lets move the Java world!

Persistent Load Balancers


Consumer

Consumer

Consumer

Sticky Load
Balancer
74.0.125.28

Node
192.168.202.55

Node
192.168.202.66

Node
192.168.202.67

Node
192.168.202.69

Lets move the Java world!

Load Balancing and Databases


Consumer

Load Balancer
74.0.125.28

Node
192.168.202.55

Node
192.168.202.66

Node
192.168.202.67

Node
192.168.202.69

Session
Data

Lets move the Java world!

Caching Strategies

Stateful load balancing requires data sharing


Caching distributes popular, shared read-only data
Think of them as a giant hash map
If the data isnt in the cache, fetch it from database
Write policies:
write-through: write to the cache AND database
write-behind: cache is marked dirty and updated
only if a dirty datum is requested
no-write allocation: only read requests are cached;
assumes data never changes

Lets move the Java world!

Caching Usage Pattern

Application caching
Little or no programmer participation (e.g. Terracotta)
Explicit API calls (memcached, Coherence, etc.)
Web caching - stores full documents, or fragments
(particles) on the server or client and are invisible to
the client
Web accelerators - distribute the load (e.g. CDN like
S3, Akamai, etc.)
Proxy caches - distribute requests to same resources
and may provide filtering/query (e.g. Squid, Apache, ISA
servers)

Lets move the Java world!

Caching Usage Pattern


Begin

query

update
Query?

Fetch
datum from
cache
Update
datum in
database
datum is
None

no

yes

Invalidate cache

Query
datum from
database
Add or update
datum to cache
Add datum to
cache

Use datum
in app

End

Lets move the Java world!

Distributed Caching
Consumer

Load Balancer
74.0.125.28

Node
192.168.202.55

Node
192.168.202.66

Node
192.168.202.67

Node
192.168.202.69

Load Balanced Configuration or Datagram


Cache 0

Cache 1

Cache 2

Cache 3

Database

Lets move the Java world!

Clustering

Cluster - two or more systems that appear to users as a


single system
A cluster (horizontally scalable) system is more costeffective than a monolithic single system (vertically
scalable) with the same performance characteristics
Systems are connected in the cluster over high-speed
LANs like Gb Ethernet, FDDI, Infiniband, Myrinet, etc.

Lets move the Java world!

A/A Clustering

A/A == Active/Active
Distribute the load evenly among multiple nodes
All nodes offer the same capabilities
All nodes are active at the same time
Consumer

Load Balancer
74.0.125.28

Node
192.168.202.55

Node
192.168.202.66

Node
192.168.202.67

Node
192.168.202.69

Lets move the Java world!

High-Availability A/P Cluster

A/P == Active/Passive
Provides uninterrupted service through redundant nodes
Eliminates single-point-of-failure
Two nodes minimum, and heartbeat detection
Automatic traffic switch for fail-over
Consumer

Router
74.0.125.28

Active Node
192.168.202.55

heartbeat

Failover Node
192.168.202.69

State Data
Cache

Database

replication or clustered database

Lets move the Java world!

Failover
Database

Grid

Consumer

Master

Load Balancer

Node

Node

Node

Node

Load Balancer

Node

Node

Node

Node

Process loads as
independent jobs
Nodes dont require data
sharing
Storage, network may be
shared by all nodes
Intermediate results have
no bearing on other jobs
progress
Each node is independent
Map/Reduce (Hadoop)

Lets move the Java world!

Computational Cluster

Used for operations that require raw computational


power
Not good for transactional operations (web, database)
Tightly coupled nodes, homogeneous, close proximity
Meant to replace supercomputers
Consumer

Master

Node

Node

Node

Node

Node

Node

Node

Node

Lets move the Java world!

Redundancy and Fault Tolerance

Redundancy - the expectation that any system


component failure is independent of failure in other
components
Fault tolerance - the system continues to operate in
the event of component failure
May have decreased throughput

Fault tolerance
results from
SLAs

Lets move the Java world!

Fault Tolerance SLA Requirements


No single point of failure - redundant components

ensure continuous operation


Allow repairs without disruption of service
Fault isolation - problem detection must pinpoint the
specific faulty component
Fault propagation containment - problems in one
component must not cascade to others
Reversion mode - the system can be set back to a
known state on command

Lets move the Java world!

A/A Cluster Fault Tolerance


Consumer

Load Balancer
74.0.125.28

Replacement
Node
192.168.202.53

Node
192.168.202.55

Node
192.168.202.66

Node
192.168.202.67

Node
192.168.202.69

Uninterruptible, scalable service (stateless, web services)


Failure transparency - though maybe degraded service
Ideal for event-based web services (SOAP, REST, JMS, etc.)
No dependencies between nodes

Lets move the Java world!

A/P Cluster Fault Tolerance


Consumer

Router
74.0.125.28

Node
192.168.202.55

heartbeat

Failover Node
192.168.202.69

State Data
Cache

Database

Failover
Database

High availability through redundancy and failure detection


Higher cost - used for stateful systems
May require active sys- or netadmin participation
More moving parts - more things to coordinate
Lets move the Java world!

Putting It All Together

Lets move the Java world!

ROC Architecture

ROC = Resource-Oriented Computing


Everything is a resource (computational, data, other)
Service
Provider
(UPS, FedEx)

Web
browser
Service Object

Remedy

business logic
Web app

Internet

GUI
App

Dedicated API
JMS, SOAP, etc.

Transformer

Transformer

Mule ESB
Transformer

SOAP

JDBC

CRM

Product
Catalogue

HTTP, XML
Product
Product
Support
Product
Support
Pages
Support
Pages
Pages

TCP pass-through

Single Sign-On
LDAP, SOAP

Mainframe / RACF

Active
Directory

Legacy
Auth

Lets move the Java world!

SOA and Computational Network

Lets move the Java world!

Real-Life Example - LeapFrog


End-User System (Mac, Windows)

USB

LeapFrog
Connect

Web
Browser

S3
Content
Repository

Third-party
Partner Site
Internet

www.leapfrog.com

connected
products

LearningPath

Firewall
Mule ESB backbone
HTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated LeapFrog services

Mule ESB tailbone

Mule ESB funnybone

Connected products SOAP, REST web


services

Device log upload, processing, servlet


container

Customer
Data

Game
play
Data

Servlets
App Logic

Device
Logs

Content
Management
System
REST, JCR

Crowd SSO

Content
Authoring

User
Credentials

Lets move the Java world!

Real-Life Example - LeapFrog


Internet

Load Balancer

Application
Server
Tomcat 6

Application
Server
Tomcat 6

Services Proxy

Load Balancer - Backbone

Backbone - message filtering, routing, dispatching, queuing, events


Mule ESB
1.6.2

Load Balancer - Tailbone

Mule ESB
SOAP, REST

Mule ESB
SOAP, REST

Database

Mule ESB
1.6.2

Mule ESB
1.6.2

Load Balancer - Funnybone

Mule ESB
servlet, MTOM

Mule ESB
servlet, MTOM

NFS
share

Mule ESB
1.6.2

Load Balancer - Message Broker

ActiveMQ

ActiveMQ

NFS
share

Lets move the Java world!

Mule SOA Applied Clustering


* Two or more Mule instances can provide services, for scalability if there is high demand
* Load balanced configuration has built-in fail-over
* External apps see a single point of entry: the service endpoint name
* Load balancer or proxy sends the request to any available Mule server
* Increased demand - add another Mule server without interrupting the existing ones
* Decreased demand - remove Mule servers without interrupting other servers
* This is an active/active configuration - any server can handle a request at any time
* Assumes that the service application components are stateless
External Applications

http://server.mycompany.com/service_call
Load
Balancer
http://mule_server_1/service_call

http://mule_server_2/service_call

Mule ESB as Application Container 1

Mule ESB as Application Container 2

Service 1

Service 2

Service 3

Service 1

Service 2

Lets move the Java world!

Service 3

Mule SOA - ESB App Failover


* A/A configuration uses the load balancer to dispatch service calls
* The load balancer takes a failing service out of rotation automatically
* Failure reason no. 1: network connectivity
* Failure reason no. 2: Mule container
* Failure reason no. 3: Service application bug

External Applications

http://server.mycompany.com/service_call
Load
Balancer
http://mule_server_1/service_call

http://mule_server_2/service_call

Mule ESB as Application Container 1

Mule ESB as Application Container 2

Service 1

Service 2

Service 3

Service 1

Service 2

Lets move the Java world!

Service 3

Uninterrupted Application Updates


* Allow stopping and deploying new application functionality without stopping services
* Allow upgrades to a country's configuration without affecting other countries or stopping services
Load Balancer

Mule ESB as Application version 1.4

Mule ESB as Application version 1.4

Load Balancer

Mule ESB as Application version 1.4

time

Mule ESB as Application version 2.0

Load Balancer

Mule ESB as Application version 2.0

Mule ESB as Application version 1.4

Load Balancer

Mule ESB as Application version 2.0

Mule ESB as Application version 2.0

Lets move the Java world!

Database Replication
Primary Cluster

Node 0

Node 1

ESB as app services provider

Partition 0

Partition 1

DB 0

DB 1

DB 0b

DB 1b

Lets move the Java world!

Application Deployment
Load Balancer

Mule 1

Load Balancer

Mule 2

Mule 3

JMS Queuing Active

Mule 4

JMS Queuing Active

Lets move the Java world!

Mule 5
Failover

Application Deployment
This architecture has a lower cost of operation and simplifies power consumption and administration.

Application 1

Application 2

Web Service 1

Web Service 2

JBoss

Mule ESB Container

MQ

Java 6

Java 6

Java 6

Linux

Linux

Linux

Virtual Machine

Virtual Machine

Virtual Machine

Multi-Core Intel or AMD Processors

Simplify the architecture by having a common platform for all systems. This platform can be replicated across multiple data
centers.
* Virtual Machine: VMware or Xen hosted on Windows; consider Amazon EC2 as a viable, low-cost alternative
* Linux: Ubuntu Server
* PowerBuilder applications (end-user) migrate to JBoss + Wicket or a similar configuration
* All web services are hosted by Mule ESB
* The Mule ESB and JBoss servers are separate from one another
* MQ clusters have a similar architecture; JBoss messaging and Websphere MQ
* Java 6 as a minimum

Lets move the Java world!

Application Deployment
App and service requests
may come from the open Internet

Each data center will have a cluster of two or more physical systems.
Each system will virtually host two or more applications/
environments deployed as described in the previous diagram.

Internet

The system is designed for horizontal scalability (more traffic, more


virtual or physical servers.
The system has inherent fail-over built in.
App Balancer

Use physical
load balancers;
can be Linux systems
or dedicated F5
balancers - separate from
cluseter

Services
Balancer

MQ
Master

Web Services
Active

Application
Active

MQ
Slave

Distributed
Cache

Virtual Host (Intel, AMD)

Application
Active

Web Services
Active

Virtual Host (Intel, AMD)

Disk
Disk
SAN

Lets move the Java world!

Distributed
Cache

Application Deployment
Data Center Japan
Data Center Europe

App Cluster
App Cluster

Internet
App Cluster

App Cluster

Expert

Claims Mgmt

Data Center US

App Cluster
App Cluster

Each data center has an application cluster

Claims Mgmt

The app clusters have identical


configurations; only the app itself may vary
by locale

Informix

Designated data center also functions as the


global services processing hub; all
applications talk to this cluster (e.g. Claims
Management) regardless of where the app
calling them is from.

Legacy System
Legacy System
Legacy System

The global services clusters are separate


physically and logically from the application
clusters which may include locale-specific
web services and data stores.

Lets move the Java world!

Application Deployment
Primary Cluster

Node 0

Secondary Cluster

Node 1

ESB as app services provider

Node 0

Node 1

ESB as app services provider

Partition 0

Partition 1

Partition 0

Partition 1

DB 0

DB 1

DB 0

DB 1

DB 0b

DB 1b

DB 0b

DB 1b

Enterprise Service Bus (routing, queuing, transformation, transactions, dispatching)

Lets move the Java world!

Eugene Ciurana

[email protected] - pr3d4t0r ##java, irc.freenode.net

http://ciurana.eu/scalablesystems

Q&A
Comments?
Anything else?
This presentation is available from:
http://ciurana.eu/GeeCON-2010
Twitter: ciurana
Lets move the Java world!

You might also like