0% found this document useful (0 votes)
33 views11 pages

02 - Build Applications On The Cloud

The document discusses performance factors and best practices for deploying and managing applications on the cloud. It covers topics like resource bandwidth and latency, multi-tenancy, security, deployment processes, managing downtime, redundancy, load balancing, and building fault-tolerant cloud services.

Uploaded by

Marcos Medice
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views11 pages

02 - Build Applications On The Cloud

The document discusses performance factors and best practices for deploying and managing applications on the cloud. It covers topics like resource bandwidth and latency, multi-tenancy, security, deployment processes, managing downtime, redundancy, load balancing, and building fault-tolerant cloud services.

Uploaded by

Marcos Medice
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

PERFORMANCE FACTORS FOR APPS ON THE CLOUD

-Resource bandwidth and latency


-plan the app with strict latency requirements in mind
-find the optimal set of datacenter locations that can be used to optimize the end-user
performance and responsiveness
-bandwidth needs depending on the type of requests (specially multimedia such as video
and audio)

-Multi-tenancy
-apps on public datacenters typically run on shared infra (share hardw). The performance of
a resource at any given time is a function of the total load on the resources from all tenants
(interference), experienced from other tenants sharing the same hardw.
-some cloud providers offers certain types of resources (such as VM’s) on dedicated hardw,
delivering consistent performance (costs more)

-Security settings
-public clouds are subject to increased attack vectors
-needs to follow best practices, protocols and procedures when deploying and maintaining
apps on the cloud
-any code deployed on a public cloud should go through a strict process of manual and
automated source code reviews and static analysis, as well as dynamic vulnerability analysis
and penetration testing

DEPLOY APPLICATIONS ON THE CLOUD

-Deployment process
-iterative process that starts from the end of development and continues right through to the
release of the application on the production resources
-typical to maintain multiple concurrently running versions of their apps to pipeline
deployment of their app to into various stages:
-> testing
-> staging
-> production

-Pipeline application changes

-Custom scripts: custom scripts to pull the latest version of the code and run specific
commands to build the app and bring it into a production state

-Pre-baked virtual machine images: provision and configure a VM with all the required
environment and softw to deploy their app. Once configured, the VM can be snapshotted
and exported to a VM image. This image can be provided to various cloud orchestration syst
to be auto deployed and configured for a prod deployment

-Continuous integration systems (CI): automate tasks on deployment (retrieval of the latest
version from a repo, building app binaries, run test cases) that need to be in the various
machines that make up the prod infra

-Manage downtime
-certain changes to the app may require partial or full termination of the app services to
incorporate a change in the apps back end.
-apps that are designed for continuous integration may be able to perform these changes
live on production systems with minimal or no interruption to the apps clients

-Redundancy and fault tolerance


-best practices in app deployment typically assume that cloud infra may be unavailable or
change at any moment
-apps must refrain from hard-coding or assuming static endpoints for various components,
such as databases and storage endpoints
-well-designed apps should ideally use service APIs to query and discover resources and
discover resources and connect to them in a dynamic fashion
-catastrophic failures in resources / connectivity can happen at a moment’s notice. Critical
apps must be designed in anticipation of such failures and must be designed for failover
redundancy
-devs can configure their apps to use resources in multiple regions / zones in order to
improve availability of their app and tolerate failures that may happen across a zone / region.
They will need to configure systems that can route and balance traffic across regions / zones

-Security and hardening in production

BEST PRACTICES:
-all softw should be switched to production mode (most softw supports “debug mode” for
local testing + “production mode” for actual deployments)
-access to nonpublic services should be restricted to certain internal IP addresses for admin
access. Make sure that adms cannot directly log in to a critical resource from the internet
without visiting an internal launchpad. Configure firewalls with IP address and port-based
rules to allow the minimal set of required accesses.
-follow the principle of least privilege. Run all services as the least privileged user that can
perform the required role. Restrict the use of root credentials to specific manual logins by
system administrators who need to debug or configure some critical problems in the system.
Also applies to access to database and administrative panels.
-use well-known defensive techniques and tools for intrusion detection and prevention
system (IDS/IPS), security information and event management (SIEM), application-layer
firewalls and anti-malware systems
-Set up a patching schedule that coincides with patch releases by the vendor of the systems
that you use

BUILD FAULT-TOLERANT CLOUD SERVICES

-a failure in a system occurs as a result of an invalid state introduced within the system due
to a fault, typically:
-> transient faults: temporary faults in the system that correct themselves with time
-> permanent faults: cannot be recovered from and generally require replacement of
resources
-> intermittent faults: occur periodically in a system

-Proactive measures
-> service providers take several measures in order to design the system in a specific way to
avoid known issues or predictable failures

-Profiling and testing: load and stress testing cloud resources in order to understand possible
causes of failure is essential to ensure the availability of services. Profiling helps in designing
a system that can successfully bear the expected load without any unpredictable behavior

-Over-provisioning: practice of deploying resources in volumes that are larger than the
general projected utilization of the resources at a given time (handle unexpected spikes in
loads). Also used against DoS or DDoS attacks

-Replication: critical systems components can be duplicated by using additional hardw and
softw components to silently handle failures in parts of the system without the entire system
failing
-> Active replication: all replicated resources are alive concurrently and respond to and
process all requests
-> Passive replication: only the primary unit processes the requests, and secondary units
merely maintain state and take over once the primary unit fails. (disadvantage: may be either
dropped requests or degraded QoS in switching from the primary to the secondary intance)
-Reactive measures

-deals with failures as and when they happen

-Checks and monitoring: recovery or reconfiguration strategies are designed in order to


restart resources or bring up new resources. Monitoring can help in the identification of faults
in the systems. (crash faults: cause a service to be unavailable, byzantine faults: induce an
irregular / incorrect behavior)

-> ping-echo: the monitoring service asks each resource for its state and is given a time
window to respond
-> heartbeat: each instance sends status to the monitoring service as regular intervals,
without any trigger

-Checkpoint and restart: state is saved at several stages of execution in order to enable
recovery to a last-saved checkpoint

-Case studies in resiliency testing: testing the ability of the system to handle catastrophic
failures.

SIMIAN ARMY:
-> Chaos monkey: randomly picks a production instance and disables it to make sure the
cloud survives common types of failure without any customer impact
-> Latency monkey: a service that induces delays in between RESTful communication of diff
clients and servers, simulating service degradation and downtime
-> Doctor monkey: service that finds instances that are exhibiting unhealthy behaviors (ex:
CPU load) and removes them from service. Allows the owner some time to figure out the
reason for the problem and eventually terminates the instance
-> Chaos gorilla: service that can simulate the loss of an entire AWS availability zone. Used
to test that the services automatically rebalance the functionality among the remaining zones
without user-visible impact or manual intervention

LOAD BALANCING

-high availability can be improved by replication


-performance can be improved through parallel processing
-simple way to tackle the latency of long-distance connections

STRATEGIES:
-> Proxying: load balancer receives the response from the back end and relays it back to the
client. Load balancer behaves as a standard web proxy and is involved in both halves of a
network transaction
-> TCP handoff: the TCP connection with the client is handed off to the back-end server.
Therefore the server sends the response directly to the client, without going through the load
balancer

-Impact on availability and performance

-important strategy to mask failures in a system


-the load balancer is a single point of failure for the server. If it falls, no client request will be
able to be served.
-to achieve high availability, load balancers are often implemented in pairs
-Strategies for load balancing

-Equitable dispatching: static approach, a simple round-robin algorithm is used to divide the
traffic between all nodes evenly and does not take into consideration the utilization of any
individual resource node in the system or the execution time of any request

-Hash-based distribution: tries to ensure that at any point, the requests made by a client
through the same connection always end up on the same server (random order). Has
several advantages over the round-robin approach as it helps in session-aware apps where
state persistence and caching strategies can be much simpler. Less susceptible to traffic
patterns that would result in clogging on a single server since the distribution is random

-Strategies based on request execution time: priority scheduling algorithm, whereby request
execution times are used in order to judge the most appropriate order of load distribution

-Strategies based on resource utilization: uses the CPU utilization on each resource node to
balance the utilization across each node

-Other benefits
-increase the performance of the service

-> SSL offload: client connection to the load balancer can be made via SSL, while redirect
requests to each individual server can be made via HTTP. Reduces the load on the servers

-> TCP Buffering: strategy to offload clients with slow connections to the load balancer in
order to relieve servers that are serving responses to these clients

-> Caching: load balancer can maintain a cache for the most popular requests

-> Traffic shaping: load balancer can be used to delay / reprioritize the flow of packets such
that traffic can be molded to suit the server configuration.

1. Consider the following scenario. You're using Azure Load Balancer with a round-robin
scheduler as a front end to two web servers. One server is a medium instance with two
cores and 8 GB of RAM. The other server is a large instance with four cores and 16 GB of
RAM. Which of the following scenarios is likely?

Both instances will receive an equal amount of load. The large instance will have half the
utilization (in terms of percentage of CPU and memory) of the medium instance.

SCALE RESOURCES

-ability scale resources into a system on demand. Scaling up (provisioning larger


resources) or scaling out (provisioning extra resources) can help in reducing the load
on a single resources by decreasing utilization as a result of increased capacity or
broader distribution of the workload
-scaling can help in improving performance by increasing the throughput, since a
larger number of requests can now be served
-also help in decreasing latency during peak loads since a reduced number of
requests are queued during peak loads on a single resource
-can help in improving the reliability of the system by reducing the resource utilization
so as to be farther away from the breaking point of the resource

-Scaling strategies

-Horizontal scaling (scale out and in): additional resources can be added to the
system or extraneous resources can be removed from the system. This type of
scaling is beneficial for the server tier, when the load on the system is unpredictable
and fluctuates inconsistently

-Vertical scaling (scale up and down): the service can be moved to a larger instance
that can serve more requests. This is suitable for small applications that experience
a low amount of traffic.

-Considerations for scaling

-Monitoring: one of the most crucial elements for effectively scaling resources, as it
enables you to have metrics that can be used to interpret which parts of the system
need to scale and when they need to scale. Enables the analysis of traffic patterns or
resource utilization in order to make an educated assessment about when and how
much to scale resources in order to maximize QoS along with profit.

-Statelessness: A stateless service design lends itself to a scalable architecture. A


stateless service essentially means that the client request contains all the
information necessary to serve a request by the server. The server does not store
any client-related information on the instance and does store any session-related
information on the server instance. Helps in switching resources at will, without any
configuration required to maintain the context (state) of the client connection for
subsequent requests

-Decide what to scale: Depending on the nature of the service, different resources
need to be scaled depending on the requirement. For the server tier, as the
workloads increase, depending on the type of application, it may increase the
resource contention for either CPU, memory, network bandwidth, or all of the above.
Monitoring the traffic allows us to identify which resource is getting constrained and
appropriately scale that specific resource. Increasing hardware resources may not
always be the best solution for increasing the performance of a service. Increasing
the efficiency of the algorithms used within the service can also help in reducing
resource contention and improve utilization, removing the need to scale physical
resources.
-Scale the data tier: In data-oriented applications, where there is a high number of
reads and writes (or both) to a database or storage system, the round-trip time for
each request is often limited by the hard disk's I/O read and write times. Larger
instances allow for higher I/O performance for reads and writes, which can improve
seek times on the hard disk, which in turn can result in a large improvement in the
latency of the service.

TAIL LATENCY

-percentile of latency on the requests


-apps needs to be tail tolerant to reduce the percentile of latency on the requests

-Variability in the cloud: Sources and mitigation

-Use of shared resources: Many different VMs (and applications within those VMs)
contend for a shared pool of compute resources. In rare cases, it is possible that this
contention leads to low latency for some requests.

-Background daemons and maintenance: synchronize disruptions due to


maintenance threads to minimize the impact on the flow of traffic. This will cause all
variation to occur in a short, well-known window rather than randomly over the
lifetime of the application.

-Queueing: using FIFO scheduling in the OS reduces tail latency at the cost of
lowering overall throughput of the system.

-All-to-all cast: use custom network drivers to dynamically adjust the TCP receiving
window and the retransmission timer. Routers may also be configured to drop traffic
that exceeds a specific rate and reduce the size of the sending.

-Power and temperature management: variability is a byproduct of other cost


reduction techniques like using idle states or CPU frequency downscaling. A
processor may often spend a non-trivial amount of time scaling up from an idle state.
Turning off such cost optimizations leads to higher energy usage and costs, but
lower variability. This is less of a problem in the public cloud, as pricing models rarely
consider internal utilization metrics of the customer's resources.

-Engineering solutions to variability

-”Good enough” results: simply respond to the users with results that arrive within a
particular, short latency window and discard the rest.
-Canaries: test a request on a small subset of leaf nodes in order to test if it causes a
crash or failure that can impact the entire system. The full fan-out query is generated
only if the canary does not cause a failure.

-Latency-induced probation and health checks: periodically monitor the health and
latency of each leaf node and not route requests to nodes that demonstrate low
performance (due to maintenance or failures).

-Differential QoS: Separate service classes can be created for interactive requests,
allowing them to take priority in any queue. Latency-insensitive applications can
tolerate longer waiting times for their operations.

-Request hedging: reduce the impact of variability by forwarding the same request to
multiple replicas and using the response that arrives first. Of course, this can double
or triple the amount of resources required. To reduce the number of hedged
requests, the second request may be sent only if the first response has been
pending for greater than the 95th percentile of the expected latency for that request.

-Speculative execution and selective replication: Tasks on nodes that are particularly
busy can be speculatively launched on other underutilized leaf nodes. This is
especially effective if a failure in a particular node causes it to be overloaded.

-UX-based solutions: the delay can be intelligently hidden from the user through a
well-designed user interface that reduces the sensation of delay experienced by a
human user. Techniques to do this may include the use of animations, showing early
results, or engaging the user by sending relevant messages.

ECONOMICS FOR CLOUD APPS

-Pricing models

-Time-based: Resources are charged based on the amount of time they are
provisioned to the user.

-Capacity-based: Users are charged based on the amount of a particular resource


that is utilized or consumed.

-Performance-based: In many cloud providers, users can select a higher


performance level for resources by paying a higher rate. For virtual machines, larger,
more powerful machines with more CPU, memory, and disk capacity can be
provisioned at a higher hourly rate.
-On-demand and pay-as-you-go pricing: This is generally the most expensive pricing
model for long-term usage. Payments are made for a very short period of usage
(generally metered in minutes or hours). The advantage is that there is no need for a
long-term contract, making it very flexible to scale in and out based on the current
need.

-Reserved instances and subscription-based pricing: Instead of paying an hourly or


per-minute rate, a user can choose to pre-pay and reserve a resource for a fairly
long period of time (weeks or months).

-Spot pricing: way for CSPs to deal with excess unutilized capacity by offering it for
sale at significantly lower prices than on-demand resources. The prices are
determined by a user auction, where users bid the maximum amount that they are
willing to pay for a resource.

-Optimize the cost utilization

Before considering cost requirements, an organization must plan the amount of work
that it is capable of completing in a given period based on fixed resources like the
amount of staff, while dealing with physical constraints like inventory management,
overhead due to transportation, material handling, etc. The provisioning of IT
resources must be designed to meet or exceed the physical capacity of the
organization. This is extremely important, because the elasticity provided by the
cloud tempts development teams to simply add resources as needed, without
considering the cost implications of their decisions.

It is important to build a monitoring and visualization system to monitor the various


resources being used. The monitoring system must be designed to trigger scaling
events in response to observed patterns of overload or idleness.
For instance, any jobs that are run for a short time on a nightly or weekly basis
should not use resources 24/7. Idle resources should also be flagged and terminated
(based on certain rules) by the monitoring system.

Summary

● Cloud applications must take precautions to ensure that they use resources that
help them meet their bandwidth and latency requirements, as well as follow
security best practices.
● Applications deployed on the cloud are often subject to performance variance
due to the shared nature of the cloud.
● The cloud makes it easy to maintain several different environments apart from
production. Application pipelines are maintained using code repository and
version control systems, and they're automated using continuous integration
tools.
● Planning for failure is crucial. Redundancy is the key technique used to ensure
resilience, often using replicas deployed across availability zones and regions.
● Redundant resources are generally monitored and accessed using a central,
highly available load balancer. High availability is ensured by switching over to a
backup instance when one fails.
● Companies like Netflix and Facebook inject large random (or planned) failures in
their datacenters and cloud operations to test for fault tolerance.
● Load balancing also supports horizontal scaling, whereby more identical
resources can be thrown at a problem. The other type of scaling is vertical,
where the size or capacity of existing resources is increased.
● Horizontal scaling across too many nodes leads to the problem of tail latency,
where the performance of the application is determined by its slowest
component. This is because of variability of performance on the cloud, and also
because applications with a large fan-out trigger bursts of activity at each stage.
● Finally, the lack of standardization and high competitiveness of the cloud market
lead to interesting opportunities and challenges to minimize costs.

You might also like