SlideShare a Scribd company logo
Felix Gessert, Florian Bücklers
{fg,fb}@baqend.com
Building a Global-Scale Multi-
Tenant Cloud Platform on AWS and
Docker: Lessons Learned
@baqendcom
Docker
Concepts
Clustering: AWS ECS vs.
Docker Swarm
Part One Part Two Part Three
Baqend & Our
Infrastructure
Presentation
is loading
Average: 9,3s
The Latency Problem
Loading…
Average: 9,3s
The Latency Problem
Loading…
-1% Revenue
100 ms
Average: 9,3s
The Latency Problem
Loading…
-1% Revenue
-9% Visitors
400 ms
Average: 9,3s
The Latency Problem
Loading…
-1% Revenue
-9% Visitors
500 ms
-20% Traffic
If perceived speed is such an
import factor
...what causes slow page load times?
State of the Art
Twobottlenecks: latencyund processing
State of the Art
Twobottlenecks: latencyund processing
Processing Time
State of the Art
Twobottlenecks: latencyund processing
High Latency
Processing Time
Problem: Network Latency
I. Grigorik, High performance browser networking.
O’Reilly Media, 2013.
Problem: Netzwerklatenz
I. Grigorik, High performance browser networking.
O’Reilly Media, 2013.
2× Bandwidth = Same Load Time
½ Latency ≈ ½ Load Time
Low-Latency
Data is served by ubiquitous web-caches
Low-Latency
Data is served by ubiquitous web-caches
Low Latency
Low-Latency
Data is served by ubiquitous web-caches
Low Latency
Less Processing
Scaling
Scalable and highlyavailable
5 Years
Research & Development
New Algorithms
Solve Consistency Problem
Innovation
Problem: changescause stale data
5 Years
Research & Development
New Algorithms
Solve Consistency Problem
Innovation
Problem: changescause stale data
5 Years
Research & Development
New Algorithms
Solve Consistency Problem
Innovation
Problem: changescause stale data
5 Years
Research & Development
New Algorithms
Solve Consistency Problem
Stale
Data
Innovation
Problem: changescause stale data
Innovation
Solution: Baqendproactivelyrevalidates data
Bloomfilter
update1 0 11 0 0 10 1 1
5 Years
Research & Development
New Algorithms
Solve Consistency Problem
Innovation
Solution: Baqendproactivelyrevalidates data
Bloomfilter
updateIs still fresh? 1 0 11 0 0 10 1 1
5 Years
Research & Development
New Algorithms
Solve Consistency Problem
Innovation
Solution: Baqendproactivelyrevalidates data
F. Gessert, F. Bücklers, und N. Ritter, „ORESTES:a Scalable
Database-as-a-Service Architecture for Low Latency“, in
CloudDB 2014, 2014.
F. Gessert und F. Bücklers, „ORESTES: ein System für horizontal
skalierbaren Zugriff auf Cloud-Datenbanken“, in Informatiktage
2013, 2013.
F. Gessert, S. Friedrich, W. Wingerath, M. Schaarschmidt, und
N. Ritter, „Towards a Scalable and Unified REST API for Cloud
Data Stores“, in 44. Jahrestagung der GI, Bd. 232, S. 723–734.
F. Gessert, M. Schaarschmidt, W. Wingerath, S. Friedrich, und
N. Ritter, „The Cache Sketch:Revisiting Expiration-based
Caching in the Age of Cloud Data Management“, in BTW 2015.
F. Gessert und F. Bücklers, Performanz- und
Reaktivitätssteigerung von OODBMS vermittels der Web-
Caching-Hierarchie. Bachelorarbeit, 2010.
F. Gessert und F. Bücklers, KohärentesWeb-Caching von
Datenbankobjektenim Cloud Computing. Masterarbeit 2012.
W. Wingerath, S. Friedrich, und F. Gessert, „Who Watches the
Watchmen? On the Lack of Validation in NoSQL
Benchmarking“, in BTW 2015.
M. Schaarschmidt, F. Gessert, und N. Ritter, „Towards
Automated Polyglot Persistence“, in BTW 2015.
S. Friedrich, W. Wingerath, F. Gessert, und N. Ritter, „NoSQL
OLTP Benchmarking: A Survey“, in 44. Jahrestagungder
Gesellschaftfür Informatik, 2014, Bd. 232, S. 693–704.
F. Gessert, „Skalierbare NoSQL- und Cloud-Datenbanken in
Forschung und Praxis“, BTW 2015
Page-Load Times
What impact doescaching have in practice?
Page-Load Times
What impact doescaching have in practice?
Page-Load Times
What impact doescaching have in practice?0,7s
1,8s
2,8s
3,6s
3,4s
CALIFORNIEN
0,5s
1,8s
2,9s
1,5s
1,3s
FRANKFURT
0,6s
3,0s
7,2s
5,0s
5,7s
SYDNEY
0,5s
2,4s
4,0s
5,7s
4,7s
TOKYO
How is this used from a
develeoper‘s
perspective?
Backend-as-a-Service
DB.Tankstellen.find()
.near("location", myLoc, 5000)
.lessThen("closing", time)
.greaterThen("opening", time)
.descending("price")
.resultList();
Baqend Architecture
OurInfrastructure
Content-Delivery-
Network
Polyglot Storage
Baqend Architecture
OurInfrastructure
Content-Delivery-
Network
Database-as-a-Service Middleware:
Caching, Transactions, Schemas,
Invalidation Detection, …
Baqend Architecture
OurInfrastructure
Content-Delivery-
Network
Standard HTTP Caching
Baqend Architecture
OurInfrastructure
Content-Delivery-
Network
Unified REST API
Baqend Architecture
Our Infrastructure
Content-Delivery-
Network
IaaS-Cloud
on
Baqend Architecture
Our Infrastructure
Content-Delivery-
Network
IaaS-Cloud
on
CDN
on
Baqend Architecture
Our Infrastructure
Content-Delivery-
Network
IaaS-Cloud
on
CDN
on
n
Orestes
1
mongos
1
1
1
ZooKeeper
BBQ Manager
Swarm Manager
1
syslog
1
mongos
Docker Cluster
n
Node
Docker Link
HTTPS on
Random Port
metadata
data
Docker Daemon
Users
HTTP/HTTPS
Forbidden
Docker DaemonDocker Daemon
Virtual Private Cloud
1 Logging
Server
1 Management
Server
N App Servers
SSL
 Route 53, EC2, ASGs, IAM etc.
 Elastic LoadBalancer: TCP Balancing for Logging
◦ Not suited for multi-tenant SSL termination: ELB cannot
dynamically route to an IP:port pair
 RedisElastiCache: Metadata Storage
◦ Easy to use but very limited: no Redis cluster support, no
append-only files, bad snapshotting
 What we don‘t use:
◦ Beanstalk: supports Docker but needs a dedicated EC2 instance
◦ Cloudfront: useless invalidations, expensive
◦ DynamoDB: difficult to scale, very limited queries
AWS Services
Serviceswe use
 Every tenant needs a private JVM and Node.JS process

Containerization
Why we needcontainers& cluster management
Baqend
Server
Customer‘s
Business
Logic
 Every tenant needs a private JVM and Node.JS process
 Provisioning new instances needs to be fast & easy:
Containerization
Why we needcontainers& cluster management
Baqend
Server
Customer‘s
Business
Logic
Launch
App
BBQ
Manager
Start
Configure databases,
CDN, etc.
Problem: Many Technology Choices
Emerging Frameworksand Tools
 Cluster Managers & Orchestration Tools:

Google Kubernetes Apache Mesos Docker Swarm
Problem: Many Technology Choices
Emerging Frameworksand Tools
 Cluster Managers & Orchestration Tools:
Container Cloud Platforms:
Google Kubernetes Apache Mesos Docker Swarm
Amazon Elastic
Container Service
RancherTutum Google Container
Engine
Problem: Many Technology Choices
Emerging Frameworksand Tools
 Cluster Managers & Orchestration Tools:
Container Cloud Platforms:
Google Kubernetes Apache Mesos Docker Swarm
Amazon Elastic
Container Service
RancherTutum Google Container
Engine
and many more: Azure Container
Service (Microsoft), Nomad
(HashiCorp), Diego (Cloud Foundry),
Fleet (CoreOs), ContainerShip, YARN
(Hadoop), …
Live Demo: Launching a container
Docker Concepts
What is Docker?
Source: https://docs.docker.com/engine/introduction/understanding-docker/
 The docker image can be hosted and transferred to different
hosts (DockerRegistry)
 The docker image can be executedas a new container on any
machine that runs a Dockerdaemon
 Updatesare handled by just stopping and starting a new
container
 Docker typically
isolates a single
application
 An applicationis built
into a Dockerimage
(including the OS)
 Docker runs on all common Linux distributions
 Docker can be installed from Docker’s own package
repository
 The Docker daemon can be configured by editing
/etc/default/docker
 The Docker daemon allows many useful configurations:
◦ Inter-container communication
◦ Docker remote REST API
◦ Labeling
◦ DNS configuration
◦ IP forwarding (disables internet for containers)
◦ SSL encryption for the Docker damon
Docker Architecture
Howto set up a Dockerhost
The Dockerfile
Howto builda Dockerimage
FROM ubuntu:latest
ENV DEBIAN_FRONTEND noninteractive
# java
RUN apt-get install -y software-properties-common && 
add-apt-repository -y ppa:webupd8team/java && 
apt-get update && 
echo debconf shared/accepted-oracle-license-v1-1 select true 
| debconf-set-selections && 
apt-get install -y oracle-java8-installer
# extract and install packages
ADD baqend-package*.tgz /opt
ADD config.json /opt/baqend/
EXPOSE 8080
WORKDIR /opt/baqend/
ENTRYPOINT ["java", "-classpath", "/opt/baqend/lib/*", "info.orestes.Launcher"]
CMD ["--config", "config.json"]
 Filesystem: by using multiple read-only file systems and
mounting a read-write file system on top
 Data volumes: mount additional physical volumes into
the container
 CPU: by CPU shares and core limitation
 Memory: by defining memory constraints
 Network: by using virtual networks
 Systemprivileges: such as port binding, execution
rights, inter process communication, etc.
 Logging: by using docker logging capabilities or external
loggers (json, syslog, aws, etc...)
How a Docker container works
Isolation,performance, light-weight
 Most constraints are set when the container is started
Docker Options
Imposingconstraints on containers
--add-host=[] Add a custom host-to-IP mapping (host:ip)
--cpu-shares=0 CPU shares (relative weight)
--cpu-quota=0 Limit CPU CFS (Completely Fair Scheduler) quota
-e, --env=[] Set environment variables
-l, --label=[] Set metadata on the container (e.g., --label=key=value)
--link=[] Add link to another container
-m, --memory="" Memory limit
--memory-swap="" Total memory (memory + swap), '-1' to disable swap
--name="" Assign a name to the container
--net="bridge" Connects a container to a network
'bridge': creates a new network stack on the docker bridge
'none': no networking for this container
'container:<name|id>': reuses another container network stack
'host': use the host network stack inside the container
'NETWORK': connects the container to user-created network
--oom-kill-disable=false Whether to disable OOM Killer for the container or not
-p, --publish=[] Publish a container's port(s) to the host
--read-only=false Mount the container's root filesystem as read only
--restart="no" Restart policy (no, on-failure[:max-retry], always)
-v, --volume=[] Bind mount a volume
 Docker containers can talk to each other by default
 Communication between containers can be restricted
by the daemon option: –-icc=false
 Docker containers can discover other linked containers
by their names
Docker Networking
Making containerstalk to each other
EXPOSE 8080
docker run --name="orestes" orestes docker run --link="orestes" node
Can access orestes:8080
Port 8080 not published,
(can’t be accessed from host
or other containers)
 Docker containers can talk to each other by default
 Communication between containers can be restricted
by the daemon option: –-icc=false
 Docker containers can discover other linked containers
by their names
Docker Networking
Making containerstalk to each other
EXPOSE 8080
docker run --name="orestes"
-p 0.0.0.0:80:8080 orestes
docker run --link="orestes" node
Can access orestes:8080
Port 8080 is published and
can be accessed on the host
port 80
 AWS provides ECS-optimizedAMIs for simple deployment
 ECS manages EC2 instances by running an ECS Agent on each instance
 ECS can automatically deploy and scale new Docker containers
specified by a Task definition across the ECS Cluster
Elastic Container Service
HowAmazonECSworks
ECS Cluster
Docker Daemon
ECS Agent
Docker Daemon
ECS Agent
Docker Daemon
ECS Agent
 ECS groups containers into Tasks and deploys them
together
 A Task definitiondescribes:
◦ The Docker images
◦ Resource requirements
◦ Environment variables
◦ Network links
◦ Data Volumes
 ECS Services can be used to keep a specified number of
Tasks running
 ECS can autoscale a Service when it is used with an ELB
ECS: Tasks and Services
Defininggroupsof containers
 ECS has used an outdated version of docker, now it’s 1.9, yeah!
 Tasks can now be parametrized using commandline args
 Previously only environment variables could be passed while
starting a Task
 Environment variables are exposed to linked containers, this can
be a security issue!
Limitations that AWS fixed
Old Docker, Parameterization
docker run --name="orestes"
--env SECRET=7kekfjd9e
docker run --link="orestes" node
Can access env
ORESTES_SECRET
Untrusted ProcessSecured Process
https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/#environment-variables
 ECS uses hard memory constraints (run –m) for Tasks to
schedule container placement
 This allocates a fixed amountof memoryon the EC2 instance
and can’t be exceeded by the process
 This is very ugly for shared, multi tenant applications:
◦ Setting the constraint too low causes Docker to kill the process on
memory peaks
◦ Setting the value too high limits the number of containers that can be
launched per EC2 instance
 Neither Docker’s memory swapping nor unlimited memory
usage is allowed by ECS
Current Limitation: Memory Constraints
RestrictingRAM consumption
 Docker has introduced a new network API, which allows
to create custom virtual networks
Current Limitation: Networking
Docker‘snew network API not supported
 BridgeNetworksconnect
groups of containers together
and isolate them from other
groups on the same host
 OverlayNetworksuse a key-
value store to connect
containers across different
host machines
Source: https://docs.docker.com/engine/userguide/networking/dockernetworks/
 Very simplesetup,thanks to
the optimized ECS AMI
 Task abstraction makes it
really comfortable to start
multiple containers together
 Services ensures that the
desired count of tasks are
always up and running
 Automatically starts newEC2
instancesif no capacity is left
for new containers
 Can be combined with an ELB
for a high availability setup
 Many Docker options aren’t
available
 Service Tasks can’t be
parametrized
 Runningthe sameServices for
different tenants on the same
EC2 instance is not possible
 Only the legacy networking is
supported
 New featureswill always be
delayed since they must first
be implemented in ECS
 Docker Swarm is Docker’s nativesolution for cluster
management
 Docker Swarm uses a discovery serviceto manage the
shared state of the cluster
 The following backends for discovery are supported:
◦ Docker Hub (for development only)
◦ Static file
◦ etcd
◦ consul
◦ zookeeper
◦ IP list or a range pattern of IPs
Docker Swarm
A replacementfor ECS
Swarm Architecture
Clustermanagement withDockerSwarm
Docker Swarm Cluster
Docker Daemon
Swarm Agent
Expose 2375
Docker Daemon
Swarm Agent
Expose 2375
Docker Daemon
Swarm Agent
Expose 2375
ZooKeeper ZooKeeper
Swarm Manager
Docker Client
 The Swarm manager acts as a proxy of the Docker
Remote API
◦ All Docker run options are available in Swarm, too
 Docker Swarm can be combined with overlaynetworks
◦ Containers can connect to others by just using the containers
name (servicediscovery)
◦ Works across Docker hosts, availability zones and external
hosts
 Containers can use any other service without defining
them in a group (such as a Task)
Swarm is Docker
Fixing the shortcomingsof ECS
 Docker hosts can be added and removed to the Swarm
Cluster silently
 Swarm provides an API to gather CPU usage and
memory consumption of hosts or containers
 Swarm provides no concept to scale services within
containers
Autoscaling in Swarm
Scale-out and scale-in
 LabeledDocker daemons can be used by the manager
to run specific containers only on specific hosts
 Containers can be launched:
◦ On the same host where other containers are running
◦ In a specific availability zone
◦ On hosts with special capabilities (RAM, CPU or SSD)
 The Docker daemon can restartfailed containers using
a restart policy --restart="yes"
 Containers will also be restarted if the docker host
restarts
 Failed machines must be handled manually
High Availability in Swarm
Handling failures and outages
 Swarm requires that the Docker daemon is exposed via
TCP
 In most setups this will be a security issue since you can
easily get root permission on the Docker host
 Also containers can access the exposed API by default
 Therefore it is recommended to always secure the
Docker daemons on each host with SSL
 Docker supports SSL client, server and both
authentication mechanisms
 SSL server authentication is not very practical since it
requires a signed certificate for each host
Securing Swarm Hosts
Securitypitfalls
Securing Swarm Hosts
Securitypitfalls
 Securing a Swarm cluster requires signed SSL certificates on all
docker hosts, on the swarm manager and the docker client
Docker Swarm Cluster
Docker Daemon
Expose 2375
Docker Daemon
Expose 2375
Docker Daemon
Expose 2375
Swarm Manager
Docker Client
Certificate
Authority
Client
Certificate
Server
Certificates
Server/Client
Certificate
Wrap-up: Docker Swarm
ProsandCons
 Swarmis Docker, all Docker
options are available
 LabelingDocker hosts,
allows to deploy containers
on specific hosts
 OverlayNetworksallow
containers to communicate
across hosts
 ServiceDiscoveryacross
containers is made really
simple
 Complexsetupand many
componentsare required
for a completesetup
 No built-in way for
autoscalingservices
 Still many bugs
 The Docker Swarm API
integrationinto Docker is
not yet completed
Conclusions
ECSvs Swarm
 SimpleSetup
 Taskand Servicedefinition
makes it easy to deploy
and update containers
 Detect failuresand restart
failed tasks within services
 Integratedinto other AWS
Services such as Elastic
Load Balancers and Auto
Scaling Groups
 Complex Setup
 Many configuration options
for deploying containers
 Is compatibleto the Docker
API, allows to use all
Docker clients
 Supports Docker’s network
API
 No Vendor Lock-In
Ziel mit InnoRampUp
Want to try Baqend?
Download Community
Edition
Invited-Beta Cloud Instance
support@baqend.com
Baqend Cloud launching
this February

More Related Content

PDF
Building an Angular 2 App
PDF
Building a multi-tenant cloud service from legacy code with Docker containers
PPTX
Multi tenancy for docker
PDF
Lessons learned from writing over 300,000 lines of infrastructure code
PPTX
Serverless and Servicefull Applications - Where Microservices complements Ser...
PDF
JClouds at San Francisco Java User Group
PDF
Shakr - Container CI/CD with Google Cloud Platform
PDF
"[WORKSHOP] K8S for developers", Denis Romanuk
Building an Angular 2 App
Building a multi-tenant cloud service from legacy code with Docker containers
Multi tenancy for docker
Lessons learned from writing over 300,000 lines of infrastructure code
Serverless and Servicefull Applications - Where Microservices complements Ser...
JClouds at San Francisco Java User Group
Shakr - Container CI/CD with Google Cloud Platform
"[WORKSHOP] K8S for developers", Denis Romanuk

What's hot (20)

PDF
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
PDF
Secrets in Kubernetes
PDF
Microservices @ Work - A Practice Report of Developing Microservices
PPTX
VMware Hybrid Cloud Service - Overview
PDF
jclouds High Level Overview by Adrian Cole
PDF
Google Cloud Platform Kubernetes Workshop IYTE
PDF
Containerized Storage for Containers
PDF
Sf bay area Kubernetes meetup dec8 2016 - deployment models
PDF
Running your dockerized application(s) on AWS Elastic Container Service
PDF
Kubernetes Requests and Limits
PDF
Getting started with kubernetes
PDF
Clocker and OpenStack
PDF
(Draft) Kubernetes - A Comprehensive Overview
PDF
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
PDF
Kubernetes best practices
PDF
Securing Containers - Sathyajit Bhat - Adobe - Container Conference 18
PDF
Distributed Locking in Kubernetes
KEY
Jclouds Intro
PDF
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
PDF
Infrastructure-as-code: bridging the gap between Devs and Ops
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
Secrets in Kubernetes
Microservices @ Work - A Practice Report of Developing Microservices
VMware Hybrid Cloud Service - Overview
jclouds High Level Overview by Adrian Cole
Google Cloud Platform Kubernetes Workshop IYTE
Containerized Storage for Containers
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Running your dockerized application(s) on AWS Elastic Container Service
Kubernetes Requests and Limits
Getting started with kubernetes
Clocker and OpenStack
(Draft) Kubernetes - A Comprehensive Overview
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
Kubernetes best practices
Securing Containers - Sathyajit Bhat - Adobe - Container Conference 18
Distributed Locking in Kubernetes
Jclouds Intro
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
Infrastructure-as-code: bridging the gap between Devs and Ops
Ad

Similar to Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lessons Learned (20)

PPTX
Deploying windows containers with kubernetes
PDF
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
PDF
Docker and Cloud - Enables for DevOps - by ACA-IT
PDF
'DOCKER' & CLOUD: ENABLERS For DEVOPS
PDF
Kubernetes security
PDF
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
PPTX
Drilett aws vpc_presentation_shared
PPTX
Gaming across multiple devices
PDF
Kubernetes and Hybrid Deployments
PDF
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
PDF
A hitchhiker‘s guide to the cloud native stack
PPTX
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
PPT
ArcReady - Architecting For The Cloud
PPTX
DevOps with Kubernetes and Helm - Jenkins World Edition
KEY
Practical Use of MongoDB for Node.js
PPTX
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
PPTX
Private Cloud with Open Stack, Docker
PPTX
DevOps with Kubernetes and Helm - OSCON 2018
PPTX
Sky High With Azure
PDF
Cloud-native .NET Microservices mit Kubernetes
Deploying windows containers with kubernetes
MongoDB World 2016: Get MEAN and Lean with MongoDB and Kubernetes
Docker and Cloud - Enables for DevOps - by ACA-IT
'DOCKER' & CLOUD: ENABLERS For DEVOPS
Kubernetes security
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
Drilett aws vpc_presentation_shared
Gaming across multiple devices
Kubernetes and Hybrid Deployments
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A hitchhiker‘s guide to the cloud native stack
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
ArcReady - Architecting For The Cloud
DevOps with Kubernetes and Helm - Jenkins World Edition
Practical Use of MongoDB for Node.js
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Private Cloud with Open Stack, Docker
DevOps with Kubernetes and Helm - OSCON 2018
Sky High With Azure
Cloud-native .NET Microservices mit Kubernetes
Ad

More from Felix Gessert (8)

PDF
Speed Kit: Getting Websites out of the Web Performance Stone Age
PPTX
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
PDF
Web Performance – die effektivsten Techniken aus der Praxis
PDF
Bloom Filters for Web Caching - Lightning Talk
PDF
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
PDF
Pitch auf den Hamburger IT-Strategietagen
PDF
Intro to Baqend
PDF
Cloud Databases in Research and Practice
Speed Kit: Getting Websites out of the Web Performance Stone Age
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Web Performance – die effektivsten Techniken aus der Praxis
Bloom Filters for Web Caching - Lightning Talk
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Pitch auf den Hamburger IT-Strategietagen
Intro to Baqend
Cloud Databases in Research and Practice

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Newfamily of error-correcting codes based on genetic algorithms
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
Omni-Path Integration Expertise Offered by Nor-Tech
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Advanced IT Governance
PPTX
CroxyProxy Instagram Access id login.pptx
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
MYSQL Presentation for SQL database connectivity
Modernizing your data center with Dell and AMD
Chapter 2 Digital Image Fundamentals.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Newfamily of error-correcting codes based on genetic algorithms
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
Omni-Path Integration Expertise Offered by Nor-Tech
Advanced Soft Computing BINUS July 2025.pdf
Advanced IT Governance
CroxyProxy Instagram Access id login.pptx
KodekX | Application Modernization Development
NewMind AI Monthly Chronicles - July 2025
SAP855240_ALP - Defining the Global Template PUBLIC.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
MYSQL Presentation for SQL database connectivity

Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lessons Learned

  • 1. Felix Gessert, Florian Bücklers {fg,fb}@baqend.com Building a Global-Scale Multi- Tenant Cloud Platform on AWS and Docker: Lessons Learned @baqendcom
  • 2. Docker Concepts Clustering: AWS ECS vs. Docker Swarm Part One Part Two Part Three Baqend & Our Infrastructure
  • 4. Average: 9,3s The Latency Problem Loading…
  • 5. Average: 9,3s The Latency Problem Loading… -1% Revenue 100 ms
  • 6. Average: 9,3s The Latency Problem Loading… -1% Revenue -9% Visitors 400 ms
  • 7. Average: 9,3s The Latency Problem Loading… -1% Revenue -9% Visitors 500 ms -20% Traffic
  • 8. If perceived speed is such an import factor ...what causes slow page load times?
  • 9. State of the Art Twobottlenecks: latencyund processing
  • 10. State of the Art Twobottlenecks: latencyund processing Processing Time
  • 11. State of the Art Twobottlenecks: latencyund processing High Latency Processing Time
  • 12. Problem: Network Latency I. Grigorik, High performance browser networking. O’Reilly Media, 2013.
  • 13. Problem: Netzwerklatenz I. Grigorik, High performance browser networking. O’Reilly Media, 2013. 2× Bandwidth = Same Load Time ½ Latency ≈ ½ Load Time
  • 14. Low-Latency Data is served by ubiquitous web-caches
  • 15. Low-Latency Data is served by ubiquitous web-caches Low Latency
  • 16. Low-Latency Data is served by ubiquitous web-caches Low Latency Less Processing
  • 18. 5 Years Research & Development New Algorithms Solve Consistency Problem Innovation Problem: changescause stale data
  • 19. 5 Years Research & Development New Algorithms Solve Consistency Problem Innovation Problem: changescause stale data
  • 20. 5 Years Research & Development New Algorithms Solve Consistency Problem Innovation Problem: changescause stale data
  • 21. 5 Years Research & Development New Algorithms Solve Consistency Problem Stale Data Innovation Problem: changescause stale data
  • 22. Innovation Solution: Baqendproactivelyrevalidates data Bloomfilter update1 0 11 0 0 10 1 1 5 Years Research & Development New Algorithms Solve Consistency Problem
  • 23. Innovation Solution: Baqendproactivelyrevalidates data Bloomfilter updateIs still fresh? 1 0 11 0 0 10 1 1 5 Years Research & Development New Algorithms Solve Consistency Problem
  • 24. Innovation Solution: Baqendproactivelyrevalidates data F. Gessert, F. Bücklers, und N. Ritter, „ORESTES:a Scalable Database-as-a-Service Architecture for Low Latency“, in CloudDB 2014, 2014. F. Gessert und F. Bücklers, „ORESTES: ein System für horizontal skalierbaren Zugriff auf Cloud-Datenbanken“, in Informatiktage 2013, 2013. F. Gessert, S. Friedrich, W. Wingerath, M. Schaarschmidt, und N. Ritter, „Towards a Scalable and Unified REST API for Cloud Data Stores“, in 44. Jahrestagung der GI, Bd. 232, S. 723–734. F. Gessert, M. Schaarschmidt, W. Wingerath, S. Friedrich, und N. Ritter, „The Cache Sketch:Revisiting Expiration-based Caching in the Age of Cloud Data Management“, in BTW 2015. F. Gessert und F. Bücklers, Performanz- und Reaktivitätssteigerung von OODBMS vermittels der Web- Caching-Hierarchie. Bachelorarbeit, 2010. F. Gessert und F. Bücklers, KohärentesWeb-Caching von Datenbankobjektenim Cloud Computing. Masterarbeit 2012. W. Wingerath, S. Friedrich, und F. Gessert, „Who Watches the Watchmen? On the Lack of Validation in NoSQL Benchmarking“, in BTW 2015. M. Schaarschmidt, F. Gessert, und N. Ritter, „Towards Automated Polyglot Persistence“, in BTW 2015. S. Friedrich, W. Wingerath, F. Gessert, und N. Ritter, „NoSQL OLTP Benchmarking: A Survey“, in 44. Jahrestagungder Gesellschaftfür Informatik, 2014, Bd. 232, S. 693–704. F. Gessert, „Skalierbare NoSQL- und Cloud-Datenbanken in Forschung und Praxis“, BTW 2015
  • 25. Page-Load Times What impact doescaching have in practice?
  • 26. Page-Load Times What impact doescaching have in practice?
  • 27. Page-Load Times What impact doescaching have in practice?0,7s 1,8s 2,8s 3,6s 3,4s CALIFORNIEN 0,5s 1,8s 2,9s 1,5s 1,3s FRANKFURT 0,6s 3,0s 7,2s 5,0s 5,7s SYDNEY 0,5s 2,4s 4,0s 5,7s 4,7s TOKYO
  • 28. How is this used from a develeoper‘s perspective? Backend-as-a-Service
  • 29. DB.Tankstellen.find() .near("location", myLoc, 5000) .lessThen("closing", time) .greaterThen("opening", time) .descending("price") .resultList();
  • 37. n Orestes 1 mongos 1 1 1 ZooKeeper BBQ Manager Swarm Manager 1 syslog 1 mongos Docker Cluster n Node Docker Link HTTPS on Random Port metadata data Docker Daemon Users HTTP/HTTPS Forbidden Docker DaemonDocker Daemon Virtual Private Cloud 1 Logging Server 1 Management Server N App Servers SSL
  • 38.  Route 53, EC2, ASGs, IAM etc.  Elastic LoadBalancer: TCP Balancing for Logging ◦ Not suited for multi-tenant SSL termination: ELB cannot dynamically route to an IP:port pair  RedisElastiCache: Metadata Storage ◦ Easy to use but very limited: no Redis cluster support, no append-only files, bad snapshotting  What we don‘t use: ◦ Beanstalk: supports Docker but needs a dedicated EC2 instance ◦ Cloudfront: useless invalidations, expensive ◦ DynamoDB: difficult to scale, very limited queries AWS Services Serviceswe use
  • 39.  Every tenant needs a private JVM and Node.JS process  Containerization Why we needcontainers& cluster management Baqend Server Customer‘s Business Logic
  • 40.  Every tenant needs a private JVM and Node.JS process  Provisioning new instances needs to be fast & easy: Containerization Why we needcontainers& cluster management Baqend Server Customer‘s Business Logic Launch App BBQ Manager Start Configure databases, CDN, etc.
  • 41. Problem: Many Technology Choices Emerging Frameworksand Tools  Cluster Managers & Orchestration Tools:  Google Kubernetes Apache Mesos Docker Swarm
  • 42. Problem: Many Technology Choices Emerging Frameworksand Tools  Cluster Managers & Orchestration Tools: Container Cloud Platforms: Google Kubernetes Apache Mesos Docker Swarm Amazon Elastic Container Service RancherTutum Google Container Engine
  • 43. Problem: Many Technology Choices Emerging Frameworksand Tools  Cluster Managers & Orchestration Tools: Container Cloud Platforms: Google Kubernetes Apache Mesos Docker Swarm Amazon Elastic Container Service RancherTutum Google Container Engine and many more: Azure Container Service (Microsoft), Nomad (HashiCorp), Diego (Cloud Foundry), Fleet (CoreOs), ContainerShip, YARN (Hadoop), …
  • 44. Live Demo: Launching a container
  • 45. Docker Concepts What is Docker? Source: https://docs.docker.com/engine/introduction/understanding-docker/  The docker image can be hosted and transferred to different hosts (DockerRegistry)  The docker image can be executedas a new container on any machine that runs a Dockerdaemon  Updatesare handled by just stopping and starting a new container  Docker typically isolates a single application  An applicationis built into a Dockerimage (including the OS)
  • 46.  Docker runs on all common Linux distributions  Docker can be installed from Docker’s own package repository  The Docker daemon can be configured by editing /etc/default/docker  The Docker daemon allows many useful configurations: ◦ Inter-container communication ◦ Docker remote REST API ◦ Labeling ◦ DNS configuration ◦ IP forwarding (disables internet for containers) ◦ SSL encryption for the Docker damon Docker Architecture Howto set up a Dockerhost
  • 47. The Dockerfile Howto builda Dockerimage FROM ubuntu:latest ENV DEBIAN_FRONTEND noninteractive # java RUN apt-get install -y software-properties-common && add-apt-repository -y ppa:webupd8team/java && apt-get update && echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && apt-get install -y oracle-java8-installer # extract and install packages ADD baqend-package*.tgz /opt ADD config.json /opt/baqend/ EXPOSE 8080 WORKDIR /opt/baqend/ ENTRYPOINT ["java", "-classpath", "/opt/baqend/lib/*", "info.orestes.Launcher"] CMD ["--config", "config.json"]
  • 48.  Filesystem: by using multiple read-only file systems and mounting a read-write file system on top  Data volumes: mount additional physical volumes into the container  CPU: by CPU shares and core limitation  Memory: by defining memory constraints  Network: by using virtual networks  Systemprivileges: such as port binding, execution rights, inter process communication, etc.  Logging: by using docker logging capabilities or external loggers (json, syslog, aws, etc...) How a Docker container works Isolation,performance, light-weight
  • 49.  Most constraints are set when the container is started Docker Options Imposingconstraints on containers --add-host=[] Add a custom host-to-IP mapping (host:ip) --cpu-shares=0 CPU shares (relative weight) --cpu-quota=0 Limit CPU CFS (Completely Fair Scheduler) quota -e, --env=[] Set environment variables -l, --label=[] Set metadata on the container (e.g., --label=key=value) --link=[] Add link to another container -m, --memory="" Memory limit --memory-swap="" Total memory (memory + swap), '-1' to disable swap --name="" Assign a name to the container --net="bridge" Connects a container to a network 'bridge': creates a new network stack on the docker bridge 'none': no networking for this container 'container:<name|id>': reuses another container network stack 'host': use the host network stack inside the container 'NETWORK': connects the container to user-created network --oom-kill-disable=false Whether to disable OOM Killer for the container or not -p, --publish=[] Publish a container's port(s) to the host --read-only=false Mount the container's root filesystem as read only --restart="no" Restart policy (no, on-failure[:max-retry], always) -v, --volume=[] Bind mount a volume
  • 50.  Docker containers can talk to each other by default  Communication between containers can be restricted by the daemon option: –-icc=false  Docker containers can discover other linked containers by their names Docker Networking Making containerstalk to each other EXPOSE 8080 docker run --name="orestes" orestes docker run --link="orestes" node Can access orestes:8080 Port 8080 not published, (can’t be accessed from host or other containers)
  • 51.  Docker containers can talk to each other by default  Communication between containers can be restricted by the daemon option: –-icc=false  Docker containers can discover other linked containers by their names Docker Networking Making containerstalk to each other EXPOSE 8080 docker run --name="orestes" -p 0.0.0.0:80:8080 orestes docker run --link="orestes" node Can access orestes:8080 Port 8080 is published and can be accessed on the host port 80
  • 52.  AWS provides ECS-optimizedAMIs for simple deployment  ECS manages EC2 instances by running an ECS Agent on each instance  ECS can automatically deploy and scale new Docker containers specified by a Task definition across the ECS Cluster Elastic Container Service HowAmazonECSworks ECS Cluster Docker Daemon ECS Agent Docker Daemon ECS Agent Docker Daemon ECS Agent
  • 53.  ECS groups containers into Tasks and deploys them together  A Task definitiondescribes: ◦ The Docker images ◦ Resource requirements ◦ Environment variables ◦ Network links ◦ Data Volumes  ECS Services can be used to keep a specified number of Tasks running  ECS can autoscale a Service when it is used with an ELB ECS: Tasks and Services Defininggroupsof containers
  • 54.  ECS has used an outdated version of docker, now it’s 1.9, yeah!  Tasks can now be parametrized using commandline args  Previously only environment variables could be passed while starting a Task  Environment variables are exposed to linked containers, this can be a security issue! Limitations that AWS fixed Old Docker, Parameterization docker run --name="orestes" --env SECRET=7kekfjd9e docker run --link="orestes" node Can access env ORESTES_SECRET Untrusted ProcessSecured Process https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/#environment-variables
  • 55.  ECS uses hard memory constraints (run –m) for Tasks to schedule container placement  This allocates a fixed amountof memoryon the EC2 instance and can’t be exceeded by the process  This is very ugly for shared, multi tenant applications: ◦ Setting the constraint too low causes Docker to kill the process on memory peaks ◦ Setting the value too high limits the number of containers that can be launched per EC2 instance  Neither Docker’s memory swapping nor unlimited memory usage is allowed by ECS Current Limitation: Memory Constraints RestrictingRAM consumption
  • 56.  Docker has introduced a new network API, which allows to create custom virtual networks Current Limitation: Networking Docker‘snew network API not supported  BridgeNetworksconnect groups of containers together and isolate them from other groups on the same host  OverlayNetworksuse a key- value store to connect containers across different host machines Source: https://docs.docker.com/engine/userguide/networking/dockernetworks/
  • 57.  Very simplesetup,thanks to the optimized ECS AMI  Task abstraction makes it really comfortable to start multiple containers together  Services ensures that the desired count of tasks are always up and running  Automatically starts newEC2 instancesif no capacity is left for new containers  Can be combined with an ELB for a high availability setup  Many Docker options aren’t available  Service Tasks can’t be parametrized  Runningthe sameServices for different tenants on the same EC2 instance is not possible  Only the legacy networking is supported  New featureswill always be delayed since they must first be implemented in ECS
  • 58.  Docker Swarm is Docker’s nativesolution for cluster management  Docker Swarm uses a discovery serviceto manage the shared state of the cluster  The following backends for discovery are supported: ◦ Docker Hub (for development only) ◦ Static file ◦ etcd ◦ consul ◦ zookeeper ◦ IP list or a range pattern of IPs Docker Swarm A replacementfor ECS
  • 59. Swarm Architecture Clustermanagement withDockerSwarm Docker Swarm Cluster Docker Daemon Swarm Agent Expose 2375 Docker Daemon Swarm Agent Expose 2375 Docker Daemon Swarm Agent Expose 2375 ZooKeeper ZooKeeper Swarm Manager Docker Client
  • 60.  The Swarm manager acts as a proxy of the Docker Remote API ◦ All Docker run options are available in Swarm, too  Docker Swarm can be combined with overlaynetworks ◦ Containers can connect to others by just using the containers name (servicediscovery) ◦ Works across Docker hosts, availability zones and external hosts  Containers can use any other service without defining them in a group (such as a Task) Swarm is Docker Fixing the shortcomingsof ECS
  • 61.  Docker hosts can be added and removed to the Swarm Cluster silently  Swarm provides an API to gather CPU usage and memory consumption of hosts or containers  Swarm provides no concept to scale services within containers Autoscaling in Swarm Scale-out and scale-in
  • 62.  LabeledDocker daemons can be used by the manager to run specific containers only on specific hosts  Containers can be launched: ◦ On the same host where other containers are running ◦ In a specific availability zone ◦ On hosts with special capabilities (RAM, CPU or SSD)  The Docker daemon can restartfailed containers using a restart policy --restart="yes"  Containers will also be restarted if the docker host restarts  Failed machines must be handled manually High Availability in Swarm Handling failures and outages
  • 63.  Swarm requires that the Docker daemon is exposed via TCP  In most setups this will be a security issue since you can easily get root permission on the Docker host  Also containers can access the exposed API by default  Therefore it is recommended to always secure the Docker daemons on each host with SSL  Docker supports SSL client, server and both authentication mechanisms  SSL server authentication is not very practical since it requires a signed certificate for each host Securing Swarm Hosts Securitypitfalls
  • 64. Securing Swarm Hosts Securitypitfalls  Securing a Swarm cluster requires signed SSL certificates on all docker hosts, on the swarm manager and the docker client Docker Swarm Cluster Docker Daemon Expose 2375 Docker Daemon Expose 2375 Docker Daemon Expose 2375 Swarm Manager Docker Client Certificate Authority Client Certificate Server Certificates Server/Client Certificate
  • 65. Wrap-up: Docker Swarm ProsandCons  Swarmis Docker, all Docker options are available  LabelingDocker hosts, allows to deploy containers on specific hosts  OverlayNetworksallow containers to communicate across hosts  ServiceDiscoveryacross containers is made really simple  Complexsetupand many componentsare required for a completesetup  No built-in way for autoscalingservices  Still many bugs  The Docker Swarm API integrationinto Docker is not yet completed
  • 66. Conclusions ECSvs Swarm  SimpleSetup  Taskand Servicedefinition makes it easy to deploy and update containers  Detect failuresand restart failed tasks within services  Integratedinto other AWS Services such as Elastic Load Balancers and Auto Scaling Groups  Complex Setup  Many configuration options for deploying containers  Is compatibleto the Docker API, allows to use all Docker clients  Supports Docker’s network API  No Vendor Lock-In
  • 67. Ziel mit InnoRampUp Want to try Baqend? Download Community Edition Invited-Beta Cloud Instance [email protected] Baqend Cloud launching this February