Backend Infrastructure Architecture
Backend Infrastructure Architecture
Technical Overview
June 2020
TABLE OF CONTENTS
Table of Tables...................................................................................................................... 3
1 Introduction .................................................................................................................... 5
1.1 Context.................................................................................................................... 5
1.2 Scope of Document ................................................................................................. 5
References ...........................................................................................................................24
1.1 Context
This document complements the “CWA Solution Architecture” document, which is developed
and published in parallel to this documentation. It is intended as a technical overview document
of Corona Warn App (CWA) and its underlaying infrastructure and network. Please refer to
https://github.com/corona-warn-app/cwa-
documentation/blob/master/solution_architecture.md
To reduce the spread of COVID-19, it is necessary to inform people about their close proximity
to positively tested individuals. So far, health departments and affected individuals have
identified possibly infected individuals in personal conversations based on each individuals'
memory. This has led to a high number of unknown connections, e.g. when using public
transport.
The Corona-Warn-App, shown centrally in Figure 1, enables individuals to trace their personal
exposure risk via their mobile phones. The Corona-Warn-App uses a new framework provided
by Apple and Google called Exposure Notification Framework. The framework employs
Bluetooth Low Energy (BLE) mechanics. BLE lets the individual mobile phones act as beacons
meaning that they constantly broadcast a temporary identifier called Rolling Proximity Identifier
(RPI) that is remembered and, at the same time, lets the mobile phone scan for identifiers of
other mobile phones. This is shown on the right side of Figure 1. Identifiers are ID numbers
In addition to the application level components, the following system and technology specific
components must be considered:
The next chapters will detail the infrastructure architecture for these components.
3.2.1 Overview
As shown in Figure 3: CWA high level deployment view, the infrastructure for the production
environment contains more than only the logical system components, but also a hosting
platform (Open Telekom Cloud (OTC) and AppAgile (OpenShift)). The whole environment is
hosted in the Open Telekom Cloud, with additional components from the Telekom backbone
(Loadbalancer, DDOS). All components are related to one tenant, the CWA OTC Tenant.
Tenants within OTC are strictly separated. The hosting and orchestration services within the
project is realized with a dedicated OpenShift stack for each project (product name
AppAgile@OTC). This allows the usage of all features of OpenShift (based on Kubernetes) for
all deployments (container, networking, firewalls, etc.). For each stage (here production stage
Prod), two separated dedicated OpenShift stacks are used (like sub-tenants). The first stack
contains all backend services, and an additional stack contains all PaaS services which have
to be consumed (here OTC RDS, a managed Relational Database Service, in a PostgreSQL
flavour). Every OpenShift Stack has its own namespace, which allows a separation of all
duties, including addressing, routing and authorization.
To separate all stages (DEV, INT, WRU, PROD), separate OTC sub-projects are used. This
allows a strict separation of all traffic, deployments and usage. Also, the user and role
definitions can be separated.
The content of all stages is identical, except that DEV and INT contain additional components
for the CI/CD pipeline (Jira, Confluence).
3.2.3.1 Website
This namespace contains a single pod with a nginx container. The nginx stops at
https://coronawarn.app and redirects traffic to https://www.coronawarn.app.
For a complete description and explanation of the OpenShift service, please refer to
https://cloud.telekom.de/en/infrastructure/appagile-paas-big-data.
For a complete and actual description of the Verification Server, please have a look on github:
https://github.com/corona-warn-app/cwa-verification-server/blob/master/docs/architecture-
overview.md
For a detailed description of the Test Result Server, please refer to github:
https://github.com/corona-warn-app/cwa-testresult-server/blob/master/docs/architecture-
overview.md
For the 4 different stages, separated CDN entries are necessary. Figure 6: CDN
implementation for different stages shows the registration of these different URLs within the
Magenta CDN.
3.2.3.9 Vault
This namespace contains a Hashicorp Vault, to manage the used secrets. Here the TLS
certificates for the individual services are provided so that the final termination does not have
to take place at the ingress controllers. It also contains the key for signing the "Diagnosis Keys",
which are sent to the Apple/Google framework. The key must be in ECDSA-SHA256 format
with the prime256v1 curve.
The Vault is locked for the first time and must be unlocked by known processes. Subsequently,
the components receive access to their certificates, keys, etc. according to the need-to-know
principle.
3.2.3.10 RDS
The consumed Relational Database Services (RDS) of OTC are each in a tenant project and
are thus completely separated from the other components. In the RDS, PostgreSQL version
11.5 is used and the databases are encrypted (@REST, as it is the data transport to them).
Every service that needs a database gets a dedicated RDS (thus runs in a separate
namespace).
Backbone Explanation
• Intercepts the danger before it
reaches customer's access line
• No change in customer
infrastructure necessary
There are different concepts for DDOS protection in place, which can be used to protect
against different attacks:
Comprehensive
HTTP Regular
IP Location Policing IP/FCAP, DNS & HTTP URL Blocking DNS Domain Blacklisting
Expression Filter
Filter Lists
Traffic Shaping
Connections between servers within the OTC are secured with TLS 1.3 with mutual
authentication (with TLS client certificate). The required TLS certificates are generated by a
CA provided by OpenShift for each cluster. The web application provided by the Portal Server
for hotline staff only enforces TLS 1.2 for interoperability reasons.
The public interfaces (App to CWA server and App to Verification server as well as browser to
Verification Portal) each use a server certificate issued by TeleSec whose root certificate is
generally known. The App uses certificate pinning to an Intermediate-CA. Since TeleSec
belongs to the Deutsche Telekom AG and we trust it, we do not use an own Intermediate CA
certificate. Instead an Intermediate CA certificate from TeleSec is used, under which the server
certificates are attached. PROD and WRU each have their own different Intermediate CA
certificates.
3.2.4.2 IP Masking
The communication of endpoints from the Internet (mobile phone, browser of the hotline staff)
is encrypted end-to-end up to the backend systems. The load balancer masks the source IP
address so that no IP addresses reach the backend systems which could appear in log files
there and enable a re-personalization of the pseudonymized data flows. For this purpose, IP
masking is performed on the load balancers.
3.2.4.3 Namespaces
The CWA_Server namespace is assigned a dedicated pool of VMs (cwa pool). The Portal
Server and the TestResult Server receive a second, dedicated pool of VMs. The remaining
servers/pods share the default pool.
The separation between VMs is more robust than the separation between containers. In
addition, the VM Pools are also assigned their own operating teams.
The existing laboratory gateway developed by Bucher Software, which collects test results
from the laboratories and provides them to the Test Result Server, is also connected via VPC
peering. The laboratory gateway is located in the Telekom Healthcare Cloud Tenant, which is
being implemented as a special tenant in the OTC. Since private IP addresses are used within
the OTC, peering was carried out to ensure that the IP address ranges are disjunctive.
24/7 operation
Seamless deployments, minimizing downtimes
No central data observation
Segregation of duties: following groups of components cannot be operated from the
same persons at any point of time (this includes monitoring with log data access):
o security components (DDOS),
o backend services (Backend Server, Verification Server, Portal Server, Test
Result Server) and
o the other infrastructure components (Load Balancer)
Certificate Management & Governance
1. Application operation
2. PaaS operation (AppAgile)
3. OTC operation
However, this segregation of duties can only be implemented at application level. The platform
itself (AppAgile), theoretically has access to all components, as they assign rights and roles.
This assignment of rights and roles is to take place under the principle of dual control and is
additionally logged. A detailed description of how the operation takes place, which roles are
assigned and from where the whole infrastructure and application is operated, is recorded in
the operating documentation.
The operation may only take place within the EU, thus following EU-GDPR.
For a list of security requirements assessed during the project, you can have a look on github:
https://github.com/corona-warn-app/cwa-documentation/blob/master/overview-security.md
4.4 Monitoring
Monitoring and Logging is performed from a separate OTC Tenant. This Tenant is used to
perform managed services for the CWA OTC Tenant. Connected to the Management Tenant
(MCS PaaS) is the Telekom Cyber Defense Center (CDC).
The service combines automated and manual analyses of security-relevant logs from the IT
systems. In addition, daily updated threat intelligence information from T-Systems is used to
improve the quality of the analyses.
The Managed Cyber Defense Services from T-Systems make it possible to identify attempted
attacks and, if necessary, successful compromises of systems and identities and to initiate
countermeasures.
For real-time alerting and long-term analysis, rule sets are configured in the system. These
alerts are based on threat scenarios that affect the platform and are modeled by experts at the
Telekom Cyber Defense Center. Typical alerting scenarios include attempts to break
passwords by trial and error or the comparison of log data against indicators that are known to
indicate malicious activity.
Log data generated is directly forwarded to the SIEM from the log servers in the AppAgile
environment. This is done using a Transport Layer Security (TLS) secured connection and the
HTTP method POST. The TLS connection guarantees on the one hand the encryption of the
connection, on the other hand we additionally use client and server certificates, so that sender
and receiver are authenticated.
On the SIEM platform the data is processed and read in directly. In this first step, our real-time
alerting uses configured rule sets. The data is stored for seven days in order to carry out
analyses in the past over this period. This can be done by means of automated rules or incident
related, for example if automated investigations give reason to carry out further analyses.
The detection scenarios (use cases) are defined in the project phase after risk assessment
and availability of corresponding security logs and are regularly optimized. In the event of a
detection, the incident is processed according to the specified runbook.
If the analyses provide indications of security incidents, an in-depth analysis is carried out by
experienced Telekom Security specialists as part of incident response activities.