BlueData EPIC Software Architecture Technical White Paper
BlueData EPIC Software Architecture Technical White Paper
www.bluedata.com
Table of Contents
1. BlueData EPIC 1
Key Features and Benefits 1
App Store 2
2. BlueData EPIC Architecture 3
Software Components 3
3. Tenants 5
Users and Roles 6
User Authentication 7
4. Virtual Compute and Memory 9
Virtual Cores and RAM 9
Virtual Node Flavors 10
5. Storage 12
About DataTaps 12
On-Premises Tenant Storage 13
Node Storage 13
Application Path Inputs 13
6. Networking 14
Networks and Subnets 14
Gateway Hosts 17
7. Cluster Management 19
Cluster Creation 19
Isolated Mode 19
Host Tags 21
High Availability 22
Platform High Availability 22
Virtual Cluster High Availability 23
Gateway Host High Availability 23
Appendix 24
Definitions 24
Hadoop and Spark Support 25
9
11
Tenant Administrators Data Scientists Developers Data Engineers Data Analysts
TM
BlueData EPIC Platform 8
10
4 5
Compute 6 6 EC2
5
Platform Administrator
1 2 3
2 3
Storage NFS HDFS S3
Figure 10 lists the specific functions that can be performed within Run ActionScripts in a cluster X X
EPIC and the role(s) that can perform each of those actions. In View detailed DataTap information X X
this table: Add a DataTap X
• Permission stands for the right to perform a given action. Edit a DataTap X
Users with specific roles receive specific permissions within Remove a DataTap X
EPIC.
View summary DataTap information X X
• PA stands for the Platform Administrator role.
View virtual node information X X
• TA stands for the Tenant Administrator role.
Manage EPIC platform configuration X
• M stands for the Member (non-administrative) role.
Create, edit, delete flavor definitions X
• An X in a column means that a user with the indicated role
Install/uninstall/upgrade X
can perform the indicated action.
App Store images
• A blank entry means that a user with the indicated role
Add and manage EPIC Worker hosts X
cannot perform the indicated action.
View global usage/health/metrics X
View tenant resource usage X X
Figure 10: EPIC permissions by platform/tenant role (right)
*The ability to access isolated clusters depends on the tenant Cluster
Superuser Privilege setting.
Each user has a unique username and password that they must • Active Directory: AD Host, User Attribute, User Subtree DN
provide in order to login to BlueData EPIC. Authentication is the Accessing EPIC (SSO)
process by which EPIC matches the user-supplied username and
Single Sign On (SSO) allows users to supply login credentials
password against the list of authorized users and determines both
once, and then gain access to all authorized resources and
- whether to grant access (stored either in the local user applications without having to log in every time. When BlueData
database server or in the remote LDAP/Active Directory EPIC is configured for SSO, authorized users will proceed directly
server), and to the Dashboard screen without having to log in. From there,
- what exact access to allow, in terms of the specific users can access cluster services (such as Hue), if needed.
role(s) granted to that user (stored on the EPIC Configuring EPIC for SSO requires both of the following:
Controller node).
• A metadata XML file that is provided by the Identity Provider
User authentication information is stored on a secure server. (IdP)
EPIC can authenticate users using any of the following methods:
• XPATH to a location in the SAML response that will contain
• Internal user database. the LDAP/AD username of the user, such as
• Your existing LDAP or Active Directory server that you can //saml:AttributeStatement/
connect to using Direct Bind or Search Bind. saml:Attribute[@Name="PersonImmutableID"]/
saml:AttributeValue/text()
Accessing EPIC (Non-SSO)
You can then use LDAP/AD groups to assign roles.
The non-SSO user authentication process is identical when using
If platform HA is not enabled for the EPIC platform, then the
either the internal EPIC user database or an external LDAP/AD
hostname of the EPIC Controller host must be mapped to the IP
server:
address of the Controller host via a DNS server that can be
1. A user accesses the EPIC Login screen using a Web browser accessed by the user. This allows a user-initiated browser GET
pointed to the IP address of the Controller host. request to correctly resolve to the Controller host. For EPIC
2. The user enters her or his username and password in the platforms with platform HA enabled, this will be a hostname that
appropriate fields and attempts to login. resolves to the cluster IP address. Figure 2 shows the DNS name
3. EPIC passes the user-supplied username and password to resolution process.
the authentication server.
4. The authentication server returns a response that indicates NO Resolve to
Platform HA enabled?
either a valid (allow user to login) or invalid (prevent user Controller hostname.
from logging in) login attempt.
5. If the login attempt is valid, EPIC will match the user with the YES
role(s) granted to that user and allow the proper access.
Using the internal user database included with EPIC is fast and
convenient from an IT perspective. However, it may complicate
Cluster hostname NO Resolve to
user administration for various reasons, such as: "#ɑ,#"E cluster IP address.
• The user may be required to change their password on the
rest of the network but this change will not be reflected in
YES
EPIC.
• A user who is removed from the network (such as when they
leave the organization) must be removed from EPIC Resolve to
Controller hostname.
separately.
Connecting EPIC to your existing user authentication server Figure 2: DNS name resolution process
requires you to supply some information about that server when
installing EPIC. Contact your user administrator for the following The IdP must be configured with the following information:
information:
...
.51
EPIC Worker n
Access to services on Controller Host (Host IP2) Worker Host (Host IP3) Worker Host (Host IP4)
containers is proxied
via Gateway host port Tenant 2
numbers.
BD IP8 BD IP9 BD IP10 BD IP11
BD IP4 BD IP5
Tenant 1
BD IP6
X X BD IP7
Cluster A
Access into/from Docker containers for control traffic, such as SSH and web UIs via the Gateway host (HAProxy)
DataTap access path from Container via Worker host interface to remote storage (eg HDFS) External Switch/
Access to remote systems (e.g. AD, CA, SSO) via Worker hosts (using IP masquerading)
Users
Next Hop
External Network
Figure 4: BlueData EPIC platform configured to use a private, non-routable virtual node network
BlueData EPIC software is deployed on a set of hosts. Each host End user access to services in the containers (such as SSH or
has an IP address and FQDN such as Host IP1, Host IP2, etc. web applications) is routed through a Gateway host that runs the
Hosts are typically deployed as one or more rack(s) of servers HAProxy service. This access is purely for control traffic. All
that are connected to an external switch for access to other other traffic, including access to remote HDFS or other enterprise
networks in the organization (e.g. end user network etc). systems such as Active Directory (AD), MIT KDC (Kerberos
BlueData EPIC provisions clusters of embedded, fully-managed provider), SSO (Identity providers), and Certificate Authority (CA),
Docker containers. Each cluster spins up within a tenant and is performed via the host network interface, as opposed to the
receives distinct assigned IP addresses and FQDNs from a private Gateway host.
IP range (or optionally from a routable IP range provided by the
network teams), which appear in the diagram above as BD IP1, BD
IP2, etc. Containers in one tenant do not have network access to
containers in a different tenant despite potentially being placed on
the same host.
Controller Host (Host IP1) Worker Host (Host IP2) Worker Host (Host IP3) Worker Host (Host IP4)
Tenant 2
BD IP8 BD IP9 BD IP10 BD IP11
BD IP4 BD IP5
Tenant 1
BD IP6
X X BD IP7
Cluster A
Docker container network access to/from external pathway (via host NIC)
Access to remote systems (e.g. AD, CA, SSO) via Worker hosts External Switch/
Users
Next Hop
External Network
Figure 5: BlueData EPIC platform configured to use a public, routable virtual node network
External switch
EPIC host
...
EPIC host n
When multiple subnets are used: conform to all applicable requirements listed in the EPIC
• The EPIC hosts may be located on-premises and/or in a documentation. Unlike Compute Worker hosts, Gateway hosts do
public cloud. For example, EPIC hosts can reside on multiple not run containers/virtual nodes. Instead, they enable access to
racks and/or can be virtual machines residing on cloud-based user-facing services such as Hue console, Cloudera Manager,
services, such as AWS. Ambari, and/or SSH running on containers via High Availability
Proxy (HAProxy). You can configure multiple Gateway hosts with
• If the EPIC platform includes cloud-based hosts, then the
a common Fully Qualified Domain Name (FQDN) for round-robin
container network must be private and non-routable.
load balancing and High Availability. You may also use a hardware
• All of the subnets used by EPIC Worker hosts must share the load balancer in front of the multiple Gateway hosts.
same path MTU setting.
The ability to run EPIC virtual nodes in a private, non-routable
• The subnet(s) used by EPIC Gateway hosts may have network can drastically reduce the routable IP address pool
different path MTU settings. requirement. For example, a /16 private network can support
thousands of containers. The corporate network need only
Gateway Hosts manage the physical host addresses (for example, 10.16.1.2-51),
A Gateway host is an optional first-class role that is managed by while EPIC Gateway hosts provide access to the virtual nodes
EPIC in a manner similar to the Controller and Compute Worker running on those hosts. All control traffic to the virtual nodes/
hosts. One or more Gateway host(s) are required when the IP Docker containers from end-user devices (browsers and
addresses used by the virtual nodes in the EPIC platform are command line), such as https, SSH, and/or AD/KDC, goes through
private and non-routable, meaning that the virtual nodes cannot the Gateway host(s), while all traffic from the virtual nodes/
be accessed via the corporate network. Gateway hosts must Docker containers is routed through the EPIC hosts on which
192.168.x.x/16
architecture of an EPIC platform with Gateway hosts.
...
.51
EPIC Worker n
Please see the technical white paper BlueData EPIC Network Design
for additional details about the networking design and network
configurations for the BlueData EPIC software platform.
.
Create Cluster Controller Schedules EC2 Instance Creation All Hosts Run
- Controller checks against tenant quotas. - EPIC Management runs on the
- Creates Worker EC2 instance & copies image Controller node.
from S3 image bucket/registry to instance. - Host Agent runs on all nodes and
- Creates container using Host Agent. manages the container lifecycle.
- Sets up container storage & networking.
EPIC Controller
Schedules
cluster creation.
Between EC2 Instance & Docker Container
- EPIC Management copies the Node Agent into
the Docker containers.
- EPIC Management SSH into containers and
runs Node Agent install.
Host Tags
Tags allows you to control the physical host(s) on which EPIC will
place a newly created or edited cluster by forcing EPIC to place
the cluster on only those host(s) that meet the criteria that you
specify for that cluster. For example, you could use tags to:
• Place GPU-centric clusters such as TensorFlow on only
those hosts that are tagged with a GPU identifier, such as
gpu=yes.
• Place specific workload clusters that require SSD-based
virtual node storage, such as ssd=true or storage=ssd-
only.
• Isolate virtual clusters of a specific tenant onto specific EPIC
hosts for physical isolation in addition to network isolation,
such as tenant=marketing or tenant=finance.
• Differentiate between cloud-based EPIC hosts and on-
premises based EPIC hosts in a single unified EPIC
deployment, such as onprem=true or location=onprem.
Cluster IP address
When a failure of a High Availability host occurs, EPIC takes the • A message appears in the upper right corner of the EPIC
following actions: interface warning you that the system is running in a
• If the Controller host has failed, then EPIC fails over to the degraded state. Use the Service Status tab of the Platform
Standby Controller host and begins running in a degraded Administrator Dashboard to see which host has failed and
state. This process usually takes 2-3 minutes, during which which services are down.
you will not be able to log in to EPIC.
• If the Shadow Controller or Arbiter host fails, then EPIC
keeps running on the Controller host in a degraded state.
EPIC does not support enabling platform High Availability if any Note: In Figure 12, below, the asterisks (*) denote platform-level
virtual clusters already exist. If you create one or more virtual High Availability protection that is only available to resources
clusters before deciding to enable High Availability, then you located on-premises. AWS resources only support High
should delete those clusters before proceeding with the High Availability for virtual clusters.