12 Requirements For A Modern Data Architecture in A Hybrid Cloud World
12 Requirements For A Modern Data Architecture in A Hybrid Cloud World
HYBRID
CLOUD
WORLD
WHITE PAPER
Table of Contents
Introduction 3
Glossary of Terms 23
Next Steps 24
Introduction
A Modern Data Architecture Businesses today are moving to the cloud at unprecedented speeds. Today, cloud seems
• Requires data-at-rest and to be the quintessential way to do business—and for good reason. It provides access to an
data-in-motion unlimited pool of compute and storage resources, agility in provisioning them, ease of
• Manages the entire lifecycle operations, cost efficiencies, and more. Nine out of ten companies will have some part of
of data their applications or infrastructure in the cloud by 2019, and the rest expect to follow by
• Spans across on premises, 2021, according to IDG’s 2018 cloud computing research1 .
cloud and multi-cloud
But moving to the cloud is not an all-or-nothing strategy. In fact, most organizations will
• Processes and drives insight
settle on a hybrid cloud strategy, deploying applications, data, and infrastructure on a
• Requires consistent security, combination of on-premises and cloud resources. Hybrid cloud brings substantial
governance and operations
advantages in terms of rapid deployment and reduced infrastructure costs, but it also
A hybrid architecture is a key comes with a new set of IT and data management challenges—namely, managing
requirement of a modern data
architecture, and is composed of data-intensive applications as well as storing, processing, securing, and governing data
on-prem + multi-cloud + edge when it’s distributed across on-premises, multiple cloud, and edge environments. A
modern data architecture designed for the hybrid cloud solves these challenges and helps
deliver ongoing business value.
This paper is designed to help you make the right decisions for delivering a modern data
architecture for hybrid cloud environments—an architecture that uses open source
software to protect against cloud vendor lock-in and that enables you to manage, secure,
and govern your data across multi-cloud and on- premise environments.
You are likely already charting the course for how to effectively lead your organization to a
managed, secured, and governed hybrid cloud. But, with a massive and complex
ecosystem of disparate legacy systems running on premise, this is no small feat. So, we’ll
begin by reviewing strategic and business considerations for success in a hybrid cloud
world and follow this up with 12 technical requirements for building an enterprise data
strategy for the hybrid cloud.
Given the global hybrid cloud market is projected to more than double from $44.6 billion
worldwide in 2018 to $97.64 billion by 20232, your enterprise data strategy must aim to
establish a modern data architecture with hybrid architecture as a key requirement. To
achieve this, you need to deploy an enterprise data platform that complements your
cloud provider’s infrastructure with a layer of consistent data services while giving you the
freedom to move data, metadata, and workloads across hybrid cloud environments.
In summary, your enterprise data strategy must advocate for a modern data architecture
that allows you to:
__ Deploy an enterprise data platform that ingests, stores, processes, and analyzes all data
regardless of volume, velocity and variety
__ Implement a data fabric that extends across your entire enterprise and provides a single
pane of glass to locate, view, manage, and access disparate enterprise data assets—no
matter where the data resides
__ Apply and enforce a consistent set of security and governance policies across hybrid
cloud environments—including fine-grained access controls, data lineage, and audit logs
__ Continue your ongoing efforts to plan and rationalize which workloads, applications,
and datasets are suitable for the cloud and which need to stay on premise
In the Cloud
Cluster 1 Cluster 2 Cluster 1 Cluster 2
(Structured) (Unstructured) (Structured) (Unstructured)
On Premises
Cluster 1 Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3 Cluster 2 Cluster 1 Cluster 2 Cluster 3
(Structured) (Unstructured) (Structured) (Structured) (Unstructured) (Structured) (Unstructured) (Structured) (Unstructured) (Structured)
Data Center Dublin Data Center Las Vegas Data Center Bangkok
Whereas a data management platform is organized around a single data repository, and is
deployed for each repository, data fabric combines data management platforms for one
global view of data.
Cloud vendors offer critical infrastructure to power your cloud needs, but they don’t provide
the necessary foundational capabilities required to establish an enterprise-level hybrid data
architecture. Most organizations today are managing a very complex application and data
ecosystem that requires a full suite of enterprise features such as security and governance,
operational controls, application portability, and enterprise support.
Your cloud vendors are simply not focused on enabling a hybrid data architecture because
they don’t have any footprint on premise, and they’re naturally motivated to get you
locked into their ecosystem of compute, storage, and applications. Therefore, your data
strategy for a hybrid cloud requires additional technology that addresses your global data
management requirements while fully leveraging your investment in cloud infrastructure.
As the new currency of business, data provides value that will transform business,
geopolitical, and human landscapes. Whether it’s mapping cancer genes, reducing
hunger by improving crop yields, understanding customers, or diagnosing medical
conditions before they occur, it all starts with data. In fact, the more historical data you
have at your disposal, the higher the likelihood that you can make predictions and act
upon them in ways that can significantly improve business outcomes.
In the end, it’s not so much about the cloud strategy. It’s the data strategy that counts—
and it should allow you to answer the following questions about your data regardless of its
location, whether it’s distributed across hybrid cloud and on-premises data lakes:
Finally, you need a data strategy that allows you to gain flexibility and agility while
remaining cloud-agnostic in order to avoid future vendor lock-in. Therefore, having a
well-defined data strategy is critical to your success delivering a hybrid architecture.
Developing an enterprise data strategy that gives your LOB practitioners self-service
access to the tools they need without creating shadow IT or duplicating data and analytics
silos is not a trivial task. In today’s world where data leaks and breaches can cripple any
business, enterprise IT must have a long-term vision for the enterprise data strategy, one
that simplifies on-boarding and operations without hindering progress.
In the rush to cloud and the agility, simplicity, and efficiencies it delivers, it is paramount
you understand and rationalize your data assets and workloads for cloud readiness based
on security, sensitivity, ownership, and other concerns. This is especially important since
you’ve probably invested years into implementing, optimizing, and operationalizing
systems of record in on-premises-only environments.
Your big data pipeline may incorporate multiple steps, with each step requiring a different
technology. For example, an end-to-end data processing pipeline may require tools such
as Apache Kafka for streaming data, Apache NiFi for managing data flows, Apache Spark
for data science and machine learning (ML), Apache Hadoop for storing the data using a
Hadoop distributed file system (HDFS), and Apache Hive for running SQL queries to
answer business intelligence questions from your data.
Implementing this entire pipeline with disparate cloud-native services is challenging from
both operational and budgetary perspectives. While some cloud vendors may offer native
services that provide similar functionality, there are three major challenges:
__ Cloud providers are mostly packagers of Open Source Software (OSS) and typically
lack the open source committers to projects needed to support your big data workloads
in production
__ Cloud vendor billing models for each service may be completely different, making the
overall cost extremely difficult to estimate up front
When moving data analytics workloads into the cloud, it’s important to understand the
key resource metrics for running your workload and the associated cloud billing model
and service costs. One of key cost control strategies is to keep long-running workloads
on-premises while moving ephemeral workloads that need resource elasticity to the
cloud.
Having the right operational controls in place for optimizing use of cloud resources for
your big data analytics workloads can save you significant expense.
As the trend is to move more and more sensitive data to the cloud, it becomes important
to have a unified security model. To reduce your risk of data exposure and unauthorized
access, you need to have consistent security and governance controls in the cloud that
allow you to apply fine-grained security policies with full end-to-end data provenance and
lineage as well as an audit trail to track who has accessed the data. After all, in the case of
unauthorized access, you need to be able to immediately identify which data assets were
exposed, what information was potentially leaked, and identify the perpetrators by
analyzing a system-wide access audit log.
Make sure your enterprise support vendor is not only able to respond within specified
SLAs but can also provide security and fix patches in a timely fashion and deliver them
back into the open source code. Your vendor should provide not just OSS but intelligent
support that proactively analyzes your big data environments both on premise and in the
cloud to maximize performance and reduce risk.
Your enterprise support vendor must be a true partner that communicates early and often
and keeps your business interests aligned with the OSS community vision and product
roadmap. Innovating in the open source community is done by consensus, and voting
determines whether to include code modifications. Look for a vendor that has gained
favorable votes from the Apache community and has those with a “committer” status
(meaning they’ve been given write access to the Apache codebase) or “contributor”
status who have been voted in by the community and Apache PMC members and can
influence OSS direction to ensure your enterprise success.
Bottom line, make sure your enterprise support vendor isn’t just an OSS packager or
service provider, but actually employs those with a “committer” status (meaning they’ve
been given write access to the Apache codebase) or “contributor” statuscommitters and
contributors in the community. who can influence OSS direction and ensure your
enterprise success.
The use of cloud for data storage introduces additional challenges for companies to manage
data scattered across multiple public clouds, private clouds and on-premises environments.
In addition, the rapid expansion and adoption of cloud and edge computing, further
necessitates a more modern, comprehensive, scalable, and sustainable architecture.
Key to this modern data architecture in a hybrid cloud environment is a data fabric that
manages the entire lifecycle of data and provides a single pane of glass for storing,
processing, securing, and analyzing your data no matter where it resides. Data fabric is an
essential element of your hybrid data architecture. It provides consistent security,
governance, and data management capabilities across on-premises and multiple cloud
environments. Data fabric is a unified, secure platform that connects diverse data
repositories to help manage, move, protect, govern, and analyze data residing anywhere in
the hybrid cloud.
TECH TIPS
Your data fabric must:
__ Deliver a unified data management environment that integrates disparate data sources into
a single logical view of all global data assets.
__ Enable a consistent data management experience across all data repositories in the hybrid
cloud, regardless of how individual systems view and access data at the edge, on-premises,
and/or in a public cloud.
__ Provide consistent security and governance for all data, no matter where it resides: on
premise, at the edge, or in the cloud.
__ Leverage container-based platforms for application portability across hybrid cloud environments
__ Deliver shared services for security, governance, and metadata, which are necessary to
maintain state across ephemeral workloads
__ Support the entire data lifecycle across edge, on-premises, and multiple cloud platforms.
__ Support a variety of tools and applications that access “data at rest” and “data in motion”
using an array of interfaces that abstract the underlying system complexities.
__ Support all types of workloads for application processing as the data is flowing at the
edge, in the data center, and in the cloud.
Today’s reality is that some data originates at the edge, in the cloud, or on-premises as
your business, customers, and supply chain all interact and exchange information. You
will need to capture, store, and process all of this data in its original format without losing
any contextual information about the data and its source of truth. And you’ll need re-
al-time visibility into this data to respond, remedy, and/or make real-time adjustments. All
this requires your data fabric to seamlessly span hybrid cloud environments and deliver a
single view through which your users can easily store, find, and process data wherever it
resides.
TECH TIPS
__ Make sure you have the flexibility to capture, store, find, and process data across edge,
on-premises, and multiple cloud environments.
__ Make sure your data can be easily ported across cloud boundaries by decoupling
storage from compute resources and leveraging cloud object stores such as Apache
Ozone O3, Amazon S3, Azure Data Lake Storage (ADLS), or Google Cloud Storage
(GCS).
__ Look for an enterprise data platform that offers a consistent API layer with pluggable
storage connectors to your desired storage medium in the cloud or on-premises.
__ Look for a solution that allows you to collect, curate, and store the data in its original
format—whether it’s unstructured, semi-structured, or structured—without losing any
contextual information about the data and source of truth.
__ Make the data available in full fidelity to your analysts and data scientists so they can run
SQL on Hadoop, machine learning, AI, NoSQL, and real-time complex event processing
applications that address your most pressing business needs.
Start by establishing strong data stewardship that addresses how to ingest, store,
catalog, secure, discover, and track data lineage as it moves throughout its lifecycle.
Understand which data assets reside where, and create an enterprise data catalog that
contains business glossary tags and metadata describing schemas, location, security
policies, and lineage details.
In most cases, you will find your data residing in various data management systems that
were not designed to work together, creating information silos. To break down those silos,
bring relevant data together into multiple distributed data lakes that can be purpose-built
for a specific ephemeral data pipeline and/or as a general-purpose shared data lake.
In either case, the goal is to establish a data fabric that provides a single pane of glass
to search, discover, and understand your global data assets while enabling self-service
access to it.
TECH TIPS
__ Look for a platform that lets your data stewards understand and govern data across
enterprise data lakes in the hybrid cloud. They should be able to organize and
curate data globally based on business classifications, purpose, and sensitivity.
__ Make sure you have the flexibility to move data and metadata across the hybrid
cloud without losing sight of and context for your data assets. Your metadata
should contain information about database schemas, security policies, business
catalog, audit logs, data lineage, and provenance that can be viewed and managed
from a single pane of glass.
__ Your enterprise data platform should enable self-service data discovery by allowing:
__ data stewards to curate data assets and group relevant datasets together into
collections representing business entities that business users can search
__ business users to find data of interest by searching the catalog to locate relevant
data assets and collections (e.g., sensitive data, commonly used data, high-risk
data, etc.), understand data shape and distribution characteristics
__ data auditors to understand how data access is secured, protected, and audited by
being able to see who has accessed what data from a forensic audit/compliance
perspective and visualizing data access patterns to identify anomalies
4. A
pply consistent security and governance
As you work to define your hybrid data architecture, data governance and security
requirements can pose a key challenge and must be at the center of your enterprise
data strategy.
Regulations like the General Data Protection Regulation (GDPR) and the California
Consumer Privacy Act of 20183 will continue to change the way businesses collect,
store, and use personal customer data. By making sure that data protection and
governance are built into your hybrid data architecture, you can leverage the full value
of advanced analytics without exposing your business to new risks and comply with
mandatory regulations.
Just as you wouldn’t leave your data center exposed to cyber threats, you wouldn’t want
to have your data assets left unprotected. Whether you are dealing with sensitive data
such as PII, PCI, PHI, or other types of company data, it’s imperative to have consistent
security and governance controls to guarantee data protection and compliance no matter
where that data resides.
Consistent security and governance controls allow you to identify and classify sensitive
data, secure it at the appropriate level of granularity, track lineage, authorize access,
and audit who has actually accessed the data throughout its lifecycle in the hybrid
cloud. Having these controls will help ensure your data is protected, trusted, and
compliant at all times.
TECH TIPS
Look for a data management solution that enables you to bring security and
governance together by providing:
__ A single administrative console to manage all security policies across all data
assets and all workloads
__ A catalog of business terms and classifications to help organize and discover data
stored inside your enterprise data platform
__ Detailed audit log that tracks all user access requests in real time
Understanding data lineage and provenance is paramount to making sense of your data
lifecycle and complying with regulatory requirements. It’s important to know where the
data originated, when it changed, and who changed it. This information gives you critical
decision-making power and allows data stewards and data engineers to make your data
known, available, trusted, and compliant.
The challenge is that your enterprise data landscape is likely to span multiple vendor
technologies that aren’t integrated, which makes it difficult to obtain a complete data
lineage and provenance view. Having that view offers important benefits to data stewards
and auditors.
For example, data that starts as event data coming from your edge devices or systems of
engagement and is combined with enterprise data coming from your systems of record
can be tracked and governed at every stage of its life cycle. This allows data stewards,
operations teams, and compliance engineers to visualize a dataset’s lineage and then drill
down into operational, security, and provenance-related details. As this tracking is done
at the platform level, any application that uses these engines will be natively tracked. This
allows for end-to-end visibility of data provenance and lineage across the entire data
lifecycle spanning multiple applications and analytics engines.
TECH TIPS
__ Look for a data platform solution that provides complete data lineage and impact
functionality telling you where data came from, how it was transformed along the
way, and what assets are affected downstream.
The first step toward delivering self-service analytics and faster business insights is central-
izing your enterprise data assets in an enterprise data lake or multiple distributed data lakes
connected by a data fabric. Once your data is consolidated and silos are broken, you need to
ensure it can be easily discovered and available for self-service provisioning and consumption.
In addition, before making your data available to a broad group of LOB users for self-service
analytics, make sure you curate and organize it. This involves understanding where relevant
data is located, defining your business taxonomy, securing access based on data tags and
attributes, and making it discoverable to business users.
Making curated data available for self-service analytics while maintaining governance of
your data assets is a significant challenge that requires a balance between enterprise IT
stakeholders who want to reduce risk and LOB practitioners who want to accelerate time to
insight. Business users need to be able to run ad-hoc analytics on curated data assets using
tools of their choice in the hybrid cloud by having access to the data they are entitled to
without relying on IT.
A common complaint today is that data scientists and analysts have to spend too much
time looking for, prepping, and cleaning data and not enough time analyzing it. While
some data cleansing and preparation will always be necessary, we can reduce the time
data analysts spend on this task by enabling them to leverage a self-service analytics
solution with a broad set of data discovery and analytics tools. This will allow data scien-
tists to spend more of their time collaborating, sharing data models, and analyzing data.
Success on the journey of data democratization requires a solid data infrastructure and a
focus on data strategy. After all, it all starts with data.
TECH TIPS
__ Your enterprise data platform must be flexible and adaptable to support data
ingestion from a variety of data sources and formats, with appropriate data
governance controls that allow data tagging, classification, and data lineage as
data moves from ingestion to consumption.
__ Automate data tagging and classification based on dynamic rules and machine
learning algorithms, while allowing data stewards to review and approve data
tagging and classification policies before datasets become available downstream.
This will ensure a high quality of data and help establish organizational trust in the
data assets you have under control.
__ Your enterprise data platform must empower data stewards to understand, secure,
and govern data across enterprise data lakes.
__ Look for a data platform that enables business users to discover, access, and
provision data and workloads for self-service analytics using SQL, Spark, or other
tools for performing advanced analytics on-demand.
When charting your cloud strategy, vendor lock-in becomes somewhat inevitable since
you are moving data and using the cloud service provider’s infrastructure to run your
applications. But the key concern is application layer lock-in. The native cloud services that
cloud providers offer are often tightly coupled to the provider’s infrastructure and lack the
full breadth of open source features and ecosystem interoperability. For example, a cloud
vendor’s data analytics service may limit your ability to:
__ Move your data between multiple cloud services and/or on-premises solutions, since it
was converted into custom data formats specific to your cloud vendor’s analytic services
__ Integrate with other software and services designed to augment existing OSS
capabilities
__ Keep track of data lineage, metadata, and security policies as the data moves between
different hybrid cloud environments
As you move to hybrid cloud, you’ll want to leverage and bring into cloud deployments the
same open source ecosystem of big data technologies running in your data center. That
way, your common data fabric can consistently unify management, analytics, operations,
security, and governance of data across on-premises, cloud, and hybrid cloud environ-
ments.
TECH TIPS
__ Consider an open source enterprise data platform that provides cloud-agnostic
capabilities and allows easy migration of data assets, metadata, and workloads
across all environments.
__ Look for a data platform that enables a consistent data fabric running on top of your
cloud infrastructure, so you can more easily deliver portable business applications.
__ Provision big data workloads on the fly that fully leverage use of native cloud
resources such as compute and storage while optimizing use of those resources
Data has gravity: the more data you have, the more it will attract other applications and
services around it. This ecosystem of applications and services that expands around your
data in the cloud can become complex enough that at some point you will likely need to
consider moving your data and applications between public cloud vendors or from cloud to
on-premises and vice versa. Replicating data between on-prem and multiple cloud provid-
ers introduces additional challenges for data governance.
Therefore, make sure that your organization’s data fabric enables you to easily port applica-
tions, data, and metadata across on-premises and cloud boundaries while keeping track of
its movement and without requiring any changes to the application code.
TECH TIPS
__ Implement an enterprise data platform that provides a consistent set of storage
and application APIs that can be used to migrate your workloads across on-
premises and different cloud providers.
__ Make sure that your data access layer, application API, and metadata repositories
provide the necessary abstraction to decouple your application code from the
underlying storage and compute infrastructure.
__ Select an enterprise data platform that offers native data and metadata replication
capabilities across on-premises and hybrid cloud environments while keeping track
of data governance and lineage.
__ Look for a platform that allows you to leverage open source standards, tools, and
technologies across any data no matter where it resides in the hybrid cloud.
To deliver this level of service, your hybrid data architecture needs to support collecting data
from the network’s edge and pushing analytics and decision making closer to the edge,
where IoT devices and sensors reside. To be effective, your data architecture should provide
real-time responses based on real-time events measured and scored against machine
learning models built and trained on historical insights.
Evolution of new cyber security threats and opportunities to improve process efficiencies in
supply chain, transportation, manufacturing, oil and gas, and other industries requires edge
connectivity to capture time-series data at the point of origination from edge IoT devices.
In order to mitigate these threats and capture new business opportunities, your hybrid data
architecture needs to support data flow, stream processing, and messaging technologies
to build a true streaming data architecture across the edge, cloud, and on-premises
environments.
TECH TIPS
__ Implement an enterprise data platform that provides a consistent set of storage
and application APIs that can be used to migrate your workloads across on-
premises and different cloud providers.
__ Make sure that your data access layer, application API, and metadata repositories
provide the necessary abstraction to decouple your application code from the
underlying storage and compute infrastructure.
__ Select an enterprise data platform that offers native data and metadata replication
capabilities across on-premises and hybrid cloud environments while keeping track
of data governance and lineage.
__ Look for a platform that allows you to leverage open source standards, tools, and
technologies across any data no matter where it resides in the hybrid cloud.
__ In recent research, Bain predicted7 B2B IoT segments would generate more than $300
billion annually by 2020, including about $85 billion in the industrial sector.
__ Discrete manufacturing, transportation and logistics, and utilities will lead all industries in
IoT spending by 20208, averaging $40B each.
Companies looking at leveraging IoT often encounter serious hurdles along the way. These
include building and deploying new applications quickly, accessing the data from edge
devices, and scaling to support the massive volumes of data generated.
To address these concerns, companies need to look at open source technologies that
provide high-performance analytics of data stores for event-driven and time-series data.
Think of leveraging data from the edge as a four-step process:
1. Connect the remote sensors and devices that produce the data
2. Collect data from the edge, deliver it to your data lake, and then process and
organize the data
3. Analyze the edge data, looking for nuances and correlations between data events
4. Act: Combine the analytics model with a stream processing tool to apply its predictive
capability to events happening in real time. Then inject this back into the original workflow
between the Connect and Collect stages. Now, as rivers of data flow in, they flow through
these trained models in order to evaluate each event that is happening and allow you to
act when an event is about to occur.
Your stream processing is now constantly on the lookout for events and will alert you at a
moment’s notice as to what is taking, or about to take place.
10. D
eploy Ephemeral Workloads in the Cloud to Optimize Costs
Cloud provides many benefits for optimizing management and cost of your IT infrastructure.
Depending on your cloud provider, resources can be made instantly available with up-to-
the-second billing based on consumption. Cloud provides the fundamental building blocks
for your big data processing needs: data storage at massive scale and access to infinite
compute resources such as CPU, GPU, and memory.
One of the biggest advantages of using cloud resources for your big data workloads is
separation of compute and storage as part of your cloud infrastructure. This fundamental
design principle allows independent scaling of storage and compute resources depending
on your workload requirements.
When deploying big data applications in the cloud, you need to understand your resource
requirements based on workload type and usage pattern. No matter whether your
application is batch, in-memory, interactive, or real-time, it can be deployed as a long-
running or ephemeral workload depending on your business need. Your ability to deploy
ephemeral workloads that can take advantage of cloud resource elasticity in response
to a business need will allow you to accelerate time to insights while reducing the overall
cost. After all, considering that most cloud billing models are on a consumption basis,
you’ll want to optimize your use of cloud resources based on actual workload utilization vs.
resource consumption.
TECH TIPS
__ Your enterprise data platform should simplify provisioning of big data workloads in
the cloud in a consistent and repeatable fashion.
__ Look for a solution that allows IT operators to define workload blueprints for
automated provisioning of cloud resources and workload deployment, which can
be made available to line of business users via self-service access.
__ Look for automated workload scaling based on business SLAs and actual
resource utilization.
__ Your enterprise data platform should let you optimize cloud resource usage by
seamlessly adjusting the cluster as workload and activity changes.
__ And it should enable you to respond faster to new business requirements and
align costs with utilization and business value rather than paying for provisioned
cloud infrastructure.
Your data-intensive applications require rich metadata providing technical and business
characteristics as well as stateful information. This is what makes the data useable and
shareable across multiple ephemeral workloads. Business users who want to explore data
on demand with different analytics engines need to have a layer of shared services that
provide all the necessary metadata, context, and state information for a consistent view of
data assets.
TECH TIPS
__ Your modern data architecture should incorporate a layer of shared schema,
security, and governance services that provide a consistent view of metadata
across multiple ephemeral workloads accessing the data in your shared data lake.
__ Once you define your schema, security, and governance policies, the metadata
you defined should be available to any ephemeral workload that can be attached to
the shared data lake.
__ As your data and workloads move across hybrid cloud, the architecture should
automatically carry and propagate all of the associated metadata to ensure a
consistent application experience no matter where the data and applications reside.
Containerization allows you to manage different versions and multiple instances of various
big data technologies, application code, and dependencies without interfering with one
another. Containers provide the needed isolation and dependency packaging to enable
easy deployment of big data applications across hybrid cloud environments. They allow you
to move workloads between different on-premises and cloud container platforms without
making any application code changes.
As hybrid cloud is quickly becoming the new reality, organizations need a unified hybrid
architecture for on-premises, multiple cloud, and edge environments that provides a
consistent fabric for deploying containerized workloads. Your enterprise data platform for
hybrid cloud needs to enable a consistent experience and interaction model built on top of a
container platform. This architecture allows you to seamlessly move data and containerized
workloads across on-premises and multiple cloud environments.
TECH TIPS
__ Look for an enterprise data platform that delivers a consistent hybrid architecture in
the cloud and on premise.
__ Your data platform should provide a consistent user interaction model that allows
you to seamlessly move data and workloads across hybrid cloud environments as
well as deploy containerized workloads regardless of the environment.
__ Look for a data platform that provides DevOps orchestration tools for managing
services and workloads via the “infrastructure as code” paradigm to allow spin-up
and down of ephemeral container workloads.
__ Look for shared services for metadata, governance, and security to support
containerized applications running as ephemeral workloads based on data stored
in the shared data lake.
Next Steps
About Cloudera
At Cloudera, we believe that data can At Cloudera, we believe the time has come to reimagine the modern data architecture,
make what is impossible today, with hybrid cloud as a key requirement. Enterprises need a consistent and unified
possible tomorrow. We empower
hybrid architecture for on-premises, multi-cloud, and edge environments with these four
people to transform complex data
into clear and actionable insights. key components:
Cloudera delivers an enterprise data
cloud for any data, anywhere, from 1. Consistent security and governance across all data no matter where it resides
the Edge to AI. Powered by the 2. Separate storage and compute layers
relentless innovation of the open
3. Containerized workloads
source community, Cloudera
advances digital transformation for 4. Ability to orchestrate and manage workloads
the world’s largest enterprises.
Learn more at cloudera.com For more information and a framework for ensuring that happens, download “Power your
Business Transformation with an Enterprise Data Cloud.”9
Follow us on Twitter:
twitter.com/cloudera
Visit us on Facebook:
facebook.com/cloudera
Sources
1
https://resources.idg.com/download/executive-summary/cloud-computing-2018?utm_
campaign=Cloud%20Computing%20Survey%202018&utm_source=Forbes%20-%20Louis
2
https: //www.marketwatch.com/press-release/hybrid-cloud-market-worth-9764-billion-
by-2023-2018-08-22
3
https://www.caprivacy.org/
4
https://thenewstack.io/survey-open-source-programs-are-a-best-practice-among-large-companies/
5
https://www.crn.com/businesses-moving-from-public-cloud-due-to-security-says-idc-survey
6
https://www.idc.com/getdoc.jsp?containerId=prUS43994118
7
https://www.bain.com/insights/choosing-the-right-platform-for-the-industrial-iot
8
https://www.statista.com/statistics/666864/iot-spending-by-vertical-worldwide/
9
https://www.cloudera.com/content/dam/www/marketing/resources/whitepapers/power-your-business-
transformation-with-edc.pdf.landing.html
Cloudera, Inc. 395 Page Mill Road Palo Alto, CA 94306 USA cloudera.com
© 2019 Cloudera, Inc. Cloudera and associated marks and trademarks are registered trademarks of
Cloudera, Inc. All other company and product names may be trademarks of their respective owners.
12_Requirements_for_a_Modern_Data_Architecture_in_a_Hybrid_Cloud_World_101