0% found this document useful (0 votes)

227 views18 pages

Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021

This whitepaper discusses building a modern, unified analytics data platform on Google Cloud. It notes that while companies want to move data to the cloud, that alone doesn't solve issues around siloed data sources and brittle processing pipelines. The whitepaper advocates for a holistic approach that considers an organization's overall data maturity and makes both technical upgrades and organizational changes. It also discusses how Google Cloud meets the needs of different data users throughout the analytics lifecycle.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

227 views18 pages

Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Whitepaper

August 2021

Build a modern,
unified analytics
data platform
with Google Cloud
Firat Tekiner & Susan Pierce
2

Build a modern, unified analytics data platform

with Google Cloud

There is no shortage of data being created. IDC research holistic way to make a data platform more successful for
indicates that worldwide data will grow to 175 zettabytes your organization.
by 2025 . The volume of data being generated every day is
1

staggering, and it is increasingly difficult for companies to In this paper, we will discuss the decision points neces-

collect, store, and organize it in a way that is accessible and sary in creating a modern, unified analytics data plat-

usable. In fact, 90% of data professionals say their work has form built on Google Cloud Platform.

been slowed by unreliable data sources. Around 86% of data

Big Data has created amazing opportunities for businesses
analysts struggle with data that is out of date, and more
over the last two decades, however, it is complicated for
than 60% of data workers are impacted by having to wait
organizations to present their business users with rele-
on engineering resources each month while their data is
vant, actionable and timely data. Research shows that 86%
cleaned and prepared2.
of analysts still struggle with outdated data3 and only 32%
Inefficient organizational structures and architectural deci- of companies feel they are in realizing tangible value from

sions contribute to the gap that companies have between their data4. The first issue is data freshness. The second

aggregating data and making it work for them. Companies issue stems from the difficulty in integrating disparate and

want to move to the Cloud to modernize their data analytics legacy systems across silos. Organizations are migrating to

systems, but that alone doesn’t solve the underlying issues the Cloud, but that does not solve the real problem of older

around siloed data sources and brittle processing pipelines. legacy systems that might have been vertically structured to

Strategic decisions around data ownership and technical meet the needs of a single business unit.

decisions about storage mechanisms must be made in a

• Reporting & Analytics “80% of analytics work is still

Maturity • Data Discovery & Preparation descriptive”
• Artificial Intelligence MIT, 2020

• Multiple Clouds “90% of employees say that

their work is slowed by
Silos • Vertical & Disparate Stacks
unreliable data sources”
• Data Ownership & Priorities Dimensional Data, 2020

• Volume, Velocity & Variety “86% of analysts struggle

Complexity • Data Recency & Quality Issues with data that’s out of date.”
• Security & Governance Dimensional Data, 2020

1 https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

2 https://www.zdnet.com/article/data-analysts-stretched-lack-engineering-resource-current-data-says-survey/

3 ibid

4 https://www.accenture.com/gr-en/insights/technology/closing-data-value-gap
3

In planning out organizational data needs, it’s easy to over- architecture or set of software components to purchase; it
generalize and consider a single, simplified structure where requires companies to take stock of their overall data matu-
there is one set of consistent data sources, one enterprise rity and make systemic, organizational changes in addition
data warehouse, one set of semantics, and one tool for to technical upgrades.
business intelligence. That might work for a very small, highly
centralized organization, and could even work for a single By the end of 2024, 75% of enterprises will shift from pilot-

business unit with its own integrated IT and data engineering to operationalizing AI, driving a 5X increase in stream-

ing team. In practice, though, no organization is that simple ing data and analytics infrastructures5. It’s easy enough to

and there are always surprise complexities around data pilot AI with an arms-length data science team, working in

ingestion, processing, and/or usage that complicate matters a siloed environment. But the fundamental challenge that

further. prevents those insights from getting released into produc-

tion systems is the organizational and architectural friction
What we have seen in talking to hundreds of customers is that keeps data ownership segmented. As a result, most
a need for a more holistic approach to data and analytics, of the insights that are incorporated into an organization’s
a platform that can meet the needs of multiple business business operations are descriptive in nature, and predictive
units and user personas, with as few redundant steps to analytics are relegated to the realm of a research team.
process the data as possible. This becomes more than a new

Google Cloud is changing the way

businesses think about data, by
focusing not just on the tech, but on
the users as well.

5 https://emtemp.gcom.cloud/ngw/globalassets/en/doc/documents/721868-100-data-and-analytics-predic-
tions-through-2024.pdf
4

A Platform for all users

throughout the data
lifecycle

Data work is rarely done by a single individual; there are and test models, and make insights available for another
many data-related users in an organization who play import- team. An ML engineer may be responsible for packaging up
ant roles in the data lifecycle. Each has a different perspec- the model for deployment into production systems, in a way
tive on data governance, freshness, discoverability, meta- that is non-disruptive to other data processing pipelines. A
data, processing timelines, queryability, etc. In most cases, product manager or business analyst may be checking in on
they are all using different systems and software to operate derived insights, using Data QnA (a natural language inter-
on the same data, at different stages of processing. face for analytics on BigQuery data), visualization software
or might be querying the result set directly through an IDE
Let’s look at a machine learning lifecycle, for example. A or a command-line interface. There are countless users with
data engineer may be responsible for ensuring fresh data is different needs and we have built a compressive platform to
available for the data science team, with appropriate secu- serve them all. Google Cloud meets customers where they
rity and privacy constraints in place. A data scientist may are with tools to meet the needs of the business.
create training and test datasets based on a golden set of
pre-aggregated data sources from the data engineer, build
5

The big data decision:

Data warehouse or data lake?
When we talk to customers about their data analytics needs, frequently we’ll hear the
question, “Which do I need: a data lake or a data warehouse?” Given the variety of data
users and needs within an organization, this can be a tricky question to answer that
depends on intended usage, types of data, and personnel.

If you know what datasets you need to analyze, have a clear un-
derstanding of its structure, and have a known set of questions
you need answered, then you are likely looking at a data ware-
house.

On the other hand, if you need discoverability across multiple data

types, are unsure of the types of analyses you’ll need to run, and
are looking for opportunities to explore rather than present preset
insights, and you have the resources to effectively manage and
explore this environment, a data lake is likely going to be more
suitable for your needs.

But there’s more to the decision, so let’s talk through some of the organizational chal-
lenges of each.

Data warehouses are often difficult to manage. The legacy systems that have worked
well in the past 40 years have proven to be very expensive and pose a lot of challenges
around data freshness, scaling, and high costs. Furthermore, they cannot easily provide
AI or real-time capabilities without bolting that functionality on after the fact. These
issues are not just present in on-premise legacy data warehouses; we even see this with
the newly created cloud-based data warehouses as well. Many do not offer integrated
AI capabilities, despite their claims. These new data warehouses are essentially the same
legacy environments but ported over to the Cloud.

Data warehouse users tend to be analysts, often embedded within a specific business
unit. They may have ideas about additional datasets that would be useful to enrich their
understanding of the business. They may have ideas for improvements in the analysis,
data processing, and requirements for business intelligence functionality.
6

However, in a traditional organization, they often don’t have Data lake users typically are closer to the raw data sources
direct access to the data owners, nor can they easily influ- and are equipped with tools and capabilities to explore the
ence the technical decision makers who decide datasets data. In traditional organizations, these users tend to focus
and tooling. In addition, because they are kept separate on the data itself and are frequently held at arm’s length
from raw data, they are unable to test hypotheses or drive a from the rest of the business. This disconnect means that
deeper understanding of the underlying data. business units miss out on the opportunity to find insights
that would drive their business objectives forward to higher
Data lakes have their own challenges. In theory, they are revenues, lower costs, lower risk, and new opportunities.
low cost and easy to scale, but many of our customers have
seen a different reality in their on-premise data lakes. Plan- Given these tradeoffs, many companies end up with a
ning for and provisioning sufficient storage can be expen- hybrid approach, where a data lake is set up to graduate
sive and difficult, especially for organizations that produce some data into a data warehouse or a data warehouse has
highly variable amounts of data. On-premise data lakes can a side data lake for additional testing and analysis. But with
be brittle and maintenance of existing systems takes time. In multiple teams fabricating their own data architectures to
many cases, the engineers who would otherwise be devel- suit their individual needs, data sharing and fidelity gets
oping new features are relegated to the care and feeding of even more complicated for a central IT team.
data clusters. Said more bluntly, they are maintaining value
as opposed to creating new value. Overall, the total cost of Instead of having separate teams with separate goals —
ownership is higher than expected for many companies. Not where one explores the business, and another understands
only that, governance is not easily solved across systems, the business — you can unite these functions and their data
especially when different parts of the organization use systems to create a virtuous cycle where a deeper under-
different security models. As a result, the data lakes become standing of the business drives directed exploration, and
siloed and segmented, making it difficult to share data and that exploration drives a better understanding of the busi-
models across teams. ness.

Data Warehouse (TB scale) Data Lake (PB Scale)

Use case Answer “known” questions Answer “unknown” questions

characteristics Access “known” data Access “unknown” data

Data type and Structured data Unstructured (raw) and structured data
access SQL access manipulation Code-involved access and exploration

Understanding your business Exploring your business

This requires convergence in both the technology and the approach to understanding and
discovering the value in your data.
7

Treat data warehouse

storage like a data lake
You can build a data warehouse or a data lake separately on Google Cloud Platform (GCP), but you don’t have to pick one
or the other. In many cases, the underlying products that our customers use are the same for both, and the only difference
between their data lake and data warehouse implementation is the data access policy that is employed. In fact, the two terms
are starting to converge into a more unified set of functionality, a modern analytics data platform. Let’s look at how this works
in GCP.

BigQuery Storage All Purpose Compute

Query Federation
BQ Storage API
Parquet & ORC
Python Pandas Beam
in GCSQ4
Go Scikit Spark
Cloud
Java Keras MapReduce Dataflow

Jupyter Theano Flink

BigQuery Cloud SQL

BigQuery Compute
Cloud
ANSI SQL BQML Dataproc
UDFs BQGIS
SpannerRoadmap

BigQuery Storage API provides the capability to use BigQ- ing to meet demand no matter the usage by different teams,
uery Storage like Google Cloud Storage (GCS) for a number tools and access patterns. All of the above applications can
of other systems such as Dataflow and Dataproc. This allows run without impacting the performance of any other jobs
breaking down the data warehouse storage wall and enables accessing BigQuery at the same time. In addition, the BigQ-
running high-performance dataframes on BigQuery. In other uery Storage API provides a petabit level network, moving
words, the BigQuery Storage API allows your BigQuery data data between nodes to fulfill a query request effectively
warehouse to act like a data lake. So what are some of the leading to a similar performance to an in-memory opera-
practical uses for it? For one, we built a series of connectors tion. It also allows federating with the popular Hadoop data
- MapReduce, Hive, Spark, for example - so that you can run formats such as Parquet & ORC directly as well as NoSQL
your Hadoop and Spark workloads directly on your data in and OLTP databases. You can go a step further with the
BigQuery. You no longer need a data lake in addition to your capabilities that are provided by Dataflow SQL, which is
data warehouse! Dataflow is incredibly powerful for batch embedded in BigQuery. This allows you to join the streams
and stream processing. Today, you can run Dataflow jobs on with BigQuery tables or data residing in files, effectively
top of BigQuery data, enriching it with data from PubSub, creating a lambda architecture, allowing you to ingest large
Spanner or any other data source amounts of batch and streaming data, while also providing a
serving layer to respond to queries. BigQuery BI Engine and
BigQuery can independently scale both storage and Materialized Views make it even easier to increase efficiency
compute, and each is serverless, allowing for limitless scal- and performance in this multi-use architecture.
8

Google’s smart analytics platform

powered by BigQuery
Serverless data solutions are absolutely necessary to allow your organization to move beyond data silos and into the realm of
insights and action. All of our core data analytics services are serverless and tightly integrated.

Cloud Pub/Sub is a Cloud Dataflow is an Google BigQuery Vertex AI provides

global message queue autoscaling ETL tool provides iinteractive state-of-the-art, auto-
that does not require that unifies batch and querying across orga- scaling training, deploy-
you to manage any streaming analytics nizational boundaries ment, and workflow
infrastructure and can do federated management
queries on common Big
Data formats in Google
Cloud Storage

All these services connect transparently to each other due to clear design and clean implementation.

Change management is often one of the hardest aspects of incorporating any new technology into an organization. Google
Cloud seeks to meet our customers where they are by providing familiar tools, platforms and integrations for developers and
business users alike. Our mission is to accelerate your organization’s ability to digitally transform and reimagine your busi-
ness through data-powered innovation, together. Instead of creating vendor lock-in, Google Cloud provides companies with
options for simple, streamlined integrations with on-premise environments, other Cloud offerings and even the Edge to form a
truly hybrid Cloud:

• BigQuery Omni removes the need for data to be ported from one environment to
another and instead takes the analytics to the data regardless of the environment.

• Apache Beam, the SDK leveraged on Cloud Dataflow, provides transferability and
portability to runners like Apache Spark and Apache Flink.

• For organizations looking to run Apache Spark or Apache Hadoop, Google Cloud
provides Dataproc.
9

Most data users care about what data they have, not which
system it resides in. Having access to the data they need
when they need it is the most important thing. So for the
most part, the type of platform does not matter for users,
so long as they are able to access fresh, usable data with
familiar tools - whether they are exploring datasets, manag-
ing sources across data stores, running ad hoc queries or
developing internal business intelligence tools for executive
stakeholders.

Data Fusion Dataflow Dataproc

(Code-free ETL) (Streaming) (Spark)

Democratized Services
Vertex AI SQL and BI Tools
Data QnA, Connected Sheets

Data Catalog Data Lakes

DLP Databases

External Public
Security Controls
Controls

DTS Connector Pub/Sub

Kafka
Services (Messaging)
10

Emerging Trends
Continuing on this idea of the convergence of a data lake and a data warehouse into a unified analytics data platform, there
are some additional data solutions that are gaining traction. We have been seeing a lot of concepts emerging around Lake-
house and Data Mesh, for example. You may have heard some of these terms before. Some are not new and have been around
in different shapes and formats for years. However, they work very nicely within the Google Cloud environment. Let’s take
a closer look into what a Data Mesh and a Lakehouse would look like in Google Cloud and what they mean for data sharing
within an organization.

Lakehouse and Data Mesh are not mutually exclusive, but they help solve different challenges within an organization. But one
favors enabling data, while the other enables teams. Data Mesh empowers people to avoid being bottlenecked by one team,
and therefore enables the entire data stack. It breaks silos into smaller organizational units in an architecture that provides
access to data in a federated manner. Lakehouse brings the data warehouse and data lake together, allowing different types
and higher volumes of data. This effectively leads to schema-on-read instead of schema-on-write, a feature of data lakes that
was thought to close some of the performance gaps in enterprise data warehouses. As an added benefit, this architecture
also borrows more rigorous data governance, something that data lakes typically lack.

Data Lakehouse Data Mesh

• Removes the overhead of Data Lakes and Data • Removes the organizational barriers becoming
Warehouses the bottleneck
• Data warehouse gets the capabilities of a data • Federates data ownership
lake • Focuses on data as product
• Data Lake gets the capabilities of the Data • Allows for the creation of agile teams and
Warehouses shorter time to insights
• Benefits: • Teams own their data & technology
• Multimodal data access with higher volumes • Provides API / access to other teams
of data
• Decentralized raw and processed data
• Schema on read
• Benefits:
• The governance that Data Lakes lack but
• Well defined, governed and secure
DWHs provide
• Ability to leverage several domains with no
• Enables unified access to batch and real-
data movement
time data
• Leverages DataOps methodologies (builds
on lessons learned in DevOps)

Empowering Technology Empowering People

Lakehouse

As mentioned above, BigQuery’s Storage API lets you treat your data warehouse like a data lake. Spark jobs running on Datap-
roc or similar Hadoop environments can use the data stored on BigQuery rather than requiring a separate storage medium by
taking storage out of the data warehouse.

The sheer compute power that is decoupled from storage within BigQuery enables SQL-based transformation and utilizes
views across different layers of these transformations. This then leads to an ELT-type approach and enables a more agile data
processing platform. Leveraging ELT over ETL, BigQuery enables SQL-based transformations to be stored as logical views.
While dumping all of the raw data into data warehouse storage may be expensive with a traditional data warehouse, there is no
premium charge for BigQuery storage. Its cost is fairly comparable to blob storage in GCS.

When performing ETL, the transformations are taking place outside of BigQuery, potentially in a tool that does not scale as
well. It might end up transforming the data line-by-line rather than parallelizing the queries. There may be instances where
Spark or other ETL processes are already codified and changing them for the sake of new technology might not make sense.
If, however, there are transformations that can be written in SQL, BigQuery is likely a great place to do them.
12

In addition, this architecture is supported by all the GCP components like Composer,
Data Catalog or Data Fusion. It provides an end-to-end layer for different user personas.
Another important aspect of reducing operational overhead can be done by leveraging
the capabilities of the underlying infrastructure. Consider Dataflow and BigQuery, all run
on containers and let us manage the uptime and the mechanics behind the scenes. Once
this is extended to third-party and partner tools, and when they start exploiting similar
capabilities such as Kubernetes, then it becomes much simpler to manage and porta-
ble. In turn, this reduces resource and operational overheads. Furthermore, this can be
complemented by better observability by exploiting monitoring dashboards with Cloud
Composer to lead for operational excellence.

Not only can you build a data lake by bringing together data stored in GCS and BigQuery,
without any data movement or duplication, but we are offering additional administra-
tive functionality to manage your data sources. Dataplex enables a Lakehouse by offer-
ing a centralized management layer to coordinate data in GCS and BigQuery. Doing this
enables you to organize your data based on your business needs, so you are no longer
restricted by how or where that data is stored.

Integrated Analytics Experience

Curate | Integrate | Analyze

Data Data Data Databases

Warehouses Lakes Marts

Unified Data Management

Metadata | Intelligence | Lifecycle | Governance | Security

Dataplex is an intelligent data fabric that enables you to keep your data distributed for the
right price/performance while making this data securely accessible to all your analytics
tools. It provides metadata-led data management with built-in data quality and gover-
nance so you spend less time wrestling with infrastructure boundaries and inefficiencies,
trust the data you have and spend more time deriving value out of this data. Additionally,
it provides an integrated analytics experience, bringing the best of GCP and open-source
together, to enable you to rapidly curate, secure, integrate and analyze their data at scale.
Finally, you can build an analytics strategy that augments existing architecture and meets
your financial governance goals.
13

Data Mesh

Data Mesh is built on a long history of innovation from other words, the people who created the data or brought
across data warehouses and data lakes, combined with it into the organization must also be responsible for creat-
the unparalleled scalability performance pay models, APIs, ing consumable data assets as products from the data they
DevOps and close integration of Google Cloud products. create.

With this approach, you can effectively create an on-de- In many organizations, establishing a “single source of truth”
mand data solution. A Data Mesh decentralizes data or “authoritative data source” is challenging due to the
ownership among domain data owners, each of whom are repeated extraction and transformation of data across the
held accountable for providing their data as a product in a organization without clear ownership responsibilities over
standard way. A Data Mesh also facilitates communication the newly-created data. In the Data Mesh, the authoritative
between different parts of the organization to distribute data source is the Data Product published by the source
datasets across different locations. domain, with a clearly assigned Data Owner and Steward
who is responsible for that data.
In a Data Mesh, the responsibility for generating value from
data is federated to the people who understand it best; in

Project: Sales Data Mesh Project: Customers Project: Products

Data Mesh Data Mesh

Transactions dataset CRM Dataset Prod. Dataset

Online Cust. P.
BigQuery Orders Details Referential

Global Logical Semantic Layer

Raw data access still possible

Data Science ML tooling Analytics BI tooling

In summary, the Data Mesh promises a domain-oriented, decentralized data ownership and architecture. This is enabled by
having federated computation and access layers just like we provide in GCP. Furthermore, if your organization is looking to get
more functionality, you can use something like Looker, which can provide a unified layer to model and access the data. Look-
er’s platform offers a single pane UI to access the truest, most up-to-date version of your company’s data and business defi-
nitions. With this unified view into the business, you can choose or design data experiences that assure people and systems
get data delivered to them in a way that makes the most sense for their needs. It fits in perfectly as it allows data scientists,
analysts and even business users to access their data with a single semantic model. Data scientists are still accessing the raw
data, but without the data movement and duplication.

We’re building additional functionality on top of our workhorse products like BigQuery, to make the creation and management
of datasets easier. Analytics Hub provides the ability to create private data exchanges, in which exchange administrators (a.k.a.
data curators) give permissions to publish and subscribe to data in the exchange to specific individuals or groups both inside
the company and externally to business partners or buyers.

Analytics Hub

Private, Public, Commercial

or Google hosted
exchanges / listings Private Public Commercial Google

Publisher project Subscriber project

BigQuery BigQuery

Source Shared Linked

datasets dataset datasets

VPC-SC perimeter VPC-SC perimeter

Publish, discover and subscribe to shared assets, including open source formats, powered by the scalability of BigQuery.
Publishers can view aggregated usage metrics. Data providers can reach enterprise BigQuery customers with data, insights,
ML models or visualizations and leverage Cloud marketplace to monetize their apps, insights or models. This is also similar to
how BigQuery public datasets are managed through a Google-managed exchange. Drive innovation with access to unique
Google datasets, commercial/industry datasets, public datasets or curated data exchanges from your organization or partner
ecosystem.
15

Dealing with the Legacy

While it sounds great to build a brand new data platform Dataproc from your legacy data warehouses and Hadoop
from the ground up, we understand that not every company clusters. Once data is moved, you can then optimize your
is going to be in a position to do that. Most are dealing with data pipelines and queries for performance. With a Lift and
existing legacy systems that they need to migrate, port, Re-platform migration strategy, you can do this in phases,
or patch until they can be replaced. We have worked with based on the complexity of your workloads. We recommend
customers at every stage of their data platform journey and this approach for large enterprise customers with central-
we have solutions to meet your situation. ized IT and multiple business units, given their complexity.

There are typically three categories of migration that we The second migration strategy we see most often is a full
see among customers: Lift and Replatform, Lift and Rehome modernization as the first step. This provides a clean break
and full Modernization. For most businesses, we suggest from the past because you are going full in with a Cloud-na-
starting with the Lift and Replatform, as it offers a high-im- tive approach. It is built native on GCP, but because you are
pact migration with as little disruption and risk as possible. changing everything in one go, migration can be slower if
With this strategy, you migrate your data into BigQuery or you have multiple, large legacy environments.

Lift & Rehome Lift & Replatform Modernize

• Conservative approach • Optimal phased approach, low • All in on cloud-native, clean

disruption, low risk and high break from the past
• Fast migration from existing
impact
services such as TD Vantage, • Built natively on GCP
Databricks on to GCP • Migrate data into BQ from
• Can be slower as it requires
legacy EDW
• No modernization or improving rewriting jobs
existing solutions apart from • Migrate data into Dataproc from
• Greatest development velocity
running them over GCP as a on-premise Hadoop cluster
and agility
tactical intermediate decision
• Optiize queries and data pipe-
• 60-88% lower TCO than
lines for performance
on-prem, plus value from
• Up to 57% lower TCO than Google AI on unstructured
on-prem data
16

A clean legacy break requires rewriting jobs and changing move your data estate onto Cloud. You can lift and rehome
different applications. However, it provides higher velocity your existing platforms and carry on using them as before
and agility as well and the lowest total cost of ownership but in the GCP environment. This is applicable for envi-
in the long run compared to the other approaches. This is ronments such as Teradata and Databricks for example, to
because of two main reasons: your applications are already reduce the initial risk and allow applications to run. However,
optimized and don’t need to be retrofitted, and once you this brings the existing siloed environment to the Cloud
migrate your data sources, you don’t have to manage two rather than transforming it, so you won’t benefit from the
environments at the same time. This approach is best suited performance of a platform built natively on GCP. However,
for digital natives or engineering-driven organizations with we can help you with a full migration into Google Cloud
few legacy environments. native products, so you can take advantage of interopera-
bility and create a fully modern analytics data platform on
Lastly, the most conservative approach is a Lift and Rehome, Google Cloud.
which we recommend as a short-term tactical solution to

Tactical or strategic?
We think the key differentiators of an analytics data platform built on GCP are that it is open, intelligent, flexible and tightly
integrated. There are a lot of solutions in the market that provide tactical solutions that may feel comfortable and familiar.
However, these generally provide a short-term solution and just compound organization and technical issues over time.
17

Google Cloud significantly simplifies data analytics. You can unlock the potential hidden in your data with a cloud-native,
serverless approach that decouples storage from compute and lets you analyze gigabytes to petabytes of data in minutes.
This allows you to remove the traditional constraints of scale, performance and cost to ask any question of data and solve
business problems. As a result, it becomes easier to operationalize insights across the enterprise with a single, trusted data
fabric.

What are the benefits?

Keeps your focus purely on analytics instead of infrastructure

Solves for every stage of the data analytics lifecycle, from ingestion to transformation and
analysis, to business intelligence and more

Creates a solid data foundation on which to operationalize machine learning

Enables ability to leverage the best open source technologies for your organization

Scales to meet the needs of your enterprise, particularly as you increase your use of data in
driving your business and through your digital transformation

A modern, unified analytics data platform built on GCP gives you the best capabilities of a data lake and a data warehouse,
but with closer integration into the AI platform. You can automatically process real-time data from billions of streaming events
and serve insights in up to milliseconds to respond to changing customer needs. Our industry-leading AI services can opti-
mize your organizational decision making and customer experiences, helping you to close the gap between descriptive and
prescriptive analytics without having to staff up a new team. You can augment your existing skills to scale the impact of AI with
automated, built-in intelligence.
Build a Unified Data
Platform with Google

August 2021

Interested in learning more about how the Google

data platform can transform the way your business
deals with data? Contact us to get started.

Data Strategy and Architecture
100% (4)
Data Strategy and Architecture
19 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Facebook Hacking With CMD PDF
54% (13)
Facebook Hacking With CMD PDF
5 pages
Maui Docs
100% (1)
Maui Docs
865 pages
Snowflake To Lakehouse Migration Assessment 5-23
100% (1)
Snowflake To Lakehouse Migration Assessment 5-23
22 pages
Profisee MDM - Reference Architecture
100% (1)
Profisee MDM - Reference Architecture
5 pages
TR Iso TR 24971-2020 Preview
No ratings yet
TR Iso TR 24971-2020 Preview
12 pages
Data Lakes in A Modern Data Architecture
88% (8)
Data Lakes in A Modern Data Architecture
23 pages
Forrester Enterprise - Data - Fabric - Wave - Q2 - 2020
No ratings yet
Forrester Enterprise - Data - Fabric - Wave - Q2 - 2020
19 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Data Quality: Empowering Businesses with Analytics and AI
From Everand
Data Quality: Empowering Businesses with Analytics and AI
Prashanth Southekal
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Datamesh Ebook
No ratings yet
Datamesh Ebook
46 pages
What Is Data Architecture - A Framework For Managing Data - CIO
No ratings yet
What Is Data Architecture - A Framework For Managing Data - CIO
6 pages
1logical Data Model - Bank
No ratings yet
1logical Data Model - Bank
25 pages
Data Strategy Playbook
100% (2)
Data Strategy Playbook
47 pages
Fast Data Enterprise Data Architecture
100% (2)
Fast Data Enterprise Data Architecture
47 pages
Cloudera Big Data Architecture Diagram
100% (1)
Cloudera Big Data Architecture Diagram
3 pages
DW Architecture & Best Practices
No ratings yet
DW Architecture & Best Practices
67 pages
01 Develop A Master Data Management Strategy and Roadmap Executive Brief
No ratings yet
01 Develop A Master Data Management Strategy and Roadmap Executive Brief
24 pages
ETL vs. ELT: Frictionless Data Integration - Diyotta
100% (1)
ETL vs. ELT: Frictionless Data Integration - Diyotta
3 pages
What Is Snowflake Data Governance
No ratings yet
What Is Snowflake Data Governance
35 pages
Data Management Maturity: Assessment Review
No ratings yet
Data Management Maturity: Assessment Review
44 pages
Report - Atlan - Data Catalog Primer
100% (1)
Report - Atlan - Data Catalog Primer
24 pages
Well Architected Lakehouse Workshop
100% (1)
Well Architected Lakehouse Workshop
49 pages
03 - IBM Watsonx - Data Introduction For Clients
100% (1)
03 - IBM Watsonx - Data Introduction For Clients
31 pages
The Complete Guide To An Enterprise DataOps Transformation (2022)
100% (1)
The Complete Guide To An Enterprise DataOps Transformation (2022)
186 pages
Data Strategy
No ratings yet
Data Strategy
1 page
Moving To A: Modern Data
No ratings yet
Moving To A: Modern Data
10 pages
Creating An Enterprise Data Strategy
0% (1)
Creating An Enterprise Data Strategy
5 pages
Data Strategy Worksheet: Component Typical Questions
No ratings yet
Data Strategy Worksheet: Component Typical Questions
2 pages
The Data Driven Enterprise of 2025 Final
No ratings yet
The Data Driven Enterprise of 2025 Final
10 pages
Data Architecture
No ratings yet
Data Architecture
1 page
DEN80EDU07A01. Introduction To Logical Data Fabric For Architects
100% (1)
DEN80EDU07A01. Introduction To Logical Data Fabric For Architects
43 pages
Modern Data Strategy 1664949335
100% (3)
Modern Data Strategy 1664949335
39 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
Reference Architecture Big Data
100% (1)
Reference Architecture Big Data
3 pages
Demystifying The Medallion and Lakehouse Architectures 1714820046
100% (1)
Demystifying The Medallion and Lakehouse Architectures 1714820046
19 pages
Big Data Maturity Model
100% (1)
Big Data Maturity Model
6 pages
How To Build and Sustain Data Governance Operating Model
No ratings yet
How To Build and Sustain Data Governance Operating Model
35 pages
The Politics of Data Warehousing
No ratings yet
The Politics of Data Warehousing
9 pages
05 - Data As A Product - IBM Watsonx - Data and IBM Cloud Pak For Data
100% (1)
05 - Data As A Product - IBM Watsonx - Data and IBM Cloud Pak For Data
31 pages
Azure Cloud Services: Azure Data Lake Store
No ratings yet
Azure Cloud Services: Azure Data Lake Store
21 pages
Microsoft Modern Data Estate
No ratings yet
Microsoft Modern Data Estate
48 pages
12 Best Practices For Modern Data Integration: White Paper
100% (3)
12 Best Practices For Modern Data Integration: White Paper
10 pages
The Medallion Architecture
100% (1)
The Medallion Architecture
2 pages
The Data Driven Enterprise
No ratings yet
The Data Driven Enterprise
27 pages
DataOps and The Future of Management
No ratings yet
DataOps and The Future of Management
8 pages
Enterprise Data Office PoV
No ratings yet
Enterprise Data Office PoV
10 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Data Governance On Unity Catalog - Jul 2024
No ratings yet
Data Governance On Unity Catalog - Jul 2024
56 pages
Roadmap For Data Lit 729278 NDX
100% (1)
Roadmap For Data Lit 729278 NDX
20 pages
Defining A Data Strategy
No ratings yet
Defining A Data Strategy
9 pages
Azure Synapse Course Presentation
100% (1)
Azure Synapse Course Presentation
261 pages
Data Fabric Architecture (More Detail) - Version 1.0
100% (2)
Data Fabric Architecture (More Detail) - Version 1.0
2 pages
Example Star Schema For Banking
No ratings yet
Example Star Schema For Banking
16 pages
The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
No ratings yet
The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
12 pages
Data Profiling White Paper1003-Final
No ratings yet
Data Profiling White Paper1003-Final
17 pages
Data Vault Case Study
No ratings yet
Data Vault Case Study
6 pages
Conceptual Data Model, Logical Data Model, and Physical Data Model-Detail
100% (1)
Conceptual Data Model, Logical Data Model, and Physical Data Model-Detail
4 pages
PWC A4 Data Governance Results
100% (2)
PWC A4 Data Governance Results
36 pages
Data Strategy, Data Management and Governance 19.04.2022
No ratings yet
Data Strategy, Data Management and Governance 19.04.2022
33 pages
Designing A Modern Data Warehouse in Azure
100% (1)
Designing A Modern Data Warehouse in Azure
25 pages
4 Key Security Trends For 2023 New Threat Shows Attackers Increasingly Exploiting MFA Fatigue
No ratings yet
4 Key Security Trends For 2023 New Threat Shows Attackers Increasingly Exploiting MFA Fatigue
179 pages
Finaal 17 Junie 2021 PAIA Art 51 Handleiding Solidariteit - ENG
No ratings yet
Finaal 17 Junie 2021 PAIA Art 51 Handleiding Solidariteit - ENG
21 pages
WEF - Partnering Against Corruption Initiative (PACI) Signatories 2022-08
No ratings yet
WEF - Partnering Against Corruption Initiative (PACI) Signatories 2022-08
1 page
Trust Property Control Act Regulations
No ratings yet
Trust Property Control Act Regulations
1 page
Henley Business School Africa3.1-Student-Disciplinary-Code
No ratings yet
Henley Business School Africa3.1-Student-Disciplinary-Code
6 pages
US SEC Proposed Rule Cybersecurity Risk Management, Strategy, Governance, Aand Incident Disclosure
No ratings yet
US SEC Proposed Rule Cybersecurity Risk Management, Strategy, Governance, Aand Incident Disclosure
129 pages
Thedtic - gov.za-BEE Advisory Council
No ratings yet
Thedtic - gov.za-BEE Advisory Council
5 pages
The Unexpressed Terms of A Contract - Stevr Cornelius
No ratings yet
The Unexpressed Terms of A Contract - Stevr Cornelius
10 pages
Handbook For The Appointment of Persons To Boards of State and State Controlled Institutions - G-RSA January 2009
No ratings yet
Handbook For The Appointment of Persons To Boards of State and State Controlled Institutions - G-RSA January 2009
61 pages
NOSA Comapny Profile
No ratings yet
NOSA Comapny Profile
30 pages
Business Plan Guide - FNB
No ratings yet
Business Plan Guide - FNB
4 pages
MBSA Audit System Rev 2022
No ratings yet
MBSA Audit System Rev 2022
19 pages
Saiosh CPD Policy Rev 3
No ratings yet
Saiosh CPD Policy Rev 3
5 pages
ISO 14050 2020 DAmd 1 (E)
No ratings yet
ISO 14050 2020 DAmd 1 (E)
46 pages
Recordkeeping Standards - © Government Records Office, Archives of Manitoba, December 2022
No ratings yet
Recordkeeping Standards - © Government Records Office, Archives of Manitoba, December 2022
6 pages
1st CD 62443-6-1
100% (2)
1st CD 62443-6-1
90 pages
ICCASA Certification Procedure
No ratings yet
ICCASA Certification Procedure
11 pages
Data Encryption Techniques Evolutionand Future Directions
No ratings yet
Data Encryption Techniques Evolutionand Future Directions
12 pages
Distributed Database System: By: Madiha Hameed
No ratings yet
Distributed Database System: By: Madiha Hameed
13 pages
MIL 4th Quarter Exam
No ratings yet
MIL 4th Quarter Exam
3 pages
Ramesh Kumar Sharma: Executive Summary
No ratings yet
Ramesh Kumar Sharma: Executive Summary
2 pages
1 4974336693076230922
No ratings yet
1 4974336693076230922
23 pages
Advanced Computer Architectures: Exception Handling
No ratings yet
Advanced Computer Architectures: Exception Handling
17 pages
Traditional Data Center V
No ratings yet
Traditional Data Center V
5 pages
Computer Padhanam Adisthana Vivarangal
No ratings yet
Computer Padhanam Adisthana Vivarangal
4 pages
Online Food Ordering Website Is A Web Based Software That Helps To Order Food Online
No ratings yet
Online Food Ordering Website Is A Web Based Software That Helps To Order Food Online
29 pages
Claroty Solution Brief 2019
No ratings yet
Claroty Solution Brief 2019
16 pages
Milatary Network
No ratings yet
Milatary Network
55 pages
Manufacturing Flash Tool: User Guide
No ratings yet
Manufacturing Flash Tool: User Guide
35 pages
Diffie-Hellman Key Exchange
No ratings yet
Diffie-Hellman Key Exchange
29 pages
Module 5.1 - Microservices
No ratings yet
Module 5.1 - Microservices
25 pages
Dynamic Application Security Testing1
No ratings yet
Dynamic Application Security Testing1
25 pages
Insert Electrical Raceway in Pds
No ratings yet
Insert Electrical Raceway in Pds
4 pages
Science Computer Assigment
No ratings yet
Science Computer Assigment
5 pages
Router Intellinet
No ratings yet
Router Intellinet
64 pages
QPSK - Quaternary Phase-Shift Keying
No ratings yet
QPSK - Quaternary Phase-Shift Keying
12 pages
Module 3 - Introduction To VMWare
No ratings yet
Module 3 - Introduction To VMWare
4 pages
Awcb 178 Nfdatasheet
No ratings yet
Awcb 178 Nfdatasheet
18 pages
The Diverse and Exploding Digital Universe - Executive Summary
No ratings yet
The Diverse and Exploding Digital Universe - Executive Summary
4 pages
CCN Lab 01
No ratings yet
CCN Lab 01
5 pages
Minicomputer
No ratings yet
Minicomputer
7 pages
Garden City University College, Kenyasi - Kumasi Faculty of Applied Science
No ratings yet
Garden City University College, Kenyasi - Kumasi Faculty of Applied Science
12 pages
CH32V003DS0 en
100% (1)
CH32V003DS0 en
34 pages
Profitrace Detailed Network: 1 General
No ratings yet
Profitrace Detailed Network: 1 General
14 pages
Proofpoint Enterprise Continuity Data Sheet
No ratings yet
Proofpoint Enterprise Continuity Data Sheet
2 pages

Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021

Uploaded by

Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021

Uploaded by

Whitepaper

Build a modern, unified analytics data platform

been slowed by unreliable data sources. Around 86% of data

decisions about storage mechanisms must be made in a

• Reporting & Analytics “80% of analytics work is still

• Multiple Clouds “90% of employees say that

• Volume, Velocity & Variety “86% of analysts struggle

further. prevents those insights from getting released into produc-

Google Cloud is changing the way

A Platform for all users

The big data decision:

On the other hand, if you need discoverability across multiple data

Data Warehouse (TB scale) Data Lake (PB Scale)

Use case Answer “known” questions Answer “unknown” questions

Understanding your business Exploring your business

Treat data warehouse

BigQuery Storage All Purpose Compute

Jupyter Theano Flink

Google’s smart analytics platform

Cloud Pub/Sub is a Cloud Dataflow is an Google BigQuery Vertex AI provides

Data Fusion Dataflow Dataproc

Data Catalog Data Lakes

DTS Connector Pub/Sub

Data Lakehouse Data Mesh

Empowering Technology Empowering People

Integrated Analytics Experience

Data Data Data Databases

Unified Data Management

Project: Sales Data Mesh Project: Customers Project: Products

Transactions dataset CRM Dataset Prod. Dataset

Global Logical Semantic Layer

Raw data access still possible

Data Science ML tooling Analytics BI tooling

Private, Public, Commercial

Publisher project Subscriber project

Source Shared Linked

VPC-SC perimeter VPC-SC perimeter

Dealing with the Legacy

Lift & Rehome Lift & Replatform Modernize

• Conservative approach • Optimal phased approach, low • All in on cloud-native, clean

What are the benefits?

Creates a solid data foundation on which to operationalize machine learning

Interested in learning more about how the Google

You might also like