100% found this document useful (1 vote)
681 views31 pages

05 - Data As A Product - IBM Watsonx - Data and IBM Cloud Pak For Data

Uploaded by

Ming Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
681 views31 pages

05 - Data As A Product - IBM Watsonx - Data and IBM Cloud Pak For Data

Uploaded by

Ming Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Data as a Product:

IBM watsonx.data &


IBM Cloud Pak for Data
Content by:
Danny Arnold
Principal, Learning Content Development | Data & AI
[email protected]

Presenter:
Ahmad Muzaffar Baharudin
Technical Enablement Specialist | Data & AI
[email protected]
01
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

Does IBM watsonx.data will replace
IBM Cloud Pak for Data (CP4D) ?”

2
02
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

Why do we need IBM watsonx, when IBM Cloud Pak for
Data (CP4D) already provides data source and data
governance capabilities?”

3
03
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

We use a CP4D system today, which has Db2
Warehouse as our data repository.
What advantage would watsonx.data provide us today?

4
04
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

How does IBM watsonx.data differ from other data
sources available with IBM Cloud Pak for Data (CP4D)?”

5
IBM Cloud Pak for Data
IBM Cloud Pak for Data (CP4D) provides the data fabric platform that
provides the infrastructure, essential services, management and
governance tools for your data environment.

Analyze Data & Infuse AI


50+ analytics services, AI apps, and industry solutions. Manage
your favorite open-source capabilities along side IBM market
leading differentiators.

Organize Data
Catalog and govern all enterprise data, models, rules, and
insights through a common experience.

Collect Data OpenShift


Virtually connect, manage and query data assets no matter where Run anywhere with
they live. Provision databases in minutes. OpenShift Container
Platform

Run anywhere
Public clouds, private clouds, on-premises, and
hyperconverged systems.

6
IBM watsonx
IBM watsonx.data provides the data mesh capability that allows Enable fine-tuned models to be
separate management of each data domain and allows those data managed through market leading
domains to span multiple locations but be viewed as a single data governance and lifecycle
lakehouse. management capabilities
Leverage foundation
models to automate data
search, discovery, and
linking in watsonx.data
watsonx.governance

watsonx.ai

watsonx.data

Leverage governed enterprise


data in watsonx.data to
seamlessly train or fine-tune
foundation models

1 Prompting

watsonx.data watsonx.ai watsonx.governance


2 Prompt
Tuning Scale a workloads, Train, validate, tune Enable responsible,
for all your data, and deploy AI transparent and
3 Fine-tuning anywhere models explainable AI workflows

4
Training from 7
scratch
Integrating, Managing and Harnessing Your Data
to Unlock Values

1 2 3 4 5

Data Data Products Data as a Product Data Fabric Data Mesh

8
Integrating, Managing and Harnessing Your Data
to Unlock Values

1 2 3 4 5

Data Data Products Data as a Product Data Fabric Data Mesh

9
Data

• Raw data, often referred to simply as "data," is the unprocessed and unorganized information collected
from various sources.
• It represents the initial, untouched form of information that hasn't undergone any transformation,
analysis, or interpretation and commonly associated with data lakes.
• It forms the basis for data analysis and is typically refined and organized into a more accessible format
within a data warehouse.

10
Integrating, Managing and Harnessing Your Data
to Unlock Values

1 2 3 4 5

Data Data Products Data as a Product Data Fabric Data Mesh

11
Data Products

• A complete data package that contains three main parts:


• Processed Dataset: The organized and analyzed data that provides answers or insights.
• Metadata: Data description that explains where the data comes from, how it's structured, and other
important details.
• Data Access Pattern: Shows how people can use or get the data, like through apps, reports, or other
tools.

12
Integrating, Managing and Harnessing Your Data
to Unlock Values

1 2 3 4 5

Data Data Products Data as a Product Data Fabric Data Mesh

13
Data as a Product

• An operational approach of packaging data in a way that makes it easier to consume.


• The domains should consider analytical data as a first-class product rather than considering it a by-
product of their business operations. They should also apply all the aspects of product development to
make it valuable, useful, reliable, and customer-focused.
• Transform raw data into a valuable, accessible resource that provides useful information or insights to
data consumers.

14
Integrating, Managing and Harnessing Your Data
to Unlock Values

1 2 3 4 5

Data Data Products Data as a Product Data Fabric Data Mesh

15
Data Fabric

• An architectural approach of simplifying data access in an organization and facilitates self-service data
consumption. This architecture is agnostic to data environments, processes, utility and geography, all
while integrating end-to-end data-management capabilities.
• A data fabric automates data discovery, governance, and consumption, enabling enterprises to use data
to maximize their value chain by providing the right data, at the right time, regardless of where it resides.
• Key elements of generic data fabric
• Data ingestion
• Data discovery
• Data processing
• Data access
• Data orchestration
• Data management & intelligence

16
Integrating, Managing and Harnessing Your Data
to Unlock Values

1 2 3 4 5

Data Data Products Data as a Product Data Fabric Data Mesh

17
Data Mesh

• An organizational approach of managing and distributing data within a company where the
responsibility for data is decentralized, to treat data as a product and shift ownership and accountability
for data to the teams or departments that have the most knowledge and expertise about that data.
• Each domain acts as its own data "product team" responsible for managing and curating the data that is
relevant to their specific operations or functions.
• Promotes data democratization and foster a more agile and scalable data infrastructure within the
organization.

18
Data Fabric with IBM

• Data fabric is an architectural approach to simplifying access to data.


• IBM Cloud Pak for Data is designed for a data fabric – no assembly required.

Components of IBM Cloud Pak for Data User Interface (UI) of IBM Cloud Pak for Data

19
Data Mesh with IBM

• Data mesh is an organizational approach of managing and distributing data.


• IBM watsonx.data is designed as a data lakehouse, optimized for all data and AI workloads to enable
data mesh capability.
Components of IBM watsonx.data User Interface (UI) of IBM watsonx.data

20
Enabling data mesh
architecture through
IBM watsonx.data
• IBM watsonx.data is a data lakehouse that
realizes data mesh capability, by including
multiple data locations into a single lakehouse
Domain 1 Domain 5
platform within your organization.
• You can access and leverage data distributed Domain 3
across departments (Sales, Finance, Marketing, ETL ETL
Customer Care, etc) or even external data (open
data or enterprise data). ETL

Domain 2 Domain 4

ETL ETL

21
Combining IBM CP4D and IBM watsonx.data

IBM watsonx.data is easily combined with IBM Cloud Pak for Data providing a data fabric and a data mesh
solution for clients.

IBM watsonx.data is a cartridge for


IBM Cloud Pak for Data
22
Data as a Product with IBM

• IBM Cloud Pak for Data and IBM watsonx combine to


provide Data as a Product capability.
• IBM Cloud Pak for Data provides the data pipeline, data
governance, and data integration for data processing.
• IBM watsonx provides the governance and controls for
the AI models and components to ensure trustworthy
and explainable models.

23
Data as a Product with IBM

• IBM Cloud Pak for Data (CP4D) provides the data fabric Data Mesh
platform that provides the infrastructure, essential Domain 2
services, management, and governance tools for your Domain 1

data environment. Domain 3


• IBM watsonx.data provides the data mesh capability
that allows separate management of each data domain
Domain 4
and allows those data domains to span multiple Domain 5
locations but be viewed as a single data lakehouse.
• IBM Cloud Pak for Data and watsonx.data combine to
provide Data as a Product capability. IBM Cloud Pak for Data

Data Fabric

IBM CP4D + IBM watsonx.data provides


flexibility of deployment without vendor lock-in

24
Answers to client’s objections…

25
01
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

Does IBM watsonx.data will replace
IBM Cloud Pak for Data (CP4D) ?”

No. IBM watsonx.data is not a replacement for IBM Cloud Pak for
Data. It is another data source that can be part of a client’s data
fabric architecture.

IBM Cloud Pak for Data provides the enterprise-wide data fabric
that all clients need to implement a modern data foundation for
their businesses.

Meanwhile, IBM watsonx platform has both traditional AI,


Generative AI and Foundation Models as the underlying reason
that the platform was created.

26
02
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

Why do we need IBM watsonx, when IBM Cloud Pak for
Data (CP4D) already provides data source and data
governance capabilities?”

IBM Cloud Pak for Data focuses on delivering a data fabric for
organizations, while IBM Watsonx is an AI platform emphasizing
foundation models and generative AI, offering trusted and
explainable AI models.

Watsonx.data serves as a cost-efficient data lakehouse, managing


hybrid cloud data sources, and using open-source tech for data
access. Watsonx.governance ensures model transparency. If AI
isn't your goal, Watsonx.ai and governance may not be necessary.

Watsonx complements IBM CP4D and can be used with it.


Watsonx.data is also available as a cartridge to enhance CP4D’s
data sources, enabling data mesh.

27
03
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

We use a CP4D system today, which has Db2
Warehouse as our data repository. What advantage
would watsonx.data provide us today?

If your data analytics currently rely solely on Db2 Warehouse,


watsonx.data won't offer any immediate benefits. However, if you
anticipate expanding your data sources in the future, especially
with a mix of on-premises and cloud data, watsonx.data becomes
valuable.

It enables seamless integration of diverse data sources in the


hybrid cloud, allowing you to add public cloud test data and
optimize query costs efficiently.

Unlike Db2 Warehouse's single query engine, watsonx.data


ensures cost-effective compute resources, crucial for budget
management in a cloud environment where performance needs
vary.

28
04
Objections you might receive
from existing clients of
IBM Cloud Pak for Data (CP4D)

How does IBM watsonx.data differ from other data
sources available with IBM Cloud Pak for Data (CP4D)?”

Data sources like Db2 Warehouse and OEM databases (MongoDB,


SingleStore, EDB, etc) in IBM CP4D use separate query engines
with distinct SQL dialects, requiring users to learn different
query syntax.

Switching between databases is the only way to optimize


compute resources, but it necessitates changing the query syntax.

IBM watsonx.data offers a unified SQL syntax for all queries


accessing Apache Iceberg tables, even if these tables are
distributed across various locations in the hybrid cloud. This
enables data separation for creating a data mesh architecture
alongside IBM CP4D's data fabric.

29
© 2023 International Business Machines Corporation

Thank you
IBM and the IBM logo are trademarks of IBM
Corporation, registered in many jurisdictions
worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list
of IBM trademarks is available on ibm.com/trademark.

THIS DOCUMENT IS DISTRIBUTED “AS IS” WITHOUT


ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN
NO EVENT, SHALL IBM BE LIABLE FOR ANY DAMAGE
ARISING FROM THE USE OF THIS INFORMATION,
INCLUDING BUT NOT LIMITED TO, LOSS OF DATA,
BUSINESS INTERRUPTION, LOSS OF PROFIT OR
LOSS OF OPPORTUNITY.

Client examples are presented as illustrations of how


those clients have used IBM products and the results
they may have achieved. Actual performance, cost,
savings or other results in other operating
environments may vary.

Not all offerings are available in every country in which


IBM operates.

IBM’s statements regarding its plans, directions, and


intent are subject to change or withdrawal without
notice at IBM’s sole discretion. Information regarding
potential future products is intended to outline our
general product direction and it should not be relied
on in making a purchasing decision. The information
mentioned regarding potential future products is not
a commitment, promise, or legal obligation to deliver
any material, code or functionality. Information about
potential future products may not be incorporated into
any contract. The development, release, and timing of
any future features or functionality described for our
products remains at our sole discretion.

Red Hat and OpenShift are registered trademarks of


Red Hat, Inc. or its subsidiaries in the United States
and other countries.

30

You might also like