0% found this document useful (0 votes)
19 views35 pages

Handout Streamline Data and AI Governance With Amazon SageMaker Catalog

Uploaded by

dingo37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views35 pages

Handout Streamline Data and AI Governance With Amazon SageMaker Catalog

Uploaded by

dingo37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Streamline data and AI governance with

Amazon SageMaker Catalog

Leonardo Gomez
(he/him)
Principal Data Governance Specialist
AWS

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Agenda

01 02 03

The importance of Business catalog Data lineage


data governance

04 05 06

Data quality Data products Publish and subscribe

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
What are enterprise customers looking for?
Discover, understand, and access
data across the organization (AWS and non-
AWS)

Ability for multiple personas


to collaborate on same data problem

Democratize data with built- Easy-to-use and easy-to-access analytics and


BI tools (AWS and non-AWS)
in governance
Self-service via a single pane of glass across
the organization

Mechanisms to govern data and tools


being used to solve data needs

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Amazon SageMaker Unified Studio
O N E D A TA A N D A I D E V E L O P M E N T E N V I R O N M E N T T O C O L L A B O R A T E A N D B U I L D F A S T E R

SageMaker Unified Studio (Preview)

Data Data Data Generative AI Data Fine-grained Domain units


catalog quality lineage business products access control
context

Data & AI governance

Lakehouse

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed. 5
The importance of data
governance

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Different data governance needs
Technical producer Technical consumer

• File formats • Schema metadata


• Schema definition • Data location and format
• Partitioning • Partitioning information
• Data ownership • Versioning
• Data classification • Size
Type of user

Business producer Business consumer

• Field definitions with business context • Data lineage


• Data classification • Data categories
• Asset documentation • Data freshness
• Data quality indicators • Data quality indicators
• Access control • Access and usage guidelines

Role

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Catalog Metadata curation

Enterprise-wide Security Enterprise Data


lineage
data
data governance governance

Business Data quality


glossary

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Business catalog

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Archaeology vs. data science
THE DISCOVERY

The discovery

In archaeology, the discovery In data science, the


builds the catalog catalog enables the discovery

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Building an enterprise data catalog
RO L ES A N D L AYE RS

Enterprise data catalog

Business Data sharing layer Business


producers consumers
Metadata curation layer

Technical Technical
Technical catalog
producers consumers

Managed cataloging Unmanaged

Data sources

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Building blocks of a business data catalog
Assets Business glossaries Metadata forms

Assets in SageMaker catalog represent the Create a glossary of standard business- and Create metadata forms and apply them
technical metadata and business metadata data-related terms with clear definitions to assets to capture the technical and
of a data object, such as a table, dashboard, that everyone in the organization can business metadata of assets in a
or view understand consistent manner

Catalog search

Search metadata across all catalog entities

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Adding metadata to the catalog
Business glossaries Metadata forms Data assets
Regions Sales identifiers Metadata forms attached to
assets enable data
e.g., AWS Glue table, Amazon Redshift table
Asia Sales region Regions producers to enrich
• Singapore Business glossaries business context U.S. sales data
Sales year Integer
• India can be used in Business label U.S. sales data
• Japan metadata forms
Business description Sales in U.S.
Americas Sales region = U.S.
Asset ownership Metadata forms
• U.S. Sales year = 2021
• Canada Technical expert String • Sales identifiers
Technical expert = [email protected]
• Asset ownership
Asset owners String Asset owner = [email protected]
Data clarification Technical metadata us_sales
Public Business glossary term can be • Technical name
AWS Glue
used to enrich the metadata of • Source type
Personal info an asset us-east-1
• AWS Region
Confidential 123456789012
• AWS account ID
• Amazon S3 location
S3://location

Product lines Business glossaries


Confidential
Active gear Data classification
Business glossary attached to a
• Sports column of an asset in the Schema Examples:
• Swim catalog • Column names
[p_id, p_line]
• Golf • Business labels
[product id, product line]
Active wear • Business descriptions
[INT, FLOAT]
• Shirts • Data types
• Shoes • Business glossary
Product lines

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
• Catalog different types of assets such as tables,
Catalog diverse assets dashboards, ML models, and many more with business
and automate context context

• Build an inventory of imported data assets, but


with generative AI selectively publish data in business data catalog

• Augment business context for assets with automated


metadata recommendations

Remove time from manual entry of data • Directed search results with filters (facets)
attributes in the data catalog
• Search recommendations to find results quickly

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Asset detail page with technical and business metadata

Business label

Asset
version
Description of asset

Glossary terms attached to the


asset

Technical metadata automatically


pulled from the source

Metadata forms attached to


capture additional business
metadata

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Type of assets

Data assets SageMaker ML assets Other assets

• Amazon Redshift • Models • Via custom assets


• AWS Glue • Feature stores

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data lineage

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data lineage in SageMaker Catalog
What’s available
• API driven data lineage
• API support: Open Lineage compliant
• Interactive visualization
• Automatically captures lineage
• Programmatically captures lineage of activities
outside of SageMaker Catalog
• Asset and column lineage with
version support

Use cases in focus:


• Provenance/source of data
• Impact analysis
• Troubleshooting data issues
• Data governance and compliance

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Technical user: Impact analysis
Julia starts with the data asset she would like to modify. From that asset she views downstream
dependencies to understand impact.

Julia
Data engineer

• Understand downstream
dependencies
• See usage information of the data
asset (i.e., used by
3 reports, 1 month ago)

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Business user: Regulatory compliance
Susan starts at the report detail page and selects the report column in question.
She looks upstream to see how the column was calculated and from which sources to respond to
auditors’ queries.

Susan
Administrator

• View lineage of data assets along with


columns
• Traverse the lineage graph to view
the upstream/downstream
transformations for a column
• View snapshots of an asset to view
how columns have changed over time

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data quality

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data quality
What we have available
• Integration with AWS Glue data quality using data source
• APIs available for third-party and programmatic entry
• Data quality visualization with key grouping and filtering
• Asset and column view for data-quality scores

Use cases in focus


• Visualize DQ scores for your consumers for more trust in
data
• Group DQ scores per your business requirements such as
completeness, timeliness, etc.
• Gather and share in your Amazon SageMaker Catalog
quality scores from multiple DQ tools

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data products

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data product: Group assets into curated packages
Data product features
Asset 1 Asset 1 Asset 1 • Purpose-driven: bundles assets, classifies
Without data products

core/dimensional assets, adds business


context
Ingest into Amazon
DataZone Asset 2 Asset 2 Asset 2
• Integrated: bundles various data types from
diverse sources as a group
Asset N Asset N Asset N
• Accessible: access all assets via one
workflow
Publish individually Discover individually Subscribe individually
• Maintainable::supports data product
registration, versioning, and access control as
a group

Data product Data product Data product Examples


With data products

Asset 1 Fact table


Asset 1 Fact table
Asset 1 Fact table • ‘Sales' data product with store information,
Fact table Fact table Fact table
customer demographics and related
Asset 2 Asset 2 Asset 2
transactions tables.
Asset N Dimension table Asset N Dimension table Asset N Dimension Table
• ML model with supporting artifacts such as
tables, files, and images
Publish as a group Discover as a group Subscribe as a group

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Security

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Without fine-grained access control

id email ssn phone amt units discount region

Approve access to 1 [email protected] 123456789 9652246319 12 1 2 America


entire table
2 [email protected] 234567891 7522298205 23 6 4 Europe

3 [email protected] 345678912 5105003729 23.1 3 6 Europe

4 [email protected] 456789123 7167501314 2.3 5 4 Asia Pacific


Data producers 5 [email protected] 567891234 7905351186 34.3 12 3 America

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
With fine-grained access control
Column Filter

id email ssn phone amt units discount region


Approve access
1 123456789 9652246319 12 1 2 America
with filters
2 234567891 7522298205 23 6 4 Europe

3 345678912 5105003729 23.1 3 6 Europe

4 456789123 7167501314 2.3 5 4 Asia Pacific


Data producers 5 567891234 7905351186 34.3 12 3 America

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
With fine-grained access control
Column Filter

id email ssn phone amt units discount region


Approve access
1 123456789 9652246319 12 1 2 America
with filters

Row Filter
2 234567891 7522298205 23 6 4 Europe

3 345678912 5105003729 23.1 3 6 Europe

4 456789123 7167501314 2.3 5 4 Asia Pacific


Data producers 5 567891234 7905351186 34.3 12 3 America

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Publish and subscribe

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Reference Architecture
Producer associated account Central governance account Consumer associated account

Corporate Domain

Data SageMaker Data Mesh


Data
Producer Catalog Admin
Consumer

AWS Glue
Claims Domain Unit Marketing Domain Unit
Data Catalog

Marketing DL SQL Analytics


Claims DL
Project Project
Claims dataset
Amazon S3

Business data Portal


Business Metadata
Glossaries catalog Forms

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
End-to-end demo

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
!
Please complete the
session survey

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Thank you!
Leonardo Gomez
Principal Data Governance Specialist
Data & AI Governance
AWS

© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.

You might also like