Handout Streamline Data and AI Governance With Amazon SageMaker Catalog
Handout Streamline Data and AI Governance With Amazon SageMaker Catalog
Leonardo Gomez
(he/him)
Principal Data Governance Specialist
AWS
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Agenda
01 02 03
04 05 06
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
What are enterprise customers looking for?
Discover, understand, and access
data across the organization (AWS and non-
AWS)
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Amazon SageMaker Unified Studio
O N E D A TA A N D A I D E V E L O P M E N T E N V I R O N M E N T T O C O L L A B O R A T E A N D B U I L D F A S T E R
Lakehouse
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed. 5
The importance of data
governance
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Different data governance needs
Technical producer Technical consumer
Role
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Catalog Metadata curation
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Business catalog
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Archaeology vs. data science
THE DISCOVERY
The discovery
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Building an enterprise data catalog
RO L ES A N D L AYE RS
Technical Technical
Technical catalog
producers consumers
Data sources
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Building blocks of a business data catalog
Assets Business glossaries Metadata forms
Assets in SageMaker catalog represent the Create a glossary of standard business- and Create metadata forms and apply them
technical metadata and business metadata data-related terms with clear definitions to assets to capture the technical and
of a data object, such as a table, dashboard, that everyone in the organization can business metadata of assets in a
or view understand consistent manner
Catalog search
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Adding metadata to the catalog
Business glossaries Metadata forms Data assets
Regions Sales identifiers Metadata forms attached to
assets enable data
e.g., AWS Glue table, Amazon Redshift table
Asia Sales region Regions producers to enrich
• Singapore Business glossaries business context U.S. sales data
Sales year Integer
• India can be used in Business label U.S. sales data
• Japan metadata forms
Business description Sales in U.S.
Americas Sales region = U.S.
Asset ownership Metadata forms
• U.S. Sales year = 2021
• Canada Technical expert String • Sales identifiers
Technical expert = [email protected]
• Asset ownership
Asset owners String Asset owner = [email protected]
Data clarification Technical metadata us_sales
Public Business glossary term can be • Technical name
AWS Glue
used to enrich the metadata of • Source type
Personal info an asset us-east-1
• AWS Region
Confidential 123456789012
• AWS account ID
• Amazon S3 location
S3://location
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
• Catalog different types of assets such as tables,
Catalog diverse assets dashboards, ML models, and many more with business
and automate context context
Remove time from manual entry of data • Directed search results with filters (facets)
attributes in the data catalog
• Search recommendations to find results quickly
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Asset detail page with technical and business metadata
Business label
Asset
version
Description of asset
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Type of assets
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data lineage
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data lineage in SageMaker Catalog
What’s available
• API driven data lineage
• API support: Open Lineage compliant
• Interactive visualization
• Automatically captures lineage
• Programmatically captures lineage of activities
outside of SageMaker Catalog
• Asset and column lineage with
version support
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Technical user: Impact analysis
Julia starts with the data asset she would like to modify. From that asset she views downstream
dependencies to understand impact.
Julia
Data engineer
• Understand downstream
dependencies
• See usage information of the data
asset (i.e., used by
3 reports, 1 month ago)
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Business user: Regulatory compliance
Susan starts at the report detail page and selects the report column in question.
She looks upstream to see how the column was calculated and from which sources to respond to
auditors’ queries.
Susan
Administrator
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data quality
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data quality
What we have available
• Integration with AWS Glue data quality using data source
• APIs available for third-party and programmatic entry
• Data quality visualization with key grouping and filtering
• Asset and column view for data-quality scores
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data products
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Data product: Group assets into curated packages
Data product features
Asset 1 Asset 1 Asset 1 • Purpose-driven: bundles assets, classifies
Without data products
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Security
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Without fine-grained access control
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
With fine-grained access control
Column Filter
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
With fine-grained access control
Column Filter
Row Filter
2 234567891 7522298205 23 6 4 Europe
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Publish and subscribe
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Reference Architecture
Producer associated account Central governance account Consumer associated account
Corporate Domain
AWS Glue
Claims Domain Unit Marketing Domain Unit
Data Catalog
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
End-to-end demo
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
!
Please complete the
session survey
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.
Thank you!
Leonardo Gomez
Principal Data Governance Specialist
Data & AI Governance
AWS
© 2025, Amazon W eb Services, Inc. or its affil iates. All ri ghts reserv ed.