0% found this document useful (0 votes)
729 views18 pages

Building Medallion Architectures 1742969743

The document discusses Medallion Architecture, a three-layered data management framework that enhances data structure and quality through Bronze, Silver, and Gold layers. It addresses implementation challenges, best practices, and the evolving nature of data management, particularly with the rise of large language models. The author emphasizes the importance of governance, flexibility, and consensus in scaling and adapting the architecture to meet enterprise needs.

Uploaded by

Marcio Franca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
729 views18 pages

Building Medallion Architectures 1742969743

The document discusses Medallion Architecture, a three-layered data management framework that enhances data structure and quality through Bronze, Silver, and Gold layers. It addresses implementation challenges, best practices, and the evolving nature of data management, particularly with the rise of large language models. The author emphasizes the importance of governance, flexibility, and consensus in scaling and adapting the architecture to meet enterprise needs.

Uploaded by

Marcio Franca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Building Medallion Architectures

Piethein Strengholt
Piethein Strengholt
Introducing myself

Enterprise Architect for Data & AI

O’Reilly Author - Data Management at


Scale & Building Medallion Architectures

Blogger:
https://www.linkedin.com/in/pietheinstrengholt
https://piethein.medium.com
Why I wrote a book about this?
Motivations for Writing a Book to Demystify Implementation Challenges and Best Practices

• Ambiguity surrounding what Medallion Architecture is and isn’t.


• Endless conversations about the objectives of each layer.
• Practitioners find it difficult to implement this architecture correctly.
• Limited resources available for best practices, examples, code snippets, etc.
• Evolving concept with large language models (LLMs) and the need for managing
unstructured data.
• Complexity increases when implementing this architecture at scale.
What is a Medallion Architecture?
Medallion architecture arranges data into three layers, enhancing the data’s structure
and quality as it progresses through the layers.

Sources Consumers
Compute layer (Spark)
Apps
Data layering

Users
Bronze Silver Gold

Operational
Data ingestion & orchestration services systems
Data platform
The three-layered design isn’t new
Separating ingestion, transformation and consumption was already a best practice for
many year
Staging area
(ingest) Data warehouse
Sources Consume
Consuming
ODS Transform Data mart services

Consuming
Data Warehouse Data mart services

Consuming
Data mart services

5
More detailed examination of the Medallion Architecture
A typical Medallion architecture is usually extended with extra components

Metastore DQ store Catalog

Consumers
Compute layer (Spark)
Landing Apps
zone Ingest

Landing Users
zone Ingest

Operational
Ingest
Bronze Silver Gold systems
Sources
Delta Lake format
Orchestration and monitoring services
Detailed examination of the three layers
Although the 3-layered design is common and well-known, there are many discussions on
the scope, purpose, and best practices on each of these layers.

Bronze layer Silver layer Gold layer


Typically raw, "as-is" Cleaned, filtered Refined business-level

• Maintains the raw state in the structure • Usually only functional data • What enterprises call data products
“as-is” • Filtered and cleaned • Data is highly governed and well-
• Mostly different formats • Historization is merged (SCD2) documented
• Data is immutable (read-only) • Delta format • Historization differs per use case
• Delivery-based partitioned tables, i.e., • Usually enriched with reference data • Contains complex business rules, such as
YYYYMMDD • Typically source-oriented calculations and enrichments
• Used for debugging, testing • Usually used by operational analytical • Delta format
teams • Might contain additional sub layers for
sharing or distributing data

7
How the Medallion architecture could look in practice
The key is understanding the strengths and limitations of each layer, which can be
adapted to better align with operational realities and strategic goals

Sources
Compute layer (Spark)
Landing
Ingest
zone
Raw Validated Cleaned Historized Data
export Delta tables tables mart
Curated
Landing
Ingest Raw Validated Cleaned Historized
data
Data
zone export Delta tables tables mart
Curated
data
Data
mart
Raw Validated Cleaned Historized
Ingest export Delta tables tables

Governance boundary
Bronze Silver Gold
Delta Lake format
Now we know what Medallion Architecture is.

Let's answer the question: Can you have


multiple Medallion architectures? And if so,
what happens then?
Let’s start with the consuming side

Compute layer Apps

Data layering Users

Operational
systems

Compute layer Apps

Data layering Users

Operational
systems

Compute layer Apps

Data layering Users

Operational
systems
Bring in the consumers (that usually only need data)
Consumers that are not
producing the data
themselves
Compute layer Apps
Apps
Data layering Users

Operational Apps
systems

Users

Compute layer Apps

Data layering Apps


Users

Operational
systems Users

Apps
Compute layer Apps

Data layering Users Users

Operational
systems Operational
systems
Some logical grouping

Compute layer
Compute layer Apps
Data layering
Data layering
Apps

Users

Compute layer
Apps
Data layering

Users

Apps
Compute layer Compute layer

Data layering Data layering Users

Operational
systems
You can make nuances to the three-layered design

Compute layer
Compute layer Apps
Data layering
Data layering
Apps

Users
Compute layer
Compute layer Data layering
Apps
Data layering

Users

Apps
Compute layer Compute layer

Data layering Data layering Users

Operational
systems
Another best practice: separating use cases from consumption
Apps
Compute layer
Compute layer Apps
Users
Data layering
Data layering Operational
systems Apps

Users
Compute layer
Data layering
Apps
Compute layer
Data layering

Users

Apps
Compute layer
Compute layer
Data layering Users
Data layering

Operational
systems
Scaling problem…
Compute layer
Data layering Compute layer
Apps
Data layering

Apps
Compute layer
Data layering
Users
Compute layer

Compute layer Data layering


Apps
Data layering

Users

Compute layer
Data layering Apps
Compute layer
Data layering Users
Compute layer
Data layering
Operational
systems
Adding extra data domains for addressing complexity

Compute layer Compute layer


Apps
Data layering Data layering

Apps

Compute layer
Data layering Users
Compute layer

Compute layer Data layering


Apps
Compute layer Data layering
Data layering Compute layer
Users
Data layering

Compute layer Apps


Compute layer
Data layering Compute layer
Data layering
Data layering Users

Compute layer
Operational
Data layering systems
These are your data domains!
Solid data governance is the way to go!
Reference & Data Traceability
Marketplace Observability Modelling …
master data contracts (Lineage )

Compute layer Compute layer Apps


Data layering
Data layering
Apps

Compute layer Compute layer


Users
Data layering Data layering
Compute layer
Compute layer
Data layering Apps
Data layering
Compute layer
Apps
Data layering
Compute layer
Compute layer
Data layering Users
Data layering
Compute layer
Operational
Data layering systems
Conclusion

• Medallion architecture provides a starting point, but enterprise guidance is crucial for governance,
interoperability, and scalability.
• Avoid framing the three-layered design as a physical design; nuances must be made based on specific
requirements.
• Medallion architecture is a modular and flexible framework.
• Numerous standards are necessary for cataloging data products, data ownership, data sharing, lineage
collection, data interoperability, etc.
• Consensus is required on the number of domains, the size of domains, and other related aspects.
• A large-scale architecture necessitates additional data domains to facilitate repetitive integration efforts.

You might also like