Building Medallion Architectures
Piethein Strengholt
Piethein Strengholt
Introducing myself
Enterprise Architect for Data & AI
O’Reilly Author - Data Management at
Scale & Building Medallion Architectures
Blogger:
https://www.linkedin.com/in/pietheinstrengholt
https://piethein.medium.com
Why I wrote a book about this?
Motivations for Writing a Book to Demystify Implementation Challenges and Best Practices
• Ambiguity surrounding what Medallion Architecture is and isn’t.
• Endless conversations about the objectives of each layer.
• Practitioners find it difficult to implement this architecture correctly.
• Limited resources available for best practices, examples, code snippets, etc.
• Evolving concept with large language models (LLMs) and the need for managing
unstructured data.
• Complexity increases when implementing this architecture at scale.
What is a Medallion Architecture?
Medallion architecture arranges data into three layers, enhancing the data’s structure
and quality as it progresses through the layers.
Sources Consumers
Compute layer (Spark)
Apps
Data layering
Users
Bronze Silver Gold
Operational
Data ingestion & orchestration services systems
Data platform
The three-layered design isn’t new
Separating ingestion, transformation and consumption was already a best practice for
many year
Staging area
(ingest) Data warehouse
Sources Consume
Consuming
ODS Transform Data mart services
Consuming
Data Warehouse Data mart services
Consuming
Data mart services
5
More detailed examination of the Medallion Architecture
A typical Medallion architecture is usually extended with extra components
Metastore DQ store Catalog
Consumers
Compute layer (Spark)
Landing Apps
zone Ingest
Landing Users
zone Ingest
Operational
Ingest
Bronze Silver Gold systems
Sources
Delta Lake format
Orchestration and monitoring services
Detailed examination of the three layers
Although the 3-layered design is common and well-known, there are many discussions on
the scope, purpose, and best practices on each of these layers.
Bronze layer Silver layer Gold layer
Typically raw, "as-is" Cleaned, filtered Refined business-level
• Maintains the raw state in the structure • Usually only functional data • What enterprises call data products
“as-is” • Filtered and cleaned • Data is highly governed and well-
• Mostly different formats • Historization is merged (SCD2) documented
• Data is immutable (read-only) • Delta format • Historization differs per use case
• Delivery-based partitioned tables, i.e., • Usually enriched with reference data • Contains complex business rules, such as
YYYYMMDD • Typically source-oriented calculations and enrichments
• Used for debugging, testing • Usually used by operational analytical • Delta format
teams • Might contain additional sub layers for
sharing or distributing data
7
How the Medallion architecture could look in practice
The key is understanding the strengths and limitations of each layer, which can be
adapted to better align with operational realities and strategic goals
Sources
Compute layer (Spark)
Landing
Ingest
zone
Raw Validated Cleaned Historized Data
export Delta tables tables mart
Curated
Landing
Ingest Raw Validated Cleaned Historized
data
Data
zone export Delta tables tables mart
Curated
data
Data
mart
Raw Validated Cleaned Historized
Ingest export Delta tables tables
Governance boundary
Bronze Silver Gold
Delta Lake format
Now we know what Medallion Architecture is.
Let's answer the question: Can you have
multiple Medallion architectures? And if so,
what happens then?
Let’s start with the consuming side
Compute layer Apps
Data layering Users
Operational
systems
Compute layer Apps
Data layering Users
Operational
systems
Compute layer Apps
Data layering Users
Operational
systems
Bring in the consumers (that usually only need data)
Consumers that are not
producing the data
themselves
Compute layer Apps
Apps
Data layering Users
Operational Apps
systems
Users
Compute layer Apps
Data layering Apps
Users
Operational
systems Users
Apps
Compute layer Apps
Data layering Users Users
Operational
systems Operational
systems
Some logical grouping
Compute layer
Compute layer Apps
Data layering
Data layering
Apps
Users
Compute layer
Apps
Data layering
Users
Apps
Compute layer Compute layer
Data layering Data layering Users
Operational
systems
You can make nuances to the three-layered design
Compute layer
Compute layer Apps
Data layering
Data layering
Apps
Users
Compute layer
Compute layer Data layering
Apps
Data layering
Users
Apps
Compute layer Compute layer
Data layering Data layering Users
Operational
systems
Another best practice: separating use cases from consumption
Apps
Compute layer
Compute layer Apps
Users
Data layering
Data layering Operational
systems Apps
Users
Compute layer
Data layering
Apps
Compute layer
Data layering
Users
Apps
Compute layer
Compute layer
Data layering Users
Data layering
Operational
systems
Scaling problem…
Compute layer
Data layering Compute layer
Apps
Data layering
Apps
Compute layer
Data layering
Users
Compute layer
Compute layer Data layering
Apps
Data layering
Users
Compute layer
Data layering Apps
Compute layer
Data layering Users
Compute layer
Data layering
Operational
systems
Adding extra data domains for addressing complexity
Compute layer Compute layer
Apps
Data layering Data layering
Apps
Compute layer
Data layering Users
Compute layer
Compute layer Data layering
Apps
Compute layer Data layering
Data layering Compute layer
Users
Data layering
Compute layer Apps
Compute layer
Data layering Compute layer
Data layering
Data layering Users
Compute layer
Operational
Data layering systems
These are your data domains!
Solid data governance is the way to go!
Reference & Data Traceability
Marketplace Observability Modelling …
master data contracts (Lineage )
Compute layer Compute layer Apps
Data layering
Data layering
Apps
Compute layer Compute layer
Users
Data layering Data layering
Compute layer
Compute layer
Data layering Apps
Data layering
Compute layer
Apps
Data layering
Compute layer
Compute layer
Data layering Users
Data layering
Compute layer
Operational
Data layering systems
Conclusion
• Medallion architecture provides a starting point, but enterprise guidance is crucial for governance,
interoperability, and scalability.
• Avoid framing the three-layered design as a physical design; nuances must be made based on specific
requirements.
• Medallion architecture is a modular and flexible framework.
• Numerous standards are necessary for cataloging data products, data ownership, data sharing, lineage
collection, data interoperability, etc.
• Consensus is required on the number of domains, the size of domains, and other related aspects.
• A large-scale architecture necessitates additional data domains to facilitate repetitive integration efforts.