The medallion architecture is a simple idea: organize your lakehouse into progressively refined layers. Raw data lands in bronze, gets cleaned and conformed in silver, and is shaped for consumption in gold. The simplicity is the point — but the details decide whether it scales.
What each layer is really for
- Bronze: an immutable record of source data, as-is. Cheap to store, easy to reprocess.
- Silver: cleaned, de-duplicated, conformed. The layer most pipelines actually build on.
- Gold: business-level aggregates and models, shaped for specific consumers.
Where teams go wrong
The most common mistake is putting business logic in bronze, which couples ingestion to interpretation and makes reprocessing painful. Keep bronze dumb. Push transformation downstream where it can evolve.
The second is treating gold as a dumping ground. Gold tables should be intentional and owned, with clear contracts for the teams that consume them.