Lakehouse migration: from Databricks to Snowflake for a European media company

The situation

A European media company had built their analytics platform on Databricks. It had worked during a period of rapid growth, but by the time I came in the monthly cost was huge, the data was a mess, and the platform was ungoverned. The cost was high because clusters ran around the clock instead of spinning up on demand. There were many data scientists working off 600 plus tables spread across Delta Lake, Parquet, and JSON formats, with no central catalog and no clear lineage.

The main problem was trust. When quarterly reports took two hours and the numbers sometimes differed depending on who ran them, teams started routing around the platform instead of using it. That’s usually where things stand when management notices a problem.

What I did

Assessment

Profiled all 200+ notebooks to understand compute vs. storage usage patterns. Built a cost model comparing the two platforms under realistic usage assumptions. Designed a hybrid architecture: Snowflake as the compute layer over the existing storage, which avoided duplicating data and reduced migration risk.

Foundation and tooling

Set up multi-cluster Snowflake warehouses with separate resource pools for ETL, analytics, and reporting workloads. Built migration pipelines with dbt and Airflow. Added data validation and reconciliation at each step. Most migrations fail here because teams assume data arrived correctly without checking.

Incremental migration

Started with high-traffic datasets: user behavior and content metadata. Both platforms ran in parallel through the transition. I decommissioned Databricks notebooks gradually as Snowflake equivalents proved out, rather than doing a hard cutover.

Optimisation

Right-sized compute based on actual usage patterns. Added materialized views for common reporting aggregations and set up cost monitoring and alerts.

Results

Monthly infrastructure dropped from €5K to around €3K (a 40% reduction). Most of the savings came from moving away from always-on clusters to Snowflake’s per-second billing.

Query performance roughly tripled on typical analytical workloads. Quarterly reports went from two hours to fifteen minutes. Concurrent analyst capacity doubled.

Data lineage is now fully tracked and auditable. There’s a central catalog and the self-service analytics that had been the original promise of the Databricks setup started getting used, once people could find data and trust it.

What I’d do differently

I spent more time on the cost model at the start than I needed to. The architecture decision was right, but I could have reached it faster and spent that time on better tooling for schema evolution edge cases. There were Delta Lake schema changes mid-migration that required manual intervention I hadn’t fully planned for.

Client details anonymized. Metrics are from the actual project.

Weighing a Databricks to Snowflake move and want the trade-offs mapped honestly before you commit? That is the kind of review I run. Get in touch.