Jonan Harrewijn

The Problem With Traditional ETL

Every new data source means new custom code. Every schema change breaks something. Every deployment is a manual ritual. Sound familiar?

Copy-Paste Engineering

Each new source gets a near-identical notebook. Multiply that by 10 clients, and you're maintaining a maintenance nightmare. One schema change means touching every file manually.

Silent Failures

Data quality errors get buried. A null value slips through, a format changes upstream, and nobody knows until the report is wrong. By then the damage is done.

Risky Deployments

Promoting from DEV to production is a manual checklist. Changes are undocumented. Rollbacks are painful. Every release is a gamble.

The Framework

Four layers. One consistent flow. Configured in code — not wired by hand every time.

Layer 1

The Registry

The single source of truth for your entire data platform. Define each source table once — schema, column mappings, data quality rules, merge strategy, Gold transformations, dependencies — all in one Python object.

✓ Add a new source with a config entry, not a new notebook
✓ Schema changes propagate automatically
✓ Every pipeline behaviour version-controlled and auditable

Layers 2 & 3

Silver & Gold Lakehouses

Raw data lands in Bronze as-is. The Silver layer standardises, validates, and deduplicates it. The Gold layer transforms Silver into Power BI-ready dimensional models — star schemas with aggregations and business logic already applied.

✓ Bronze: raw, immutable source records
✓ Silver: cleaned, typed, deduplicated
✓ Gold: dimensional models optimised for fast reporting

Built-in

Enterprise Data Quality

Six rule types enforced automatically on every run: type casting, regex patterns, value ranges, uniqueness checks, row count thresholds, and Dutch BSN checksum validation. Blocking rules stop bad data; non-blocking rules log warnings with a full audit trail.

✓ Every DQ violation logged with context
✓ PII masked at column level (hash or partial)
✓ Compliance-grade for healthcare & finance

Layer 4

Automatic Orchestration

The orchestration engine reads the Registry, builds a dependency graph automatically, and executes parallel batches of up to 8 jobs. Tables run in the right order, every time, without manual coordination. SCD Type 2 history tracking is a config option, not a custom build.

✓ DAG-based dependency resolution
✓ OVERWRITE, MERGE, MERGE_HASH, SCD Type 2
✓ Automated DEV → TST, manual-gated PRD promotion

How It Works

From raw source data to a Power BI dashboard — without writing a single one-off pipeline.

1

Register

Define your source system in the Registry — name, schema, load type, quality rules, transformations. One config object, everything declared.

2

Orchestrate

The framework reads the Registry, resolves dependencies, and runs Bronze → Silver → Gold automatically in parallel batches. No manual wiring.

3

Report

Power BI connects directly to Gold layer semantic models. Dashboards refresh on schedule. Stakeholders get accurate, up-to-date data every time.

What This Delivers For Your Organisation

The framework was designed with enterprise outcomes in mind — not just developer convenience.

Hours

To onboard a new data source

Add a Registry entry, run the pipeline. No new notebooks, no new deployments.

Zero

Silent data quality failures

Every run logged. Every DQ violation captured. Your audit trail is always complete.

Full

History retention with SCD Type 2

Track how every record changed over time. Retroactive analysis and regulatory compliance built in.

Auto

Schema evolution

Upstream added a column? The framework detects and applies it automatically. No emergency fixes.

Built-in

PII masking & privacy compliance

Sensitive columns hashed or partially masked at the framework level. AVG/GDPR-ready by design.

Gated

Production deployments

DEV → TST automated via Azure Pipelines. ACC → PRD requires a manual approval gate. No surprises in production.

Production-Proven

Dutch Healthcare — MedMij Integration Platform

The framework powers a live healthcare data integration platform in the Netherlands, ingesting multi-format source data from provider systems, enforcing healthcare-grade data quality rules (including BSN checksum validation), and serving dimensional models for downstream analytics — all running automatically on Microsoft Fabric.

Healthcare Multi-format ingestion DQ enforcement Dimensional modelling Microsoft Fabric

Built on a Certified Stack

The framework is built on Microsoft Fabric and the broader Azure data platform — and Jonan is certified across the full stack.

Microsoft Fabric PySpark Delta Lake Azure Pipelines Azure Key Vault Azure DevOps Python Power BI

DP-600 · DP-700 · DP-203 · PL-300 · AZ-900

Define it once.
Run it everywhere.

The Problem With Traditional ETL

Copy-Paste Engineering

Silent Failures

Risky Deployments

The Framework

The Registry

Silver & Gold Lakehouses

Enterprise Data Quality

Automatic Orchestration

How It Works

Register

Orchestrate

Report

What This Delivers For Your Organisation

To onboard a new data source

Silent data quality failures

History retention with SCD Type 2

Schema evolution

PII masking & privacy compliance

Production deployments

Production-Proven

Dutch Healthcare — MedMij Integration Platform

Built on a Certified Stack

Ready to deploy?

Define it once.Run it everywhere.

The Problem With Traditional ETL

Copy-Paste Engineering

Silent Failures

Risky Deployments

The Framework

The Registry

Silver & Gold Lakehouses

Enterprise Data Quality

Automatic Orchestration

How It Works

Register

Orchestrate

Report

What This Delivers For Your Organisation

To onboard a new data source

Silent data quality failures

History retention with SCD Type 2

Schema evolution

PII masking & privacy compliance

Production deployments

Production-Proven

Dutch Healthcare — MedMij Integration Platform

Built on a Certified Stack

Ready to deploy?

Define it once.
Run it everywhere.