ETL vs. ELT for Logistics Pipelines: The Answer Is Usually "It Depends on Your WMS"

The ELT-first movement has real merits — but logistics source data is often dirty enough that landing it raw creates more cleanup work than transforming it first.

ETL vs ELT comparison diagram

The data engineering community has largely shifted toward ELT (Extract, Load, Transform) as the preferred pattern for modern data pipelines. The argument is compelling: land raw data in a cloud warehouse with essentially unlimited compute, transform it there using SQL with version-controlled dbt models, keep the raw data for reprocessing, and let the warehouse handle the computational heavy lifting instead of a transformation server. For many data domains, this is genuinely the better approach. For logistics data — specifically for WMS data — it's often the wrong choice, and the teams that discover this tend to discover it the hard way.

The ELT Case: Where It Works in Logistics

ELT works well for logistics data sources that produce clean, well-structured output with predictable schemas:

  • Carrier EDI feeds after a normalization layer: once X12 or EDIFACT transactions have been parsed and their content normalized to a common schema, the resulting structured data is well-suited for ELT. Land it in raw tables, transform it in the warehouse with SQL models.
  • TMS data with documented schemas: modern cloud TMS platforms (project44, FourKites, Flexport) expose clean API data with consistent types, enumerations, and timestamp formats. These are ideal ELT candidates.
  • ERP financial data: AP invoice data, general ledger allocations, and vendor master data from ERP systems like SAP S/4HANA via its OData APIs are typically clean enough to land raw and transform in the warehouse.

In all these cases, the source data is clean at the field level. Transformation in the warehouse is about business logic — joining, aggregating, applying business rules — not about cleaning dirty values at the field level. ELT is the appropriate pattern here.

Where ELT Creates More Work Than It Saves in Logistics

WMS systems — particularly legacy WMS platforms and multi-client configurations — produce data that is structurally sound but semantically problematic. The issues described throughout this blog apply here: enumeration expansions, precision truncations, timestamp inconsistencies, LPN/HU naming conflicts, and company code filtering anomalies. When you land this data raw via ELT, you land all the problems into your warehouse, where they become the responsibility of your SQL transformation models.

The Raw Layer Debugging Problem

In a pure ELT architecture, when a data quality issue surfaces in a BI report, you trace it back through the transformation layers to the raw table. If the problem originated in the source system (a semantic change, a batch update that bypassed timestamps, a precision reduction), the raw table has the wrong data. You now need to re-extract from the source and re-land in the raw layer before your transformation models can produce correct output.

In an ETL architecture with transformation before loading, the same issue is caught at the transformation layer before it reaches the destination table. The source extract is re-run, the transformation is reapplied, and the destination is updated — without requiring a raw layer re-land followed by a transformation re-run.

For WMS sources with known data quality patterns, the ETL approach reduces the blast radius of source quality issues from "warehouse raw tables are dirty, all downstream models are wrong" to "transformation job failed, destination tables not updated." The latter is a significantly easier operational situation.

The Cost of Dirty Raw Tables

ELT proponents often emphasize the value of keeping raw data for reprocessing. This is genuinely useful when the raw data is a faithful representation of the source — when you want to reprocess history because your business logic changed, not because your source data was wrong.

For WMS raw data, the "faithful representation of the source" property is less certain. If a WMS batch update bypassed timestamp tracking and you've been missing those records in your incremental loads for three weeks, your raw table doesn't have those records. The raw table isn't a faithful source-of-truth — it's a partial extract. Reprocessing it doesn't recover the missing data; it just reapplies your transformation logic to an incomplete dataset.

The practical implication: for WMS sources with batch timestamp bypass patterns, a periodic full-refresh to reconcile the raw layer is necessary regardless of whether you're using ELT or ETL. The full-refresh frequency and the mechanism for detecting gaps differ, but neither architecture eliminates the need for explicit reconciliation.

A Hybrid Pattern That Works Well in Practice

The most effective pattern for logistics data pipelines isn't a binary choice between ETL and ELT. It's a layered approach that applies transformations at the right stage for the right source type:

Hybrid ETL-ELT pipeline diagram
  • WMS data: ETL with pre-transformation. Apply schema validation, enumeration normalization, precision standardization, and timestamp cleaning before loading to the warehouse. Load clean, validated data to a staging layer. This prevents WMS-quality issues from propagating into the warehouse raw layer.
  • Carrier EDI data: Post-normalization ELT. After the EDI parsing and carrier-specific normalization layer (which is itself a transformation), the normalized output is clean enough for ELT. Land it in raw form, transform with warehouse SQL.
  • TMS and ERP API data: Pure ELT. Modern API sources with clean schemas benefit from the full ELT approach — raw retention, warehouse transformation, dbt-style version control.

This hybrid approach means you're not dogmatically committed to either pattern. The transformation stage placement decision is made per-source based on the data quality characteristics of that specific source — not based on architectural preference.

The dbt Compatibility Question

One practical consideration for teams committed to dbt for transformation management: pre-transformation before loading means the transformation step isn't in dbt, and therefore isn't in the version control and lineage tracking that dbt provides. For teams where dbt represents the entire transformation layer, ETL with pre-transformation creates a split: some transformations are in the dbt project, some are in the extraction pipeline. This requires maintaining lineage documentation outside dbt for the pre-transformation steps.

This is a real operational cost. Whether it outweighs the benefit of keeping dirty WMS data out of the warehouse raw layer depends on the size of the team and the severity of the WMS quality issues. For large teams with dedicated data platform engineers, the split is manageable. For smaller teams where the dbt project is the single source of transformation truth, the complexity of maintaining external lineage documentation may tip the decision toward ELT with more aggressive raw-layer quality checks.

Conclusion

The ETL vs. ELT decision for logistics pipelines is genuinely source-dependent. The modern data engineering default (ELT for everything) produces good results for clean API and EDI sources. It produces more operational complexity than it saves for WMS sources with known data quality patterns. Making this call correctly per source type — rather than applying a blanket architecture — is the difference between a data platform that runs smoothly and one that requires regular manual reconciliation to keep the numbers right.

The goal isn't to be architecturally consistent. The goal is to have accurate logistics data with predictable pipeline behavior. Those goals sometimes point toward different patterns for different sources.

MLPipeLab applies per-source extraction strategies with pre-transformation for WMS sources and configurable raw-load for clean API sources. Request a demo to see how the hybrid pattern works in practice.

Back to Blog