Watermark-Based Incremental Loads in WMS Pipelines: When They Work and When They Don't

September 3, 2025 MLPipeLab Engineering Team Best Practices

Watermark-based incremental loading is the standard recommendation for large-table ETL. The pattern is simple: store the maximum value of a timestamp or sequential ID from the last run, and on the next run, only extract records newer than that value. For most source systems, this works reliably. For WMS systems specifically, it works less reliably than most data engineers assume when they first set it up — and the failures are silent enough to persist for weeks before they're detected.

Why Watermark Loading Is the Default Recommendation

The business case for incremental loading in WMS pipelines is real. A mid-sized distribution center running Manhattan Associates or Oracle WMS might have 10-15 million rows in its primary transaction tables over 12 months of operation. A full-refresh extraction on these tables during operational hours puts measurable load on the WMS database — a system that warehouse floor operations depend on in near-real-time. WMS vendors typically have contractual or SLA-based restrictions on external database queries that can affect operational performance.

Incremental loads using a watermark limit the query window to recent records, reducing extraction time from potentially hours to minutes and keeping database load within acceptable bounds. For warehouses running 24/7 operations, this is often the only way to run frequent pipeline refreshes without operational risk.

The Four Ways Watermark Loading Fails in WMS Systems

1. Batch Process Timestamp Bypass

The most common failure mode. Many WMS operations — cycle count posting, bulk inventory adjustments, load confirmation, physical inventory finalization — are processed by background batch jobs that update records in bulk. These batch processes frequently bypass the normal application logic that updates MODIFIED_DATE or LAST_UPDATED_DTTM, writing directly to the underlying tables via SQL UPDATE statements that don't trigger the standard audit timestamp mechanism.

The result: records that were functionally changed by a batch process show old timestamp values. Your watermark query says "nothing changed since last run" — but 2,000 inventory records were actually updated by a cycle count posting that ran overnight.

Detection: compare row count of last 24 hours in your destination table against row count of last 24 hours in the source table using the same filter. Consistent discrepancies that can't be explained by pipeline timing are a sign of batch timestamp bypass.

2. Clock Skew Between WMS Nodes

Modern WMS deployments often run on clustered database configurations — primary and replica nodes, or multiple application servers writing to the same database. If these nodes have clock skew of more than a few seconds, records written by different nodes will have timestamp values that don't reflect true chronological order.

For watermark-based extraction, this means the maximum timestamp from the last run may be greater than some legitimate records written by a slower-clock node. Those records fall before the watermark and are skipped on the next incremental load, permanently missed unless there's a reconciliation mechanism.

The standard mitigation — use the watermark value minus a safety buffer (e.g., 5-minute lookback) — helps with minor skew but introduces duplicate processing that requires deduplication logic downstream. For large tables, this adds meaningful processing overhead.

3. Backdated Corrections

WMS systems support backdated data corrections for various operational reasons: adjusting an inventory count to reflect what was actually in stock on a prior date, correcting a shipment weight that was incorrectly recorded, applying a retroactive freight rate change. These corrections deliberately set a transaction's effective date to a historical value — but the MODIFIED_DATE timestamp reflects when the correction was made, not the backdated effective date.

For a watermark-based incremental load, a backdated correction appears as a normal incremental record — it has a recent MODIFIED_DATE and will be extracted correctly. However, downstream aggregations built on the effective date (daily inventory totals, monthly shipment summaries) will now have changed for a historical period, and any pre-aggregated data in your destination won't reflect the correction.

This requires a correction-detection pattern: after each incremental load, check whether any records have effective dates significantly older than their MODIFIED_DATE. If so, trigger a selective recomputation of the affected historical periods rather than assuming only current-period data was changed.

4. DST Transition Gaps and Doubles

Daylight saving time transitions create two distinct problems for watermark-based WMS extraction. In the spring forward, a one-hour window of time codes "doesn't exist" — records created during that hour by the WMS system (which may be running in server local time) get timestamps that are 1 hour ahead of what they'd be in standard time, creating an apparent gap in the incremental load. In the fall back, a one-hour window of time codes exists twice — records from the "extra hour" may appear as duplicates or as earlier than their actual creation order, depending on how the WMS handles the repeated time window.

WMS systems are notoriously inconsistent on DST handling. Some store all times in UTC internally and convert on display; some store local time with no timezone annotation; some handle the transitions correctly some of the time. The safest approach is to treat the 2-hour window around each DST transition as a full-refresh candidate rather than trusting the watermark for that window.

Tables Where Full-Refresh Is Generally the Safer Choice

Based on the failure patterns above, certain WMS table categories are better served by scheduled full-refreshes than by watermark-based incrementals — particularly when those tables have complex update patterns or are subject to batch processing.

Good candidates for full-refresh:

Inventory summary tables — subject to bulk batch updates from cycle counts, adjustment postings, and load confirmations. Full-refresh every 2-4 hours during operational hours is safer than watermark incremental.
Item master and item facility tables — update frequency is low, but when they do update (new season SKU additions, configuration changes), batch-mode updates that bypass timestamps are common. Full-refresh daily is the safer default.
Rate and billing tables — subject to backdated corrections from carrier invoice reconciliation. Full-refresh for these tables prevents historical aggregation errors.

Good candidates for watermark incremental:

Pick task and receipt detail tables — records are generally insert-only once completed, with timestamps that reflect actual creation time. Watermark incrementals work reliably here.
Shipment header tables — for outbound shipments past the load confirmation stage, records are typically stable. Watermark incrementals capture new shipments efficiently.
Carrier EDI transaction records — EDI feeds are insertion-dominant, with timestamps generated by the EDI processing layer rather than WMS batch jobs. Watermark incrementals are reliable here.

A Practical Decision Framework

Before choosing incremental vs. full-refresh for any WMS table, answer four questions:

Does this table receive batch updates from overnight or background processes? If yes, watermark incrementals risk missing batch-updated records.
Is this table subject to backdated corrections for any business reason? If yes, downstream aggregations need correction-detection logic regardless of extraction strategy.
What is the table's row count and growth rate? If it's under 5M rows and growing slowly, full-refresh is operationally viable and simpler to reason about.
What is the pipeline refresh frequency requirement? If you need data every 15 minutes, full-refresh on large tables isn't feasible and watermark incrementals (with their tradeoffs) may be the only option.

Conclusion

Watermark-based incremental loading is an appropriate strategy for WMS pipelines — but not a universal one. The default assumption that any table with a MODIFIED_DATE column is safely incrementally loadable is incorrect in WMS environments where batch processing, clustered databases, and backdated corrections are common. Treating the extraction strategy decision as a per-table analysis rather than a blanket architecture choice leads to pipelines that are both more reliable and easier to debug when issues arise.

The practical outcome of getting this right: fewer pipeline investigations that trace back to missing records, and less accumulation of silent inaccuracies in downstream inventory and shipment reports.

MLPipeLab's connector configuration layer lets you specify per-table extraction strategies — incremental, full-refresh, or hybrid — with automated batch-update detection for compatible WMS systems. Request a demo to see how it works.

Back to Blog

Watermark-Based Incremental Loads in WMS Data Pipelines: When They Work and When They Don't