WMS timestamp fields are notoriously unreliable. The cases where full-refresh is actually the safer choice.
Watermark-based incremental loading is the standard recommendation for large-table ETL. The pattern is simple: store the maximum value of a timestamp or sequential ID from the last run, and on the next run, only extract records newer than that value. For most source systems, this works reliably. For WMS systems specifically, it works less reliably than most data engineers assume when they first set it up — and the failures are silent enough to persist for weeks before they're detected.
The business case for incremental loading in WMS pipelines is real. A mid-sized distribution center running Manhattan Associates or Oracle WMS might have 10-15 million rows in its primary transaction tables over 12 months of operation. A full-refresh extraction on these tables during operational hours puts measurable load on the WMS database — a system that warehouse floor operations depend on in near-real-time. WMS vendors typically have contractual or SLA-based restrictions on external database queries that can affect operational performance.
Incremental loads using a watermark limit the query window to recent records, reducing extraction time from potentially hours to minutes and keeping database load within acceptable bounds. For warehouses running 24/7 operations, this is often the only way to run frequent pipeline refreshes without operational risk.
The most common failure mode. Many WMS operations — cycle count posting, bulk inventory adjustments, load confirmation, physical inventory finalization — are processed by background batch jobs that update records in bulk. These batch processes frequently bypass the normal application logic that updates MODIFIED_DATE or LAST_UPDATED_DTTM, writing directly to the underlying tables via SQL UPDATE statements that don't trigger the standard audit timestamp mechanism.
The result: records that were functionally changed by a batch process show old timestamp values. Your watermark query says "nothing changed since last run" — but 2,000 inventory records were actually updated by a cycle count posting that ran overnight.
Detection: compare row count of last 24 hours in your destination table against row count of last 24 hours in the source table using the same filter. Consistent discrepancies that can't be explained by pipeline timing are a sign of batch timestamp bypass.
Modern WMS deployments often run on clustered database configurations — primary and replica nodes, or multiple application servers writing to the same database. If these nodes have clock skew of more than a few seconds, records written by different nodes will have timestamp values that don't reflect true chronological order.
For watermark-based extraction, this means the maximum timestamp from the last run may be greater than some legitimate records written by a slower-clock node. Those records fall before the watermark and are skipped on the next incremental load, permanently missed unless there's a reconciliation mechanism.
The standard mitigation — use the watermark value minus a safety buffer (e.g., 5-minute lookback) — helps with minor skew but introduces duplicate processing that requires deduplication logic downstream. For large tables, this adds meaningful processing overhead.
WMS systems support backdated data corrections for various operational reasons: adjusting an inventory count to reflect what was actually in stock on a prior date, correcting a shipment weight that was incorrectly recorded, applying a retroactive freight rate change. These corrections deliberately set a transaction's effective date to a historical value — but the MODIFIED_DATE timestamp reflects when the correction was made, not the backdated effective date.
For a watermark-based incremental load, a backdated correction appears as a normal incremental record — it has a recent MODIFIED_DATE and will be extracted correctly. However, downstream aggregations built on the effective date (daily inventory totals, monthly shipment summaries) will now have changed for a historical period, and any pre-aggregated data in your destination won't reflect the correction.
This requires a correction-detection pattern: after each incremental load, check whether any records have effective dates significantly older than their MODIFIED_DATE. If so, trigger a selective recomputation of the affected historical periods rather than assuming only current-period data was changed.
Daylight saving time transitions create two distinct problems for watermark-based WMS extraction. In the spring forward, a one-hour window of time codes "doesn't exist" — records created during that hour by the WMS system (which may be running in server local time) get timestamps that are 1 hour ahead of what they'd be in standard time, creating an apparent gap in the incremental load. In the fall back, a one-hour window of time codes exists twice — records from the "extra hour" may appear as duplicates or as earlier than their actual creation order, depending on how the WMS handles the repeated time window.
WMS systems are notoriously inconsistent on DST handling. Some store all times in UTC internally and convert on display; some store local time with no timezone annotation; some handle the transitions correctly some of the time. The safest approach is to treat the 2-hour window around each DST transition as a full-refresh candidate rather than trusting the watermark for that window.
Based on the failure patterns above, certain WMS table categories are better served by scheduled full-refreshes than by watermark-based incrementals — particularly when those tables have complex update patterns or are subject to batch processing.
Good candidates for full-refresh:
Good candidates for watermark incremental:
Before choosing incremental vs. full-refresh for any WMS table, answer four questions:
Watermark-based incremental loading is an appropriate strategy for WMS pipelines — but not a universal one. The default assumption that any table with a MODIFIED_DATE column is safely incrementally loadable is incorrect in WMS environments where batch processing, clustered databases, and backdated corrections are common. Treating the extraction strategy decision as a per-table analysis rather than a blanket architecture choice leads to pipelines that are both more reliable and easier to debug when issues arise.
The practical outcome of getting this right: fewer pipeline investigations that trace back to missing records, and less accumulation of silent inaccuracies in downstream inventory and shipment reports.
MLPipeLab's connector configuration layer lets you specify per-table extraction strategies — incremental, full-refresh, or hybrid — with automated batch-update detection for compatible WMS systems. Request a demo to see how it works.