Product - MLPipeLab Data Pipeline Platform

Process

How the platform works

MLPipeLab handles each stage of the data integration lifecycle with purpose-built tooling for logistics schemas.

01

Source Discovery

MLPipeLab connects to your source systems — WMS, TMS, ERP, or EDI gateway — and crawls the schema automatically. It identifies table structures, primary keys, foreign key candidates, and field value distributions without requiring your team to document the schema first.

Supported connection methods: JDBC/ODBC, REST API with OpenAPI spec, flat-file EDI with X12/EDIFACT parsing, and direct Snowflake or Redshift shares.

02

ML-Assisted Field Mapping

The mapping engine uses a logistics-domain-tuned model trained on schemas from 40+ WMS and TMS vendors. It proposes field alignments with confidence scores. Fields below the confidence threshold are flagged for human review.

Typical result: 83% of fields mapped automatically on the first pass. The remaining 17% are edge cases — custom fields, non-standard unit enumerations — that a logistics data specialist reviews in a focused 45-minute session.

03

Transformation & Validation

Accepted mappings generate a transformation DAG that runs on each pipeline execution. Transformations include: unit normalization (lbs/kg, pallets/cartons), timezone standardization, carrier SCAC code resolution, and null-fill strategies for optional fields.

Each run produces a validation report: row counts per source table, schema drift diffs, referential integrity checks between shipment and order records, and an anomaly score for statistical outliers.

04

Delivery to Destination

Normalized data lands in your configured destination on schedule. Supported destinations: Snowflake, BigQuery, Amazon Redshift, Azure Synapse, and self-hosted Postgres. MLPipeLab handles incremental loads using watermark-based change detection — no full-table refreshes unless required. Continuous drift detection monitors for schema changes after vendor upgrades and alerts your team before pipelines break.

Capabilities

Platform capabilities in depth

WMS Connector Library

MLPipeLab ships with certified connectors for the 12 most common WMS platforms in North American logistics:

Manhattan Associates Active WMS
SAP Extended Warehouse Management (EWM)
Oracle Warehouse Management Cloud
Blue Yonder (formerly JDA) Warehouse Management
HighJump / Körber WMS
Infor CloudSuite WMS
3PL Central (Extensiv)
Fishbowl Warehouse

Connector certification means schema coverage is documented, field mappings are pre-seeded, and known quirks (non-standard date formats, vendor-specific status codes) are handled without custom config.

TMS & Carrier EDI

Transportation data comes from TMS platforms and direct carrier EDI feeds. MLPipeLab supports both:

Blue Yonder TMS (load tender, tracking, invoicing)
MercuryGate TMS
McLeod Software (carrier TMS)
X12 EDI: 204 (Load Tender), 214 (Shipment Status), 210 (Freight Invoice), 856 (Ship Notice)
EDIFACT IFTMIN / IFTSTA equivalents

Carrier-side EDI is normalized into a consistent schema regardless of whether the carrier sends 214s in X12 4010, 5010, or a proprietary flat-file variant.

Pipeline Observability

Every pipeline run is instrumented. The observability layer tracks:

Row counts per source table per run
Extraction latency (time from source query to destination write)
Schema drift diffs (column additions, type changes, renames)
Field-level null rates and statistical distribution shifts
Referential integrity failures (orphaned shipment records, missing SKU references)

Alerts can be routed to Slack, email, or PagerDuty. Alert thresholds are configurable per pipeline.

Security & Access

Logistics data includes PII (shipper names, delivery addresses) and commercially sensitive inventory levels. MLPipeLab addresses this with:

AES-256 encryption at rest for all credentials and staging data
TLS 1.3 for all data in transit
Role-based access control at the pipeline and destination level
Audit logs retained for 12 months (read-only, tamper-evident)
SOC 2 Type II audit in progress (estimated completion Q3 2025)

Data is processed in-region. No cross-region transfers without explicit customer configuration.

Architecture

Technical specifications

Pipeline Execution

Scheduling: Cron-based with configurable granularity (minimum 5-minute intervals on Growth plan)
Throughput: Handles up to 500,000 rows per pipeline run on standard infrastructure. Partitioned pipelines for larger volumes.
Transformation engine: Apache Spark-based for batch. Flink connector available for near-real-time WMS event streams.
Incremental load: Watermark-based (timestamp or sequence ID). Full-refresh mode available per table.

Connectivity & Deployment

Deployment: SaaS (fully managed) or self-hosted agent mode (for on-premises WMS behind firewall)
Agent: Lightweight Go binary, 48MB, runs on any Linux host with outbound HTTPS access
API: REST API for pipeline management, run history, and schema metadata. OpenAPI 3.0 spec published.
Destinations: Snowflake, BigQuery, Redshift, Azure Synapse, Postgres (self-hosted or RDS)

The MLPipeLab pipeline configuration interface

Configure connectors, review field mappings, and monitor pipeline health from a single dashboard.

See it running on your data.

We'll connect to your WMS or TMS, run a live schema discovery, and show you what the normalized output looks like — before you sign anything.

Request a Demo

From raw WMS exports to query-ready tables.