High
throughput data transfer
Targets 95%+ network saturation through control and data plane decoupling, distributed parallel execution, and efficient data handling.
Data Orchestrator
Limestone Data Orchestrator helps AI infrastructure teams move datasets, model weights, containers, artifacts, and logs across fragmented storage and GPU compute environments with high throughput, reliable execution, and clear monitoring.
Data Orchestrator jobs
Control plane view
12.8 PB
found
9.4 PB
copied
37
errors
Problem
AI workloads run across hyperscalers, private clusters, regional capacity, and specialized GPU environments. The data they need is spread across object stores, file systems, databases, data lakes, warehouses, model registries, and local cluster storage.
Teams stitch this together with scripts, cloud-specific tools, and fragile pipelines. When transfers are slow or fail silently, accelerators sit idle and platform engineers end up debugging movement instead of improving infrastructure.
Solution
Limestone provides Data Orchestrator as an orchestration layer for defining, tracking, and operating data movement jobs. The control plane manages state, progress, and errors while deployable workers execute near storage and compute.
Data Orchestrator makes data movement fast, scalable, monitored, reliable, secure, and convenient across file and object storage, private clusters, and public cloud compute. Customers keep control over data residency and execution while reducing the idle time and operational friction around expensive compute.
Product
Data Orchestrator makes data movement fast, reliable, observable, and easier to operate as AI infrastructure becomes more distributed.
It is designed for the workflows between storage and compute: hydrating clusters, returning workload outputs, placing model artifacts near inference capacity, and cleaning up ephemeral environments.
High
Targets 95%+ network saturation through control and data plane decoupling, distributed parallel execution, and efficient data handling.
durable
Accepted jobs are persisted, decomposed into retry-safe work, and tracked through monotonic state transitions.
visible
Progress counters, job state, manifests, and structured errors make long-running movement easy to monitor and reason about.
Copy, scan, and delete workflows are submitted as durable jobs with explicit state, counters, manifests, and structured errors. CLI, Python SDK, and web console surfaces can share one operating model.
do job copy \ --origin s3://training-data/frontier-v4 \ --destination cluster://h100-pool-a/datasets \ --follow-symlinks=false job_id: job_01HX9B7M6R state: RUNNING throughput: 14.2 GB/s files_copied: 18,402,117
Use cases
01
Move datasets, model weights, containers, and artifacts into compute before training or inference, then return generated artifacts, logs, model outputs, and completed run data to durable storage.
02
Place model weights and supporting artifacts near newly available GPU capacity when serving demand increases.
03
Clean up staged data after ephemeral cluster use with durable job state, structured errors, and auditable outcomes.
04
Stream or bulk-move files created by agentic workflows from local or ephemeral storage to durable storage.
Results
FAQ
Limestone is designed around a control plane for defining and observing jobs, with data plane workers that execute near the relevant storage and compute environments.
The intended architecture keeps customer data movement close to customer infrastructure. Data planes access storage and credentials directly so customers retain control over execution and security boundaries.
The product direction includes object storage such as S3, GCS, and Azure Blob; file systems such as NFS, Lustre, Weka, Vast, CephFS, and related backends; and future support for data lakes and data warehouses.
Managed service pricing is listed at $0.20 per GB transferred plus $0.50 per job. Enterprise pricing is available for support, SLAs, private deployment, enhanced observability, and on-premise or self-managed data plane needs.