Tutorial

Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud: Synthetic Data Generation for Robotics and Physical AI (2026 Guide)

Back to BlogWritten by Mitrasish, Co-founderApr 12, 2026
NVIDIA CosmosSynthetic DataPhysical AIRoboticsGPU CloudH100H200
Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud: Synthetic Data Generation for Robotics and Physical AI (2026 Guide)

Robotics teams spend more time collecting training data than training models. Real-world data collection for manipulation tasks or autonomous navigation costs hundreds of dollars per clip once you factor in operators, environments, and labeling. NVIDIA Cosmos changes the math by generating photorealistic synthetic video of physical environments at a fraction of that cost. On Spheron's on-demand H100 instances, you can run a full Cosmos inference pipeline without hyperscaler pricing. For a broader look at GPU selection for these workloads, see the GPU requirements cheat sheet for 2026.

What Is NVIDIA Cosmos: World Foundation Models for Physical AI

Cosmos is a family of world foundation models (WFMs) designed to generate physically plausible video of real-world environments. Unlike general-purpose video generation models, Cosmos is trained specifically on physical scenes: factories, warehouses, roads, outdoor environments, and manipulation workspaces. The outputs are used as synthetic training data for robots and autonomous vehicles.

Three model families ship in the current release. Cosmos-Predict handles video generation using both diffusion-based and autoregressive architectures. It takes text prompts or video conditioning inputs and produces photorealistic clips of physical environments. Cosmos-Transfer handles style and domain transfer, adapting existing video to target visual domains. Cosmos-Reason is a transformer-based video understanding model used for annotation and scene analysis. All families are distributed under the NVIDIA Open Model License (source code is Apache 2.0), which permits commercial use with attribution but is not a fully open license. You must accept the license on Hugging Face before downloading weights.

Models are available via Hugging Face (under the nvidia/ organization) and through NVIDIA's NGC registry. The NVIDIA Open Model License terms are shown during the HF access request flow.

GPU Requirements for Cosmos Model Variants: VRAM, Compute, and Storage

Cosmos-Predict models are VRAM-hungry. The 7B variant fits on a single 80GB GPU at full precision. The 14B variant typically needs either two 80GB GPUs running tensor parallelism or a single H200 with 141GB HBM3e memory at full precision. With aggressive model offloading you can reduce VRAM requirements to around 39GB, though inference speed drops significantly. Cosmos-Reason1 is less demanding and can run on 40GB A100s (note: architecture support may vary by version, with newer Reason releases targeting Hopper and Blackwell).

Model VariantVRAM RequiredRecommended GPUMinimum GPUStorage
Cosmos-Predict1-7B-Text2World80GB1x H100 SXM51x H100 PCIe~50GB weights
Cosmos-Predict1-14B-Text2World80GB2x H100 SXM51x H100 80GB~100GB weights
Cosmos-Predict1-7B-Video2World80GB1x H100 SXM51x H100 PCIe~50GB weights
Cosmos-Predict1-14B-Video2World80GB2x H100 SXM51x H100 80GB~100GB weights
Cosmos-Reason1-7B40GB1x A100 80GB1x A100 40GB~15GB weights

_14B fits on a single 80GB GPU with model offloading. Multi-GPU recommended for production throughput._

Multi-GPU 14B deployments use tensor parallelism. Inter-GPU bandwidth matters here. For multi-node setups, see the multi-node GPU training guide for networking considerations when NVLink is not available.

Storage is often overlooked. Model weights alone consume 50-100GB. Add output buffers for video frames and the temporary tensors during generation, and you need at least 200GB NVMe SSD per instance. Spin up H100 PCIe or H200 instances from Spheron's H100 GPU rental page with attached NVMe included.

Self-Hosting Cosmos vs Using NVIDIA's API: Cost and Control Tradeoffs

NVIDIA offers Cosmos inference as a managed API, which is the fastest way to run your first generation. You submit a prompt via a REST call, get back a video, and pay per generation. No infrastructure to manage. As of April 2026, NVIDIA has not published per-call pricing for Cosmos on their API; it is available through NVIDIA Cloud Functions and early enterprise agreements. The tradeoff: prompts and output data leave your perimeter, environment customization is limited, and at scale the per-call cost will exceed self-hosted GPU-hours.

Self-hosting on GPU cloud gives you the opposite profile. Upfront setup takes a few hours. After that, every GPU-hour is fully utilized by your pipeline with no per-token or per-generation markup. Your prompts and outputs stay on your infrastructure. You can run custom environments, modify inference parameters, and batch at any scale. The only constraint is GPU availability, which Spheron handles with on-demand provisioning.

The cost case for self-hosting becomes clear quickly. At $2.01/hr for an H100 PCIe on Spheron, generating 1,000 clips with an average 30-minute generation time per clip uses 500 GPU-hours, costing roughly $1,005 total. For teams generating more than a few hundred clips per month, self-hosted on-demand GPU will consistently beat managed API pricing once NVIDIA publishes Cosmos API rates.

Step-by-Step: Deploy Cosmos on GPU Cloud with Docker and NVIDIA Container Toolkit

Prerequisites

Before starting, you need:

  • GPU instance with NVIDIA driver 550+ and CUDA 12.4+ installed
  • Docker 24.0+
  • NVIDIA Container Toolkit
  • NGC account with API key (ngc.nvidia.com)
  • Hugging Face account with Cosmos model access approved (accept the NVIDIA Open Model License on the model page)
  • 200GB+ NVMe storage for weights and outputs

Step 1: Provision a GPU Instance

Rent via the Spheron dashboard. For the 7B models, a single H100 PCIe 80GB is the minimum and works well. For the 14B models or faster generation, use 2x H100 SXM5 or a single H200 SXM5.

bash
# After SSH-ing into your instance, verify the GPU is visible
nvidia-smi

# Check CUDA version
nvcc --version

# Confirm available NVMe storage
df -h /mnt

H200 SXM5 on-demand instances are subject to availability. Check current GPU pricing and availability before provisioning.

Step 2: Install NVIDIA Container Toolkit

bash
# On Ubuntu 22.04
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor \
  -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify
nvidia-ctk --version

Step 3: Authenticate with NGC and Pull the Cosmos Container

bash
# Log in to NVIDIA's container registry
# Username: $oauthtoken (literal string)
# Password: your NGC API key
docker login nvcr.io

docker pull nvcr.io/nim/nvidia/cosmos-predict1-7b-text2world:1.0.0

This uses the NIM (NVIDIA Inference Microservices) container path with the /nim/ prefix and a model-specific name and version tag. Verify the latest tag against NVIDIA's NGC catalog at ngc.nvidia.com before pulling, as newer model variants may have different image names.

Step 4: Download Cosmos Model Weights

bash
pip install huggingface_hub
huggingface-cli login  # enter your HF token when prompted

# 7B text-to-world model (~50GB)
huggingface-cli download nvidia/Cosmos-Predict1-7B-Text2World \
  --local-dir /mnt/weights/cosmos-7b

# Optional: 14B model (~100GB, requires 2x H100 SXM5 or 1x H200)
huggingface-cli download nvidia/Cosmos-Predict1-14B-Text2World \
  --local-dir /mnt/weights/cosmos-14b

The NVIDIA Open Model License terms are enforced at download time. If your HF account has not accepted the license, the download will fail with a 403 error. Accept the license at the model page on Hugging Face first.

Step 5: Run Cosmos Inference

NIM containers are microservices. You start the container, wait for it to become ready, then send HTTP requests to its REST API. Do not override the entrypoint.

Start the 7B container and expose its API on port 8000:

bash
docker run --rm --gpus all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v /mnt/weights:/opt/nim/.cache \
  -p 8000:8000 \
  nvcr.io/nim/nvidia/cosmos-predict1-7b-text2world:1.0.0

In a separate terminal, wait for the service to be ready then send a generation request:

bash
# Poll until the service is ready (may take a few minutes on first run)
curl --retry 20 --retry-delay 10 --retry-connrefused --fail-with-body --retry-all-errors \
  http://localhost:8000/v1/health/ready

# Generate a video via the REST API
curl -X POST http://localhost:8000/v1/infer \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A warehouse floor with pallets moving on autonomous forklifts, overhead lighting, concrete floor",
    "resolution": "1280x720",
    "num_frames": 120
  }'

For the 14B model on a 2-GPU setup, use --gpus '"device=0,1"' to assign both GPUs explicitly and reference the 14B image:

bash
docker run --rm --gpus '"device=0,1"' \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v /mnt/weights:/opt/nim/.cache \
  -p 8000:8000 \
  nvcr.io/nim/nvidia/cosmos-predict1-14b-text2world:1.0.0

Then query the same API endpoint on port 8000 as shown above. The --gpus all flag works for single-GPU deployments. Use "device=0" for explicit single-GPU control.

Generating Synthetic Training Data: Warehouse, Factory, and Driving Environments

Prompt engineering for Cosmos is different from general video generation. Physical accuracy matters more than aesthetic appeal. Describe the environment precisely: floor materials, lighting type, fixture positions, background objects, and any moving elements. Here are three prompt templates that work well:

Warehouse:

A high-bay warehouse with shelving racks, a mobile robot base navigating narrow aisles,
fluorescent overhead lighting, concrete floor with painted safety lanes, depth visible
in background shelving, ambient dust particles in light beams

Factory:

An automotive assembly line with robotic arm welders, overhead conveyors, parts bins,
bright industrial lighting, metal surfaces with reflections, sparks from welding,
yellow safety barriers visible at frame edges

Driving (suburban):

A suburban intersection at dawn, lane markings, traffic signs, parked vehicles on both
sides, wet road surface after rain, street lights still on, early morning light
from the east casting long shadows

The Video2World variant lets you condition generation on a real base clip. You provide a short real-world video and Cosmos generates photorealistic variants of the same scene. This is useful for sim-to-real transfer: anchor synthetic data to your actual deployment environment to reduce the domain gap.

Most robotics teams need between 10,000 and 100,000 clips per task type to see meaningful policy improvement. For general video AI GPU context, the AI video generation GPU guide covers VRAM and cost tradeoffs across generation models.

The Physical AI Data Factory Blueprint: End-to-End Pipeline Architecture

NVIDIA announced the Physical AI Data Factory Blueprint at GTC 2026. At its core, it connects Cosmos components (Curator for data curation, Cosmos-Transfer for domain adaptation, and Cosmos-Reason/Evaluator for quality assessment) with NVIDIA OSMO as the orchestration layer. Omniverse and Isaac Sim are part of NVIDIA's broader Physical AI ecosystem and can integrate with the pipeline, but the blueprint's primary components are the Cosmos modules and OSMO.

[Text / Scene Spec]
        |
        v
  Cosmos Curator         <-- data ingestion, filtering, processing
  (data pipeline)
        |
        v
  Cosmos-Transfer        <-- domain and style adaptation of video
  (video adaptation)
        |
        v
  Cosmos-Reason /        <-- quality evaluation, scene analysis,
  Evaluator              <-- annotation for downstream training

Orchestration: NVIDIA OSMO manages compute orchestration across the pipeline stages, coordinating job scheduling and scaling on GPU clusters.

Integration with simulation: For teams using NVIDIA Omniverse and Isaac Sim, Cosmos outputs can feed into those tools for scene authoring, physics simulation, and reinforcement learning training loops. This broader integration is documented in NVIDIA's Physical AI Data Factory Blueprint documentation at developer.nvidia.com.

The entire pipeline runs on GPU cloud. No on-premise cluster is required. Each stage can scale independently: run more Cosmos generation instances when building a new dataset, then scale down while training runs on the same GPU budget.

Integrating Cosmos with Omniverse and Isaac Sim for Robotics Workflows

Cosmos and Omniverse operate at different points in the pipeline. Cosmos generates appearance: photorealistic video that looks like the real world. Omniverse handles physics and ground truth: it takes that appearance data and embeds it in physically simulated scenes where robot actions can be evaluated.

The handoff works in two directions. First, you use Cosmos to generate large volumes of photorealistic reference clips for an environment class (e.g., "warehouse with diverse lighting conditions"). These clips are imported into Omniverse as texture and appearance references, helping the simulated environment look realistic. Second, you can author a scene in Omniverse with precise asset placement and export that scene geometry as a video conditioning input to Cosmos-Predict's Video2World variant, generating appearance-varied versions of your specific Omniverse scene.

Isaac Sim connects after Omniverse: it runs physics simulation and domain randomization on top of the Omniverse scenes, generating the actual robot training trajectories. Isaac Lab provides the reinforcement learning scaffolding. Full documentation for this integration is available in NVIDIA's Physical AI Data Factory Blueprint documentation at developer.nvidia.com.

Cost Analysis: Synthetic Data Generation vs Real-World Data Collection on GPU Cloud

Data SourceCost per 10-sec clipClips per day (1x GPU)Notes
Cosmos on H100 PCIe (Spheron, on-demand)~$1.00-$1.50~32-48$2.01/hr, ~30-45 min/clip
Cosmos on H100 SXM5 (Spheron, spot)~$0.40-$0.60~32-48$0.80/hr spot
Cosmos on H200 SXM5 (Spheron, on-demand)~$0.38-$0.76~144-288$4.54/hr, ~5-10 min/clip
AWS p4d.24xlarge (8x A100)~$32/hr totalFaster, but 16-32x costOn-demand only
Real-world robot data collection$50-$500 per clipN/AIncludes operators, environments, labeling

Pricing fluctuates based on GPU availability. The prices above are based on 12 Apr 2026 and may have changed. Check current GPU pricing for live rates.

The H200 justifies its higher hourly rate through faster generation. At ~5-10 minutes per clip versus 30-45 minutes on an H100 PCIe, you get roughly 4-6x more clips per hour. For teams generating 5,000+ clips per month, the H200 on-demand can cost less total despite the higher rate. For lighter pipelines generating under 500 clips per month, H100 PCIe on-demand is the simpler and more cost-effective option. Spot instances on H100 SXM5 at $0.80/hr are attractive for workloads that can tolerate preemption.

Compared to AWS p4d.24xlarge at ~$32/hr for 8x A100s, running Cosmos on Spheron H100 instances costs 16-32x less for equivalent throughput. For a full breakdown of hyperscaler GPU pricing versus Spheron, see the AWS, GCP, and Azure GPU alternative guide. For strategies to further reduce GPU spend across your broader AI infrastructure, the GPU cost optimization playbook covers spot instance usage, checkpoint strategies, and right-sizing decisions that apply directly to synthetic data pipelines.

H200 on-demand availability varies. Check live inventory at Spheron's pricing page before planning a production pipeline around it.


Cosmos synthetic data pipelines run for GPU-hours at a time. Spheron's on-demand H100 and H200 instances let robotics teams spin up generation capacity when they need it and shut it down when they don't - no reserved capacity required.

Rent H100 → | Rent H200 → | View all pricing →

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.