Tutorial

Deploy Genesis Physics Engine on GPU Cloud: Embodied AI Simulation and Robot Policy Training at 100M FPS (2026 Guide)

Back to BlogWritten by Mitrasish, Co-founderMay 30, 2026
Genesis Physics EngineRobot Simulation GPU CloudEmbodied AI SimulationRobot Policy TrainingGenesis vs Isaac SimGPU CloudH100L40SH200B200
Deploy Genesis Physics Engine on GPU Cloud: Embodied AI Simulation and Robot Policy Training at 100M FPS (2026 Guide)

Genesis is a physics simulator that compiles to CUDA (and other backends) via Taichi, produces tens of millions of simulation frames per second on a single high-end GPU for rigid-body robotics tasks (Genesis's published 43M FPS benchmark is on an RTX 4090 running a single-plane Franka arm manipulation scene), and supports the same Python-native API for rigid bodies, soft bodies, fluids, and MPM. For robotics teams doing RL policy search on GPU cloud, it turns multi-day compute jobs into GPU-hours. If your pipeline involves GR00T N1 fine-tuning after policy search, Deploy NVIDIA Isaac GR00T N1 on GPU Cloud covers the Isaac Lab end of that stack. For synthetic data augmentation of Genesis trajectories, Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud documents the Cosmos pipeline.

What Is Genesis: Unified Physics Engine for Embodied AI

Genesis is an open-source universal physics simulator developed by the Genesis-Embodied-AI group (CMU, Stanford, and collaborators). It is written natively in Python with Taichi and CUDA backends, which is why it achieves throughput that USD-based pipelines cannot match.

The technical difference from Isaac Sim is foundational. Isaac Sim runs on Omniverse's USD scene graph with a full PhysX + OptiX rendering pipeline. Each simulated environment is a USD stage with asset loading, shader compilation, and rendering overhead per instance. Genesis bypasses all of that. Environments are tensor slices, not separate scene graphs. Adding 1024 parallel environments means allocating 1024x more memory, not spawning 1024 separate simulators.

Genesis supports multiple physics backends in the same scene:

  • Rigid body dynamics - the fastest path, up to 43M FPS on an RTX 4090 for the published Franka manipulation benchmark; scene complexity and contact richness reduce this significantly
  • Soft body (FEM/PBD) - deformable objects, cloth, elastic manipulation targets
  • Fluid (SPH) - particle-based fluid simulation for food handling, liquid manipulation
  • MPM (Material Point Method) - granular materials, plasticine, snow, soft contacts
  • PBD (Position Based Dynamics) - fast approximate soft body for large scene batches

The generative agent layer is an underrated feature for robotics. Instead of authoring each scene in USD by hand, Genesis lets you describe scene layouts programmatically and generate thousands of distinct environments for domain randomization without manual asset work.

Genesis is Apache 2.0 licensed. The simulation code, training wrappers, and export utilities are all open-source under permissive terms.

Genesis vs Isaac Sim vs MuJoCo: Where the Speed Comes From

The headline 43M FPS figure (RTX 4090, single-plane Franka arm with self-collisions enabled and mostly-idle actions) is real but requires context. With random actions enabled the Genesis team reports the same scene drops to ~27M FPS. It reflects a batch of parallel environments running the same rigid-body scene. A single isolated environment running serially achieves a few thousand FPS, similar to MuJoCo. The throughput advantage is entirely in parallelism.

MuJoCo's architecture is CPU-first with a Python wrapper. It runs fast on a single environment but does not scale to 1024 environments without spawning 1024 processes. That model does not fit on a GPU VRAM budget. MuJoCo-MJX (JAX port) and Brax address this with JAX-native batching, but Genesis handles the full physics feature set in one engine.

SimulatorFPS (peak published, rigid)APIMulti-GPULicense
Genesis~43M (RTX 4090, Franka manip, rigid, batched)¹Python/CUDAYes (batched tensor)Apache 2.0
Isaac Sim~50K-500K (H100)Python/USDYesNVIDIA (NC)
MuJoCo~1-5M (single env)C/PythonLimitedApache 2.0
Brax~1B (TPU pod, rigid only)JAXYesApache 2.0

¹ Genesis 43M FPS is measured on an RTX 4090 with a single-plane Franka arm manipulation scene, self-collisions enabled, arm largely idle. The Genesis team reports the same scene drops to ~27M FPS with random actions enabled. Isaac Sim figures are H100-based.

Brax vs Genesis: Brax (Google) is JAX-native and achieves higher theoretical throughput for pure rigid-body locomotion because JAX compiles the full sim-to-policy-gradient path into a single XLA computation. If your workload is purely rigid locomotion on JAX, Brax is faster. Genesis wins on physics fidelity: it supports MPM, fluid, and soft body physics that Brax does not, and its Python API is closer to the Isaac Lab and ROS 2 ecosystem most robotics teams already use. Do not confuse Genesis with Brax when searching for benchmark comparisons.

When Isaac Sim still wins: photorealistic rendering for Cosmos synthetic data pipelines, full Omniverse asset library, GR00T N1 official support, and ground-truth segmentation masks for perception-heavy policies. Genesis does not render Cosmos-quality frames. It is the right tool for RL policy search, not visual appearance generation.

GPU Sizing for Genesis Workloads

Environment batch size drives VRAM requirements more than any other factor. A single locomotion environment (52-DoF bimanual robot, 1K timestep rollout) fits in under 100MB of VRAM. Scaling to 4096 environments takes ~6-8GB. MPM and fluid scenes are denser because each particle is a separate simulation entity, not a rigid body.

WorkloadGPUReasoning
Single-robot PPO, up to 512 envsL40S 48GBFits full env batch, cost-effective for long sweeps
Multi-robot parallel RL, 1K-4K envsH100 PCIe 80GB80GB handles large env batches, sufficient bandwidth
Large-scale policy search, 8K+ envsH100 SXM5 instances on Spheron (NVLink)NVLink reduces gradient sync overhead across multi-GPU
MPM/fluid/soft body scenesH200 GPU availability 141GBHBM3e bandwidth critical for SPH/MPM tensor operations
100M FPS benchmark workloads8x H100 SXM5 or B200Full NVLink bandwidth, maximum parallel throughput

A note on n_envs: the right count is not fixed. For a simple 6-DoF arm (12 state dims), n_envs=4096 on an H100 PCIe fits easily. For a full 52-DoF bimanual robot (104 state dims plus contact buffers), n_envs=1024 is a safer starting point. Start with n_envs=256 and double until you hit OOM or throughput stops scaling, rather than targeting a specific count.

Live pricing (30 May 2026):

GPUOn-Demand $/hrSpot $/hr
H100 PCIe$2.01N/A
H100 SXM5$3.90$1.73
H200 SXM5$4.62$1.40
B200 SXM6$6.73$2.14
L40S$0.91N/A

Pricing fluctuates based on GPU availability. The prices above are based on 30 May 2026 and may have changed. Check current GPU pricing → for live rates.

Step-by-Step: Deploy Genesis on GPU Cloud

Step 1: Provision the Instance

Rent an H100 SXM5, H200, or L40S instance via the Spheron dashboard. For multi-GPU parallel scene training, use an SXM5 node (NVLink). For PPO training on single-robot environments, a single H100 PCIe or L40S is enough. Getting started docs are at docs.spheron.ai/quick-guides/.

After SSH-ing in, confirm CUDA 12.x:

bash
nvidia-smi
nvcc --version
# Confirm driver >= 525
nvidia-smi --query-gpu=driver_version --format=csv,noheader

For multi-GPU nodes, also check:

bash
nvidia-smi topo -m
# Confirms NVLink connectivity between GPUs (SXM5 nodes only)

Step 2: Install Genesis

The quick path via PyPI:

bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install genesis-world

For the full feature set including MPM, fluid, and soft body physics (required for non-rigid workloads), install from source:

bash
git clone https://github.com/Genesis-Embodied-AI/Genesis.git
cd Genesis
pip install -e ".[all]"

The [all] extra pulls Taichi, Open3D, trimesh, and the full physics backend dependencies. Installation takes 5-10 minutes. Genesis requires Python 3.10+, PyTorch 2.1+, and CUDA 12.x.

Pin the version in your training scripts. Genesis is under active development and the API surface changes between minor releases. Check Genesis GitHub releases for the latest stable tag and pin with pip install genesis-world==0.x.x after confirming your target version.

Step 3: Configure Headless Rendering

Cloud GPU instances have no physical display. Genesis's rasterized renderer requires a virtual framebuffer or EGL offscreen mode.

Option A: Xvfb virtual display (simplest)

bash
sudo apt-get install -y xvfb
Xvfb :0 -screen 0 1024x768x24 &
export DISPLAY=:0

Add both lines to your training script preamble or ~/.bashrc.

Option B: EGL offscreen rendering (no Xvfb)

python
import genesis as gs

gs.init(
    backend=gs.cuda,
    renderer=gs.renderers.RastRenderer(offscreen=True)
)

This is the cleaner path for headless training. No virtual display process to manage.

Option C: OptiX ray-traced rendering (optional)

OptiX requires driver 535+. Check with nvidia-smi --query-gpu=driver_version --format=csv,noheader. If your instance has driver 535+, enable it:

python
gs.init(
    backend=gs.cuda,
    renderer=gs.renderers.OptixRenderer()
)

OptiX adds photorealistic rendering, useful for generating visual observations for VLA training. It is not required for pure RL policy training where visual fidelity does not matter.

Step 4: Define a Robot Environment

python
import genesis as gs
import torch

gs.init(backend=gs.cuda, renderer=gs.renderers.RastRenderer(offscreen=True))

# Create the scene
scene = gs.Scene(
    sim_options=gs.options.SimOptions(
        dt=0.02,  # 50 Hz physics
        gravity=(0, 0, -9.8),
    ),
)

# Add ground plane
scene.add_entity(gs.morphs.Plane())

# Load robot from URDF
robot = scene.add_entity(
    gs.morphs.URDF(
        file="path/to/your/robot.urdf",
        pos=(0.0, 0.0, 0.5),
    )
)

# Build with parallel environments
# Genesis tiles n_envs across available GPU VRAM automatically
scene.build(n_envs=1024)

# After build: scene.envs_offset contains per-env offsets for observations

Start with n_envs=256 and increase until OOM. A 52-DoF bimanual robot with contact forces in the observation will hit ~6-8GB on H100 at n_envs=1024. A simpler 6-DoF arm fits n_envs=4096 in the same VRAM.

Step 5: Verify Multi-GPU Distribution

For multi-GPU nodes, Genesis distributes environment batches across available GPUs automatically when you initialize with:

python
import torch

gs.init(
    backend=gs.cuda,
    renderer=gs.renderers.RastRenderer(offscreen=True)
)

print(f"Available GPUs: {torch.cuda.device_count()}")
# Genesis will distribute n_envs across all visible CUDA devices

scene.build(n_envs=4096)
# With 4x H100 SXM5, Genesis assigns ~1024 envs per GPU

Confirm GPU utilization during training:

bash
watch -n 1 nvidia-smi

All GPUs should show high memory utilization and >80% GPU-util during training steps.

Robot Policy Training Pipeline: Genesis to PPO/GRPO

Genesis's batched tensor API maps directly onto vectorized policy gradient methods. Each scene.step() call returns observations and rewards for all n_envs environments as a single GPU tensor, so the policy update uses the full batch without a gather step across processes.

Gym Wrapper

python
import gymnasium as gym
import numpy as np
import torch
import genesis as gs
from stable_baselines3.common.vec_env import VecEnv

class GenesisVecEnv(VecEnv):
    def __init__(self, n_envs=1024, robot_urdf="robot.urdf"):
        gs.init(backend=gs.cuda, renderer=gs.renderers.RastRenderer(offscreen=True))

        self.scene = gs.Scene(
            sim_options=gs.options.SimOptions(dt=0.02, gravity=(0, 0, -9.8)),
        )
        self.scene.add_entity(gs.morphs.Plane())
        self.robot = self.scene.add_entity(
            gs.morphs.URDF(file=robot_urdf, pos=(0.0, 0.0, 0.5))
        )
        self.scene.build(n_envs=n_envs)

        n_dofs = self.robot.n_dofs
        observation_space = gym.spaces.Box(
            low=-np.inf, high=np.inf,
            shape=(n_dofs * 2 + 3,),
            dtype=np.float32
        )
        action_space = gym.spaces.Box(
            low=-1.0, high=1.0,
            shape=(n_dofs,),
            dtype=np.float32
        )
        # VecEnv.__init__ sets self.num_envs = n_envs so PPO sees all parallel streams
        super().__init__(n_envs, observation_space, action_space)
        self._pending_actions = None

    def reset(self):
        self.scene.reset()
        return self._get_obs().cpu().numpy()  # (n_envs, obs_dim)

    def step_async(self, actions):
        # actions: (n_envs, n_dofs) — one action per environment from PPO
        self._pending_actions = torch.tensor(actions, dtype=torch.float32, device="cuda")

    def step_wait(self):
        self.robot.set_dofs_control_force(forces=self._pending_actions)
        self.scene.step()

        rewards = self._compute_reward().cpu().numpy()  # (n_envs,)

        ee_pos = self.robot.get_link("ee_link").get_pos()
        target = torch.zeros(self.num_envs, 3, device="cuda")
        dones = (torch.norm(ee_pos - target, dim=-1) < 0.05).cpu().numpy()
        done_indices = np.where(dones)[0].tolist()

        infos = [{} for _ in range(self.num_envs)]
        if done_indices:
            terminal_obs = self._get_obs().cpu().numpy()
            for i in done_indices:
                infos[i]["terminal_observation"] = terminal_obs[i]
            self.scene.reset(envs_idx=done_indices)

        obs = self._get_obs().cpu().numpy()  # post-reset obs for done envs; satisfies SB3 VecEnv auto-reset contract
        return obs, rewards, dones, infos

    def _get_obs(self):
        pos = self.robot.get_dofs_position()
        vel = self.robot.get_dofs_velocity()
        target = torch.zeros(self.num_envs, 3, device="cuda")
        return torch.cat([pos, vel, target], dim=-1)    # (n_envs, obs_dim)

    def _compute_reward(self):
        ee_pos = self.robot.get_link("ee_link").get_pos()
        target = torch.zeros(self.num_envs, 3, device="cuda")
        dist = torch.norm(ee_pos - target, dim=-1)
        return -dist  # (n_envs,)

    def close(self):
        pass

    def env_is_wrapped(self, wrapper_class, indices=None):
        return [False] * self.num_envs

    def env_method(self, method_name, *method_args, indices=None, **method_kwargs):
        return [None] * self.num_envs

    def get_attr(self, attr_name, indices=None):
        return [None] * self.num_envs

    def set_attr(self, attr_name, value, indices=None):
        pass

    def seed(self, seed=None):
        return [None] * self.num_envs

PPO Training Loop

python
from stable_baselines3 import PPO

# GenesisVecEnv exposes all 1024 Genesis envs as SB3 VecEnv slots.
# PPO collects n_envs * n_steps = 1024 * 512 = 524,288 transitions per update.
env = GenesisVecEnv(n_envs=1024, robot_urdf="path/to/robot.urdf")

model = PPO(
    "MlpPolicy",
    env,
    n_steps=512,
    batch_size=4096,
    n_epochs=10,
    learning_rate=3e-4,
    gamma=0.99,
    verbose=1,
    tensorboard_log="./logs/"
)

model.learn(total_timesteps=10_000_000)
model.save("genesis_ppo_checkpoint")

For GRPO (Group Relative Policy Optimization, commonly used in reasoning models but also applicable to robotics), the reward structure changes but the Genesis environment integration stays the same. GRPO works especially well for manipulation tasks where a binary success signal is available at episode end.

Reward Shaping

Locomotion: reward forward velocity, penalize torque use and joint limit violations. Genesis returns contact forces as part of the observation, which lets you add a stability term without extra sensors.

Manipulation: shaped reward with distance-to-target, grasp success binary, and time penalty. Add an exploration bonus (curiosity) for long-horizon tasks where sparse rewards slow convergence.

Domain randomization for sim-to-real: randomize mass (±20%), friction (0.5-1.5x), damping (0.8-1.2x), and action noise. Genesis's programmatic scene API makes this easier to script than USD-based Isaac Sim scenes.

Sim-to-Real Transfer with Genesis

Genesis-trained policies transfer via the same mechanisms as Isaac Lab: domain randomization, action noise injection, and observation noise. The key advantage is that Genesis's programmatic scene API makes randomization easier to script.

python
import random

# Randomize physics per environment reset
for env_i in range(n_envs):
    mass_scale = random.uniform(0.8, 1.2)
    friction_scale = random.uniform(0.5, 1.5)
    robot.set_mass(robot.get_mass() * mass_scale, envs_idx=[env_i])
    robot.set_friction(friction_scale, envs_idx=[env_i])

For the visual domain gap, an optional Cosmos step closes it: export Genesis sim trajectories, run through Cosmos-Transfer to add photorealistic domain variation, then use the augmented dataset for VLA fine-tuning. The full pipeline is documented in Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud.

Calibration loop for sim-to-real transfer:

  1. Train policy in Genesis with domain randomization
  2. Deploy to real robot, collect failure episodes
  3. Identify failure modes (e.g., grasp fails on polished surfaces)
  4. Add corresponding randomization in Genesis (increase friction variance)
  5. Retrain and repeat

Three to five calibration rounds is typical for a manipulation task. Genesis's speed makes each retrain a GPU-hour rather than a multi-day job.

Integrating Genesis with Isaac GR00T N1, OpenVLA, and Cosmos

Genesis + Isaac GR00T N1

Genesis runs fast RL policy search to find a good base policy. That policy or its trajectory data feeds into Isaac Lab for GR00T N1 fine-tuning. Genesis is not a replacement for Isaac Lab in the GR00T N1 stack. It is a pre-training accelerator.

The handoff: Genesis generates 10K-50K demonstration trajectories in LeRobot v2 parquet format. The GR00T N1 fine-tuning script in Isaac Lab consumes that format directly. The typical workflow:

  1. Search reward landscape in Genesis (fast, broad search)
  2. Find a high-reward policy in Genesis
  3. Run rollouts, save trajectory tensors, and convert to LeRobot v2 parquet using LeRobot's conversion scripts
  4. Fine-tune GR00T N1 LoRA adapter on the Genesis-generated dataset

GR00T N1 weights are under NVIDIA's non-commercial research license. Genesis itself is Apache 2.0. Those licenses are independent. See Deploy NVIDIA Isaac GR00T N1 on GPU Cloud for the GR00T fine-tuning setup.

Genesis + OpenVLA

Genesis generates RLDS or LeRobot v2 trajectory data. OpenVLA's action tokenizer consumes that data directly. Genesis's 43M FPS means you can generate 10,000 demonstration trajectories in minutes instead of hours, which is enough to bootstrap a usable OpenVLA adapter before collecting real data.

The export flow:

python
import genesis as gs
import pickle

trajectories = []
for episode in range(num_episodes):
    obs_list, action_list = [], []
    scene.reset()
    for step in range(episode_length):
        obs = scene.get_obs()
        action = policy.act(obs)
        scene.step()
        obs_list.append(obs)
        action_list.append(action)
    trajectories.append({"obs": obs_list, "actions": action_list})

# Convert to LeRobot v2 parquet format for OpenVLA fine-tuning
# (use Genesis's built-in export utilities or LeRobot's conversion scripts)

See Deploy OpenVLA on GPU Cloud for the full OpenVLA fine-tuning pipeline once you have the trajectory dataset.

Genesis + Cosmos

Genesis provides physics-accurate trajectories. Cosmos-Transfer adds photorealistic visual variation. The combined dataset covers both dynamics accuracy (Genesis) and visual distribution (Cosmos), which is the strongest combination for policies that need to generalize across visual domains.

The workflow:

  1. Run Genesis policy rollouts and save video frames + trajectory data
  2. Pass video frames through Cosmos-Transfer to add realistic visual variation
  3. The combined dataset (Genesis physics, Cosmos appearance) trains a VLA that transfers better to real hardware than either source alone

See Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud for the Cosmos pipeline setup.

Cost Comparison: Genesis on Spheron vs Isaac Lab on Managed Robotics Cloud

The table below compares cost for collecting 1M env steps per day across three configurations. "Env steps" means simulation steps per individual environment, not total across all parallel envs. Genesis's throughput figures are illustrative estimates based on published benchmarks; actual numbers depend on robot complexity and scene configuration.

SetupGPURateSteps/env/secGPU-hrs/dayCost/day
Genesis, H100 SXM5 (spot)H100 SXM5$1.73/hr~1,172~0.24~$0.42
Genesis, H200 SXM5 (spot)H200 SXM5$1.40/hr~1,400 (est.)~0.20~$0.28
Isaac Lab, AWS RoboMakerp4d.24xlarge~$10-14/hr~98~2.84~$28-40

The Genesis throughput figure (~1,172 steps/env/sec for H100 SXM5) is a conservative per-env throughput estimate for a contact-rich manipulation scene; actual numbers depend on robot complexity. Isaac Lab throughput on AWS RoboMaker uses representative figures for a comparable manipulation scene. Steps per second vary significantly with scene complexity.

Pricing fluctuates based on GPU availability. The prices above are based on 30 May 2026 and may have changed. Check current GPU pricing → for live rates.

The Genesis-on-Spheron advantage compounds at scale. Hyperscaler RoboMaker pricing includes per-simulation-unit charges on top of EC2 instance costs. Spheron charges only for the GPU instance with no platform overhead.

Troubleshooting: CUDA Driver Matrix, OptiX, and Multi-Node Scaling

CUDA Driver Matrix

Genesis requires CUDA 12.x (driver 525+). The Taichi backend additionally requires CUDA 12.3+ for its kernel compilation path.

Genesis VersionMinimum DriverCUDA Version
0.2.x52512.0
0.3.x+53012.3

Check your driver version:

bash
nvidia-smi --query-gpu=driver_version --format=csv,noheader

If you get No CUDA runtime is found on import:

bash
# Reinstall PyTorch with the correct CUDA index for your driver
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# or cu124 for CUDA 12.4 instances

Most Spheron instances ship with driver 535+ and CUDA 12.4. Confirm with nvcc --version before debugging Python imports.

OptiX Rendering Issues

OptiX requires driver 535+. On headless instances, OptiX also needs an X display session.

bash
# Verify driver version supports OptiX
nvidia-smi --query-gpu=driver_version --format=csv,noheader
# If < 535, fall back to RastRenderer

# If driver >= 535 but OptiX still fails, start Xvfb first
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1
python your_genesis_script.py

OptiX is not required for policy training. Use gs.renderers.RastRenderer(offscreen=True) as the default on all cloud instances and switch to OptiX only when visual quality matters (e.g., generating observations for VLA training).

Multi-Node Genesis Scaling

Genesis's multi-GPU support runs on NCCL within a single node. Multi-node Genesis requires a Ray or torchrun wrapper that handles cross-node environment distribution.

Common NCCL init timeout when InfiniBand is unavailable:

bash
# For RoCE or pure Ethernet setups
export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1

# Then launch your distributed Genesis script
torchrun --nproc_per_node=8 your_training_script.py

For networking context on multi-node setups without InfiniBand, see Multi-Node GPU Training Without InfiniBand.

Out-of-Memory in Large Environment Batches

If you hit OOM during scene.build(n_envs=N):

  1. Reduce n_envs by half
  2. For contact-rich tasks, reduce the contact buffer size in gs.options.SimOptions
  3. Enable expandable memory segments to reduce VRAM fragmentation:
bash
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
  1. If training on multi-GPU, reduce n_envs per GPU proportionally:
python
n_gpus = torch.cuda.device_count()
n_envs_per_gpu = 1024  # start here, scale up
total_envs = n_envs_per_gpu * n_gpus
scene.build(n_envs=total_envs)

Genesis's simulation throughput turns a multi-day policy search into a few GPU-hours. Spheron's on-demand and spot H100/H200 instances give robotics teams bare-metal compute without managed-platform overhead.

H100 on Spheron → | H200 GPU pricing → | View all GPU pricing →

STEPS / 06

Quick Setup Guide

  1. Provision a GPU instance on Spheron

    Rent an H100 SXM5, H200, or L40S instance at app.spheron.ai. SSH in and verify CUDA 12.x with nvidia-smi. For multi-GPU parallel scene training, use a node with NVLink (SXM5 form factor). L40S single-GPU is sufficient for PPO training on single-robot environments. For the 100M FPS benchmark workloads, use an 8xH100 SXM5 node.

  2. Install Genesis physics engine

    Install from PyPI with: pip install genesis-world. For the full feature set including MPM, fluid, and soft body physics, install from source: git clone https://github.com/Genesis-Embodied-AI/Genesis.git && cd Genesis && pip install -e '[all]'. Genesis requires Python 3.10+, PyTorch 2.1+, and CUDA 12.x.

  3. Configure headless rendering

    On a headless GPU instance, start a virtual display: Xvfb :0 -screen 0 1024x768x24 & and export DISPLAY=:0. For EGL-based offscreen rendering (no Xvfb needed): set gs.init(backend=gs.cuda, renderer=gs.renderers.RastRenderer(offscreen=True)). NVIDIA OptiX ray-traced rendering requires driver 535+ and is enabled with gs.renderers.OptixRenderer.

  4. Define a robot simulation environment

    Create a Genesis scene with gs.Scene(), add a robot URDF (gs.morphs.URDF(file='robot.urdf')), set physics properties, and instantiate parallel environments with scene.build(n_envs=1024). Genesis automatically tiles environments across available GPU VRAM and distributes computation across multiple GPUs if present.

  5. Train a robot policy with PPO

    Wrap the Genesis environment in a gym-compatible interface using Genesis's built-in GymWrapper. Connect to stable-baselines3 PPO or a custom GRPO trainer. Genesis returns batched observations and rewards across all parallel environments in a single GPU tensor, making vectorized policy gradient updates efficient. Log episode returns and success rates with wandb or tensorboard.

  6. Export trajectories for VLA fine-tuning

    After training, run rollouts in Genesis, save trajectory tensors (observations, actions, rewards), and convert to LeRobot v2 parquet format using LeRobot's conversion scripts. This dataset feeds directly into NVIDIA Isaac GR00T N1 LoRA fine-tuning or OpenVLA's fine-tuning pipeline.

FAQ / 06

Frequently Asked Questions

Genesis runs on any CUDA-capable GPU with CUDA 12.x and driver 525+. For single-environment PPO training, an L40S 48GB or A100 40GB is sufficient. For multi-GPU parallel scene training (1K+ environments), use an H100 SXM5 or H200. The 100M FPS figure Genesis quotes requires an H100 SXM5 with 8 GPUs or a B200. L40S handles most robotics sim sweeps well at a lower price.

Genesis reports 10-80x faster simulation than Isaac Sim and MuJoCo on equivalent workloads in its benchmark paper. The speedup varies by workload: rigid body scenes show the highest gains (Genesis reports 43M FPS on an RTX 4090 for a single-plane Franka arm manipulation scene), while complex fluid/MPM scenes show more modest speedups. Isaac Sim runs on USD-based scene graphs with full photorealistic rendering, making it slower per sim step but richer in visual fidelity for Cosmos-style synthetic data pipelines.

Yes. Genesis supports headless rendering via Xvfb or EGL/offscreen mode. On a cloud instance with no physical display, set DISPLAY=:0 and start Xvfb before launching Genesis, or use the gs.renderers.RastRenderer with offscreen mode. NVIDIA OptiX ray-traced rendering requires driver 535+ and is optional - basic rasterized rendering works on any CUDA 12.x driver.

Genesis generates robot trajectory data (joint positions, forces, contact states) that can be exported to LeRobot v2 parquet or RLDS format. This data feeds directly into the GR00T N1 LoRA fine-tuning pipeline or OpenVLA's action tokenizer. The workflow is: Genesis sim -> trajectory export -> Cosmos photorealistic augmentation (optional) -> VLA fine-tuning on H100/B200. Genesis complements Isaac Lab (used for GR00T N1) by providing faster parallel simulation for RL policy search before transferring the best policy to Isaac Lab's full pipeline.

Running Genesis on a Spheron H100 at bare-metal on-demand pricing typically costs 40-60% less than the equivalent Isaac Lab setup on AWS RoboMaker or NVIDIA's managed robotics cloud. RoboMaker charges per simulation unit hour on top of EC2 costs; Spheron charges only for the GPU instance with no platform overhead. For 1M sim steps per day, the Genesis-on-Spheron path using spot H100 SXM5 instances typically runs under $1/day versus $28-40/day for managed robotics cloud equivalents.

Genesis supports multi-GPU on a single node through NCCL-backed data parallelism, running independent environment batches on each GPU and gathering gradients for policy updates. True multi-node Genesis requires wrapping the simulator in a Ray or NCCL-based distributed training harness. For multi-node setups, see the Spheron multi-node GPU training guide. Most robotics sim-to-real workflows fit within a single 8xH100 or 8xB200 node.

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.