Genesis is a physics simulator that compiles to CUDA (and other backends) via Taichi, produces tens of millions of simulation frames per second on a single high-end GPU for rigid-body robotics tasks (Genesis's published 43M FPS benchmark is on an RTX 4090 running a single-plane Franka arm manipulation scene), and supports the same Python-native API for rigid bodies, soft bodies, fluids, and MPM. For robotics teams doing RL policy search on GPU cloud, it turns multi-day compute jobs into GPU-hours. If your pipeline involves GR00T N1 fine-tuning after policy search, Deploy NVIDIA Isaac GR00T N1 on GPU Cloud covers the Isaac Lab end of that stack. For synthetic data augmentation of Genesis trajectories, Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud documents the Cosmos pipeline.
What Is Genesis: Unified Physics Engine for Embodied AI
Genesis is an open-source universal physics simulator developed by the Genesis-Embodied-AI group (CMU, Stanford, and collaborators). It is written natively in Python with Taichi and CUDA backends, which is why it achieves throughput that USD-based pipelines cannot match.
The technical difference from Isaac Sim is foundational. Isaac Sim runs on Omniverse's USD scene graph with a full PhysX + OptiX rendering pipeline. Each simulated environment is a USD stage with asset loading, shader compilation, and rendering overhead per instance. Genesis bypasses all of that. Environments are tensor slices, not separate scene graphs. Adding 1024 parallel environments means allocating 1024x more memory, not spawning 1024 separate simulators.
Genesis supports multiple physics backends in the same scene:
- Rigid body dynamics - the fastest path, up to 43M FPS on an RTX 4090 for the published Franka manipulation benchmark; scene complexity and contact richness reduce this significantly
- Soft body (FEM/PBD) - deformable objects, cloth, elastic manipulation targets
- Fluid (SPH) - particle-based fluid simulation for food handling, liquid manipulation
- MPM (Material Point Method) - granular materials, plasticine, snow, soft contacts
- PBD (Position Based Dynamics) - fast approximate soft body for large scene batches
The generative agent layer is an underrated feature for robotics. Instead of authoring each scene in USD by hand, Genesis lets you describe scene layouts programmatically and generate thousands of distinct environments for domain randomization without manual asset work.
Genesis is Apache 2.0 licensed. The simulation code, training wrappers, and export utilities are all open-source under permissive terms.
Genesis vs Isaac Sim vs MuJoCo: Where the Speed Comes From
The headline 43M FPS figure (RTX 4090, single-plane Franka arm with self-collisions enabled and mostly-idle actions) is real but requires context. With random actions enabled the Genesis team reports the same scene drops to ~27M FPS. It reflects a batch of parallel environments running the same rigid-body scene. A single isolated environment running serially achieves a few thousand FPS, similar to MuJoCo. The throughput advantage is entirely in parallelism.
MuJoCo's architecture is CPU-first with a Python wrapper. It runs fast on a single environment but does not scale to 1024 environments without spawning 1024 processes. That model does not fit on a GPU VRAM budget. MuJoCo-MJX (JAX port) and Brax address this with JAX-native batching, but Genesis handles the full physics feature set in one engine.
| Simulator | FPS (peak published, rigid) | API | Multi-GPU | License |
|---|---|---|---|---|
| Genesis | ~43M (RTX 4090, Franka manip, rigid, batched)¹ | Python/CUDA | Yes (batched tensor) | Apache 2.0 |
| Isaac Sim | ~50K-500K (H100) | Python/USD | Yes | NVIDIA (NC) |
| MuJoCo | ~1-5M (single env) | C/Python | Limited | Apache 2.0 |
| Brax | ~1B (TPU pod, rigid only) | JAX | Yes | Apache 2.0 |
¹ Genesis 43M FPS is measured on an RTX 4090 with a single-plane Franka arm manipulation scene, self-collisions enabled, arm largely idle. The Genesis team reports the same scene drops to ~27M FPS with random actions enabled. Isaac Sim figures are H100-based.
Brax vs Genesis: Brax (Google) is JAX-native and achieves higher theoretical throughput for pure rigid-body locomotion because JAX compiles the full sim-to-policy-gradient path into a single XLA computation. If your workload is purely rigid locomotion on JAX, Brax is faster. Genesis wins on physics fidelity: it supports MPM, fluid, and soft body physics that Brax does not, and its Python API is closer to the Isaac Lab and ROS 2 ecosystem most robotics teams already use. Do not confuse Genesis with Brax when searching for benchmark comparisons.
When Isaac Sim still wins: photorealistic rendering for Cosmos synthetic data pipelines, full Omniverse asset library, GR00T N1 official support, and ground-truth segmentation masks for perception-heavy policies. Genesis does not render Cosmos-quality frames. It is the right tool for RL policy search, not visual appearance generation.
GPU Sizing for Genesis Workloads
Environment batch size drives VRAM requirements more than any other factor. A single locomotion environment (52-DoF bimanual robot, 1K timestep rollout) fits in under 100MB of VRAM. Scaling to 4096 environments takes ~6-8GB. MPM and fluid scenes are denser because each particle is a separate simulation entity, not a rigid body.
| Workload | GPU | Reasoning |
|---|---|---|
| Single-robot PPO, up to 512 envs | L40S 48GB | Fits full env batch, cost-effective for long sweeps |
| Multi-robot parallel RL, 1K-4K envs | H100 PCIe 80GB | 80GB handles large env batches, sufficient bandwidth |
| Large-scale policy search, 8K+ envs | H100 SXM5 instances on Spheron (NVLink) | NVLink reduces gradient sync overhead across multi-GPU |
| MPM/fluid/soft body scenes | H200 GPU availability 141GB | HBM3e bandwidth critical for SPH/MPM tensor operations |
| 100M FPS benchmark workloads | 8x H100 SXM5 or B200 | Full NVLink bandwidth, maximum parallel throughput |
A note on n_envs: the right count is not fixed. For a simple 6-DoF arm (12 state dims), n_envs=4096 on an H100 PCIe fits easily. For a full 52-DoF bimanual robot (104 state dims plus contact buffers), n_envs=1024 is a safer starting point. Start with n_envs=256 and double until you hit OOM or throughput stops scaling, rather than targeting a specific count.
Live pricing (30 May 2026):
| GPU | On-Demand $/hr | Spot $/hr |
|---|---|---|
| H100 PCIe | $2.01 | N/A |
| H100 SXM5 | $3.90 | $1.73 |
| H200 SXM5 | $4.62 | $1.40 |
| B200 SXM6 | $6.73 | $2.14 |
| L40S | $0.91 | N/A |
Pricing fluctuates based on GPU availability. The prices above are based on 30 May 2026 and may have changed. Check current GPU pricing → for live rates.
Step-by-Step: Deploy Genesis on GPU Cloud
Step 1: Provision the Instance
Rent an H100 SXM5, H200, or L40S instance via the Spheron dashboard. For multi-GPU parallel scene training, use an SXM5 node (NVLink). For PPO training on single-robot environments, a single H100 PCIe or L40S is enough. Getting started docs are at docs.spheron.ai/quick-guides/.
After SSH-ing in, confirm CUDA 12.x:
nvidia-smi
nvcc --version
# Confirm driver >= 525
nvidia-smi --query-gpu=driver_version --format=csv,noheaderFor multi-GPU nodes, also check:
nvidia-smi topo -m
# Confirms NVLink connectivity between GPUs (SXM5 nodes only)Step 2: Install Genesis
The quick path via PyPI:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install genesis-worldFor the full feature set including MPM, fluid, and soft body physics (required for non-rigid workloads), install from source:
git clone https://github.com/Genesis-Embodied-AI/Genesis.git
cd Genesis
pip install -e ".[all]"The [all] extra pulls Taichi, Open3D, trimesh, and the full physics backend dependencies. Installation takes 5-10 minutes. Genesis requires Python 3.10+, PyTorch 2.1+, and CUDA 12.x.
Pin the version in your training scripts. Genesis is under active development and the API surface changes between minor releases. Check Genesis GitHub releases for the latest stable tag and pin with pip install genesis-world==0.x.x after confirming your target version.
Step 3: Configure Headless Rendering
Cloud GPU instances have no physical display. Genesis's rasterized renderer requires a virtual framebuffer or EGL offscreen mode.
Option A: Xvfb virtual display (simplest)
sudo apt-get install -y xvfb
Xvfb :0 -screen 0 1024x768x24 &
export DISPLAY=:0Add both lines to your training script preamble or ~/.bashrc.
Option B: EGL offscreen rendering (no Xvfb)
import genesis as gs
gs.init(
backend=gs.cuda,
renderer=gs.renderers.RastRenderer(offscreen=True)
)This is the cleaner path for headless training. No virtual display process to manage.
Option C: OptiX ray-traced rendering (optional)
OptiX requires driver 535+. Check with nvidia-smi --query-gpu=driver_version --format=csv,noheader. If your instance has driver 535+, enable it:
gs.init(
backend=gs.cuda,
renderer=gs.renderers.OptixRenderer()
)OptiX adds photorealistic rendering, useful for generating visual observations for VLA training. It is not required for pure RL policy training where visual fidelity does not matter.
Step 4: Define a Robot Environment
import genesis as gs
import torch
gs.init(backend=gs.cuda, renderer=gs.renderers.RastRenderer(offscreen=True))
# Create the scene
scene = gs.Scene(
sim_options=gs.options.SimOptions(
dt=0.02, # 50 Hz physics
gravity=(0, 0, -9.8),
),
)
# Add ground plane
scene.add_entity(gs.morphs.Plane())
# Load robot from URDF
robot = scene.add_entity(
gs.morphs.URDF(
file="path/to/your/robot.urdf",
pos=(0.0, 0.0, 0.5),
)
)
# Build with parallel environments
# Genesis tiles n_envs across available GPU VRAM automatically
scene.build(n_envs=1024)
# After build: scene.envs_offset contains per-env offsets for observationsStart with n_envs=256 and increase until OOM. A 52-DoF bimanual robot with contact forces in the observation will hit ~6-8GB on H100 at n_envs=1024. A simpler 6-DoF arm fits n_envs=4096 in the same VRAM.
Step 5: Verify Multi-GPU Distribution
For multi-GPU nodes, Genesis distributes environment batches across available GPUs automatically when you initialize with:
import torch
gs.init(
backend=gs.cuda,
renderer=gs.renderers.RastRenderer(offscreen=True)
)
print(f"Available GPUs: {torch.cuda.device_count()}")
# Genesis will distribute n_envs across all visible CUDA devices
scene.build(n_envs=4096)
# With 4x H100 SXM5, Genesis assigns ~1024 envs per GPUConfirm GPU utilization during training:
watch -n 1 nvidia-smiAll GPUs should show high memory utilization and >80% GPU-util during training steps.
Robot Policy Training Pipeline: Genesis to PPO/GRPO
Genesis's batched tensor API maps directly onto vectorized policy gradient methods. Each scene.step() call returns observations and rewards for all n_envs environments as a single GPU tensor, so the policy update uses the full batch without a gather step across processes.
Gym Wrapper
import gymnasium as gym
import numpy as np
import torch
import genesis as gs
from stable_baselines3.common.vec_env import VecEnv
class GenesisVecEnv(VecEnv):
def __init__(self, n_envs=1024, robot_urdf="robot.urdf"):
gs.init(backend=gs.cuda, renderer=gs.renderers.RastRenderer(offscreen=True))
self.scene = gs.Scene(
sim_options=gs.options.SimOptions(dt=0.02, gravity=(0, 0, -9.8)),
)
self.scene.add_entity(gs.morphs.Plane())
self.robot = self.scene.add_entity(
gs.morphs.URDF(file=robot_urdf, pos=(0.0, 0.0, 0.5))
)
self.scene.build(n_envs=n_envs)
n_dofs = self.robot.n_dofs
observation_space = gym.spaces.Box(
low=-np.inf, high=np.inf,
shape=(n_dofs * 2 + 3,),
dtype=np.float32
)
action_space = gym.spaces.Box(
low=-1.0, high=1.0,
shape=(n_dofs,),
dtype=np.float32
)
# VecEnv.__init__ sets self.num_envs = n_envs so PPO sees all parallel streams
super().__init__(n_envs, observation_space, action_space)
self._pending_actions = None
def reset(self):
self.scene.reset()
return self._get_obs().cpu().numpy() # (n_envs, obs_dim)
def step_async(self, actions):
# actions: (n_envs, n_dofs) — one action per environment from PPO
self._pending_actions = torch.tensor(actions, dtype=torch.float32, device="cuda")
def step_wait(self):
self.robot.set_dofs_control_force(forces=self._pending_actions)
self.scene.step()
rewards = self._compute_reward().cpu().numpy() # (n_envs,)
ee_pos = self.robot.get_link("ee_link").get_pos()
target = torch.zeros(self.num_envs, 3, device="cuda")
dones = (torch.norm(ee_pos - target, dim=-1) < 0.05).cpu().numpy()
done_indices = np.where(dones)[0].tolist()
infos = [{} for _ in range(self.num_envs)]
if done_indices:
terminal_obs = self._get_obs().cpu().numpy()
for i in done_indices:
infos[i]["terminal_observation"] = terminal_obs[i]
self.scene.reset(envs_idx=done_indices)
obs = self._get_obs().cpu().numpy() # post-reset obs for done envs; satisfies SB3 VecEnv auto-reset contract
return obs, rewards, dones, infos
def _get_obs(self):
pos = self.robot.get_dofs_position()
vel = self.robot.get_dofs_velocity()
target = torch.zeros(self.num_envs, 3, device="cuda")
return torch.cat([pos, vel, target], dim=-1) # (n_envs, obs_dim)
def _compute_reward(self):
ee_pos = self.robot.get_link("ee_link").get_pos()
target = torch.zeros(self.num_envs, 3, device="cuda")
dist = torch.norm(ee_pos - target, dim=-1)
return -dist # (n_envs,)
def close(self):
pass
def env_is_wrapped(self, wrapper_class, indices=None):
return [False] * self.num_envs
def env_method(self, method_name, *method_args, indices=None, **method_kwargs):
return [None] * self.num_envs
def get_attr(self, attr_name, indices=None):
return [None] * self.num_envs
def set_attr(self, attr_name, value, indices=None):
pass
def seed(self, seed=None):
return [None] * self.num_envsPPO Training Loop
from stable_baselines3 import PPO
# GenesisVecEnv exposes all 1024 Genesis envs as SB3 VecEnv slots.
# PPO collects n_envs * n_steps = 1024 * 512 = 524,288 transitions per update.
env = GenesisVecEnv(n_envs=1024, robot_urdf="path/to/robot.urdf")
model = PPO(
"MlpPolicy",
env,
n_steps=512,
batch_size=4096,
n_epochs=10,
learning_rate=3e-4,
gamma=0.99,
verbose=1,
tensorboard_log="./logs/"
)
model.learn(total_timesteps=10_000_000)
model.save("genesis_ppo_checkpoint")For GRPO (Group Relative Policy Optimization, commonly used in reasoning models but also applicable to robotics), the reward structure changes but the Genesis environment integration stays the same. GRPO works especially well for manipulation tasks where a binary success signal is available at episode end.
Reward Shaping
Locomotion: reward forward velocity, penalize torque use and joint limit violations. Genesis returns contact forces as part of the observation, which lets you add a stability term without extra sensors.
Manipulation: shaped reward with distance-to-target, grasp success binary, and time penalty. Add an exploration bonus (curiosity) for long-horizon tasks where sparse rewards slow convergence.
Domain randomization for sim-to-real: randomize mass (±20%), friction (0.5-1.5x), damping (0.8-1.2x), and action noise. Genesis's programmatic scene API makes this easier to script than USD-based Isaac Sim scenes.
Sim-to-Real Transfer with Genesis
Genesis-trained policies transfer via the same mechanisms as Isaac Lab: domain randomization, action noise injection, and observation noise. The key advantage is that Genesis's programmatic scene API makes randomization easier to script.
import random
# Randomize physics per environment reset
for env_i in range(n_envs):
mass_scale = random.uniform(0.8, 1.2)
friction_scale = random.uniform(0.5, 1.5)
robot.set_mass(robot.get_mass() * mass_scale, envs_idx=[env_i])
robot.set_friction(friction_scale, envs_idx=[env_i])For the visual domain gap, an optional Cosmos step closes it: export Genesis sim trajectories, run through Cosmos-Transfer to add photorealistic domain variation, then use the augmented dataset for VLA fine-tuning. The full pipeline is documented in Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud.
Calibration loop for sim-to-real transfer:
- Train policy in Genesis with domain randomization
- Deploy to real robot, collect failure episodes
- Identify failure modes (e.g., grasp fails on polished surfaces)
- Add corresponding randomization in Genesis (increase friction variance)
- Retrain and repeat
Three to five calibration rounds is typical for a manipulation task. Genesis's speed makes each retrain a GPU-hour rather than a multi-day job.
Integrating Genesis with Isaac GR00T N1, OpenVLA, and Cosmos
Genesis + Isaac GR00T N1
Genesis runs fast RL policy search to find a good base policy. That policy or its trajectory data feeds into Isaac Lab for GR00T N1 fine-tuning. Genesis is not a replacement for Isaac Lab in the GR00T N1 stack. It is a pre-training accelerator.
The handoff: Genesis generates 10K-50K demonstration trajectories in LeRobot v2 parquet format. The GR00T N1 fine-tuning script in Isaac Lab consumes that format directly. The typical workflow:
- Search reward landscape in Genesis (fast, broad search)
- Find a high-reward policy in Genesis
- Run rollouts, save trajectory tensors, and convert to LeRobot v2 parquet using LeRobot's conversion scripts
- Fine-tune GR00T N1 LoRA adapter on the Genesis-generated dataset
GR00T N1 weights are under NVIDIA's non-commercial research license. Genesis itself is Apache 2.0. Those licenses are independent. See Deploy NVIDIA Isaac GR00T N1 on GPU Cloud for the GR00T fine-tuning setup.
Genesis + OpenVLA
Genesis generates RLDS or LeRobot v2 trajectory data. OpenVLA's action tokenizer consumes that data directly. Genesis's 43M FPS means you can generate 10,000 demonstration trajectories in minutes instead of hours, which is enough to bootstrap a usable OpenVLA adapter before collecting real data.
The export flow:
import genesis as gs
import pickle
trajectories = []
for episode in range(num_episodes):
obs_list, action_list = [], []
scene.reset()
for step in range(episode_length):
obs = scene.get_obs()
action = policy.act(obs)
scene.step()
obs_list.append(obs)
action_list.append(action)
trajectories.append({"obs": obs_list, "actions": action_list})
# Convert to LeRobot v2 parquet format for OpenVLA fine-tuning
# (use Genesis's built-in export utilities or LeRobot's conversion scripts)See Deploy OpenVLA on GPU Cloud for the full OpenVLA fine-tuning pipeline once you have the trajectory dataset.
Genesis + Cosmos
Genesis provides physics-accurate trajectories. Cosmos-Transfer adds photorealistic visual variation. The combined dataset covers both dynamics accuracy (Genesis) and visual distribution (Cosmos), which is the strongest combination for policies that need to generalize across visual domains.
The workflow:
- Run Genesis policy rollouts and save video frames + trajectory data
- Pass video frames through Cosmos-Transfer to add realistic visual variation
- The combined dataset (Genesis physics, Cosmos appearance) trains a VLA that transfers better to real hardware than either source alone
See Deploy NVIDIA Cosmos World Foundation Models on GPU Cloud for the Cosmos pipeline setup.
Cost Comparison: Genesis on Spheron vs Isaac Lab on Managed Robotics Cloud
The table below compares cost for collecting 1M env steps per day across three configurations. "Env steps" means simulation steps per individual environment, not total across all parallel envs. Genesis's throughput figures are illustrative estimates based on published benchmarks; actual numbers depend on robot complexity and scene configuration.
| Setup | GPU | Rate | Steps/env/sec | GPU-hrs/day | Cost/day |
|---|---|---|---|---|---|
| Genesis, H100 SXM5 (spot) | H100 SXM5 | $1.73/hr | ~1,172 | ~0.24 | ~$0.42 |
| Genesis, H200 SXM5 (spot) | H200 SXM5 | $1.40/hr | ~1,400 (est.) | ~0.20 | ~$0.28 |
| Isaac Lab, AWS RoboMaker | p4d.24xlarge | ~$10-14/hr | ~98 | ~2.84 | ~$28-40 |
The Genesis throughput figure (~1,172 steps/env/sec for H100 SXM5) is a conservative per-env throughput estimate for a contact-rich manipulation scene; actual numbers depend on robot complexity. Isaac Lab throughput on AWS RoboMaker uses representative figures for a comparable manipulation scene. Steps per second vary significantly with scene complexity.
Pricing fluctuates based on GPU availability. The prices above are based on 30 May 2026 and may have changed. Check current GPU pricing → for live rates.
The Genesis-on-Spheron advantage compounds at scale. Hyperscaler RoboMaker pricing includes per-simulation-unit charges on top of EC2 instance costs. Spheron charges only for the GPU instance with no platform overhead.
Troubleshooting: CUDA Driver Matrix, OptiX, and Multi-Node Scaling
CUDA Driver Matrix
Genesis requires CUDA 12.x (driver 525+). The Taichi backend additionally requires CUDA 12.3+ for its kernel compilation path.
| Genesis Version | Minimum Driver | CUDA Version |
|---|---|---|
| 0.2.x | 525 | 12.0 |
| 0.3.x+ | 530 | 12.3 |
Check your driver version:
nvidia-smi --query-gpu=driver_version --format=csv,noheaderIf you get No CUDA runtime is found on import:
# Reinstall PyTorch with the correct CUDA index for your driver
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# or cu124 for CUDA 12.4 instancesMost Spheron instances ship with driver 535+ and CUDA 12.4. Confirm with nvcc --version before debugging Python imports.
OptiX Rendering Issues
OptiX requires driver 535+. On headless instances, OptiX also needs an X display session.
# Verify driver version supports OptiX
nvidia-smi --query-gpu=driver_version --format=csv,noheader
# If < 535, fall back to RastRenderer
# If driver >= 535 but OptiX still fails, start Xvfb first
Xvfb :1 -screen 0 1920x1080x24 &
export DISPLAY=:1
python your_genesis_script.pyOptiX is not required for policy training. Use gs.renderers.RastRenderer(offscreen=True) as the default on all cloud instances and switch to OptiX only when visual quality matters (e.g., generating observations for VLA training).
Multi-Node Genesis Scaling
Genesis's multi-GPU support runs on NCCL within a single node. Multi-node Genesis requires a Ray or torchrun wrapper that handles cross-node environment distribution.
Common NCCL init timeout when InfiniBand is unavailable:
# For RoCE or pure Ethernet setups
export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1
# Then launch your distributed Genesis script
torchrun --nproc_per_node=8 your_training_script.pyFor networking context on multi-node setups without InfiniBand, see Multi-Node GPU Training Without InfiniBand.
Out-of-Memory in Large Environment Batches
If you hit OOM during scene.build(n_envs=N):
- Reduce
n_envsby half - For contact-rich tasks, reduce the contact buffer size in
gs.options.SimOptions - Enable expandable memory segments to reduce VRAM fragmentation:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True- If training on multi-GPU, reduce
n_envsper GPU proportionally:
n_gpus = torch.cuda.device_count()
n_envs_per_gpu = 1024 # start here, scale up
total_envs = n_envs_per_gpu * n_gpus
scene.build(n_envs=total_envs)Genesis's simulation throughput turns a multi-day policy search into a few GPU-hours. Spheron's on-demand and spot H100/H200 instances give robotics teams bare-metal compute without managed-platform overhead.
H100 on Spheron → | H200 GPU pricing → | View all GPU pricing →
Quick Setup Guide
Rent an H100 SXM5, H200, or L40S instance at app.spheron.ai. SSH in and verify CUDA 12.x with nvidia-smi. For multi-GPU parallel scene training, use a node with NVLink (SXM5 form factor). L40S single-GPU is sufficient for PPO training on single-robot environments. For the 100M FPS benchmark workloads, use an 8xH100 SXM5 node.
Install from PyPI with: pip install genesis-world. For the full feature set including MPM, fluid, and soft body physics, install from source: git clone https://github.com/Genesis-Embodied-AI/Genesis.git && cd Genesis && pip install -e '[all]'. Genesis requires Python 3.10+, PyTorch 2.1+, and CUDA 12.x.
On a headless GPU instance, start a virtual display: Xvfb :0 -screen 0 1024x768x24 & and export DISPLAY=:0. For EGL-based offscreen rendering (no Xvfb needed): set gs.init(backend=gs.cuda, renderer=gs.renderers.RastRenderer(offscreen=True)). NVIDIA OptiX ray-traced rendering requires driver 535+ and is enabled with gs.renderers.OptixRenderer.
Create a Genesis scene with gs.Scene(), add a robot URDF (gs.morphs.URDF(file='robot.urdf')), set physics properties, and instantiate parallel environments with scene.build(n_envs=1024). Genesis automatically tiles environments across available GPU VRAM and distributes computation across multiple GPUs if present.
Wrap the Genesis environment in a gym-compatible interface using Genesis's built-in GymWrapper. Connect to stable-baselines3 PPO or a custom GRPO trainer. Genesis returns batched observations and rewards across all parallel environments in a single GPU tensor, making vectorized policy gradient updates efficient. Log episode returns and success rates with wandb or tensorboard.
After training, run rollouts in Genesis, save trajectory tensors (observations, actions, rewards), and convert to LeRobot v2 parquet format using LeRobot's conversion scripts. This dataset feeds directly into NVIDIA Isaac GR00T N1 LoRA fine-tuning or OpenVLA's fine-tuning pipeline.
Frequently Asked Questions
Genesis runs on any CUDA-capable GPU with CUDA 12.x and driver 525+. For single-environment PPO training, an L40S 48GB or A100 40GB is sufficient. For multi-GPU parallel scene training (1K+ environments), use an H100 SXM5 or H200. The 100M FPS figure Genesis quotes requires an H100 SXM5 with 8 GPUs or a B200. L40S handles most robotics sim sweeps well at a lower price.
Genesis reports 10-80x faster simulation than Isaac Sim and MuJoCo on equivalent workloads in its benchmark paper. The speedup varies by workload: rigid body scenes show the highest gains (Genesis reports 43M FPS on an RTX 4090 for a single-plane Franka arm manipulation scene), while complex fluid/MPM scenes show more modest speedups. Isaac Sim runs on USD-based scene graphs with full photorealistic rendering, making it slower per sim step but richer in visual fidelity for Cosmos-style synthetic data pipelines.
Yes. Genesis supports headless rendering via Xvfb or EGL/offscreen mode. On a cloud instance with no physical display, set DISPLAY=:0 and start Xvfb before launching Genesis, or use the gs.renderers.RastRenderer with offscreen mode. NVIDIA OptiX ray-traced rendering requires driver 535+ and is optional - basic rasterized rendering works on any CUDA 12.x driver.
Genesis generates robot trajectory data (joint positions, forces, contact states) that can be exported to LeRobot v2 parquet or RLDS format. This data feeds directly into the GR00T N1 LoRA fine-tuning pipeline or OpenVLA's action tokenizer. The workflow is: Genesis sim -> trajectory export -> Cosmos photorealistic augmentation (optional) -> VLA fine-tuning on H100/B200. Genesis complements Isaac Lab (used for GR00T N1) by providing faster parallel simulation for RL policy search before transferring the best policy to Isaac Lab's full pipeline.
Running Genesis on a Spheron H100 at bare-metal on-demand pricing typically costs 40-60% less than the equivalent Isaac Lab setup on AWS RoboMaker or NVIDIA's managed robotics cloud. RoboMaker charges per simulation unit hour on top of EC2 costs; Spheron charges only for the GPU instance with no platform overhead. For 1M sim steps per day, the Genesis-on-Spheron path using spot H100 SXM5 instances typically runs under $1/day versus $28-40/day for managed robotics cloud equivalents.
Genesis supports multi-GPU on a single node through NCCL-backed data parallelism, running independent environment batches on each GPU and gathering gradients for policy updates. True multi-node Genesis requires wrapping the simulator in a Ray or NCCL-based distributed training harness. For multi-node setups, see the Spheron multi-node GPU training guide. Most robotics sim-to-real workflows fit within a single 8xH100 or 8xB200 node.
