NVIDIA Vera Rubin NVL72 GPU Cloud: Availability, Cost Per Token, and Planning Your Rubin Rental in H2 2026

CoreWeave powered on and ran compute jobs on a Vera Rubin NVL72 rack on June 1, 2026. That is the first validated Rubin bring-up in the industry, shifting the architecture from announced to working hardware. For full specs, see the Vera Rubin NVL72 architecture guide. This post covers what comes next: who has access, when, what the cost-per-token economics actually mean, and how to plan capacity now.

What CoreWeave Just Validated

"First industry bring-up" means CoreWeave installed a Vera Rubin NVL72 rack, powered it on, ran compute jobs, and confirmed the system works at production scale. This is not a benchmark preview or a conference demo. It is a real system that ran real jobs.

The date matters. June 1, 2026 confirms that H2 2026 delivery to first-cohort providers is realistic, not aspirational. Hyperscalers and large neo-clouds that have NVL72 contracts with NVIDIA can start scheduling production deployments.

One caveat: one validated system at one provider does not mean broad availability. CoreWeave is the first mover. Most teams will not be able to rent a Rubin instance for months. For many teams, the practical window is 2027.

What the Vera Rubin NVL72 Delivers

The full rack combines 72 R100 GPUs with 36 Vera ARM CPUs, connected by NVLink 6. Key specs at a glance:

Component	Spec
GPUs per rack	72 R100 (also branded H300 by some providers)
CPUs per rack	36 Vera ARM
NVLink generation	NVLink 6 at 260 TB/s all-to-all
Memory per GPU	288 GB HBM4
Memory bandwidth per GPU	22 TB/s
Aggregate rack memory	20.7 TB
Inference throughput per GPU	50 PFLOPS NVFP4

Some cloud providers brand R100 as "H300" for naming continuity. AWS and Google are both doing this. The hardware is identical regardless of catalog branding. For the full architecture breakdown and a direct Blackwell comparison, see the Rubin vs Blackwell vs Hopper generation comparison.

Cost Per Token: What the 10x Efficiency Number Actually Means

NVIDIA claims 10x inference-per-watt improvement for R100 over Blackwell. The math: R100 delivers 50 PFLOPS FP4 per GPU versus approximately 9 PFLOPS on B200, with memory bandwidth jumping from 8 TB/s to 22 TB/s. More compute and more bandwidth per watt consumed.

The translation to token costs is not 1:1. Cloud pricing includes memory, networking, and margins alongside raw compute. A 10x chip-level efficiency advantage compresses to a smaller cost advantage at the invoice level.

The gain also depends heavily on what is bottlenecking your workload:

Model size	Primary bottleneck	Estimated Rubin cost advantage vs B200
7B - 13B	Memory bandwidth	Moderate (~2-3x)
70B	Compute + memory	Significant (~5-7x)
405B+	Compute-bound	Maximum (~8-10x)

For 7B-13B models, the workload is memory-bandwidth-bound. R100's 22 TB/s is 2.75x B200's 8 TB/s, so you get roughly 2-3x the throughput per dollar at equivalent cloud pricing. For 405B+ models running FP4, the workload is compute-bound and R100's 50 PFLOPS drives near-10x throughput gains.

No official Rubin cloud pricing has been published as of June 2026. Based on historical patterns, expect Rubin NVL72 access to carry a 30-50% premium over Blackwell NVL72 at initial availability. That premium typically compresses over 12-18 months as supply scales. For current Blackwell and Hopper pricing, see the section below.

Availability Map: Who Has Rubin and When

Cloud / Provider	Status	Expected Access
CoreWeave	First bring-up confirmed (June 1, 2026)	Enterprise contracts, H2 2026
AWS, Google Cloud, Azure	First cohort	H2 2026, enterprise-first
Lambda, Nebius, Nscale	First cohort	H2 2026
Spheron + other neo-clouds	Pre-order open	2027

The access pattern mirrors what happened with GB200 NVL72: tight supply went to large enterprise contracts first. Hyperscalers announced H2 2026 availability at CES 2026 and GTC 2026, and that timeline is now confirmed realistic given CoreWeave's bring-up.

Smaller GPU cloud providers, including Spheron, are in the 2027 window. The gap between first-cohort access and neo-cloud access has consistently run 6-12 months across recent GPU generations. Supply needs to ramp before it filters further down the chain.

To get first-access notification when Spheron receives Rubin inventory, register on the R100 pre-order page.

Should You Wait for Rubin or Deploy on Blackwell Now?

Deploy on Blackwell or Hopper now if:

Your workload is live or launching before Q3 2027.
Your models are 7B-70B and fit in B200's 192 GB VRAM or B300's 288 GB VRAM.
You need predictable access today without waitlist or allocation risk.
Per-token cost on B200 spot already meets your budget.
You are running batch inference where checkpoint-and-resume makes spot pricing viable.

Wait for Rubin if:

You are planning a greenfield large-scale inference service launching in 2027 or later.
Your workload is 405B+ parameters and compute-bound.
Your token economics require sub-$0.50/million tokens at current prompt lengths.
You have confirmed first-cohort access through a hyperscaler or CoreWeave agreement.

For most teams, the answer is deploy now. B200 on Spheron delivers roughly 4x H100 throughput on 70B models with native FP4 support. Waiting idle for 12+ months to access Rubin is rarely cost-effective unless you have a specific technical requirement that only R100 satisfies.

Need more than 192 GB VRAM? You can rent B300 for 288 GB HBM3e per GPU. For smaller workloads, H200 GPU pricing covers 141 GB HBM3e at lower hourly cost and beats H100 on VRAM-constrained 70B inference.

How to Plan Your Migration Path on Spheron

Here is what to do now to stay cost-efficient while Rubin availability ramps.

1. Run on Available Blackwell and Hopper Inventory

Current GPU pricing on Spheron (as of June 12, 2026):

GPU	On-Demand	Spot	VRAM
H100 SXM5	$3.92/hr	$1.43/hr	80 GB
H100 PCIe	$2.01/hr	N/A	80 GB
H200 SXM5	$4.84/hr	$1.82/hr	141 GB
B200 SXM6	$7.41/hr	$5.34/hr	192 GB
B300 SXM6	$9.02/hr	$3.29/hr	288 GB

Pricing fluctuates based on GPU availability. The prices above are based on 12 Jun 2026 and may have changed. Check current GPU pricing → for live rates.

2. Use Per-Minute Billing to Avoid Lock-In

Spheron bills per minute, no long-term contracts required. This means you can move from H100 to B200 to R100 as each generation becomes accessible without eating a 12-month reservation penalty. A team deploying on B200 today can switch to R100 the day it appears on Spheron.

3. Prepare Your Workload for Rubin

A few things to check before R100 inventory arrives:

FP4 quantization support: R100's biggest efficiency gains require FP4. Confirm your inference framework (TensorRT-LLM 0.17+, vLLM with FP4 enabled) supports it. FP8 works as a fallback.
NVLink topology requirements: If your multi-node jobs are tuned for NVLink 5 at 1.8 TB/s, verify they can benefit from NVLink 6 at 3.6 TB/s without needing mesh topology changes.
Register for first-access: Use the R100 pre-order page to get notified when Spheron has confirmed Rubin capacity.

GPU Price Compression Into 2027

Each GPU generation launch has compressed the previous generation's on-demand price by 30-50% within 12 months. H100 launched at $8-10/hr on-demand in 2023. It is now available from $2.01/hr on Spheron. B200 on-demand should follow the same curve as Rubin ramps and supply normalizes through 2027.

The practical implication: a team deploying on B200 now at $7.41/hr on-demand will likely see those rates drop as Rubin scales. That is a better outcome than sitting idle while waiting for Rubin access. You build and ship on B200, then migrate when R100 becomes available.

For the broader pricing context across GPU cloud providers, the GPU cloud pricing comparison for 2026 covers rates at Spheron, Runpod, Lambda, Nebius, and AWS side by side.

Most teams cannot get Rubin access in H2 2026. The right move now is running on available Blackwell and Hopper inventory at spot pricing, then migrating when R100 capacity opens up. Spheron offers per-minute billing with no commitment. Check Blackwell GPU pricing on Spheron → or register for R100 Rubin first-access →.

FAQ / 05

Frequently Asked Questions

CoreWeave completed the first Vera Rubin NVL72 system bring-up on June 1, 2026. NVIDIA has announced H2 2026 broad cloud availability targeting AWS, Google Cloud, Azure, CoreWeave, Lambda, Nebius, and Nscale. Smaller neo-cloud providers and GPU marketplaces like Spheron are expected to receive Rubin capacity in 2027. You can register for first-access via the Spheron R100 pre-order page.

The R100's 10x inference-per-watt vs Blackwell translates to roughly one-tenth the compute cost per million tokens when running the same model at the same scale. For models under 70B parameters on existing Blackwell or Hopper hardware, the practical gain is smaller because the workload is not compute-bound. For 70B-plus inference at high concurrency, Rubin should cut token costs by 70-90% compared to B200, though official cloud pricing has not been published.

For most teams, deploying on B200 or B300 now makes more sense than waiting. Rubin is not broadly accessible until 2027, and B200 spot pricing already makes inference cost-competitive. The exception is if you are planning a new large-scale inference service launching in late 2027 where Rubin economics apply from day one. For any workload live today, run it on available Blackwell or Hopper capacity and migrate when Rubin is accessible.

No official cloud pricing has been published. Based on historical launch premiums, Rubin NVL72 rack access is expected to carry a 30-50% premium over Blackwell NVL72 at initial availability in H2 2026. As supply increases through 2027, that premium should compress. Full rack pricing for GB200 NVL72 is a useful baseline for estimating Rubin NVL72 costs.

B200 and B300 are the best choices for production inference in 2026. B200 delivers roughly 4x the throughput of H100 on 70B models and offers native FP4 support. H100 and H200 remain cost-effective for smaller models (7B-13B) at lower concurrency. B300 is the top option for models requiring over 192GB VRAM. All three are available on Spheron with per-minute billing and no long-term commitment.

What CoreWeave Just Validated

What the Vera Rubin NVL72 Delivers

Cost Per Token: What the 10x Efficiency Number Actually Means

Availability Map: Who Has Rubin and When

Should You Wait for Rubin or Deploy on Blackwell Now?

How to Plan Your Migration Path on Spheron

1. Run on Available Blackwell and Hopper Inventory

2. Use Per-Minute Billing to Avoid Lock-In

3. Prepare Your Workload for Rubin

GPU Price Compression Into 2027

Frequently Asked Questions

01When will Vera Rubin NVL72 be available to rent on cloud platforms?

02What does 10x inference-per-watt mean for my cost per million tokens?

03Should I wait for Vera Rubin or deploy on Blackwell now?

04How much will Vera Rubin NVL72 cloud access cost?

05Which GPU should I use for LLM inference while waiting for Rubin?

Try It on Real GPUs