Spheron GPU Catalog
Coming H2 2026
PipelinePre-Order Open

NVIDIA Rubin R100
(H300) GPU

288 GB HBM4 · 22 TB/s · 50 PFLOPS FP4

The Rubin R100 is the generational successor to B300, built for trillion-parameter inference at FP4 precision and multi-node training runs where Blackwell memory bandwidth becomes the bottleneck.

No pricing published yet. Cloud providers start shipping R100 (also called H300) in H2 2026. Register your interest to be notified first when Spheron capacity opens.

VRAM288 GB HBM4
Bandwidth22 TB/s
FP4 Compute50 PFLOPS

Register Interest

Join the pipeline for R100 access. We'll reach out with pricing and availability as soon as capacity goes live.

At a glance

The NVIDIA Rubin R100 (also branded H300 by cloud providers) is the generational successor to B300 Blackwell Ultra. It ships with 288GB HBM4 at up to 22 TB/s bandwidth (2.75x faster than B300), 50 PFLOPS FP4 compute (3.33x B300), NVLink 6 at 3.6 TB/s per GPU, and ConnectX-9 networking. First cloud availability is H2 2026 for AWS, Google Cloud, Azure, and specialist providers. Spheron is onboarding R100 capacity and will contact registered teams with pricing as soon as it is confirmed. For workloads running today, B300 and B200 are available now.

NVIDIA GPU generation roadmap

Where R100 sits in the NVIDIA GPU generation stack. All generations prior to Rubin are available on Spheron today.

R100 GPU specifications

Architecture
NVIDIA Rubin
VRAM
288 GB HBM4
Memory Bandwidth
Up to 22 TB/s
FP4 Throughput
50 PFLOPS
FP8 Throughput
~16,000 TFLOPS (est.)
Transistors
336 billion
Interconnect
NVLink 6 @ 3.6 TB/s
Networking
ConnectX-9 (1.6T)
TDP
~2,300 W
Memory Type
HBM4

R100 specs sourced from NVIDIA GTC 2025 roadmap and confirmed at CES 2026 and GTC 2026. Memory bandwidth (up to 22 TB/s), NVLink 6, and ConnectX-9 are officially confirmed. FP8 throughput (~16,000 TFLOPS) is derived from NVIDIA's NVL72 system-level spec. Final per-GPU specs will be available when cloud providers ship production systems in H2 2026.

R100 vs B300 vs B200 vs H100

SpecR100B300B200H100
ArchitectureRubinBlackwell UltraBlackwellHopper
VRAM288 GB HBM4288 GB HBM3e192 GB HBM3e80 GB HBM3
Memory BandwidthUp to 22 TB/s8 TB/s8 TB/s3.35 TB/s
FP4 Compute50 PFLOPS15 PFLOPS9 PFLOPSN/A
FP8 Throughput~16,000 TFLOPS7,000 TFLOPS4,500 TFLOPS~2,000 TFLOPS
InterconnectNVLink 6 (3.6 TB/s)NVLink 5 (1.8 TB/s)NVLink 5 (1.8 TB/s)NVLink 4 (900 GB/s)
Transistors336 billion208 billion208 billion80 billion
Cloud AvailabilityH2 2026 (first cohort)Available nowAvailable nowAvailable now

R100 FP8 throughput (~16,000 TFLOPS) derived from NVIDIA's NVL72 system-level spec. All other R100 specs confirmed at CES 2026 and GTC 2026. Availability reflects first cohort; broader availability from additional providers follows.

Workloads built for R100

Use case / 01

Trillion-Parameter FP4 Inference

At 50 PFLOPS FP4 and 288GB HBM4, a single R100 can hold and serve a 200B parameter model in FP4 with 88GB of headroom for KV cache. Multi-GPU setups handle 400B+ models that currently require 4x B300 at FP8.

Frontier MoE models (400B+ at FP4) on fewer GPUsLong-context RAG at 1M+ tokens with HBM4 bandwidthDisaggregated prefill/decode with NVLink 6Real-time agentic inference for 70B–400B models
Use case / 02
🔬

Frontier Model Pre-Training

3.33x the FP4 compute of B300 and 2.75x the memory bandwidth. Training runs that require 8x B300 nodes may fit on fewer R100 nodes, reducing inter-node communication overhead and wall time.

Trillion-parameter dense and MoE pre-trainingMulti-modal foundation models (text, image, video, audio)Reinforcement learning from human feedback at scaleLong-sequence transformer training (1M+ token context)
Use case / 03
🖥️

Rack-Scale NVL72 Workloads

In the Vera Rubin NVL72 configuration, 72 R100 GPUs share 260 TB/s NVLink fabric and 20.7TB of aggregate HBM4. Models that require multi-node sharding on Blackwell may fit in a single NVL72 rack.

10T+ parameter models in a single NVLink domainMixture-of-Experts with billions of total parametersScientific simulation: molecular dynamics, climate, physicsAutonomous AI agent orchestration at scale
Use case / 04
🚀

High-Bandwidth Inference Serving

22 TB/s memory bandwidth is 2.75x B300 on a single GPU. Decode-phase throughput scales nearly linearly with bandwidth for memory-bound LLM serving. At equivalent batch sizes, R100 serves 2.5–3x more tokens per second than B300.

High-concurrency API serving for frontier-scale LLMsReal-time video and image generation at 4K/8KCode generation agents with 200K+ context windowsBatch inference pipelines for large-scale data processing

When to pick the R100

Scenario 01

Pick R100 if

You're training or serving 400B+ parameter models and B300's 8 TB/s memory bandwidth is the bottleneck. R100's 22 TB/s HBM4 is 2.75x faster, and 50 PFLOPS FP4 compute is 3.33x B300. If your workload is memory-bandwidth-bound, R100 is the first GPU where bandwidth stops being the ceiling.

Recommended fit
Scenario 02

Pick B300 instead if

Your timeline is 2025 or early 2026. B300 ships now with 288GB HBM3e and 15 PFLOPS FP4. For most 200B–400B workloads, B300 is the practical choice today. R100 is worth waiting for if you have flexible timelines and need the bandwidth or compute ceiling.

Recommended fit
Scenario 03

Pick B200 instead if

Your model fits in 192GB and you want the most widely available Blackwell option at lower cost. B200 handles most 70B–200B workloads, has better spot pricing, and is available on Spheron today.

Recommended fit
Scenario 04

Pick R100 for NVL72 if

You're running trillion-parameter workloads that need rack-scale memory. 72 R100 GPUs in NVL72 configuration share 20.7TB of unified HBM4. No other architecture today keeps a 10T+ parameter model inside a single NVLink domain without multi-node sharding.

Recommended fit

Available now on Spheron

R100 ships H2 2026. For workloads that need to run now, Blackwell and Hopper GPUs are available on Spheron with per-minute billing and no commitments.

Related resources

FAQ / 05

R100 Pre-Order FAQ

Also consider