Spheron GPU Catalog

Rent NVIDIA B300 GPUs on Demand from $3.50/hr

288GB HBM3e Blackwell Ultra with 15 PFLOPS dense FP4, built for trillion-parameter training.

At a glance

You can rent an NVIDIA B300 Blackwell Ultra GPU on Spheron starting at $3.50/hr per GPU per hour on dedicated (99.99% SLA, non-interruptible), with spot pricing cheaper still. Per-minute billing, no long-term contracts, and B300 instances deploy as part of GB300 NVL72 rack systems or HGX B300 8-way nodes. Each GPU ships with 288GB HBM3e (50% more than B200), NVLink 5 @ 1.8 TB/s, 5th gen Tensor Cores with an enhanced FP4 Transformer Engine, and dramatically higher throughput than B200 across every precision format. Built for 200B+ parameter training, ultra-long-context inference (1M+ tokens), MoE models at trillion-parameter scale, and multi-modal foundation models. B300 is the pick when B200's 192GB isn't enough.

GPU ArchitectureNVIDIA Blackwell Ultra
VRAM288 GB HBM3e
Memory Bandwidth8.0 TB/s

Technical specifications

GPU Architecture
NVIDIA Blackwell Ultra
VRAM
288 GB HBM3e
Memory Bandwidth
8.0 TB/s
Tensor Cores
5th Generation (Ultra)
CUDA Cores
20,480+
FP64 Performance
60 TFLOPS
FP32 Performance
120 TFLOPS
TF32 Performance
3,000 TFLOPS
FP8 Tensor (dense)
7,500 TFLOPS
FP4 Tensor (dense)
15,000 TFLOPS
System RAM
184 GB DDR5
vCPUs
32 vCPUs
Storage
250 GB NVMe Gen5
Network
NVLink 1.8 TB/s
TDP
1200W

Pricing comparison

ProviderPrice/hrSavings
SpheronYour price
$3.50/hr-
Nebius
$6.10/hr1.7x more expensive
CoreWeave
Contact sales-
AWS (p6-b300)
$17.80/hr5.1x more expensive
Custom & Reserved

Need More B300 Than What's Listed?

Reserved Capacity

Commit to a duration, lock in availability and better rates

Custom Clusters

8 to 512+ GPUs, specific hardware, InfiniBand configs on request

Supplier Matchmaking

Spheron sources from its certified data center network, negotiates pricing, handles setup

Need more B300 capacity? Tell us your requirements and we'll source it from our certified data center network.

Typical turnaround: 24–48 hours

When to pick the B300

Scenario 01

Pick B300 if

You're training or serving 200B+ parameter models and B200's 192GB HBM3e isn't enough. 288GB lets you fit larger dense models on a single GPU, keep longer context windows (1M+ tokens), or reduce tensor-parallel splits on fixed model sizes. Also the pick for GB300 NVL72 rack-scale deployments where all 72 GPUs address unified memory.

Recommended fit
Scenario 02

Pick B200 instead if

Your model fits comfortably in 192GB and you want the cheapest Blackwell rate. B200 is widely available, cheaper per hour, and matches B300 on FP4 Transformer Engine capability. Best for most 70B-200B workloads.

Recommended fit
Scenario 03

Pick H200 instead if

You don't need Blackwell FP4 and want proven Hopper with 141GB HBM3e. H200 is significantly cheaper per hour and has been production-hardened for over a year, a safer pick when Blackwell software tuning isn't worth the premium.

Recommended fit
Scenario 04

Pick GB300 NVL72 instead if

You need rack-scale training for trillion-parameter frontier models. GB300 NVL72 connects 72 B300 GPUs over NVLink into a unified 20+ TB memory domain — the only architecture that handles models too large for any single 8-way node.

Recommended fit

Ideal use cases

Use case / 01
🌐

Frontier Model Training

Train the most advanced frontier AI models at scale with 288GB memory per GPU and class-leading memory bandwidth. Handle the largest MoE and dense transformer architectures without memory constraints.

Frontier-scale MoE models with 10T+ parametersMulti-modal foundation models (text, image, video, audio, 3D)Scientific AI for drug discovery and protein foldingSparse-attention and long-context transformers (1M+ tokens)
Use case / 02
💬

Ultra-High-Throughput LLM

Serve the world's largest language models at production scale with massive memory capacity and superior compute density, minimizing cost per token across all precision formats.

Real-time inference for 200B+ parameter LLMsUltra-long context RAG pipelines (1M+ token windows)Multi-turn agentic AI with reasoning and tool useSpeculative decoding pipelines at scale
Use case / 03

Generative AI & Creative Workloads

Power next-generation generative AI with massive VRAM headroom for high-resolution video, 3D, and complex multi-modal generation pipelines all within a single GPU.

Cinematic 4K/8K video generation at real-time speedsHigh-fidelity 3D world and asset generationFull-context multi-modal document understandingEnterprise-grade code generation and agentic programming
Use case / 04
🔬

AI Research & Architecture Exploration

Give researchers the memory and compute needed to explore novel architectures, scaling laws, and experimental approaches without hardware bottlenecks.

Novel neural architecture search at scaleMulti-agent and emergent-behavior RL researchIn-context learning and ICL at 1M+ token lengthsBrain-scale and physics simulation workloads

Performance benchmarks

LLM Pre-training (100B)
3.3x faster
vs H100 SXM5
LLM Inference Throughput
24,000 tokens/s
Llama-3 70B FP8
MoE Training Efficiency
4.1x faster
vs H100 SXM5
Multi-Modal Training
3.5x faster
vs H100 SXM5
Stable Diffusion XL
5.2x faster
1024×1024 generation
Memory Capacity
3.6x larger
vs H100 80GB

Train a 400B+ MoE model on 8x B300 HGX

288GB per GPU on an 8-way HGX B300 node gives you 2.3TB of HBM3e across NVLink, enough to train a 400B+ MoE or pre-train a large dense model with aggressive batch sizes.

bash
Spheron
# SSH into your HGX B300 nodessh ubuntu@<instance-ip> # NVIDIA NeMo Framework ships Blackwell-optimized containersdocker run --gpus all --rm -it \  nvcr.io/nvidia/nemo:25.04 bash # Inside container, launch FP8 pre-training with FSDPtorchrun --nproc_per_node=8 \  examples/nlp/language_modeling/megatron_gpt_pretraining.py \  model.mcore_gpt=True \  model.transformer_engine=True \  model.fp8=hybrid \  model.tensor_model_parallel_size=4 \  model.pipeline_model_parallel_size=2 \  trainer.devices=8

For FP4 pre-training, pass model.fp4=True (requires Transformer Engine 2.0+ and Blackwell kernels). FP4 roughly doubles effective throughput vs FP8 on compatible layers.

Interconnect fabric

NVLink Ultra Configuration

B300 GPUs are built on NVLink Ultra technology, delivering 1.8 TB/s bidirectional bandwidth per GPU. Combined with 288GB of HBM3e memory per card, B300 clusters enable near-linear scaling for the most data-intensive distributed training workloads, including trillion-parameter models with long-context requirements.

01NVLink 5.0 Ultra with 1.8 TB/s per GPU bandwidth
0218x bandwidth improvement over PCIe Gen5
03Full NVSwitch connectivity across 8-GPU systems
04Unified memory addressing across all GPUs in node
05Direct GPU-to-GPU communication bypassing CPU
06NVIDIA SHARP support for in-network computing
07Optimized for DeepSpeed ZeRO-3, FSDP, and Megatron
08Sub-100ns GPU-to-GPU latency within node
Scale

Need a custom multi-node cluster or reserved capacity?

B300 vs alternatives

Related resources

Frequently asked questions

What is the NVIDIA B300 and how does it differ from the B200?

The B300 is NVIDIA's Blackwell Ultra generation GPU, the successor to the B200. Key improvements include: 288GB HBM3e memory (50% more than B200's 192GB), 8 TB/s memory bandwidth (25% faster), enhanced Tensor Core throughput (~33% uplift across precision formats), and higher TDP for sustained peak performance. It is purpose-built for frontier-scale AI training and ultra-large-scale inference.

Is the B300 available now on Spheron?

B300 is in early rollout across the industry. CoreWeave was first to GA on GB300 NVL72 in August 2025, with Nebius, AWS (p6-b300), Azure (ND GB300 v6), and Google Cloud (A4X Max) following. Spheron is onboarding B300 capacity with data center partners, priority is given to sustained training commitments. Contact our team to reserve capacity.

Book a call with our team

When does 288GB of VRAM matter vs a B200?

288GB per GPU matters when fitting the full model or optimizer state in GPU memory is a constraint at B200's 192GB. Prime examples: trillion-parameter dense transformer training without model parallelism, inference serving of 200B+ parameter models on a single GPU, very long context windows (500K–1M tokens), and large-scale reinforcement learning with huge replay buffers.

Can I use B300 for inference-only workloads?

Yes. For inference, B300 excels at models that don't fit on B200 (200B+ parameters) and high-throughput serving where memory bandwidth is the bottleneck. For models under 100B parameters, B200 or H100 may offer better cost efficiency. The B300's FP4 support (12,000 TFLOPS) is exceptional for quantized inference of very large models.

What frameworks are supported on B300?

All major frameworks are supported: PyTorch 2.3+, TensorFlow 2.16+, JAX 0.4.25+. NVIDIA provides Blackwell Ultra-optimized containers with CUDA 12.5+, cuDNN 9.1+, and TensorRT 10.1+. Framework-level support for FP4 precision, enhanced Transformer Engine, and improved NCCL collective operations is available out-of-the-box.

How does B300 compare to renting multiple H100s?

A single B300 delivers approximately 3.3x H100 training throughput and 3.6x the memory. For workloads that fit on B200/H100, multiple H100s may be more cost-effective. But for workloads requiring >192GB VRAM or extreme bandwidth (8 TB/s), B300 eliminates inter-node communication overhead and simplifies deployment significantly.

What is the cost to buy a B300 vs renting on Spheron?

B300 GPUs list in the $40,000-$50,000 range per card, and an 8-way HGX B300 node with networking, cooling, and chassis runs $400K-$600K fully provisioned. At Spheron's on-demand rate, you'd need well over a year of 24/7 utilization to break even on hardware acquisition alone, before counting power, rack space, or depreciation. For all but the largest continuous training commitments, on-demand rental wins on total cost of ownership.

Do you offer reserved or dedicated B300 capacity?

Yes. For enterprise customers and research labs requiring sustained access, we offer reserved B300 capacity and dedicated clusters (8–256 GPUs) with custom networking and volume pricing. Contact our enterprise team for more details.

Book a call with our team

What makes Spheron's B300 offering different from public clouds?

Spheron provides bare-metal B300 access from Tier 3/4 data centers, meaning no hypervisor overhead, direct NVLink configuration, and significantly lower pricing (often 2–6x cheaper than AWS/Azure/GCP). Deployment is faster, billing is per-minute, and there are no long-term contracts. You get the full GPU, not a virtualized slice.

What's the difference between dedicated and spot B300 instances?

Dedicated B300 instances are non-interruptible, run on a 99.99% SLA, and bill per-minute at the on-demand rate. Spot instances run on spare capacity at meaningfully lower rates but can be preempted when dedicated demand rises. Given B300's role in critical frontier training runs, dedicated is the default pick. Spot makes sense for fault-tolerant workloads: batch inference, hyperparameter sweeps, or ablation studies with frequent checkpointing (every 15-30 minutes). For a 70B+ pre-training run where a preemption would cost days of wall time, dedicated is almost always worth the premium.

Also consider