Rent NVIDIA B300 GPUs on Demand from $3.50/hr
288GB HBM3e Blackwell Ultra with 15 PFLOPS dense FP4, built for trillion-parameter training.
You can rent an NVIDIA B300 Blackwell Ultra GPU on Spheron starting at $3.50/hr per GPU per hour on dedicated (99.99% SLA, non-interruptible), with spot pricing cheaper still. Per-minute billing, no long-term contracts, and B300 instances deploy as part of GB300 NVL72 rack systems or HGX B300 8-way nodes. Each GPU ships with 288GB HBM3e (50% more than B200), NVLink 5 @ 1.8 TB/s, 5th gen Tensor Cores with an enhanced FP4 Transformer Engine, and dramatically higher throughput than B200 across every precision format. Built for 200B+ parameter training, ultra-long-context inference (1M+ tokens), MoE models at trillion-parameter scale, and multi-modal foundation models. B300 is the pick when B200's 192GB isn't enough.
Technical specifications
Pricing comparison
| Provider | Price/hr | Savings |
|---|---|---|
SpheronYour price | $3.50/hr | - |
Nebius | $6.10/hr | 1.7x more expensive |
CoreWeave | Contact sales | - |
AWS (p6-b300) | $17.80/hr | 5.1x more expensive |
Need More B300 Than What's Listed?
Reserved Capacity
Commit to a duration, lock in availability and better rates
Custom Clusters
8 to 512+ GPUs, specific hardware, InfiniBand configs on request
Supplier Matchmaking
Spheron sources from its certified data center network, negotiates pricing, handles setup
Need more B300 capacity? Tell us your requirements and we'll source it from our certified data center network.
Typical turnaround: 24–48 hours
When to pick the B300
Pick B300 if
You're training or serving 200B+ parameter models and B200's 192GB HBM3e isn't enough. 288GB lets you fit larger dense models on a single GPU, keep longer context windows (1M+ tokens), or reduce tensor-parallel splits on fixed model sizes. Also the pick for GB300 NVL72 rack-scale deployments where all 72 GPUs address unified memory.
Pick B200 instead if
Your model fits comfortably in 192GB and you want the cheapest Blackwell rate. B200 is widely available, cheaper per hour, and matches B300 on FP4 Transformer Engine capability. Best for most 70B-200B workloads.
Pick H200 instead if
You don't need Blackwell FP4 and want proven Hopper with 141GB HBM3e. H200 is significantly cheaper per hour and has been production-hardened for over a year, a safer pick when Blackwell software tuning isn't worth the premium.
Pick GB300 NVL72 instead if
You need rack-scale training for trillion-parameter frontier models. GB300 NVL72 connects 72 B300 GPUs over NVLink into a unified 20+ TB memory domain — the only architecture that handles models too large for any single 8-way node.
Ideal use cases
Frontier Model Training
Train the most advanced frontier AI models at scale with 288GB memory per GPU and class-leading memory bandwidth. Handle the largest MoE and dense transformer architectures without memory constraints.
Ultra-High-Throughput LLM
Serve the world's largest language models at production scale with massive memory capacity and superior compute density, minimizing cost per token across all precision formats.
Generative AI & Creative Workloads
Power next-generation generative AI with massive VRAM headroom for high-resolution video, 3D, and complex multi-modal generation pipelines all within a single GPU.
AI Research & Architecture Exploration
Give researchers the memory and compute needed to explore novel architectures, scaling laws, and experimental approaches without hardware bottlenecks.
Performance benchmarks
Train a 400B+ MoE model on 8x B300 HGX
288GB per GPU on an 8-way HGX B300 node gives you 2.3TB of HBM3e across NVLink, enough to train a 400B+ MoE or pre-train a large dense model with aggressive batch sizes.
# SSH into your HGX B300 nodessh ubuntu@<instance-ip> # NVIDIA NeMo Framework ships Blackwell-optimized containersdocker run --gpus all --rm -it \ nvcr.io/nvidia/nemo:25.04 bash # Inside container, launch FP8 pre-training with FSDPtorchrun --nproc_per_node=8 \ examples/nlp/language_modeling/megatron_gpt_pretraining.py \ model.mcore_gpt=True \ model.transformer_engine=True \ model.fp8=hybrid \ model.tensor_model_parallel_size=4 \ model.pipeline_model_parallel_size=2 \ trainer.devices=8For FP4 pre-training, pass model.fp4=True (requires Transformer Engine 2.0+ and Blackwell kernels). FP4 roughly doubles effective throughput vs FP8 on compatible layers.
NVLink Ultra Configuration
B300 GPUs are built on NVLink Ultra technology, delivering 1.8 TB/s bidirectional bandwidth per GPU. Combined with 288GB of HBM3e memory per card, B300 clusters enable near-linear scaling for the most data-intensive distributed training workloads, including trillion-parameter models with long-context requirements.
Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.
B300 vs alternatives
CDNA 5 vs Blackwell Ultra architecture, LLM inference projections, ROCm vs CUDA maturity, and GPU cloud pricing for teams weighing AMD's MI400 series as an alternative to B300.
Where B300 fits in NVIDIA's generational stack, how Blackwell Ultra compares to Hopper, and what changes with Rubin on the horizon. Useful context before committing to multi-year infrastructure.
Related resources
NVIDIA B300 (Blackwell Ultra): Complete Guide to Specs and Pricing
Everything you need to know about B300 specs, pricing, architecture, and when the upgrade from B200 is worth it.
GPU Requirements Cheat Sheet 2026
Find the right GPU for every major open-source AI model, includes B300-class workload recommendations.
GPU Cloud Benchmarks 2026
Real performance and pricing data across every major GPU cloud provider, including next-gen Blackwell GPUs.
Frequently asked questions
What is the NVIDIA B300 and how does it differ from the B200?
The B300 is NVIDIA's Blackwell Ultra generation GPU, the successor to the B200. Key improvements include: 288GB HBM3e memory (50% more than B200's 192GB), 8 TB/s memory bandwidth (25% faster), enhanced Tensor Core throughput (~33% uplift across precision formats), and higher TDP for sustained peak performance. It is purpose-built for frontier-scale AI training and ultra-large-scale inference.
Is the B300 available now on Spheron?
B300 is in early rollout across the industry. CoreWeave was first to GA on GB300 NVL72 in August 2025, with Nebius, AWS (p6-b300), Azure (ND GB300 v6), and Google Cloud (A4X Max) following. Spheron is onboarding B300 capacity with data center partners, priority is given to sustained training commitments. Contact our team to reserve capacity.
Book a call with our team →When does 288GB of VRAM matter vs a B200?
288GB per GPU matters when fitting the full model or optimizer state in GPU memory is a constraint at B200's 192GB. Prime examples: trillion-parameter dense transformer training without model parallelism, inference serving of 200B+ parameter models on a single GPU, very long context windows (500K–1M tokens), and large-scale reinforcement learning with huge replay buffers.
Can I use B300 for inference-only workloads?
Yes. For inference, B300 excels at models that don't fit on B200 (200B+ parameters) and high-throughput serving where memory bandwidth is the bottleneck. For models under 100B parameters, B200 or H100 may offer better cost efficiency. The B300's FP4 support (12,000 TFLOPS) is exceptional for quantized inference of very large models.
What frameworks are supported on B300?
All major frameworks are supported: PyTorch 2.3+, TensorFlow 2.16+, JAX 0.4.25+. NVIDIA provides Blackwell Ultra-optimized containers with CUDA 12.5+, cuDNN 9.1+, and TensorRT 10.1+. Framework-level support for FP4 precision, enhanced Transformer Engine, and improved NCCL collective operations is available out-of-the-box.
How does B300 compare to renting multiple H100s?
A single B300 delivers approximately 3.3x H100 training throughput and 3.6x the memory. For workloads that fit on B200/H100, multiple H100s may be more cost-effective. But for workloads requiring >192GB VRAM or extreme bandwidth (8 TB/s), B300 eliminates inter-node communication overhead and simplifies deployment significantly.
What is the cost to buy a B300 vs renting on Spheron?
B300 GPUs list in the $40,000-$50,000 range per card, and an 8-way HGX B300 node with networking, cooling, and chassis runs $400K-$600K fully provisioned. At Spheron's on-demand rate, you'd need well over a year of 24/7 utilization to break even on hardware acquisition alone, before counting power, rack space, or depreciation. For all but the largest continuous training commitments, on-demand rental wins on total cost of ownership.
Do you offer reserved or dedicated B300 capacity?
Yes. For enterprise customers and research labs requiring sustained access, we offer reserved B300 capacity and dedicated clusters (8–256 GPUs) with custom networking and volume pricing. Contact our enterprise team for more details.
Book a call with our team →What makes Spheron's B300 offering different from public clouds?
Spheron provides bare-metal B300 access from Tier 3/4 data centers, meaning no hypervisor overhead, direct NVLink configuration, and significantly lower pricing (often 2–6x cheaper than AWS/Azure/GCP). Deployment is faster, billing is per-minute, and there are no long-term contracts. You get the full GPU, not a virtualized slice.
What's the difference between dedicated and spot B300 instances?
Dedicated B300 instances are non-interruptible, run on a 99.99% SLA, and bill per-minute at the on-demand rate. Spot instances run on spare capacity at meaningfully lower rates but can be preempted when dedicated demand rises. Given B300's role in critical frontier training runs, dedicated is the default pick. Spot makes sense for fault-tolerant workloads: batch inference, hyperparameter sweeps, or ablation studies with frequent checkpointing (every 15-30 minutes). For a 70B+ pre-training run where a preemption would cost days of wall time, dedicated is almost always worth the premium.