NVIDIA B300 GPU: 288GB Blackwell Ultra Specs, Pricing & Rental. Rent B300 GPU from $3.35/hr
288GB HBM3e Blackwell Ultra with 15 PFLOPS dense FP4. B300 GPU rentals built for trillion-parameter training.
You can rent an NVIDIA B300 Blackwell Ultra GPU on Spheron starting at $3.35/hr per GPU per hour, the lowest live marketplace rate. Per-minute billing, no long-term contracts, and B300 instances deploy as part of GB300 NVL72 rack systems or HGX B300 8-way nodes. Each GPU ships with 288GB HBM3e (50% more than B200), NVLink 5 @ 1.8 TB/s, 5th gen Tensor Cores with an enhanced FP4 Transformer Engine, and dramatically higher throughput than B200 across every precision format. Built for 200B+ parameter training, ultra-long-context inference (1M+ tokens), MoE models at trillion-parameter scale, and multi-modal foundation models. B300 is the pick when B200's 192GB isn't enough.
NVIDIA B300 specifications
NVIDIA B300 pricing
| Provider | Price/hr | Savings |
|---|---|---|
SpheronYour price | $3.35/hr | - |
Nebius | $6.10/hr | 1.8x more expensive |
CoreWeave | Contact sales | - |
AWS (p6-b300) | $17.80/hr | 5.3x more expensive |
Need More B300 Than What's Listed?
Reserved Capacity
Commit to a duration, lock in availability and better rates
Custom Clusters
8 to 512+ GPUs, specific hardware, InfiniBand configs on request
Supplier Matchmaking
Spheron sources from its certified data center network, negotiates pricing, handles setup
Need more B300 capacity? Tell us your requirements and we'll source it from our certified data center network.
Typical turnaround: 24–48 hours
When to pick the B300
Pick B300 if
You're training or serving 200B+ parameter models and B200's 192GB HBM3e isn't enough. 288GB lets you fit larger dense models on a single GPU, keep longer context windows (1M+ tokens), or reduce tensor-parallel splits on fixed model sizes. Also the pick for GB300 NVL72 rack-scale deployments where all 72 GPUs address unified memory.
Pick B200 instead if
Your model fits comfortably in 192GB and you want the cheapest Blackwell rate. B200 is widely available, cheaper per hour, and matches B300 on FP4 Transformer Engine capability. Best for most 70B-200B workloads.
Pick H200 instead if
You don't need Blackwell FP4 and want proven Hopper with 141GB HBM3e. H200 is significantly cheaper per hour and has been production-hardened for over a year, a safer pick when Blackwell software tuning isn't worth the premium.
Pick GB300 NVL72 instead if
You need rack-scale training for trillion-parameter frontier models. GB300 NVL72 connects 72 B300 GPUs over NVLink into a unified 20+ TB memory domain, the only architecture that handles models too large for any single 8-way node.
NVIDIA B300 use cases
Frontier Model Training
Train the most advanced frontier AI models at scale with 288GB memory per GPU and class-leading memory bandwidth. Handle the largest MoE and dense transformer architectures without memory constraints.
Ultra-High-Throughput LLM
Serve the world's largest language models at production scale with massive memory capacity and superior compute density, minimizing cost per token across all precision formats.
Generative AI & Creative Workloads
Power next-generation generative AI with massive VRAM headroom for high-resolution video, 3D, and complex multi-modal generation pipelines all within a single GPU.
AI Research & Architecture Exploration
Give researchers the memory and compute needed to explore novel architectures, scaling laws, and experimental approaches without hardware bottlenecks.
NVIDIA B300 benchmarks
Train a 400B+ MoE model on 8x B300 HGX
288GB per GPU on an 8-way HGX B300 node gives you 2.3TB of HBM3e across NVLink, enough to train a 400B+ MoE or pre-train a large dense model with aggressive batch sizes.
# SSH into your HGX B300 nodessh ubuntu@<instance-ip> # NVIDIA NeMo Framework ships Blackwell-optimized containersdocker run --gpus all --rm -it \ nvcr.io/nvidia/nemo:25.04 bash # Inside container, launch FP8 pre-training with FSDPtorchrun --nproc_per_node=8 \ examples/nlp/language_modeling/megatron_gpt_pretraining.py \ model.mcore_gpt=True \ model.transformer_engine=True \ model.fp8=hybrid \ model.tensor_model_parallel_size=4 \ model.pipeline_model_parallel_size=2 \ trainer.devices=8For FP4 pre-training, pass model.fp4=True (requires Transformer Engine 2.0+ and Blackwell kernels). FP4 roughly doubles effective throughput vs FP8 on compatible layers.
NVLink Ultra Configuration
B300 GPUs are built on NVLink Ultra technology, delivering 1.8 TB/s bidirectional bandwidth per GPU. Combined with 288GB of HBM3e memory per card, B300 clusters enable near-linear scaling for the most data-intensive distributed training workloads, including trillion-parameter models with long-context requirements.
Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.
B300 vs alternatives
CDNA 5 vs Blackwell Ultra architecture, LLM inference projections, ROCm vs CUDA maturity, and GPU cloud pricing for teams weighing AMD's MI400 series as an alternative to B300.
Where B300 fits in NVIDIA's generational stack, how Blackwell Ultra compares to Hopper, and what changes with Rubin on the horizon. Useful context before committing to multi-year infrastructure.
NVIDIA B300 guides and resources
NVIDIA B300 (Blackwell Ultra): Complete Guide to Specs and Pricing
Everything you need to know about B300 specs, pricing, architecture, and when the upgrade from B200 is worth it.
NVIDIA Vera Rubin NVL72: Rack-Scale H300 System Specs and Cloud Timing
The next-gen successor to GB200 NVL72. 72 R100 (H300) GPUs, 260 TB/s NVLink fabric. Plan your B300 to Rubin upgrade path.
GPU Requirements Cheat Sheet 2026
Find the right GPU for every major open-source AI model, includes B300-class workload recommendations.
GPU Cloud Benchmarks 2026
Real performance and pricing data across every major GPU cloud provider, including next-gen Blackwell GPUs.
NVIDIA B300 Release Date and Cloud Availability
The NVIDIA B300 Blackwell Ultra GPU was announced at GTC 2025 as the memory-upgraded refresh of the B200. Production shipments began in mid-2025 inside GB300 NVL72 rack-scale systems. CoreWeave was first to general availability on GB300 NVL72 in August 2025; Nebius, AWS p6-b300, Microsoft Azure ND GB300 v6, and Google Cloud A4X Max followed through Q4 2025 and H1 2026. The B300 is still in early-availability rollout across most providers as of mid-2026.
On Spheron, B300 capacity is sourced from data center partners with priority given to sustained training commitments. Live availability and reservation pricing is on the pricing page or via the contact form. The B300's successor is the NVIDIA Rubin R100 (288GB HBM4 at up to 22 TB/s), which is expected to start shipping in H2 2026 and reach broad cloud availability in 2027.
B300 VRAM and Memory Bandwidth: 288GB HBM3e at 8 TB/s
The B300 ships with 288GB of HBM3e memory at 8 TB/s of bandwidth, making it the highest-VRAM single GPU available outside the upcoming Rubin generation. That is 50% more VRAM than the B200 (192GB HBM3e), 2x the VRAM of the H200 (141GB HBM3e), and 3.6x the VRAM of the H100 (80GB HBM3). The bandwidth matches the B200 at 8 TB/s, so the throughput advantage of B300 over B200 is purely about fitting larger models or longer contexts in a single GPU rather than serving each token faster.
Where the 288GB VRAM matters: 200B+ parameter dense models fit in FP8 on a single GPU without tensor parallelism overhead, trillion-parameter MoE models fit across a single 8-GPU HGX B300 node with ~2.3TB of pooled HBM3e, and 1M+ token context windows fit without KV cache eviction. The B300 includes the second-generation Transformer Engine with native FP4 support, delivering roughly 15 PFLOPS of FP4 dense compute. For workloads that fit comfortably in 192GB, the B200 at a lower hourly rate is the better economic fit; B300 is the right pick when single-GPU VRAM is the binding constraint.