NVIDIA B200 GPU: 192GB Blackwell Specs, Pricing & Rental. Rent B200 GPU from $2.68/hr
192GB HBM3e Blackwell with FP4 Transformer Engine. B200 GPU rentals built for trillion-parameter training and 100B+ LLM inference.
You can rent an NVIDIA B200 on Spheron starting at $2.68/hr per GPU per hour, the lowest live marketplace rate. Per-minute billing, no contracts, and 8-GPU HGX B200 nodes deploy via NVLink 5.0 with 1.8 TB/s GPU-to-GPU bandwidth. Each B200 ships with 192GB HBM3e, 8 TB/s memory bandwidth, and a 2nd-gen Transformer Engine with native FP4 support, delivering roughly 2x faster LLM training and up to 15x faster inference than H100 at FP4 (per MLPerf). Designed for frontier-scale workloads: 1T+ parameter training, 100B+ parameter inference serving, and multi-modal foundation models where HBM capacity and NVLink bandwidth are the bottleneck.
NVIDIA B200 specifications
NVIDIA B200 pricing
| Provider | Price/hr | Savings |
|---|---|---|
SpheronYour price | $2.68/hr | - |
RunPod | $5.89/hr | 2.2x more expensive |
Lambda Labs | $6.08/hr | 2.3x more expensive |
Nebius | $5.50/hr | 2.1x more expensive |
CoreWeave (SXM) | $8.60/hr | 3.2x more expensive |
CoreWeave (NVL) | $10.50/hr | 3.9x more expensive |
AWS (p6-b200) | est. $12.00/hr | 4.5x more expensive |
Need More B200 Than What's Listed?
Reserved Capacity
Commit to a duration, lock in availability and better rates
Custom Clusters
8 to 512+ GPUs, specific hardware, InfiniBand configs on request
Supplier Matchmaking
Spheron sources from its certified data center network, negotiates pricing, handles setup
Need more B200 capacity? Tell us your requirements and we'll source it from our certified data center network.
Typical turnaround: 24–48 hours
When to pick the B200
Pick B200 if
You're training frontier models (1T+ parameters), serving 100B+ parameter LLMs in production, or running MoE architectures that need the extra HBM capacity and NVLink bandwidth. FP4 support cuts inference cost per token roughly in half vs H100 FP8. If your model already maxes out 80GB on H100, B200 is the direct step up.
Pick H100 instead if
Your model fits in 80GB and you want the best price per hour for 70B-class training or inference. H100 is mature, has broad framework support, and costs significantly less per GPU-hour. B200 is overkill for anything under ~100B parameters.
Pick H200 instead if
You need 141GB HBM3e to fit larger contexts or KV cache without the full Blackwell price bump. H200 is a drop-in upgrade from H100 and a popular middle ground for serving 70-180B parameter models.
Pick B300 or GB200 instead if
You want Blackwell Ultra (B300) with 288GB HBM3e per GPU, or the GB200 Grace-Blackwell Superchip pairing two B200s with a Grace CPU over a 900 GB/s NVLink-C2C link. Both target the largest possible training runs and enterprise-scale reasoning models.
NVIDIA B200 use cases
Trillion-Parameter Model Training
Train the next generation of foundation models at exceptional scale, leveraging 192GB memory and 2nd-gen Transformer Engine.
Advanced LLM Inference
Deploy ultra-large language models for production inference with industry-leading throughput and lowest cost per token.
Generative AI at Scale
Power next-generation generative AI applications with support for advanced diffusion models and multi-modal generation.
AI Research & Innovation
Push the boundaries of AI research with frontier-class hardware built for experimental architectures and novel approaches.
NVIDIA B200 benchmarks
Serve Llama 3.1 405B on 8x B200 with vLLM + FP4
8-GPU HGX B200 node has 1.5TB unified HBM, enough to serve Llama 3.1 405B in FP4 with a 32K+ context window. vLLM enables tensor parallelism across NVLink for low-latency inference.
# SSH into your 8x B200 HGX nodessh root@<instance-ip> # NVIDIA PyTorch 24.10+ container has Blackwell + FP4 kernelsdocker run --gpus all --ipc=host --ulimit memlock=-1 \ -p 8000:8000 -v $HOME/.cache:/root/.cache \ nvcr.io/nvidia/pytorch:24.10-py3 bash pip install vllm>=0.6.3 # Launch Llama 3.1 405B with FP4 quantization across 8 GPUsvllm serve meta-llama/Llama-3.1-405B-Instruct \ --tensor-parallel-size 8 \ --quantization fp4 \ --max-model-len 32768 \ --gpu-memory-utilization 0.95 # Test the endpointcurl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"meta-llama/Llama-3.1-405B-Instruct","messages":[{"role":"user","content":"Hello"}]}'On an 8x B200 node, expect 5-8x higher tokens/sec than an 8x H100 node at FP4 thanks to the 2nd-gen Transformer Engine and NVLink 5.0.
NVLink Switch Configuration
B200 GPUs feature the latest NVLink switch technology providing 1.8 TB/s bidirectional bandwidth per GPU. This enables near-linear scaling for multi-GPU training of trillion-parameter models with minimal communication overhead.
Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.
B200 vs alternatives
Three-way breakdown of consumer Blackwell (RTX 5090), the H100 workhorse, and B200. Covers real benchmark data, cost per million tokens, and which one actually makes sense for your workload.
Side-by-side specs, real LLM benchmark data, cost-per-token analysis, and a clear decision framework across H200, B200, and GB200 Superchip for 2026.
CDNA 4 vs Blackwell architecture, LLM inference benchmarks, ROCm vs CUDA support, and GPU cloud pricing for AI teams weighing AMD as an alternative.
GB200 NVL72 packs 72 B200 dies into a single rack-scale NVLink domain for frontier training. Standalone B200 is simpler to rent per-GPU when you don't need full-rack scale.
NVIDIA B200 guides and resources
NVIDIA B200 Complete Guide: Specs, Benchmarks, and Pricing
Deep dive into Blackwell architecture, real-world benchmarks, FP4 performance, and how B200 compares to H100/H200.
RTX 5090 vs H100 vs B200: Which GPU for AI Workloads?
Head-to-head benchmarks on Llama 3.1, Stable Diffusion, and training throughput across three generations.
NVIDIA B300 Blackwell Ultra: Complete Guide
Detailed comparison of B200 vs B300 specs, pricing, and when the upgrade is worth it.
Production-Ready GPU Cloud Architecture
Design patterns for building reliable AI infrastructure on bare-metal B200 GPUs.
NVIDIA Vera Rubin NVL72: Rack-Scale H300 System Specs and Cloud Timing
Plan your infrastructure roadmap. Rubin NVL72 (H300-class) is the next-gen successor to B200 NVL72, expected H2 2026.
NVIDIA B200 Release Date and Cloud Availability
The NVIDIA B200 GPU was announced at GTC March 2024 as the flagship of the new Blackwell architecture. The first production HGX B200 systems shipped in late 2024, with broad GPU cloud availability rolling out through H1 2025. CoreWeave was first to GA on HGX B200 in October 2024; Nebius, Lambda Labs, and RunPod followed in Q1 2025; AWS p6-b200 reached general availability in mid-2025, with Azure ND B200 v6 and Google Cloud A4 close behind.
On Spheron the B200 SXM6 is available with per-minute billing and no contract. Spot pricing is meaningfully below the dedicated rate for fault-tolerant workloads. Live capacity and current pricing is on the pricing page. The B300 Blackwell Ultra (288GB HBM3e at 10 TB/s) is the next step up for workloads that exceed B200's 192GB VRAM ceiling; the GB200 NVL72 rack-scale system pairs 72 B200 GPUs into a unified memory domain for frontier pre-training.
B200 VRAM and Memory Bandwidth: 192GB HBM3e at 8 TB/s
The B200 ships with 192GB of HBM3e memory at 8 TB/s of bandwidth. That is 1.36x the VRAM and roughly 1.67x the bandwidth of the H200 (141GB HBM3e at 4.8 TB/s), and 2.4x the VRAM at 2.39x the bandwidth of the H100 (80GB HBM3 at 3.35 TB/s). For autoregressive LLM decode at batch size 1, the B200's bandwidth ceiling for a 70B FP16 model is roughly 57 tokens per second, more than double the H100.
Where the 192GB VRAM matters: a Llama 405B model fits in FP4 on a single B200, a 200B-class model fits in FP8 with KV cache headroom for production batches, and trillion-parameter MoE models fit across an 8-GPU HGX B200 node with 1.5TB of pooled HBM. The second-generation Transformer Engine adds native FP4 support, roughly doubling effective throughput on inference workloads that tolerate the precision reduction. For workloads that exceed 192GB VRAM, the B300 (288GB HBM3e) is the next step.