B200 SXM6 on-demand rates range from $3.70/hr on neo-clouds to $14.24/hr on AWS in June 2026. The 3.8x price spread across providers reflects structural differences in provider margin, quota friction, and billing model, not hardware differences. The GPU is the same NVIDIA B200 SXM6 regardless of which cloud you pick. For a broader view of the B200 SXM6 specs and architecture, that guide covers memory bandwidth, FP4 throughput, and workload selection criteria in detail.
This post breaks down what you actually pay across providers, the math on on-demand versus spot versus reserved, cost-per-token compared to H100 and H200, and three representative monthly cost scenarios.
TL;DR: B200 Cloud Pricing Across Providers (June 2026)
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron B200 SXM6 | $3.70 | $2.74 | Live from API; per-minute billing |
| RunPod (Secure Cloud) | $5.89 | N/A | Per-minute |
| Lambda Labs | $4.99-$5.29 | N/A | 1x-8x configs |
| Nebius | $5.50 | N/A | On-demand |
| CoreWeave | ~$6.50 | N/A | Estimated; contract-oriented |
| AWS p6-b200 | ~$14.24 | ~$2.70 | $113.93/hr for 8-GPU node |
| Azure | TBA | N/A | Not in standard catalog as of June 2026 |
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
What a B200 Costs to Rent in 2026
The $3.70 to $14.24/hr on-demand spread comes down to four factors.
Provider tier. Neo-cloud providers (Spheron, RunPod, Lambda, Nebius) operate with lower overhead and often aggregate third-party data center supply. Hyperscalers (AWS, Azure, GCP) carry substantially higher margins and infrastructure costs. The same B200 SXM6 hardware sits at $3.70/hr on Spheron and $14.24/hr on AWS. For a full GPU cloud pricing comparison across providers and GPU models, that post covers the systematic hyperscaler premium in detail.
Availability. Spot instances use unused capacity. Spheron spot starts at $2.74/hr, a 26% discount from its $3.70/hr on-demand rate. AWS p6 spot runs approximately $2.70/hr for a similar saving, but AWS spot availability for P6 instances is inconsistent.
Instance configuration. Providers that sell B200 only as 8-GPU nodes (common on AWS, some CoreWeave configs) effectively require you to buy 8 GPUs whether you need one or eight. Providers with single-GPU access (Spheron, RunPod) let you match capacity to actual workload size.
Contract versus per-minute billing. CoreWeave's competitive rate (~$6.50/hr) typically requires a direct contract and commitment. Spheron and RunPod offer per-minute billing with no commitment.
On-Demand vs Spot vs Reserved B200 Pricing
Single GPU Rates
| Provider | On-Demand $/hr | Spot $/hr | Monthly (720 hrs, on-demand) |
|---|---|---|---|
| Spheron | $3.70 | $2.74 | $2,664 |
| RunPod | $5.89 | N/A | $4,241 |
| Lambda Labs | $4.99-$5.29 | N/A | $3,593-$3,809 |
| Nebius | $5.50 | N/A | $3,960 |
| CoreWeave | ~$6.50 | N/A | ~$4,680 |
| AWS p6 | ~$14.24 | ~$2.70 | ~$10,253 |
8-GPU Node Math
| Provider | Per-GPU Rate | 8x Node $/hr | Monthly (720 hrs) |
|---|---|---|---|
| Spheron on-demand | $3.70 | $29.60 | $21,312 |
| Spheron spot | $2.74 | $21.92 | $15,782 |
| Lambda Labs | $5.29 | $42.32 | $30,470 |
| AWS p6.48xlarge | $14.24 | $113.93 | $82,030 |
Reserved and Contract Pricing
Most neo-cloud providers do not publish formal reserved pricing for B200. CoreWeave's typical contract arrangement for multi-GPU B200 clusters can reach 15-25% below on-demand rates for 6-month or longer commitments, but requires a direct conversation. AWS Savings Plans and reserved instances for p6-b200 instances are not yet widely published as of June 2026. Contact AWS account teams for Capacity Block pricing on p6.
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Provider-by-Provider B200 Pricing Breakdown
RunPod B200 (Secure Cloud)
RunPod lists B200 SXM6 in Secure Cloud at $5.89/hr per GPU with per-minute billing. No spot or preemptible option is available on RunPod for B200. The Secure Cloud tier provides dedicated hardware with consistent performance. Community Cloud (shared/marketplace) may list cheaper rates but with variable reliability.
| Config | On-Demand $/hr | Spot $/hr | Billing |
|---|---|---|---|
| B200 SXM6 (Secure Cloud) | $5.89 | N/A | Per minute |
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Lambda Labs B200
Lambda Labs offers B200 on-demand in two configurations: single-GPU at $4.99/hr and 8-GPU nodes at $5.29/hr per GPU. Per-hour billing applies, meaning a 47-minute job rounds up to a full hour. No spot or preemptible option is available.
| Config | On-Demand $/hr | Reserved | Billing |
|---|---|---|---|
| B200 (1x) | $4.99 | Contact Lambda | Per hour |
| B200 (8x node, per GPU) | $5.29 | Contact Lambda | Per hour |
The 1x configuration at $4.99/hr makes Lambda competitive for single-GPU workloads. The 8x node at $5.29/hr is 43% above Spheron's $3.70/hr on-demand rate.
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Nebius B200
Nebius lists B200 SXM6 at $5.50/hr per GPU on-demand, with per-hour billing. Nebius is a European cloud provider with data centers in Europe and the US. No spot tier is offered.
| Config | On-Demand $/hr | Spot $/hr | Billing |
|---|---|---|---|
| B200 SXM6 | $5.50 | N/A | Per hour |
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
CoreWeave B200
CoreWeave does not publish standard B200 on-demand rates on a public pricing page. Their model is contract-oriented for large B200 allocations. Based on publicly reported rates and market context, CoreWeave B200 on-demand is estimated at approximately $6.50/hr per GPU. Reserved multi-GPU cluster contracts may run lower.
If you need 16+ B200s with dedicated networking (InfiniBand clusters), CoreWeave's contract model may be the right fit. For individual GPU access or per-minute billing without a contract, neo-clouds like Spheron or RunPod are more accessible.
AWS p6-b200 Pricing
AWS introduced p6-b200 instances for B200 SXM6 workloads. The primary SKU is p6.48xlarge, which packs 8 B200 GPUs in a single instance at approximately $113.93/hr on-demand.
| Instance | Config | On-Demand $/hr | Per-GPU | Spot $/hr (per node) | Per-GPU Spot |
|---|---|---|---|---|---|
| p6.48xlarge | 8x B200 SXM6 | $113.93 | ~$14.24 | ~$21.61 | ~$2.70 |
AWS spot pricing for p6 instances shows up at roughly 81% below on-demand when available, bringing the per-GPU spot rate to approximately $2.70/hr. Like all AWS P-instance spot, p6 spot availability is inconsistent and availability in practice is often limited.
Beyond the hourly rate, AWS p6 pricing carries additional cost layers that the table above does not capture. Data egress from AWS to the internet runs $0.09/GB (first 10 TB/month). EBS root volumes, CloudWatch logging, and EFA networking within the same placement group add further overhead. A realistic total cost for an AWS p6.48xlarge running 24/7 with moderate checkpointing and egress is closer to $120-130/hr including storage and networking, not the headline $113.93/hr.
AWS p6 instances require a service quota increase before you can launch them. New accounts default to 0 vCPUs for P instances. Each p6.48xlarge consumes 192 vCPUs, and quota approval takes 3-7 business days with a written business justification.
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Why B200 Prices Are Falling in 2026
B200 on-demand rates are lower in mid-2026 than they were at the start of the year, and the trend continues downward. Several supply-side factors are driving this.
HBM3e supply improved. The B200 uses 192 GB of HBM3e memory. Early 2026 saw HBM3e supply constrained partly by SK Hynix and Micron production ramp. By Q2 2026, supply has loosened enough that more neo-cloud providers can stock B200 inventory without paying spot premiums on the hardware itself.
TSMC 4NP yields improved. The B200 GPU die is manufactured on TSMC's 4NP process. Yield improvements mean more functional dice per wafer, which directly reduces hardware cost and enables lower rental rates.
More providers have B200 inventory. Early B200 availability was concentrated at a few providers. By mid-2026, RunPod, Lambda, Nebius, and Spheron all carry B200 stock. More competitive supply tends to compress prices.
Q4 2026 spot-price outlook. As B300 (Blackwell Ultra) availability grows in H2 2026, some B200 capacity will shift from on-demand to spot pools, further reducing spot prices. For a detailed B200 vs B300 comparison including when B300's higher specs justify the premium, see the B300 vs B200 cost-per-token breakdown linked at the end of this post.
Cost Per Token: When B200 Beats H100 and H200 Despite the Higher Rate
The per-GPU hourly rate is the wrong number to optimize on for inference. What matters is cost per output token, which combines throughput and price.
Throughput Baselines
| GPU | FP8 Throughput (Llama 2 70B) | FP4 Throughput (Llama 2 70B) | Source |
|---|---|---|---|
| H100 SXM5 | ~3,000 tok/s | N/A (no native FP4) | MLPerf Inference v6.0 baseline |
| H200 SXM | ~4,000 tok/s | N/A (no native FP4) | Estimated from memory bandwidth ratio |
| B200 SXM6 (FP8) | ~6,000 tok/s | - | Estimated from TFLOPS ratio vs H100 |
| B200 SXM6 (FP4) | - | ~12,305 tok/s | MLPerf Inference v5.0 server mode |
Note: H100 throughput is from MLPerf Inference v6.0; B200 FP4 throughput is from v5.0. Cross-version numbers are directional, not directly comparable. A higher B200 FP4 figure (roughly 17,500 tok/s) appears in some benchmarks under different server configurations. See the B200 complete guide for details. H200 and B200 FP8 rows are estimates from bandwidth and TFLOPS ratios.
Cost-Per-Million-Tokens Math (at Spheron On-Demand Rates, June 2026)
1 million tokens at each throughput rate:
| GPU | Throughput | Price/hr | Time for 1M tokens | Cost per 1M tokens |
|---|---|---|---|---|
| H100 SXM5 | 3,000 tok/s | $2.54 | 333 sec = 0.093 hr | $0.24 |
| H200 SXM | 4,000 tok/s | $4.54 | 250 sec = 0.0694 hr | $0.32 |
| B200 SXM6 (FP8) | 6,000 tok/s | $3.70 | 167 sec = 0.046 hr | $0.17 |
| B200 SXM6 (FP4) | 12,305 tok/s | $3.70 | 81 sec = 0.0226 hr | $0.08 |
B200 FP8 at $3.70/hr is already ~27% cheaper per token than H100 SXM5 at $2.54/hr, because the throughput gain (2x) outpaces the price premium. At FP4, the cost-per-token advantage widens to ~65% cheaper than H100.
Where B200 Wins
Large-batch FP4 inference (70B+ models). If your workload runs Llama 3 70B, Mistral 7B at large batch, or any model where FP4 quality is acceptable, the B200 is the clear cost-per-token winner. FP4 throughput is only available on Blackwell.
Workloads exceeding 80 GB VRAM. H100 SXM5 has 80 GB. A 70B model in FP16 needs roughly 140 GB. On H100, you need two GPUs. On B200 (192 GB), one GPU handles it. The effective cost comparison flips because you're comparing 2x H100 against 1x B200.
High-traffic inference APIs. The cost advantage compounds at scale. A service generating 100 million tokens/day saves approximately $455/month using B200 FP4 vs H100 on-demand, based on the $0.15/M token savings at those throughput rates (computed directly from 3,000 tok/s × $2.54/hr vs 12,305 tok/s × $3.70/hr).
Where H100 Still Wins
Models under 34B at FP16. If your model fits comfortably in 80 GB, H100 PCIe on Spheron at $2.01/hr is hard to beat. You don't need B200 VRAM for smaller models.
Training jobs where FP4 isn't applicable. FP4 is inference-only. Training runs that don't fill the B200's memory will pay the B200 premium without capturing the throughput advantage.
For a full cost-per-token framework comparing B300 and B200 at Llama 70B and 405B scale, see the B300 vs B200 inference cost-per-token guide.
Spheron B200 Pricing: Live Rates (June 2026)
Current Spheron B200 SXM6 rates fetched from the Spheron pricing API on 20 Jun 2026:
| GPU | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| B200 SXM6 | $3.70 | $2.74 | Per-minute billing; SSH root access |
Spheron sources B200 supply from vetted data center partners worldwide. This aggregated supply model keeps on-demand rates competitive without requiring reserved contracts or quota approval processes.
Key billing differences vs hyperscalers:
- Per-minute billing vs per-hour minimums on AWS and Azure. A 47-minute job costs 47 minutes, not 60.
- No egress fees. AWS charges $0.09/GB for data leaving the platform. Spheron does not charge egress. For teams pulling 100 GB model checkpoints regularly, this is real savings.
- No quota friction. B200 SXM6 instances are available in under 2 minutes without a service ticket or quota increase request.
- Spot pricing is a real billing tier. Unlike AWS P6 spot, which is structurally scarce, Spheron spot is a consistent billing option with reliable availability.
For deployment guides, SSH setup, and distributed training configuration docs, see docs.spheron.ai.
Pricing fluctuates based on GPU availability. The prices above are based on 20 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Monthly Cost Examples
Scenario 1: Solo Researcher (1x B200 Spot, 40 hrs/week)
A researcher running weekly training experiments on a 70B model:
- 40 hours/week x 4 weeks = 160 GPU-hours/month
- Rate: $2.74/hr (Spheron spot)
- Monthly cost: $438
Assumes checkpointing is in place so spot interruptions don't lose work. This is the lowest-cost path to B200 access for infrequent use.
Scenario 2: Small Team (2x B200 On-Demand, 24/7 Inference)
A small team running two B200 instances continuously for production inference:
- 2 GPUs x 720 hrs/month x $3.70/hr
- Monthly cost: $5,328
Compared to running the same workload on AWS p6 (2 GPUs from a p6.48xlarge, billed as the full 8-GPU node at $113.93/hr): AWS would charge $82,030/month for the whole node even if you use only 2 GPUs.
Scenario 3: 8-GPU Training Cluster (Spot, 2-Week Run)
A team running a 2-week distributed training run on an 8-GPU B200 cluster:
- 8 GPUs x 336 hrs (2 weeks x 168 hrs/week) x $2.74/hr
- Total cost: $7,365
Same job on AWS p6.48xlarge on-demand: $113.93/hr x 336 hours = $38,281.
The spot-vs-AWS-on-demand gap for an 8-GPU 2-week run is approximately $30,900.
B200 SXM6 is available on Spheron at on-demand and spot rates, with per-minute billing and no egress fees. On-demand, Spheron is roughly 3.8x cheaper than AWS p6. At spot, rates are comparable, but Spheron spot is a consistent billing tier while AWS p6 spot availability is limited. No quota process to navigate.
B200 GPU cloud pricing → | View all GPU pricing → | Get started →
Quick Setup Guide
Visit spheron.network/gpu-rental/b200/ to see current on-demand and spot rates for B200 SXM6, updated live from the Spheron pricing API. You can also check spheron.network/pricing/ for a full GPU catalog comparison.
Sign in at app.spheron.ai, select B200 SXM6 from the GPU catalog, choose the spot billing tier, and deploy. You get SSH root access in under 2 minutes. Spheron bills per minute with no minimum commitment, so short jobs don't round up to a full hour.
Multiply your expected hours of use per day by 30 (days) and by the per-GPU rate. For a single B200 at Spheron on-demand ($3.70/hr) running 8 hours/day for 30 days: 8 x 30 x $3.70 = $888/month. For 24/7 inference serving (720 hrs/month): 720 x $3.70 = $2,664/month per GPU.
Frequently Asked Questions
As of June 2026, B200 SXM6 on-demand pricing ranges from $3.70/hr (Spheron) to $14.24/hr (AWS p6-b200). Lambda Labs lists B200 at $4.99-$5.29/hr, RunPod Secure Cloud at $5.89/hr, and Nebius at $5.50/hr. Spot pricing is available on Spheron from $2.74/hr/GPU and on AWS p6 spot at approximately $2.70/hr. Prices vary by provider tier, instance configuration, and availability.
As of June 2026, Spheron offers the lowest B200 SXM6 on-demand rate at $3.70/hr per GPU, with spot pricing from $2.74/hr. Lambda Labs is next at $4.99-$5.29/hr. Neo-cloud providers are significantly cheaper than AWS p6-b200 on-demand at approximately $14.24/hr per GPU. AWS p6 spot is available at approximately $2.70/hr but with inconsistent and often limited availability. Spheron spot at $2.74/hr offers the lowest reliably available spot cost for fault-tolerant workloads.
B200 on-demand instances are guaranteed capacity that runs until you stop them. Spot instances use spare capacity sold at a discount but can be reclaimed by the provider with short notice. On Spheron, B200 spot starts at $2.74/hr versus $3.70/hr on-demand, saving about 26%. Spot is suitable for training jobs with checkpointing, batch inference, and workloads that can tolerate interruptions. Avoid spot for production inference APIs serving live traffic.
An 8-GPU B200 node at Spheron's on-demand rate of $3.70/hr per GPU costs approximately $29.60/hr total, or about $21,312/month at full utilization. AWS p6.48xlarge (8x B200) costs approximately $113.93/hr on-demand, or about $82,030/month. Lambda Labs 8x B200 at $5.29/hr per GPU works out to $42.32/hr total. The neo-cloud vs hyperscaler spread on 8-GPU nodes is substantial: Spheron's node costs about 26% of AWS on-demand.
Yes, when using FP4 inference. The B200 SXM6 delivers approximately 12,305 tokens/sec on Llama 2 70B in FP4 server mode (MLPerf v5.0), versus roughly 3,000 tokens/sec for H100 SXM5. At Spheron on-demand rates ($3.70/hr for B200 vs $2.54/hr for H100), FP4 B200 is approximately 65% cheaper per output token than H100 FP8. Even at FP8 (roughly 6,000 tok/s), the B200 at $3.70/hr is cheaper per token than H100. The break-even fully tips to B200 for any model where FP4 quality is acceptable, particularly 70B+ models at batch sizes above 8.
