Hyperscalers charge 3-6x more than neo-cloud alternatives for the same GPU hardware. AWS H100 on-demand runs ~$6.88/hr. Azure charges ~$12.29/hr per GPU on their ND H100 v5 instances. On Spheron, the same H100 SXM5 is $2.01/hr on-demand and $0.99/hr on spot. That gap is not a temporary anomaly. It reflects structural differences in overhead, margin, and business model.
This post covers 7 GPU models across 15+ providers, with on-demand, spot, and reserved pricing for each. You can check Spheron's current GPU pricing for live rates. For throughput data behind these prices, see our GPU cloud benchmarks.
The GPU Models Covered
| GPU Model | VRAM | Primary Use Case | Tier |
|---|---|---|---|
| RTX 4090 | 24 GB GDDR6X | Hobbyist inference, fine-tuning | Consumer |
| A100 80GB | 80 GB HBM2e | Training, inference | Data center |
| L40S | 48 GB GDDR6 | Inference, rendering | Data center |
| H100 SXM5 | 80 GB HBM3 | Production training | Data center |
| H200 SXM | 141 GB HBM3e | Large model inference | Data center |
| B200 | 192 GB HBM3e | Frontier inference | Blackwell |
| RTX 5090 | 32 GB GDDR7 | Consumer inference | Consumer |
GPU Cloud Pricing by Model (March 2026)
All prices as of March 19, 2026, based on publicly available on-demand rates. Prices fluctuate based on GPU availability and provider policies. Check current Spheron GPU pricing for live rates.
H100 SXM5 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $2.01 | $0.99 | Lowest spot rate |
| Lambda Labs | $2.49–$3.44 | N/A | H100 SXM; on-demand only (8x–1x configs) |
| RunPod | $2.69 | Available | PCIe Community Cloud |
| Vast.ai | ~$1.53–$2.27 | Available | Marketplace rates |
| CoreWeave | ~$6.16 | N/A | H100 HGX SXM; normalized per GPU |
| Nebius | $2.95 | N/A | On-demand |
| FluidStack | $2.10 | N/A | |
| Paperspace | $5.95 | N/A | |
| AWS (p5) | ~$6.88 | ~$3.83 | Spot ~44% off OD |
| GCP (A3) | ~$10.98 | ~$3.69 | Estimated; varies by region |
| Azure (ND H100 v5) | ~$12.29 | N/A | Per GPU on ND96isr H100 v5 ($98.32/hr, 8 GPUs) |
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
H200 SXM Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $4.54 | $1.78 | |
| GMI Cloud | $2.60 | N/A | On-demand; from $2.60/hr |
| Nebius | $3.50 | N/A | On-demand |
| RunPod | $3.59 | N/A | Secure Cloud |
| Jarvislabs | $3.80 | N/A | On-demand |
| AWS (p5e) | ~$4.98 | N/A | Estimated; spot not widely available |
| GCP | TBA | Spot only | Limited on-demand availability |
| Azure | ~$13.78 | N/A | Estimated |
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
B200 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $6.03 | $2.18 | Lowest spot rate |
| RunPod | $4.99 | N/A | Secure Cloud |
| Nebius | $5.50 | N/A | On-demand |
| Lambda Labs | $4.99–$5.29 | N/A | On-demand; varies by configuration (8x–1x configs) |
| AWS (p6-b200) | ~$14.24 | ~$3.24 | Estimated; $113.93/hr for 8-GPU node |
| Azure | TBA | N/A | Not yet in standard catalog |
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
A100 80GB Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $1.07 | $0.61 | |
| Thunder Compute | $0.78 | N/A | |
| Market range | $0.78–$2.06 | Varies | Neo-cloud average |
| AWS (p4de) | ~$3.43 | ~$3.07 | 80GB; Estimated |
| GCP (A2) | ~$5.78 | ~$2.51 | 80GB (a2-ultragpu); us-central1; Estimated |
| Azure (NC A100 v4) | ~$3.67 | ~$0.74 | NC24ads A100 v4; spot available |
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
L40S Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $0.91 | $0.41 | |
| RunPod | $0.79 | Available | |
| AWS reserved | ~$1.17 | N/A | 1-year reserved (g6e.xlarge) |
| Marketplace | ~$0.40 | Available | Varies |
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
RTX 4090 Pricing
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron | $0.58 | N/A | |
| RunPod | $0.34 | Available | Community |
| Vast.ai | $0.35–$0.55 | Available | Marketplace, varies |
| Local marketplace | ~$0.20 | N/A | Variable reliability |
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
RTX 5090 Pricing
| Provider | On-Demand $/hr | Notes |
|---|---|---|
| Spheron | $0.68 | Limited inventory |
| RunPod | $0.69 | Community Cloud; limited inventory |
| Vast.ai | $0.51–$0.89 | Marketplace rates; limited availability |
RTX 5090 cloud availability is limited to a small number of providers as of March 2026. Inventory is constrained and prices can shift quickly.
Pricing data as of March 19, 2026. Prices fluctuate based on GPU availability. Check current Spheron GPU pricing for live rates.
On-Demand vs Spot vs Reserved: Which Pricing Tier to Choose
On-Demand Pricing
On-demand gives you full flexibility with no commitment. You pay the listed hourly rate, start when you want, and stop when you're done. It is the most expensive tier but the right choice for:
- Short experiments and one-off jobs where total cost is low anyway
- Workloads with unpredictable runtimes or sharp deadlines
- Debugging and development where interruption is intolerable
- Production inference APIs where availability guarantees matter
Most neo-cloud providers (Spheron, Lambda, RunPod) do not require contracts for on-demand instances, and several bill per-minute or per-second.
Spot / Preemptible Pricing
Spot instances use idle capacity that providers offer at steep discounts. They can be reclaimed with short notice, typically 30 seconds to 2 minutes. Savings over on-demand range from 40-65%.
| GPU | On-Demand | Spot | Savings % |
|---|---|---|---|
| H100 SXM5 (Spheron) | $2.01/hr | $0.99/hr | ~51% |
| B200 (Spheron) | $6.03/hr | $2.18/hr | ~64% |
| A100 80GB (Spheron) | $1.07/hr | $0.61/hr | ~43% |
| H100 SXM5 (AWS) | ~$6.88/hr | ~$3.83/hr | ~44% |
Spot pricing is the right call for: batch training jobs with checkpoint/resume, offline inference pipelines, hyperparameter sweeps, and data preprocessing. It is the wrong call for: production serving, real-time inference APIs, or any job that cannot tolerate interruption.
Reserved / Committed Pricing
Reserved pricing requires a commitment, typically 1 to 12 months, in exchange for 20-40% discounts vs on-demand. AWS EC2 reserved instances, Azure reserved VMs, and GCP committed-use contracts all follow this model.
Neo-cloud providers like Lambda Labs and CoreWeave offer reserved clusters at negotiated rates. Spheron offers volume pricing via direct contact for teams with predictable long-term compute needs.
Reserved pricing is right for: production inference running 24/7, large-scale training programs with predictable GPU-hour requirements, and teams that have validated their workload and want to lock in cost predictability.
Hidden Costs: What the Hourly Rate Doesn't Include
Egress and Bandwidth Fees
Hyperscalers charge $0.08-$0.12/GB for outbound data transfers. Most neo-clouds (Spheron, RunPod, Lambda) include bandwidth in the instance rate or charge flat rates well below hyperscaler egress fees.
In practice: transferring a 100 GB model checkpoint out of AWS costs $8-12 in egress fees on top of whatever you paid for the GPU hour. At scale, if you are syncing checkpoints to external storage or serving model weights across regions, egress can easily match or exceed your GPU compute bill.
Storage Costs
Persistent volume storage typically runs $0.08-$0.15/GB/month. Temporary storage is included in GPU instances but is not persisted between restarts. If your workflow requires persistent storage across sessions, factor this in when comparing providers.
For large models, even modest storage needs add up. A 70B parameter model in FP16 requires around 140 GB of storage. At $0.10/GB/month, that is $14/month in storage alone before any compute.
Networking and IP Fees
Static IP addresses, load balancers, and VPC peering add cost on hyperscalers. Most neo-clouds include a public IP in the instance rate. If your application requires custom networking topology, AWS and GCP give you more tools but charge for the privilege.
Minimum Commitments
Some providers require a 1-hour minimum billing period (Paperspace). Others bill per-minute (Spheron) or per-second (RunPod). For short experimental runs under 10 minutes, the minimum commitment model can effectively double or triple your per-run cost. Check this before choosing a provider for iterative development work.
Price-Performance: Cost Per Token and Cost Per TFLOP
The cheapest hourly rate rarely delivers the best cost-per-token. A B200 at $6.03/hr on Spheron costs less per output token than an H100 at $2.01/hr, because the B200 delivers roughly 3-4x the inference throughput.
LLM Inference: Cost Per Million Tokens (Llama 3 70B)
The RTX 4090 is excluded from this table. Llama 3 70B in FP16 requires approximately 140 GB of VRAM, which exceeds the 24 GB on a single RTX 4090. If you need 70B inference on consumer GPUs, use INT4 quantization (e.g., GGUF Q4_K_M), which reduces the memory requirement to roughly 40 GB and is best spread across multiple GPUs or handled on a dedicated data center GPU.
| GPU | Provider | $/hr | Est. tokens/sec | $/M tokens |
|---|---|---|---|---|
| A100 80GB | Spheron | $1.07 | ~520 | $0.57 |
| L40S | Spheron | $0.91 | ~450 | $0.56 |
| H100 SXM5 | Spheron | $2.01 | ~1,200 | $0.47 |
| H100 SXM5 | AWS | ~$6.88 | ~1,200 | $1.59 |
| H200 SXM | Spheron | $4.54 | ~1,800 | $0.70 |
| B200 | Spheron | $6.03 | ~4,000 | $0.42 |
| B200 | AWS | ~$14.24 | ~4,000 | $0.99 |
Throughput figures are per-GPU estimates for comparison purposes. GPUs with less than 141 GB VRAM require multi-GPU tensor parallelism or quantization for Llama 3 70B inference. Reference GPU cloud benchmarks for full multi-GPU data.
The B200 on Spheron ($0.42/M tokens) leads on cost-per-token efficiency across the lineup, combining high throughput with competitive pricing. The H100 ($0.47/M tokens) ranks second, followed by the L40S ($0.56/M tokens) and A100 ($0.57/M tokens). The H200 ($0.70/M tokens) is better suited for very large model batches than for single-instance 70B inference. For detailed total cost of ownership analysis, see the GPU cost optimization playbook.
Spheron vs Every Major Competitor
Spheron vs AWS/GCP/Azure: The cost gap is 40-85% across major GPU models when comparing on-demand rates. Beyond the hourly rate, Spheron does not charge egress fees, does not require minimum commitments, and bills per-minute. AWS, GCP, and Azure add egress, storage, networking, and reserved capacity overhead that compound the gap substantially on real workloads.
Spheron vs RunPod: Spheron undercuts RunPod on H100 pricing ($2.01 vs $2.69 on-demand). RunPod offers a lower H200 on-demand rate ($3.59 vs $4.54), but Spheron's B200 spot pricing ($2.18/hr) has no RunPod equivalent, giving Spheron a significant edge for spot-eligible B200 workloads. Both platforms offer per-minute billing, spot instances, and multi-GPU configurations. RunPod has a larger community marketplace; Spheron aggregates from enterprise-grade data center partners with SLA guarantees.
Spheron vs Lambda Labs: Lambda is on-demand only for most GPU models. If your workload benefits from spot pricing, Spheron delivers 40-60% cost reductions that Lambda cannot match. Lambda's GPU inventory is strong for H100 and A100; Spheron adds B200 spot availability.
Spheron vs Vast.ai: Vast.ai's marketplace model can produce lower prices on commodity GPUs (A100, RTX 4090) because individual providers compete, but reliability and SLA coverage are variable. Spheron offers guaranteed SLA-backed capacity with consistent performance. For cost-first commodity workloads where reliability tolerance is high, Vast.ai is worth evaluating.
Spheron vs CoreWeave: CoreWeave is enterprise-focused with contract pricing and strong multi-node cluster support. For startups and teams that need on-demand access without a sales cycle, Spheron is more accessible. CoreWeave makes sense for large organizations with predictable multi-month compute requirements and existing enterprise procurement workflows.
For head-to-head comparisons, see Spheron vs RunPod, Spheron vs Vast.ai, Spheron vs CoreWeave, RunPod alternatives, and Lambda Labs alternatives.
How to Choose the Right GPU and Provider
| Workload | Recommended GPU | Recommended Provider Tier | Why |
|---|---|---|---|
| Hobbyist inference (7B-13B) | RTX 4090 | Vast.ai / Spheron spot | Lowest cost, sufficient VRAM |
| Fine-tuning 7B-70B | A100 80GB | Spheron / Lambda on-demand | Mature stack, good price |
| Production inference (70B) | H100 / H200 | Spheron spot or on-demand | Balance of cost and throughput |
| Large model training | H200 / B200 | Spheron / CoreWeave | VRAM headroom |
| Frontier inference (100B+) | B200 / B300 | Spheron | Best cost-per-token at scale |
For hyperscaler integration requirements (IAM, VPC, compliance certifications), AWS/GCP/Azure may be justified despite significantly higher GPU costs. If your workload is tightly integrated with S3, BigQuery, or Azure Active Directory, the switching cost of migrating to a neo-cloud can outweigh the per-GPU savings in the short term.
Final Verdict
For most AI teams, neo-cloud providers deliver 40-85% lower GPU compute costs than hyperscalers with comparable or better GPU availability in 2026. The pricing gap has widened, not narrowed, as hyperscaler overhead and margin have increased faster than neo-cloud cost reductions.
The cheapest hourly rate is not always the best value. Calculate cost per token or cost per training step before committing to a platform. The H100 on Spheron at $2.01/hr delivers better cost-per-token than either the A100 at $1.07/hr or the L40S at $0.91/hr for most inference workloads. The B200 at $6.03/hr beats all of them at scale, delivering the lowest cost per output token among Spheron's GPU lineup.
Spot pricing is worth using for batch workloads. The 40-65% savings over on-demand are real and reproducible for any workload that implements checkpoint/resume. On-demand is right for production serving and latency-sensitive workloads where interruption is unacceptable.
All pricing in this post is based on publicly available on-demand rates as of March 19, 2026. GPU cloud prices fluctuate over time based on availability, provider changes, and market conditions. Check Spheron's GPU pricing page for the most current rates.
Compare current rates on Spheron's GPU pricing page and rent a GPU now to start running your workloads at lower cost.
Spheron gives you on-demand access to H100, H200, B200, A100, L40S, and RTX 4090 GPUs with per-minute billing, no egress fees, and spot pricing that cuts costs by up to 64% compared to on-demand. No contracts, no minimums, no hidden fees.
