H100 cloud pricing has dropped significantly this June: Spheron's H100 SXM5 spot reaches $1.43/hr per GPU and on-demand starts at $2.53/hr, while H100 PCIe starts at $2.01/hr on-demand, all well below hyperscaler rates. This post tracks H100 price changes month by month, explains what is driving prices down, and covers what comes next: H200 availability, Blackwell supply dynamics, and the Vera Rubin timeline. If you need H100 instances right now, Spheron's H100 GPU rental has on-demand availability with per-minute billing and no contracts.
H100 Price Tracker (Updated Monthly)
The table below compares H100 on-demand and spot rates across major GPU cloud providers. Neo-cloud prices reflect current marketplace rates. Hyperscaler rates reflect published list prices.
| Provider | On-Demand $/hr | Spot $/hr | Notes |
|---|---|---|---|
| Spheron (H100 SXM5) | $2.53 | $1.43 | Lowest spot rate; per-minute billing |
| Spheron (H100 PCIe) | $2.01 | N/A | Single-GPU configurations available |
| RunPod (Secure Cloud) | $2.89-$3.29 | N/A | SXM vs PCIe pricing varies; see RunPod H100 pricing 2026 |
| Lambda Cloud | $3.29-$3.99 | N/A | PCIe and SXM configs; see Lambda Cloud H100 pricing 2026 |
| AWS (p5.48xlarge) | ~$6.88 | ~$3.83 | Per GPU on 8-GPU node; see AWS H100 pricing 2026 |
| GCP (A3 High) | ~$10.98 | ~$3.69 | A3 a3-highgpu-8g; see GCP A3 H100 pricing vs Spheron |
| Azure (ND H100 v5) | ~$12.29 | N/A | Per GPU on ND96isr; see Azure H100 pricing guide |
Pricing fluctuates based on GPU availability. Rates above are based on 14 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
For the full cross-provider comparison including A100, B200, L40S, and consumer GPUs, see our GPU cloud pricing comparison 2026.
Why H100 Prices Keep Falling
H100 on-demand pricing on neo-cloud platforms has dropped from roughly $8/hr in late 2024 to around $2.53/hr in mid-2026 for SXM5 on-demand, with spot rates as low as $1.43/hr. The driver is Blackwell.
B200 and B300 supply has ramped throughout 2026. As more Blackwell capacity comes online, H100 shifts from top-tier to mid-tier status. Providers who previously competed aggressively for limited H100 inventory can now offer H100 at lower margins, or divert new demand to B200. The result is a steady compression of H100 on-demand rates on marketplaces like Spheron.
The dynamic works differently at hyperscalers. AWS, Azure, and GCP tend to set pricing through multi-year commercial agreements and are slower to pass through hardware economics to list prices. AWS cut P5 H100 prices by 44% in June 2025, which was significant, but the $6.88/hr per GPU rate is still roughly 2.7x higher than Spheron's current SXM5 on-demand rate, and nearly 5x higher than SXM5 spot. Hyperscaler price reductions typically lag neo-cloud reductions by 6-12 months.
For a deeper look at how Blackwell supply competes with Hopper, see the NVIDIA B300 Blackwell Ultra guide and the Rubin vs Blackwell vs Hopper comparison.
Successor Watch: H200, B200/B300, and Vera Rubin
H200: Mature, Widely Available, Competitively Priced
The H200 is the direct H100 successor in the Hopper line. It uses the same GH100 compute die but upgrades the memory subsystem from 80 GB HBM3 to 141 GB HBM3e at 4.8 TB/s bandwidth. That memory upgrade delivers 37-90% faster LLM inference on 70B+ parameter models where memory bandwidth is the bottleneck.
H200 on-demand on Spheron starts at $4.84/hr per GPU (DEDICATED), with spot pricing available from $1.77/hr. For workloads that need more than 80 GB of VRAM but do not require the full Blackwell feature set (native FP4, 192 GB HBM3e), the H200 is currently one of the best cost-per-token options available. For a side-by-side spec and benchmark comparison, see H100 vs H200. For H200-specific configuration guides, see our NVIDIA H200 specs reference.
You can deploy H200 instances directly from Spheron H200 GPU rental.
B200 and B300: The New Production Standard
Blackwell GPUs are now available on Spheron and several other neo-cloud platforms. The B200 brings 192 GB HBM3e at 8 TB/s bandwidth, native FP4 Tensor Cores, and approximately 4x the LLM inference throughput of H100 on Llama 70B. The B300 (Blackwell Ultra) pushes memory to 288 GB while keeping bandwidth at 8 TB/s, making it the right choice when model size rather than bandwidth is the constraint.
For teams moving beyond 70B models or building high-throughput production inference pipelines, B200 is now the default recommendation. H100 remains competitive for 7B-30B models at low batch sizes where its lower price per GPU offsets its lower throughput. For a direct comparison between B300 and B200 inference economics, see B300 vs B200 inference cost per token.
B200 rentals are available at Spheron B200 GPU rental.
Vera Rubin (R100): H2 2026 for the First Cohort
NVIDIA announced Rubin architecture at GTC 2026. The R100 GPU delivers HBM4 memory at up to 22 TB/s bandwidth, 50 PFLOPS FP4 compute, and NVLink 6 at 3.6 TB/s. That is approximately 18-22x the LLM inference throughput of H100 on Llama 70B, and roughly 4-5x the throughput of B200.
The first cloud cohort, confirmed by NVIDIA, includes AWS, Google Cloud, Azure, OCI, CoreWeave, Lambda, Nebius, and Nscale. Deployment is expected in H2 2026. Broader marketplace availability, including Spheron, is expected in 2027. For the full spec comparison and workload guidance across Hopper, Blackwell, and Rubin, see our Rubin vs Blackwell vs Hopper comparison and the NVIDIA Rubin R100 guide.
H100 Availability and Supply in 2026
H100 supply on neo-cloud platforms has loosened considerably compared to 2025. Spheron and similar marketplaces carry meaningful spot and on-demand inventory, with single-GPU and 8-GPU configurations available. Lead times for cloud-based H100 are effectively zero: provision in under 2 minutes on-demand.
The picture is different for direct hardware purchase. Secondary-market H100 lead times from resellers are running 36-52 weeks, driven by CoWoS packaging constraints at TSMC and HBM supply limitations. Hyperscalers also maintain tighter on-demand H100 availability than neo-clouds because they prioritize their own workloads and reserved capacity commitments.
For teams that could not lock in H100 reserved capacity in 2025, neo-cloud on-demand is now the most accessible path. Spot capacity on Spheron aggregates inventory from 5+ providers, which reduces the preemption risk compared to relying on a single provider's spot pool. For more detail on the structural GPU supply dynamics at play, see our analysis of H100 supply and lead times.
H100 Form Factors in 2026: SXM5 vs PCIe vs NVL
H100 comes in three physical configurations: SXM5 (high-bandwidth, high-TDP, HGX server), PCIe (lower TDP, broader data center compatibility, HBM2e), and NVL (94 GB HBM3, 2-GPU NVLink bridge pair).
The short decision matrix:
- H100 PCIe ($2.01/hr on Spheron): best for models under 40 GB, development workloads, cost-first deployments without NVLink requirements.
- H100 SXM5 ($2.53/hr on-demand, $1.43/hr spot on Spheron): best for training, multi-GPU tensor-parallel inference, models up to 80 GB. Higher bandwidth (3.35 TB/s vs 2 TB/s PCIe).
- H100 NVL ($2.06/hr on Spheron, 8-GPU configs): suited for models between 80-188 GB that benefit from the NVLink bridge without full tensor-parallel overhead.
For a detailed workload-mapping analysis across all three variants including MIG partitioning and NVLink topology, see the H100 NVL vs SXM5 vs PCIe form factor guide.
Buy vs Rent H100 in 2026
Secondary-market H100 SXM5 GPUs currently trade at $25,000-$35,000 per unit, down from $40,000+ in early 2025 as Blackwell supply has grown. The price trajectory is still downward.
At current rental prices, the break-even math is unfavorable for purchasing. At $2.01/hr on-demand (H100 PCIe), recovering a $35,000 purchase cost requires 17,412 hours of continuous operation, roughly 2 years at 24/7 utilization, or 6 years at 8 hours per day. At $1.43/hr spot (H100 SXM5), the math extends further: $35,000 / $1.43 = 24,476 hours continuous, or about 8 years at 8 hours per day.
For most AI teams, those utilization profiles are unrealistic. A typical workload runs at 40-70% GPU utilization, not 100% 24/7. Factor in idle time and the break-even extends well beyond hardware refresh cycles.
The depreciation risk compounds this. H100 resale values will continue to fall as B200 and B300 become the production standard. A GPU purchased today at $30,000 may trade at $10,000-$15,000 in 18 months. That depreciation is a real cost that the buy-vs-rent math must account for, and it tilts the outcome further toward renting.
For workloads that are genuinely stable and predictable over 12+ months, reserved commitments on cloud offer a middle path: lower rates than on-demand without the capital expenditure of hardware ownership. For reserved pricing options, see Spheron reserved commitments and our guide to serverless GPU vs on-demand vs reserved.
Spheron H100 Live Pricing
These rates are pulled from the Spheron GPU marketplace as of 14 Jun 2026:
| Model | On-Demand $/hr | Spot $/hr | Min Config |
|---|---|---|---|
| H100 SXM5 | $2.53 | $1.43 | 1 GPU |
| H100 PCIe | $2.01 | N/A | 1 GPU |
| H100 NVL | $2.06 | N/A | 8 GPUs |
| H200 SXM5 | $4.84 | $1.77 | 1 GPU |
All instances are billed per minute with no minimum commitment. Deployment takes under 2 minutes on-demand. Spot instances are preemptible but typically stable for hours-to-days on Spheron's multi-provider pool.
For comparison, the equivalent AWS P5 H100 SXM5 rate is ~$6.88/hr on-demand, and Azure ND H100 v5 runs ~$12.29/hr per GPU. The Spheron SXM5 on-demand rate is approximately 63% below AWS list price, with spot pricing reaching 79% below.
Pricing fluctuates based on GPU availability. The prices above are based on 14 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
To deploy, go to app.spheron.ai and select H100 from the GPU catalog.
How to Lock Low H100 Rates
1. Use Spot Instances for Fault-Tolerant Workloads
Spot instances on Spheron offer H100 SXM5 capacity at $1.43/hr, compared to $2.53/hr on-demand, with the trade-off that instances can be preempted. For training jobs, configure checkpoint saves every 15-30 minutes so a preemption does not cost more than 30 minutes of compute. For fine-tuning runs under 8 hours, spot pricing is typically the lowest-cost option with minimal operational risk.
See our guides on spot GPU training with checkpointing and GPU spot instance arbitrage strategies in 2026 for configuration details.
2. Use Reserved Commitments for Steady-State Workloads
If your H100 usage is predictable across 3-12 months, a reserved commitment locks a lower rate than on-demand and eliminates the preemption risk of spot. This suits production inference endpoints and long training runs with firm completion timelines. See Spheron reserved commitments for current reserved pricing.
3. Use Spheron's Multi-Provider Spot Pool
Spheron aggregates H100 spot capacity from 5+ providers into a single marketplace. When one provider's spot pool reclaims capacity, Spheron can route to another provider's pool. This means lower effective preemption rates compared to a single-provider spot deployment, where a capacity wave hits the entire pool at once.
H100 cloud pricing has dropped to its lowest point yet, with spot from $1.43/hr and on-demand from $2.53/hr on Spheron. Spheron aggregates H100 capacity from 5+ providers into a single marketplace with per-minute billing and no contracts.
H100 GPU pricing → | Compare all GPU prices → | Get started →
Quick Setup Guide
Go to spheron.network/pricing/ and filter by H100. Both on-demand and spot rates are shown live. On-demand instances provision in under 2 minutes with no commitment. Spot instances apply a discount for fault-tolerant workloads.
If your model weights are under 40 GB, H100 PCIe is the cheapest option. For 40-80 GB models, H100 SXM5 offers full capacity. For 80-141 GB models, H200 SXM5 is needed. For 141+ GB models, B200 or B300 are required.
On Spheron, select Spot pricing when deploying. Spot instances are preemptible, so configure your workload to checkpoint every 15-30 minutes. For fine-tuning jobs under 8 hours, spot pricing typically saves 40-60% vs on-demand.
Frequently Asked Questions
As of June 2026, H100 SXM5 on-demand starts at $2.53/hr on Spheron and H100 PCIe from $2.01/hr. Spot pricing on H100 SXM5 reaches as low as $1.43/hr. AWS P5 on-demand runs ~$6.88/hr per GPU. Prices are falling due to Blackwell (B200/B300) supply pressure displacing H100 to mid-tier status.
Blackwell GPU supply (B200 and B300) has ramped significantly in 2026, increasing the overall supply of high-end GPU compute. As B200 becomes the default for frontier inference, H100 shifts into a mid-tier role, pushing on-demand prices down on neo-cloud marketplaces. Hyperscaler prices (AWS, GCP, Azure) tend to lag neo-cloud reductions by 6-12 months.
The H100's direct successor in the Hopper line was the H200 (141 GB HBM3e, same compute die). The broader successor generation is Blackwell: B200 (192 GB HBM3e, 5th-gen Tensor Cores, native FP4) and B300 (288 GB HBM3e). The next generation, Rubin (R100), is expected in H2 2026 for the first cloud cohort.
At secondary-market H100 purchase prices of $25,000-$35,000 per GPU, the breakeven depends on the rental tier. At $1.43/hr SXM5 spot, that purchase range breaks even at roughly 17,500-24,500 hours of continuous use. At $2.01/hr PCIe on-demand, the same purchase range breaks even at roughly 12,500-17,400 hours. For a team using a GPU 8 hours per day, that is 4-8 years of operation before a purchase pays off. Hardware also depreciates as B200/B300 displace H100, so the financial case for renting is strong.
NVIDIA's Rubin R100 is targeted at H2 2026 for the first cloud cohort: AWS, Google Cloud, Azure, OCI, CoreWeave, Lambda, Nebius, and Nscale. Broader availability on marketplaces like Spheron is expected in 2027.
