RunPod lists H100 GPUs across three distinct pricing tiers: Community Cloud (community-hosted, variable), Secure Cloud (RunPod-operated, SLA-backed), and Serverless (scale-to-zero, per-second billing). Each tier targets a different tradeoff between cost, reliability, and latency. This post covers exact per-hour rates for each tier, the per-second serverless math, hidden storage fees, and a direct comparison against Spheron, Lambda Labs, and CoreWeave.
RunPod H100 Pricing: Three Tiers Explained
RunPod's tiered model puts different GPU inventory under different guarantees. Understanding which tier fits your workload matters before comparing raw $/hr figures.
Community Cloud connects you to third-party hosts who rent their hardware through RunPod's marketplace. These are the cheapest listings, often $1.80-$2.40/hr for H100 80GB, but availability is host-dependent and hardware condition varies. If a host goes offline, your pod can be interrupted with limited recourse.
Secure Cloud is RunPod's own data center capacity. Hardware is RunPod-owned, availability is more predictable, and uptime guarantees apply. Prices are higher: H100 80GB (PCIe) runs around $2.79/hr and SXM5 variants are approximately $2.69/hr. Most production inference deployments land here.
Serverless is not an always-on instance. You deploy a container template and RunPod spins up H100 pods on demand, billing per second of active execution. When traffic drops to zero, you pay zero. This is ideal for APIs with unpredictable or bursty traffic patterns.
| Tier | H100 Type | Price Range | Billing | Reliability |
|---|---|---|---|---|
| Community Cloud | H100 80GB (PCIe) | $1.80-$2.40/hr | Per minute | Variable (host-dependent) |
| Secure Cloud | H100 80GB (PCIe) | ~$2.79/hr | Per minute | SLA-backed |
| Secure Cloud | H100 SXM5 | ~$2.69/hr | Per minute | SLA-backed |
| Serverless | H100 (configured) | ~$0.00053/sec | Per second | Managed, cold-start applies |
RunPod H100 SXM5 vs PCIe Per-Hour Rates
Most H100 listings on RunPod show "H100 80GB" without specifying interconnect. The majority are PCIe models. SXM5 variants with NVLink are listed separately and typically carry a small premium due to higher memory bandwidth (3.35 TB/s vs 2 TB/s) and better multi-GPU scaling.
| Variant | VRAM | Tier | Price/hr | Notes |
|---|---|---|---|---|
| H100 80GB PCIe | 80 GB HBM2e | Community Cloud | $1.80-$2.40 | Host-variable, may be interrupted |
| H100 80GB PCIe | 80 GB HBM2e | Secure Cloud | ~$2.79 | RunPod-operated, predictable |
| H100 SXM5 | 80 GB HBM3 | Secure Cloud | ~$2.69 | Higher bandwidth, better for multi-GPU |
| H100 80GB (Serverless) | 80 GB | Serverless | $0.00053/sec | Scale-to-zero, cold-start billed |
For single-GPU inference where bandwidth headroom is not critical, the PCIe variant is fine. For distributed training or large model inference requiring multi-GPU NVLink bandwidth, the SXM5 variant is the right call.
RunPod Serverless GPU Pricing: Per-Second Billing and Cold-Start Math
RunPod Serverless charges $0.00053/sec for H100 access when a pod is running. At 100% utilization, this works out to $1.91/hr, which is cheaper than the PCIe Secure Cloud on-demand rate ($2.79/hr). The scale-to-zero mechanism means you pay nothing during idle periods. That sounds straightforward, but cold-start overhead changes the math significantly.
Every time a pod initializes (a cold start), RunPod bills the full per-second rate for the initialization period. H100 pods typically take 20-60 seconds to become ready, depending on image size and container startup time. That adds $0.011-$0.032 per cold start, which is negligible individually but accumulates at scale.
For inference APIs with highly variable traffic:
| Utilization Rate | Effective Hourly Rate | vs PCIe Secure Cloud ($2.79/hr) |
|---|---|---|
| 100% (continuous) | $1.91/hr | 32% cheaper |
| 80% | $1.53/hr | 45% cheaper |
| 50% | $0.96/hr | 66% cheaper |
| 20% | $0.38/hr | 86% cheaper |
| <10% with frequent cold-starts | $0.10-$0.25/hr + cold-start cost | Still cheaper |
Serverless becomes less attractive when:
- Your API needs sub-100ms first-token latency (cold-start adds 20-60s of delay).
- You are running training jobs (continuous, no idle time, cold-starts irrelevant).
- Your workload is already at near-100% utilization around the clock.
For the inference throughput math behind token costs, see Ollama vs vLLM and KV cache optimization.
Hidden Costs on RunPod H100
The per-hour compute rate is not the total cost. RunPod charges separately for storage and network volumes:
- Network volumes: $0.10/GB/month for persistent NFS-backed storage accessible across pods.
- Container volumes: $0.10/GB/month for pod-local storage.
- Idle pod billing: If you leave a pod running but idle (paused state), RunPod may still bill at a reduced storage-only rate depending on pod status.
- Cold-start billing: Serverless pods bill the full $0.00053/sec during initialization, before any work is processed.
- Community Cloud egress: Some hosts apply network transfer fees. The RunPod platform does not explicitly itemize this; check with individual hosts.
A realistic monthly bill for a team running 2x H100 SXM5 instances:
| Item | Rate | Monthly (2x H100 SXM5, 720 hrs) |
|---|---|---|
| Compute (2x @ $2.69/hr) | $2.69/hr/GPU | $3,873.60 |
| Network volumes (100 GB) | $0.10/GB/month | $10.00 |
| Container volumes (200 GB) | $0.10/GB/month | $20.00 |
| Total | $3,903.60 |
vs purely looking at the $2.69/hr headline: $3,873.60. Storage adds only ~0.8% in this example, but for storage-intensive pipelines (checkpointing, large datasets), it adds up faster.
RunPod H100 vs Spheron H100: Per-Hour and Per-Million-Token Cost
Spheron's live pricing for H100 as of 22 May 2026, fetched from the Spheron marketplace:
| Metric | RunPod Secure Cloud | Spheron On-Demand | Spheron Spot |
|---|---|---|---|
| H100 SXM5 $/hr | ~$2.69 | $3.90 | $1.66 |
| H100 PCIe $/hr | ~$2.79 | $2.09 | N/A |
| Billing granularity | Per minute | Per minute | Per minute |
| Storage fees | $0.10/GB/month | None (compute only) | None (compute only) |
| Min commitment | None | None | None |
For inference workloads, per-million-token cost matters more than raw $/hr. Using vLLM serving Llama 3.1 70B at approximately 1,200 tokens/sec on an H100 SXM5:
Cost per million tokens = (1,000,000 / 1,200 tokens/sec) / 3,600 * price
| Provider | H100 SXM5 $/hr | Cost per Million Tokens |
|---|---|---|
| RunPod Secure Cloud | $2.69 | $0.623 |
| Spheron On-Demand | $3.90 | $0.903 |
| Spheron Spot | $1.66 | $0.384 |
| RunPod Serverless (100% util) | $1.91 | $0.443 |
Spheron's on-demand H100 SXM5 at $3.90/hr is higher than RunPod Secure Cloud's $2.69/hr for the same variant. The comparison flips on PCIe: Spheron PCIe on-demand at $2.09/hr beats RunPod PCIe at $2.79/hr. On spot, Spheron SXM5 at $1.66/hr is the cheapest option in this comparison, dropping the per-million-token cost to $0.384 vs $0.623 at RunPod Secure Cloud. Check H100 on Spheron for current SXM5 and PCIe availability.
Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.
RunPod vs Lambda vs CoreWeave H100 Pricing Side-by-Side
| Provider | H100 SXM On-Demand | H100 PCIe On-Demand | Spot/Interruptible | Billing | Notes |
|---|---|---|---|---|---|
| RunPod | ~$2.69/hr | ~$2.79/hr | Community Cloud ~$1.80-$2.40/hr | Per minute | Serverless also available at $0.00053/sec |
| Spheron | $3.90/hr | $2.09/hr | $1.66/hr (SXM5) | Per minute | Aggregates from 5+ providers, no storage markup |
| Lambda Labs | ~$2.49/hr | ~$2.49/hr | Not available | Per hour | On-demand reservations often required for availability |
| CoreWeave | ~$2.09/hr | ~$2.09/hr | Limited | Per second | Contract pricing standard for large clusters |
Lambda Labs typically has strong availability but bills hourly and often requires on-demand reservations to guarantee capacity. CoreWeave's public rates are competitive, but sustained access for teams without contracts can be unpredictable. Spheron aggregates supply from 5+ providers, which generally keeps spot and on-demand availability higher than single-datacenter platforms.
For a wider market survey covering AWS, Azure, Lambda, CoreWeave, and other providers, see GPU cloud pricing 2026.
Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.
When RunPod Serverless Beats On-Demand (and When It Doesn't)
Serverless wins for workloads where idle time is significant:
- Bursty inference APIs: If your API sees traffic spikes with 30-70% idle time between bursts, the scale-to-zero billing cuts your effective rate by 30-70% vs on-demand.
- Prototyping and low-volume endpoints: Running an H100 inference endpoint for occasional queries at $2.79/hr is expensive. Serverless lets you pay only for the seconds you actually use.
- Event-driven pipelines: Batch jobs that run overnight or on a schedule benefit from not paying for idle daytime hours.
Serverless loses for:
- Sustained training runs: A 72-hour fine-tuning job runs continuously with no idle time. At 100% utilization, Serverless ($1.91/hr) is cheaper than Secure Cloud ($2.69/hr), but training frameworks often need persistent state that conflicts with stateless pod restarts.
- Low-latency production inference: If first-token latency must be under 500ms, a 20-60s cold start is a non-starter. On-demand with a warm pod is the only option.
- High-throughput continuous inference: At 85%+ sustained utilization, the difference between $1.91/hr and $2.69/hr matters less than infrastructure predictability. On-demand with reserved capacity scales more reliably.
| Workload | Best Pricing Model | Provider Pick |
|---|---|---|
| Bursty inference API (<50% utilization) | RunPod Serverless | RunPod |
| Production inference (low latency required) | On-demand PCIe | Spheron ($2.09/hr) |
| Training runs (24h+) | On-demand SXM5 or Spot | RunPod Secure (SXM5 on-demand) or Spheron Spot |
| Prototyping / dev endpoints | RunPod Serverless | RunPod |
| Batch inference (scheduled overnight) | Spot or Serverless | Spheron Spot |
| Multi-GPU distributed training | Spot SXM5 | Spheron Spot ($1.66/hr) |
Spheron H100 Spot Availability and Marketplace Pricing
Spheron's marketplace model aggregates H100 capacity from data center partners globally, which creates more spot inventory than a single-provider platform can typically offer. Spot pricing for H100 SXM5 sits at $1.66/hr, roughly 57% below the $3.90/hr on-demand rate for the same SKU. That gap can narrow when supply tightens.
Spot instances suit workloads that checkpoint regularly. Training jobs using tools like Determined AI or custom PyTorch checkpoint loops can tolerate interruption every few hours without losing significant progress. For these workloads, running spot and re-queuing on interruption is cheaper than paying on-demand rates continuously.
Spheron's marketplace also differs structurally from RunPod's Community Cloud: Spheron vets providers for Tier 2/3/4 data center compliance before listing their hardware. This is a different reliability tradeoff than RunPod Community Cloud, where hosts range from enterprise to individual operators.
Check Spheron spot GPU instances for current availability and pricing. For a detailed comparison including architecture differences, see Spheron vs RunPod.
RunPod H100 Secure Cloud starts at $2.69/hr. Spheron H100 PCIe starts at $2.09/hr on-demand, with SXM5 spot from $1.66/hr for fault-tolerant workloads. No storage markups, no cold-start billing, per-minute granularity.
Frequently Asked Questions
RunPod Secure Cloud charges approximately $2.69-$2.79/hr for H100 80GB instances, depending on SXM vs PCIe variant and availability. Community Cloud listings are cheaper (sometimes $1.80-$2.40/hr) but vary by host. Serverless H100 pricing is billed per second at approximately $0.00053/sec ($1.91/hr equivalent) plus cold-start overhead.
RunPod Serverless bills in per-second increments with a minimum execution floor. H100 Serverless runs around $0.00053/sec (roughly $1.91/hr equivalent when fully utilized). The catch is cold-start time: a typical H100 pod takes 20-60 seconds to initialize, billed at the full per-second rate. For workloads with bursty, unpredictable traffic and extended idle periods, Serverless can save money. For continuous or near-continuous inference loads, on-demand instances are cheaper.
Yes. RunPod charges for network volumes ($0.10/GB/month), container volumes ($0.10/GB/month), and persistent storage separate from compute. Data egress is not charged by RunPod itself, but underlying network usage at some community hosts may apply. Always check the total bill including storage when comparing to other providers.
RunPod has EU-based Secure Cloud regions including Norway, France, and the Netherlands. EU H100 availability is generally lower than US regions, and prices may be 5-15% higher depending on supply. Community Cloud hosts in Europe are available but vary more in reliability and pricing.
RunPod does not publish a standard volume discount tier. Enterprise pricing is available by contacting their sales team. For sustained multi-GPU workloads, platforms like Spheron and CoreWeave are more transparent about volume pricing: Spheron offers committed-use arrangements via direct contact, and CoreWeave prices reserved clusters through contracts.
