Comparison

RunPod H100 Pricing 2026: Per-Hour and Serverless Cost vs Spheron

Back to BlogWritten by Mitrasish, Co-founderMay 22, 2026
runpod h100 price per hourrunpod h100 pricingrunpod serverless gpu pricingrunpod h100 costrunpod pricing 2026GPU Cloud PricingH100 RentalGPU Cloud
RunPod H100 Pricing 2026: Per-Hour and Serverless Cost vs Spheron

RunPod lists H100 GPUs across three distinct pricing tiers: Community Cloud (community-hosted, variable), Secure Cloud (RunPod-operated, SLA-backed), and Serverless (scale-to-zero, per-second billing). Each tier targets a different tradeoff between cost, reliability, and latency. This post covers exact per-hour rates for each tier, the per-second serverless math, hidden storage fees, and a direct comparison against Spheron, Lambda Labs, and CoreWeave.

RunPod H100 Pricing: Three Tiers Explained

RunPod's tiered model puts different GPU inventory under different guarantees. Understanding which tier fits your workload matters before comparing raw $/hr figures.

Community Cloud connects you to third-party hosts who rent their hardware through RunPod's marketplace. These are the cheapest listings, often $1.80-$2.40/hr for H100 80GB, but availability is host-dependent and hardware condition varies. If a host goes offline, your pod can be interrupted with limited recourse.

Secure Cloud is RunPod's own data center capacity. Hardware is RunPod-owned, availability is more predictable, and uptime guarantees apply. Prices are higher: H100 80GB (PCIe) runs around $2.79/hr and SXM5 variants are approximately $2.69/hr. Most production inference deployments land here.

Serverless is not an always-on instance. You deploy a container template and RunPod spins up H100 pods on demand, billing per second of active execution. When traffic drops to zero, you pay zero. This is ideal for APIs with unpredictable or bursty traffic patterns.

TierH100 TypePrice RangeBillingReliability
Community CloudH100 80GB (PCIe)$1.80-$2.40/hrPer minuteVariable (host-dependent)
Secure CloudH100 80GB (PCIe)~$2.79/hrPer minuteSLA-backed
Secure CloudH100 SXM5~$2.69/hrPer minuteSLA-backed
ServerlessH100 (configured)~$0.00053/secPer secondManaged, cold-start applies

RunPod H100 SXM5 vs PCIe Per-Hour Rates

Most H100 listings on RunPod show "H100 80GB" without specifying interconnect. The majority are PCIe models. SXM5 variants with NVLink are listed separately and typically carry a small premium due to higher memory bandwidth (3.35 TB/s vs 2 TB/s) and better multi-GPU scaling.

VariantVRAMTierPrice/hrNotes
H100 80GB PCIe80 GB HBM2eCommunity Cloud$1.80-$2.40Host-variable, may be interrupted
H100 80GB PCIe80 GB HBM2eSecure Cloud~$2.79RunPod-operated, predictable
H100 SXM580 GB HBM3Secure Cloud~$2.69Higher bandwidth, better for multi-GPU
H100 80GB (Serverless)80 GBServerless$0.00053/secScale-to-zero, cold-start billed

For single-GPU inference where bandwidth headroom is not critical, the PCIe variant is fine. For distributed training or large model inference requiring multi-GPU NVLink bandwidth, the SXM5 variant is the right call.

RunPod Serverless GPU Pricing: Per-Second Billing and Cold-Start Math

RunPod Serverless charges $0.00053/sec for H100 access when a pod is running. At 100% utilization, this works out to $1.91/hr, which is cheaper than the PCIe Secure Cloud on-demand rate ($2.79/hr). The scale-to-zero mechanism means you pay nothing during idle periods. That sounds straightforward, but cold-start overhead changes the math significantly.

Every time a pod initializes (a cold start), RunPod bills the full per-second rate for the initialization period. H100 pods typically take 20-60 seconds to become ready, depending on image size and container startup time. That adds $0.011-$0.032 per cold start, which is negligible individually but accumulates at scale.

For inference APIs with highly variable traffic:

Utilization RateEffective Hourly Ratevs PCIe Secure Cloud ($2.79/hr)
100% (continuous)$1.91/hr32% cheaper
80%$1.53/hr45% cheaper
50%$0.96/hr66% cheaper
20%$0.38/hr86% cheaper
<10% with frequent cold-starts$0.10-$0.25/hr + cold-start costStill cheaper

Serverless becomes less attractive when:

  • Your API needs sub-100ms first-token latency (cold-start adds 20-60s of delay).
  • You are running training jobs (continuous, no idle time, cold-starts irrelevant).
  • Your workload is already at near-100% utilization around the clock.

For the inference throughput math behind token costs, see Ollama vs vLLM and KV cache optimization.

Hidden Costs on RunPod H100

The per-hour compute rate is not the total cost. RunPod charges separately for storage and network volumes:

  • Network volumes: $0.10/GB/month for persistent NFS-backed storage accessible across pods.
  • Container volumes: $0.10/GB/month for pod-local storage.
  • Idle pod billing: If you leave a pod running but idle (paused state), RunPod may still bill at a reduced storage-only rate depending on pod status.
  • Cold-start billing: Serverless pods bill the full $0.00053/sec during initialization, before any work is processed.
  • Community Cloud egress: Some hosts apply network transfer fees. The RunPod platform does not explicitly itemize this; check with individual hosts.

A realistic monthly bill for a team running 2x H100 SXM5 instances:

ItemRateMonthly (2x H100 SXM5, 720 hrs)
Compute (2x @ $2.69/hr)$2.69/hr/GPU$3,873.60
Network volumes (100 GB)$0.10/GB/month$10.00
Container volumes (200 GB)$0.10/GB/month$20.00
Total$3,903.60

vs purely looking at the $2.69/hr headline: $3,873.60. Storage adds only ~0.8% in this example, but for storage-intensive pipelines (checkpointing, large datasets), it adds up faster.

RunPod H100 vs Spheron H100: Per-Hour and Per-Million-Token Cost

Spheron's live pricing for H100 as of 22 May 2026, fetched from the Spheron marketplace:

MetricRunPod Secure CloudSpheron On-DemandSpheron Spot
H100 SXM5 $/hr~$2.69$3.90$1.66
H100 PCIe $/hr~$2.79$2.09N/A
Billing granularityPer minutePer minutePer minute
Storage fees$0.10/GB/monthNone (compute only)None (compute only)
Min commitmentNoneNoneNone

For inference workloads, per-million-token cost matters more than raw $/hr. Using vLLM serving Llama 3.1 70B at approximately 1,200 tokens/sec on an H100 SXM5:

Cost per million tokens = (1,000,000 / 1,200 tokens/sec) / 3,600 * price

ProviderH100 SXM5 $/hrCost per Million Tokens
RunPod Secure Cloud$2.69$0.623
Spheron On-Demand$3.90$0.903
Spheron Spot$1.66$0.384
RunPod Serverless (100% util)$1.91$0.443

Spheron's on-demand H100 SXM5 at $3.90/hr is higher than RunPod Secure Cloud's $2.69/hr for the same variant. The comparison flips on PCIe: Spheron PCIe on-demand at $2.09/hr beats RunPod PCIe at $2.79/hr. On spot, Spheron SXM5 at $1.66/hr is the cheapest option in this comparison, dropping the per-million-token cost to $0.384 vs $0.623 at RunPod Secure Cloud. Check H100 on Spheron for current SXM5 and PCIe availability.

Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.

RunPod vs Lambda vs CoreWeave H100 Pricing Side-by-Side

ProviderH100 SXM On-DemandH100 PCIe On-DemandSpot/InterruptibleBillingNotes
RunPod~$2.69/hr~$2.79/hrCommunity Cloud ~$1.80-$2.40/hrPer minuteServerless also available at $0.00053/sec
Spheron$3.90/hr$2.09/hr$1.66/hr (SXM5)Per minuteAggregates from 5+ providers, no storage markup
Lambda Labs~$2.49/hr~$2.49/hrNot availablePer hourOn-demand reservations often required for availability
CoreWeave~$2.09/hr~$2.09/hrLimitedPer secondContract pricing standard for large clusters

Lambda Labs typically has strong availability but bills hourly and often requires on-demand reservations to guarantee capacity. CoreWeave's public rates are competitive, but sustained access for teams without contracts can be unpredictable. Spheron aggregates supply from 5+ providers, which generally keeps spot and on-demand availability higher than single-datacenter platforms.

For a wider market survey covering AWS, Azure, Lambda, CoreWeave, and other providers, see GPU cloud pricing 2026.

Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.

When RunPod Serverless Beats On-Demand (and When It Doesn't)

Serverless wins for workloads where idle time is significant:

  • Bursty inference APIs: If your API sees traffic spikes with 30-70% idle time between bursts, the scale-to-zero billing cuts your effective rate by 30-70% vs on-demand.
  • Prototyping and low-volume endpoints: Running an H100 inference endpoint for occasional queries at $2.79/hr is expensive. Serverless lets you pay only for the seconds you actually use.
  • Event-driven pipelines: Batch jobs that run overnight or on a schedule benefit from not paying for idle daytime hours.

Serverless loses for:

  • Sustained training runs: A 72-hour fine-tuning job runs continuously with no idle time. At 100% utilization, Serverless ($1.91/hr) is cheaper than Secure Cloud ($2.69/hr), but training frameworks often need persistent state that conflicts with stateless pod restarts.
  • Low-latency production inference: If first-token latency must be under 500ms, a 20-60s cold start is a non-starter. On-demand with a warm pod is the only option.
  • High-throughput continuous inference: At 85%+ sustained utilization, the difference between $1.91/hr and $2.69/hr matters less than infrastructure predictability. On-demand with reserved capacity scales more reliably.
WorkloadBest Pricing ModelProvider Pick
Bursty inference API (<50% utilization)RunPod ServerlessRunPod
Production inference (low latency required)On-demand PCIeSpheron ($2.09/hr)
Training runs (24h+)On-demand SXM5 or SpotRunPod Secure (SXM5 on-demand) or Spheron Spot
Prototyping / dev endpointsRunPod ServerlessRunPod
Batch inference (scheduled overnight)Spot or ServerlessSpheron Spot
Multi-GPU distributed trainingSpot SXM5Spheron Spot ($1.66/hr)

Spheron H100 Spot Availability and Marketplace Pricing

Spheron's marketplace model aggregates H100 capacity from data center partners globally, which creates more spot inventory than a single-provider platform can typically offer. Spot pricing for H100 SXM5 sits at $1.66/hr, roughly 57% below the $3.90/hr on-demand rate for the same SKU. That gap can narrow when supply tightens.

Spot instances suit workloads that checkpoint regularly. Training jobs using tools like Determined AI or custom PyTorch checkpoint loops can tolerate interruption every few hours without losing significant progress. For these workloads, running spot and re-queuing on interruption is cheaper than paying on-demand rates continuously.

Spheron's marketplace also differs structurally from RunPod's Community Cloud: Spheron vets providers for Tier 2/3/4 data center compliance before listing their hardware. This is a different reliability tradeoff than RunPod Community Cloud, where hosts range from enterprise to individual operators.

Check Spheron spot GPU instances for current availability and pricing. For a detailed comparison including architecture differences, see Spheron vs RunPod.

RunPod H100 Secure Cloud starts at $2.69/hr. Spheron H100 PCIe starts at $2.09/hr on-demand, with SXM5 spot from $1.66/hr for fault-tolerant workloads. No storage markups, no cold-start billing, per-minute granularity.

Rent H100 on Spheron → | View all GPU pricing →

FAQ / 05

Frequently Asked Questions

RunPod Secure Cloud charges approximately $2.69-$2.79/hr for H100 80GB instances, depending on SXM vs PCIe variant and availability. Community Cloud listings are cheaper (sometimes $1.80-$2.40/hr) but vary by host. Serverless H100 pricing is billed per second at approximately $0.00053/sec ($1.91/hr equivalent) plus cold-start overhead.

RunPod Serverless bills in per-second increments with a minimum execution floor. H100 Serverless runs around $0.00053/sec (roughly $1.91/hr equivalent when fully utilized). The catch is cold-start time: a typical H100 pod takes 20-60 seconds to initialize, billed at the full per-second rate. For workloads with bursty, unpredictable traffic and extended idle periods, Serverless can save money. For continuous or near-continuous inference loads, on-demand instances are cheaper.

Yes. RunPod charges for network volumes ($0.10/GB/month), container volumes ($0.10/GB/month), and persistent storage separate from compute. Data egress is not charged by RunPod itself, but underlying network usage at some community hosts may apply. Always check the total bill including storage when comparing to other providers.

RunPod has EU-based Secure Cloud regions including Norway, France, and the Netherlands. EU H100 availability is generally lower than US regions, and prices may be 5-15% higher depending on supply. Community Cloud hosts in Europe are available but vary more in reliability and pricing.

RunPod does not publish a standard volume discount tier. Enterprise pricing is available by contacting their sales team. For sustained multi-GPU workloads, platforms like Spheron and CoreWeave are more transparent about volume pricing: Spheron offers committed-use arrangements via direct contact, and CoreWeave prices reserved clusters through contracts.

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.