RunPod H100 Pricing 2026: Per-Hour and Serverless Cost vs Spheron

Q: How much does RunPod charge per hour for an H100?

RunPod Secure Cloud charges approximately $2.89-$3.29/hr for H100 80GB instances, depending on SXM vs PCIe variant and availability. Community Cloud listings are cheaper (sometimes $1.80-$2.40/hr) but vary by host. Serverless H100 pricing is billed per second at approximately $0.00053/sec ($1.91/hr equivalent) plus cold-start overhead.

Q: What is RunPod Serverless GPU pricing for H100?

RunPod Serverless bills in per-second increments with a minimum execution floor. H100 Serverless runs around $0.00053/sec (roughly $1.91/hr equivalent when fully utilized). The catch is cold-start time: a typical H100 pod takes 20-60 seconds to initialize, billed at the full per-second rate. For workloads with bursty, unpredictable traffic and extended idle periods, Serverless can save money. For continuous or near-continuous inference loads, on-demand instances are cheaper.

Q: Does RunPod charge for storage and egress on H100 instances?

Yes. RunPod charges for network volumes ($0.10/GB/month), container volumes ($0.10/GB/month), and persistent storage separate from compute. Data egress is not charged by RunPod itself, but underlying network usage at some community hosts may apply. Always check the total bill including storage when comparing to other providers.

Q: Does RunPod offer H100 in the EU?

RunPod has EU-based Secure Cloud regions including Norway, France, and the Netherlands. EU H100 availability is generally lower than US regions, and prices may be 5-15% higher depending on supply. Community Cloud hosts in Europe are available but vary more in reliability and pricing.

Q: Can I get a volume discount on RunPod H100?

RunPod does not publish a standard volume discount tier. Enterprise pricing is available by contacting their sales team. For sustained multi-GPU workloads, platforms like Spheron and CoreWeave are more transparent about volume pricing: Spheron offers committed-use arrangements via direct contact, and CoreWeave prices reserved clusters through contracts.

RunPod lists H100 GPUs across three distinct pricing tiers: Community Cloud (community-hosted, variable), Secure Cloud (RunPod-operated, SLA-backed), and Serverless (scale-to-zero, per-second billing). Each tier targets a different tradeoff between cost, reliability, and latency. This post covers exact per-hour rates for each tier, the per-second serverless math, hidden storage fees, and a direct comparison against Spheron, Lambda Labs, and CoreWeave.

RunPod H100 Pricing: Three Tiers Explained

RunPod's tiered model puts different GPU inventory under different guarantees. Understanding which tier fits your workload matters before comparing raw $/hr figures.

Community Cloud connects you to third-party hosts who rent their hardware through RunPod's marketplace. These are the cheapest listings, often $1.80-$2.40/hr for H100 80GB, but availability is host-dependent and hardware condition varies. If a host goes offline, your pod can be interrupted with limited recourse.

Secure Cloud is RunPod's own data center capacity. Hardware is RunPod-owned, availability is more predictable, and uptime guarantees apply. Prices are higher: H100 80GB (PCIe) runs around $2.89/hr and SXM5 variants are approximately $3.29/hr. Most production inference deployments land here.

Serverless is not an always-on instance. You deploy a container template and RunPod spins up H100 pods on demand, billing per second of active execution. When traffic drops to zero, you pay zero. This is ideal for APIs with unpredictable or bursty traffic patterns.

Tier	H100 Type	Price Range	Billing	Reliability
Community Cloud	H100 80GB (PCIe)	$1.80-$2.40/hr	Per minute	Variable (host-dependent)
Secure Cloud	H100 80GB (PCIe)	~$2.79/hr	Per minute	SLA-backed
Secure Cloud	H100 SXM5	~$2.69/hr	Per minute	SLA-backed
Serverless	H100 (configured)	~$0.00053/sec	Per second	Managed, cold-start applies

RunPod H100 SXM5 vs PCIe Per-Hour Rates

Most H100 listings on RunPod show "H100 80GB" without specifying interconnect. The majority are PCIe models. SXM5 variants with NVLink are listed separately and typically carry a small premium due to higher memory bandwidth (3.35 TB/s vs 2 TB/s) and better multi-GPU scaling.

Variant	VRAM	Tier	Price/hr	Notes
H100 80GB PCIe	80 GB HBM2e	Community Cloud	$1.80-$2.40	Host-variable, may be interrupted
H100 80GB PCIe	80 GB HBM2e	Secure Cloud	~$2.89	RunPod-operated, predictable
H100 SXM5	80 GB HBM3	Secure Cloud	~$3.29	Higher bandwidth, better for multi-GPU
H100 80GB (Serverless)	80 GB	Serverless	$0.00053/sec	Scale-to-zero, cold-start billed

For single-GPU inference where bandwidth headroom is not critical, the PCIe variant is fine. For distributed training or large model inference requiring multi-GPU NVLink bandwidth, the SXM5 variant is the right call.

RunPod Serverless GPU Pricing: Per-Second Billing and Cold-Start Math

RunPod Serverless charges $0.00053/sec for H100 access when a pod is running. At 100% utilization, this works out to $1.91/hr, which is cheaper than the PCIe Secure Cloud on-demand rate ($2.89/hr). The scale-to-zero mechanism means you pay nothing during idle periods. That sounds straightforward, but cold-start overhead changes the math significantly.

Every time a pod initializes (a cold start), RunPod bills the full per-second rate for the initialization period. H100 pods typically take 20-60 seconds to become ready, depending on image size and container startup time. That adds $0.011-$0.032 per cold start, which is negligible individually but accumulates at scale.

For inference APIs with highly variable traffic:

Utilization Rate	Effective Hourly Rate	vs PCIe Secure Cloud ($2.79/hr)
100% (continuous)	$1.91/hr	32% cheaper
80%	$1.53/hr	45% cheaper
50%	$0.96/hr	66% cheaper
20%	$0.38/hr	86% cheaper
<10% with frequent cold-starts	$0.10-$0.25/hr + cold-start cost	Still cheaper

Serverless becomes less attractive when:

Your API needs sub-100ms first-token latency (cold-start adds 20-60s of delay).
You are running training jobs (continuous, no idle time, cold-starts irrelevant).
Your workload is already at near-100% utilization around the clock.

For the inference throughput math behind token costs, see Ollama vs vLLM and KV cache optimization.

Hidden Costs on RunPod H100

The per-hour compute rate is not the total cost. RunPod charges separately for storage and network volumes:

Network volumes: $0.10/GB/month for persistent NFS-backed storage accessible across pods.
Container volumes: $0.10/GB/month for pod-local storage.
Idle pod billing: If you leave a pod running but idle (paused state), RunPod may still bill at a reduced storage-only rate depending on pod status.
Cold-start billing: Serverless pods bill the full $0.00053/sec during initialization, before any work is processed.
Community Cloud egress: Some hosts apply network transfer fees. The RunPod platform does not explicitly itemize this; check with individual hosts.

A realistic monthly bill for a team running 2x H100 SXM5 instances:

Item	Rate	Monthly (2x H100 SXM5, 720 hrs)
Compute (2x @ $2.69/hr)	$2.69/hr/GPU	$3,873.60
Network volumes (100 GB)	$0.10/GB/month	$10.00
Container volumes (200 GB)	$0.10/GB/month	$20.00
Total	$3,903.60

vs purely looking at the $2.69/hr headline: $3,873.60. Storage adds only ~0.8% in this example, but for storage-intensive pipelines (checkpointing, large datasets), it adds up faster.

RunPod H100 vs Spheron H100: Per-Hour and Per-Million-Token Cost

Spheron's live pricing for H100 as of 22 May 2026, fetched from the Spheron GPU rental marketplace:

Metric	RunPod Secure Cloud	Spheron On-Demand	Spheron Spot
H100 SXM5 $/hr	~$2.69	$3.90	$1.66
H100 PCIe $/hr	~$2.79	$2.09	N/A
Billing granularity	Per minute	Per minute	Per minute
Storage fees	$0.10/GB/month	None (compute only)	None (compute only)
Min commitment	None	None	None

For inference workloads, per-million-token cost matters more than raw $/hr. Using vLLM serving Llama 3.1 70B at approximately 1,200 tokens/sec on an H100 SXM5:

Cost per million tokens = (1,000,000 / 1,200 tokens/sec) / 3,600 * price

Provider	H100 SXM5 $/hr	Cost per Million Tokens
RunPod Secure Cloud	$3.29	$0.762
Spheron On-Demand	$3.90	$0.903
Spheron Spot	$1.66	$0.384
RunPod Serverless (100% util)	$1.91	$0.443

Spheron's on-demand H100 SXM5 at $3.90/hr is higher than RunPod Secure Cloud's $3.29/hr for the same variant. The comparison flips on PCIe: Spheron PCIe on-demand at $2.09/hr beats RunPod PCIe at $2.89/hr. On spot, Spheron SXM5 at $1.66/hr is the cheapest option in this comparison, dropping the per-million-token cost to $0.384 vs $0.762 at RunPod Secure Cloud. Check H100 on Spheron for current SXM5 and PCIe availability.

Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.

RunPod vs Lambda vs CoreWeave H100 Pricing Side-by-Side

Provider	H100 SXM On-Demand	H100 PCIe On-Demand	Spot/Interruptible	Billing	Notes
RunPod	~$3.29/hr	~$2.89/hr	Community Cloud ~$1.80-$2.40/hr	Per minute	Serverless also available at $0.00053/sec
Spheron	$3.90/hr	$2.09/hr	$1.66/hr (SXM5)	Per minute	Aggregates from 5+ providers, no storage markup
Lambda Labs	~$2.49/hr	~$2.49/hr	Not available	Per hour	On-demand reservations often required for availability
CoreWeave	$6.16/hr (8-GPU node only)	N/A	~$2.46/hr	Per hour	No single-GPU option, contract pricing standard for large clusters

Lambda Labs typically has strong availability but bills hourly and often requires on-demand reservations to guarantee capacity. CoreWeave's on-demand rate is the highest in this table, and it's sold only as an 8-GPU HGX node with no single-GPU tier; see our full CoreWeave H100 and H200 pricing breakdown for the per-GPU math and how its two pricing pages quote different numbers. Spheron aggregates supply from 5+ providers, which generally keeps spot and on-demand availability higher than single-datacenter platforms. Nebius runs $3.85/hr for HGX H100 on-demand with preemptible instances at $2.15/hr; for the full breakdown of Nebius per-hour rates including H200 pricing and committed-use math, that post covers the details.

For a wider market survey covering AWS, Azure, Lambda, CoreWeave, and other providers, see GPU cloud pricing 2026.

Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.

When RunPod Serverless Beats On-Demand (and When It Doesn't)

Serverless wins for workloads where idle time is significant:

Bursty inference APIs: If your API sees traffic spikes with 30-70% idle time between bursts, the scale-to-zero billing cuts your effective rate by 30-70% vs on-demand.
Prototyping and low-volume endpoints: Running an H100 inference endpoint for occasional queries at $2.79/hr is expensive. Serverless lets you pay only for the seconds you actually use.
Event-driven pipelines: Batch jobs that run overnight or on a schedule benefit from not paying for idle daytime hours.

Serverless loses for:

Sustained training runs: A 72-hour fine-tuning job runs continuously with no idle time. At 100% utilization, Serverless ($1.91/hr) is cheaper than Secure Cloud ($2.69/hr), but training frameworks often need persistent state that conflicts with stateless pod restarts.
Low-latency production inference: If first-token latency must be under 500ms, a 20-60s cold start is a non-starter. On-demand with a warm pod is the only option.
High-throughput continuous inference: At 85%+ sustained utilization, the difference between $1.91/hr and $2.69/hr matters less than infrastructure predictability. On-demand with reserved capacity scales more reliably.

Workload	Best Pricing Model	Provider Pick
Bursty inference API (<50% utilization)	RunPod Serverless	RunPod
Production inference (low latency required)	On-demand PCIe	Spheron ($2.09/hr)
Training runs (24h+)	On-demand SXM5 or Spot	RunPod Secure (SXM5 on-demand) or Spheron Spot
Prototyping / dev endpoints	RunPod Serverless	RunPod
Batch inference (scheduled overnight)	Spot or Serverless	Spheron Spot
Multi-GPU distributed training	Spot SXM5	Spheron Spot ($1.66/hr)

Spheron H100 Spot Availability and Marketplace Pricing

Spheron's marketplace model aggregates H100 capacity from data center partners globally, which creates more spot inventory than a single-provider platform can typically offer. Spot pricing for H100 SXM5 sits at $1.66/hr, roughly 57% below the $3.90/hr on-demand rate for the same SKU. That gap can narrow when supply tightens.

Spot instances suit workloads that checkpoint regularly. Training jobs using tools like Determined AI or custom PyTorch checkpoint loops can tolerate interruption every few hours without losing significant progress. For these workloads, running spot and re-queuing on interruption is cheaper than paying on-demand rates continuously.

Spheron's marketplace also differs structurally from RunPod's Community Cloud: Spheron vets providers for Tier 2/3/4 data center compliance before listing their hardware. This is a different reliability tradeoff than RunPod Community Cloud, where hosts range from enterprise to individual operators.

Check Spheron spot GPU instances for current availability and pricing. For a detailed comparison including architecture differences, see Spheron vs RunPod.

RunPod H100 SXM5 Secure Cloud runs $3.29/hr. Spheron H100 PCIe starts at $2.09/hr on-demand, with SXM5 spot from $1.66/hr for fault-tolerant workloads. No storage markups, no cold-start billing, per-minute granularity.
H100 pricing on Spheron → | View all GPU pricing →

FAQ / 05

Frequently Asked Questions