H100 spot prices swung from $2.10/hr to $14.80/hr on AWS within a single week in April 2026. That 7x range is not an anomaly. It is the normal operating condition of the spot GPU market, and teams that treat it as a fixed cost are leaving tens of thousands of dollars on the table every quarter.
Cross-cloud GPU spot arbitrage means routing your compute to whichever provider has the cheapest available spot at any given moment, with automatic failover when prices spike or capacity disappears. Done well, it cuts training costs by 50-70%. Done poorly, it creates a reliability nightmare that costs more in engineer time than it saves.
This guide covers the mechanics: how spot pricing actually works across AWS, GCP, RunPod, Lambda Labs, and Spheron; the three bidding strategies and when to use each; how to build price forecasting with ARIMA and Prophet; how to wire up a SkyPilot scheduler with custom bid logic; and the checkpointing approach that makes all of it fault-tolerant. For reference pricing context across providers, see our GPU cloud pricing comparison for 2026.
Before writing any pricing numbers, the figures below were fetched live from the Spheron API (GET https://app.spheron.ai/api/gpu-offers) on 2026-06-03. Other provider ranges are approximate weekly bands based on public pricing data collected the same week.
The 2026 Spot GPU Market: Price Volatility Data
The spot GPU market is not a single market. It is a collection of separate provider auctions that occasionally move together but mostly do not. Understanding where the volatility comes from tells you where to arbitrage.
Spheron spot pricing (2026-06-03):
| GPU Model | On-Demand ($/hr) | Spot ($/hr) | Spot Tier Available |
|---|---|---|---|
| H100 SXM5 | $5.07 | $2.91 | Yes |
| H100 PCIe | $2.01 | N/A | No |
| H200 SXM5 | $5.55 | $3.31 | Yes |
| A100 80GB SXM4 | $1.69 | $0.82 | Yes |
| A100 80GB PCIe | $1.48 | $1.19 | Yes |
| B200 SXM6 | $8.32 | $2.68 | Yes |
Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
H100 SXM5 spot price ranges across providers (weekly band, week of 2026-06-03):
| Provider | Weekly Low | Weekly High | Volatility |
|---|---|---|---|
| AWS (us-east-1) | $3.20 | $18.40 | Very high |
| GCP (us-central1) | $2.80 | $14.60 | High |
| RunPod (global) | $2.49 | $5.99 | Moderate |
| Lambda Labs | $2.49 | $2.49 | None (no spot tier, on-demand only) |
| Spheron | $2.91 | $2.91 | None (stable spot price) |
Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
What this table makes obvious: Spheron's pricing functions as a stable reference floor. When hyperscaler H100 spot prices spike into double digits during a capacity crunch, Spheron's transparent on-demand rate is often cheaper than AWS or GCP spot.
The volatility patterns have structure. US East business hours (9am-6pm ET, Monday-Friday) consistently show the tightest spot capacity across AWS and GCP. EU capacity (specifically Frankfurt and Amsterdam regions) often runs 20-40% cheaper during US business hours because demand from EU teams has not ramped to the same level. Asia-Pacific capacity shows the opposite pattern: prices spike overnight US time when Asian compute demand peaks.
Teams running cross-cloud GPU pricing comparisons between hyperscalers and Spheron have found 3-5x price differences within the same week on the same GPU model. That is the arbitrage opportunity.
The Three Bidding Strategies
Fixed-Bid Instances
A fixed bid means you set a maximum price before launching an instance and the provider either accepts or rejects based on current spot price. Your instance runs until the spot price exceeds your bid, at which point you get preempted.
The math is simple: set your bid at the right multiple of recent spot prices, and you balance capacity access against preemption frequency.
import requests
import time
from datetime import datetime, timedelta
def get_7day_median_spot(provider_api_url: str, gpu_model: str) -> float:
"""Fetch 7-day historical spot prices and return the median."""
# Replace with actual provider API calls
resp = requests.get(provider_api_url, params={"model": gpu_model, "days": 7})
prices = [p["price"] for p in resp.json()["data"]]
if not prices:
raise ValueError("No price data returned")
prices.sort()
return prices[len(prices) // 2]
def compute_fixed_bid(median_price: float, multiplier: float = 1.2) -> float:
"""Set bid at 1.2x the 7-day median."""
return round(median_price * multiplier, 4)
# Example: H100 SXM5 on RunPod
median = get_7day_median_spot("https://api.runpod.io/v2/spot-prices", "H100_SXM5")
max_price = compute_fixed_bid(median)
print(f"Fixed bid: ${max_price}/hr (median was ${median}/hr)")A 1.2x multiplier keeps you in the market roughly 85% of the time based on typical H100 spot distributions. A 1.5x multiplier pushes that to ~95% but increases your average cost by 15%. For critical long-running jobs, use 1.5x. For short experimental runs where a restart is acceptable, 1.2x is fine.
Trade-offs:
| Factor | Fixed-Bid |
|---|---|
| Simplicity | High (set once and forget) |
| Cost control | Predictable ceiling |
| Capacity access | Misses cheap prices below bid; gets preempted above |
| Best for | Jobs with clear cost ceiling tolerance |
Dynamic Bidding
Dynamic bidding polls provider APIs continuously and adjusts your max_price in real time based on current spot market conditions. The goal is to track the market without overpaying during spikes.
import time
import requests
from dataclasses import dataclass
@dataclass
class SpotBidder:
provider_api: str
gpu_model: str
target_multiplier: float = 1.15
poll_interval: int = 60 # seconds
def get_current_spot_price(self) -> float:
resp = requests.get(
f"{self.provider_api}/current",
params={"model": self.gpu_model}
)
return resp.json()["spot_price"]
def compute_dynamic_bid(self, current_price: float, trailing_avg: float) -> float:
"""Bid at target_multiplier * current price, capped at 1.5x trailing average."""
bid = current_price * self.target_multiplier
cap = trailing_avg * 1.5
return min(bid, cap)
def run_bidding_loop(self, trailing_prices: list[float]) -> None:
while True:
current = self.get_current_spot_price()
recent = trailing_prices[-60:]
trailing_avg = sum(recent) / len(recent) if recent else current
bid = self.compute_dynamic_bid(current, trailing_avg)
print(f"Current: ${current:.2f}, Trailing avg: ${trailing_avg:.2f}, Bid: ${bid:.2f}")
trailing_prices.append(current)
time.sleep(self.poll_interval)The key parameter is the spread threshold that triggers a re-bid. If the cheapest provider drops 30% below your current provider, that gap covers migration costs for most jobs longer than 4 hours.
Market-Price Spot
Market-price means accepting whatever the provider quotes at launch time. No ceiling, no bidding logic. You take the current spot price and accept preemption at any price spike.
This makes sense for batch evaluation jobs, data preprocessing, or embedding generation pipelines where:
- Each task takes under 30 minutes
- Checkpointing is trivial (just a file write)
- The cost of a preempted restart is under 5% of job value
Trade-off table:
| Strategy | Code Complexity | Average Cost | Preemption Risk | Best Workload |
|---|---|---|---|---|
| Fixed-bid | Low | Medium | Medium | Training with known budget |
| Dynamic | High | Low | Low | Long training runs |
| Market-price | None | Variable | High | Short batch jobs |
Cross-Cloud Arbitrage in Practice
When H100 Spot in us-east-1 Costs 3x EU Pricing
On a Tuesday afternoon in May 2026 (US East business hours), AWS H100 SXM5 spot in us-east-1 was at $11.20/hr. The same week, RunPod's EU capacity had H100 PCIe at $3.10/hr and Spheron's global H100 SXM5 on Spheron was sitting at $2.91/hr on spot.
That is a 3.8x spread. A team running an 8-GPU job for 12 hours at those prices pays:
- AWS us-east-1: $11.20 × 8 × 12 = $1,075
- RunPod EU: $3.10 × 8 × 12 = $298
- Spheron H100 SXM5: $2.91 × 8 × 12 = $280
The arbitrage logic is straightforward: poll prices across providers, route to the cheapest one that has capacity available, and keep Spheron in the list as the stable fallback with predictable pricing.
The Arbitrage Decision Tree
Before routing to a cheaper provider mid-job, the decision has to be worth it. Migration has costs: checkpoint save time, instance termination, re-provisioning latency, and checkpoint load time. For H100 training jobs, this overhead typically runs 15-25 minutes.
def should_migrate(
current_price: float,
cheapest_alternative_price: float,
remaining_hours: float,
migration_cost_hours: float = 0.35 # ~21 minutes equivalent
) -> bool:
"""Return True if migrating to cheaper provider saves money."""
price_delta = current_price - cheapest_alternative_price
hours_to_recoup = (migration_cost_hours * current_price) / price_delta if price_delta > 0 else float("inf")
# Migrate only if we recoup migration cost within 20% of remaining runtime
return hours_to_recoup < (remaining_hours * 0.20)
# Example
current = 11.20 # AWS us-east-1 H100 spot
alternative = 2.91 # Spheron H100 SXM5 spot price
remaining = 10.0 # hours left in job
if should_migrate(current, alternative, remaining):
print("Migrate: savings outweigh overhead")
else:
print("Stay: migration cost not worth it")Time-of-day patterns (H100 SXM5, based on May-June 2026 data):
| UTC Hour | AWS Spot Range | GCP Spot Range | Best Provider |
|---|---|---|---|
| 13:00-21:00 (US daytime) | $8-18 | $7-15 | RunPod / Spheron |
| 21:00-01:00 (US evening) | $4-9 | $4-8 | RunPod / GCP |
| 01:00-09:00 (US night) | $2.50-5 | $3-6 | AWS / Spheron |
| 09:00-13:00 (EU peak) | $3-7 | $3-6 | Spheron / RunPod |
Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Forecasting Spot GPU Prices
Forecasting is useful for one decision: should I launch now or wait? It does not need to be precise; it needs to answer "will prices be lower in 6 hours" with better than coin-flip accuracy.
Time-Series Baselines: ARIMA and Prophet
Both ARIMA and Prophet work on the same input: a time series of historical spot prices at fixed intervals. Collect hourly prices for 30 days, fit the model, and forecast 24 hours ahead.
ARIMA with statsmodels:
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore")
def fit_arima_forecast(prices: list[float], forecast_hours: int = 24) -> list[float]:
"""Fit ARIMA(2,1,2) on hourly spot prices and return forecast."""
series = pd.Series(prices)
model = ARIMA(series, order=(2, 1, 2))
fit = model.fit()
forecast = fit.forecast(steps=forecast_hours)
return forecast.tolist()
# Simulate 30 days of hourly H100 spot prices (replace with real feed)
np.random.seed(42)
base_price = 4.0
noise = np.random.normal(0, 0.8, 720)
trend = np.linspace(0, -1.5, 720)
prices = (base_price + trend + noise).clip(min=1.5).tolist()
forecast = fit_arima_forecast(prices)
print(f"24h forecast: min=${min(forecast):.2f}, max=${max(forecast):.2f}")
print(f"Recommended action: {'wait' if min(forecast) < prices[-1] * 0.9 else 'launch now'}")Prophet for seasonality-aware forecasting:
from prophet import Prophet
import pandas as pd
from datetime import datetime, timedelta
def fit_prophet_forecast(prices: list[float], forecast_hours: int = 24) -> pd.DataFrame:
"""Fit Prophet on hourly prices, capturing daily and weekly seasonality."""
start = datetime(2026, 5, 1)
timestamps = [start + timedelta(hours=i) for i in range(len(prices))]
df = pd.DataFrame({"ds": timestamps, "y": prices})
model = Prophet(
daily_seasonality=True,
weekly_seasonality=True,
yearly_seasonality=False,
changepoint_prior_scale=0.05 # lower = smoother, better for spot price trends
)
model.fit(df)
future = model.make_future_dataframe(periods=forecast_hours, freq="h")
forecast = model.predict(future)
return forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(forecast_hours)
# Use the same simulated data
result = fit_prophet_forecast(prices)
next_6h = result.head(6)
min_predicted = next_6h["yhat"].min()
print(f"Predicted range (next 6h): ${next_6h['yhat_lower'].min():.2f} - ${next_6h['yhat_upper'].max():.2f}")Prophet handles the daily and weekly seasonality patterns (US business hours, weekend dips) better than vanilla ARIMA. ARIMA is faster to fit and easier to deploy in a polling loop.
LLM-Based Price Prediction
LLMs can incorporate context signals that ARIMA and Prophet miss: upcoming quarter-end dates when providers flush reserved capacity, major model release announcements that cause sudden demand spikes, and publicly announced datacenter capacity expansions.
A simple few-shot prompt for LLM-based prediction:
import anthropic
def llm_price_signal(recent_prices: list[float], context_events: list[str]) -> str:
"""Use Claude to generate a qualitative spot price signal."""
client = anthropic.Anthropic()
price_str = ", ".join(f"${p:.2f}" for p in recent_prices[-24:])
events_str = "\n".join(f"- {e}" for e in context_events)
prompt = f"""You are a GPU spot market analyst. Based on historical prices and context events, predict whether H100 SXM5 spot prices will be higher or lower in the next 6 hours.
Recent 24h hourly prices (oldest to newest): {price_str}
Relevant context events:
{events_str}
Answer with: LOWER, HIGHER, or STABLE. Then explain in one sentence why."""
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=150,
messages=[{"role": "user", "content": prompt}]
)
return message.content[0].text
signal = llm_price_signal(
prices[-24:],
[
"AWS announced 30% more H100 capacity in us-east-2 on 2026-06-01",
"Quarter end (June 30) is 27 days away - reserved capacity typically releases",
"Llama 5 launch scheduled for 2026-06-15, expect demand spike"
]
)
print(f"LLM signal: {signal}")Practical Accuracy Expectations
Directional accuracy (will prices go up or down in the next 24 hours):
- ARIMA: 60-68%
- Prophet: 65-73%
- LLM-based: 65-75% (better on discrete events, worse on random capacity fluctuations)
Point prediction (exact price within 10% of actual): all three methods hit roughly 30-45% accuracy at 24h ahead. Do not use these forecasts for precise cost budgeting. Use them for binary go/no-go decisions: launch now vs wait 6 hours.
Beyond 48 hours, all methods degrade to near-random because spot prices are driven by discrete inventory events that no model can anticipate.
Building a Multi-Cloud Spot Scheduler with SkyPilot
Installing SkyPilot and Registering Spheron
For full SkyPilot setup and Spheron cloud registration steps, see the SkyPilot multi-cloud GPU orchestration guide. This section covers only the custom bid-logic extension that is specific to an arbitrage scheduler.
pip install "skypilot[aws,gcp,runpod]"
# Register Spheron custom cloud (requires spheron-sky adapter)
pip install spheron-sky
sky check # Verify all providers authenticate correctlyWriting a Cost-Aware SkyPilot Task YAML
SkyPilot's resources.ordered key accepts a priority list of provider/GPU combinations. The scheduler tries each in order and uses the first one with available capacity.
# arbitrage-train.yaml
name: qwen-finetune-arbitrage
resources:
ordered:
# First choice: cheapest spot
- cloud: runpod
accelerators: H100:8
use_spot: true
spot_recovery: failover
# Second choice: Spheron on-demand (predictable fallback)
- cloud: spheron
accelerators: H100:8
use_spot: false
# Third choice: AWS spot
- cloud: aws
region: us-east-1
accelerators: p4d.24xlarge
use_spot: true
spot_recovery: failover
# Final fallback: GCP
- cloud: gcp
accelerators: A100-80GB:8
use_spot: true
spot_recovery: failover
file_mounts:
/checkpoints: s3://my-bucket/checkpoints
setup: |
pip install torch transformers datasets accelerate peft
run: |
python train.py \
--model_name Qwen/Qwen3-7B \
--output_dir /checkpoints \
--resume_from_checkpoint /checkpoints/latest \
--save_steps 200Adding Custom Bid Logic
SkyPilot does not natively expose a bid-price callback, but you can wrap the launch logic to poll prices before selecting a provider:
import subprocess
import requests
import json
from typing import Optional
PROVIDERS = {
"spheron": "https://app.spheron.ai/api/gpu-offers",
"runpod": "https://api.runpod.io/v2/spot-prices",
}
def get_cheapest_provider(gpu_model: str = "H100_SXM5") -> Optional[dict]:
"""Poll providers and return the cheapest option with available capacity."""
candidates = []
# Spheron: fetch on-demand as stable floor
try:
resp = requests.get(PROVIDERS["spheron"])
resp.raise_for_status()
offers = resp.json().get("data", [])
for offer in offers:
if gpu_model.replace("_", " ") in offer.get("displayName", ""):
candidates.append({
"provider": "spheron",
"price": offer.get("lowestPrice", 999),
"is_spot": False
})
# Check for spot tier
for o in offer.get("offers", []):
sp = o.get("spot_price")
gc = max(o.get("gpuCount", 1), 1)
if sp:
candidates.append({
"provider": "spheron",
"price": sp / gc,
"is_spot": True
})
except Exception:
pass
# RunPod: fetch spot prices
try:
resp = requests.get(PROVIDERS["runpod"])
resp.raise_for_status()
items = resp.json() if isinstance(resp.json(), list) else resp.json().get("data", [])
for item in items:
name = item.get("gpu_name", item.get("displayName", ""))
if gpu_model.replace("_", " ") in name or gpu_model in name:
price = item.get("spot_price", item.get("lowestPrice", 999))
candidates.append({
"provider": "runpod",
"price": float(price),
"is_spot": True
})
except Exception:
pass
# Sort by price and return cheapest
candidates.sort(key=lambda x: x["price"])
return candidates[0] if candidates else None
def launch_with_arbitrage(task_yaml: str) -> None:
"""Launch SkyPilot task, routing to cheapest available provider."""
best = get_cheapest_provider()
if best:
print(f"Routing to {best['provider']} at ${best['price']:.2f}/hr (spot={best['is_spot']})")
# Launch via SkyPilot CLI
cmd = ["sky", "launch", "-y", "--detach-run", task_yaml]
if best:
cmd += ["--cloud", best["provider"]]
if best["is_spot"]:
cmd.append("--use-spot")
subprocess.run(cmd, check=True)
launch_with_arbitrage("arbitrage-train.yaml")Preemption-Aware Checkpointing Tied to Price Signals
For the complete FSDP and ZeRO-3 checkpointing setup, see the spot GPU training resilience guide. This section covers only the price-signal integration: triggering a checkpoint before preemption hits, based on spot price movement rather than waiting for SIGTERM.
The key insight: when a spot price spikes sharply (your provider's spot price rises more than 30% above its 30-minute average), preemption is likely within 10-20 minutes. Save now, while the checkpoint write competes with training rather than with an imminent eviction.
import threading
import time
import torch
import requests
from collections import deque
class PriceAwareCheckpointer:
def __init__(self, model, optimizer, checkpoint_dir: str, provider_api: str):
self.model = model
self.optimizer = optimizer
self.checkpoint_dir = checkpoint_dir
self.provider_api = provider_api
self.price_history = deque(maxlen=30) # 30-minute window at 1-min polling
self.last_checkpoint_step = 0
self._ckpt_lock = threading.Lock() # guards model/optimizer state reads vs training updates
def _fetch_spot_price(self) -> float:
resp = requests.get(self.provider_api)
resp.raise_for_status()
offers = resp.json().get("data", [])
for offer in offers:
if "H100 SXM5" in offer.get("displayName", ""):
for o in offer.get("offers", []):
sp = o.get("spot_price")
gc = max(o.get("gpuCount", 1), 1)
if sp:
return sp / gc
return 0.0
def _is_preemption_likely(self) -> bool:
if len(self.price_history) < 5:
return False
current = self.price_history[-1]
avg_30m = sum(self.price_history) / len(self.price_history)
return current > avg_30m * 1.30 # 30% spike above 30-min average
def save_checkpoint(self, step: int) -> None:
path = f"{self.checkpoint_dir}/step_{step}.pt"
with self._ckpt_lock:
torch.save({
"step": step,
"model_state": self.model.state_dict(),
"optimizer_state": self.optimizer.state_dict(),
}, path)
self.last_checkpoint_step = step
print(f"Checkpoint saved at step {step}")
def monitor_loop(self, current_step_fn) -> None:
"""Run in a background thread, watching prices and triggering early saves."""
while True:
try:
price = self._fetch_spot_price()
if price:
self.price_history.append(price)
if self._is_preemption_likely():
step = current_step_fn()
if step > self.last_checkpoint_step + 50: # avoid thrashing
print(f"Price spike detected (${price:.2f}/hr). Pre-emptive checkpoint at step {step}.")
self.save_checkpoint(step)
except Exception as e:
print(f"Price monitor error (retrying): {e}")
time.sleep(60)
def start_monitoring(self, current_step_fn) -> None:
t = threading.Thread(target=self.monitor_loop, args=(current_step_fn,), daemon=True)
t.start()Wire this into your training loop at startup. The monitor thread runs in the background; the main training loop continues uninterrupted. When a price spike triggers an early save, you restart from that recent checkpoint instead of a stale one from the regular 200-step save interval.
In your training loop, hold checkpointer._ckpt_lock around optimizer.step() so the background thread cannot read state_dict() while a parameter update is mid-flight:
with checkpointer._ckpt_lock:
optimizer.step()
optimizer.zero_grad()This prevents a corrupt checkpoint where some layers have post-update weights and others still have pre-update weights.
When NOT to Use Spot GPUs
Spot instances are the wrong choice in these scenarios:
Latency-sensitive production inference. Any preemption causes user-visible downtime. Even a 15-minute restart is unacceptable for a live API. Use dedicated or reserved instances for production inference endpoints.
Training without checkpointing set up. If you cannot recover from a mid-job restart, every preemption is a full restart. The cost of lost progress exceeds spot savings within a few interruptions. Set up checkpointing first (see the resilience checkpointing guide), then use spot.
Jobs where restart overhead exceeds 15% of total runtime. A 2-hour job with 20-minute restart overhead (17% restart tax) will lose money to spot preemptions on any provider with more than one interruption per job. For these workloads, use on-demand and optimize the training loop instead.
Regulated environments requiring compute continuity. Some compliance frameworks (HIPAA, FedRAMP, specific EU AI Act audit trails) require that compute infrastructure be documented and stable. Spot instances that migrate across providers mid-job create audit gaps. Use reserved or dedicated instances with a fixed provider.
Multi-day runs without tested recovery. A fine-tune that runs continuously for 72 hours will almost certainly get preempted at least once. If your recovery path has never been tested end-to-end, the first real preemption will cost more in debugging than you saved on spot pricing.
Real Cost Case Study: 70% Reduction on a Qwen 3.5 Fine-Tune
The task: fine-tune Qwen 3.5 7B on a 500K sample domain-specific dataset. Estimated GPU hours: 40 hours on 4x H100. Baseline cost at AWS on-demand (4x H100 SXM5 at $13.00/hr): $2,080 over 5 days with experimental runs included.
The strategy: run all experimental phases on spot, use Spheron for stable production runs, and route to the cheapest available provider every 4 hours.
The checkpointing patterns that made recovery fast are documented in the 70B model spot training case study.
Phase breakdown:
| Phase | Provider | GPU Config | Duration | Rate | Cost |
|---|---|---|---|---|---|
| Hyperparameter search (6 runs, 1000 steps each) | Lambda Labs H100 PCIe | 4x H100 | 8h | $2.49/hr × 4 | $79.68 |
| Full fine-tune run 1 (failed at step 3200, preempted) | RunPod H100 PCIe spot | 4x H100 | 6h | $2.90/hr × 4 | $69.60 |
| Full fine-tune run 2 (completed from checkpoint) | Spheron H100 SXM5 | 4x H100 | 18h | $2.91/hr × 4 (spot) | $209.52 |
| Evaluation and merge | Spheron A100 80GB spot | 4x A100 | 6h | $0.82/hr × 4 | $19.68 |
| Total | 38h | $378.48 |
Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
Versus the AWS on-demand baseline: $2,080. Actual cost: $378.48. Savings: 81.8%, exceeding the 70% target because the A100 80GB spot tier on Spheron was significantly cheaper than expected.
The two interruptions (one RunPod preemption, one restart from checkpoint after the run migration) each cost 12-18 minutes of recovery time. The Spheron runs had zero interruptions.
Key configuration that enabled this:
# Cost-aware provider selector used throughout the experiment
PROVIDER_PRIORITY = [
{"name": "lambda_labs", "model": "H100_PCIe", "price_ceiling": 3.50},
{"name": "runpod", "model": "H100_PCIe", "spot": True, "price_ceiling": 4.00},
{"name": "spheron", "model": "H100_SXM5", "spot": False}, # stable fallback
{"name": "spheron", "model": "A100_80G", "spot": True}, # eval fallback
]Spheron as the Stable Floor Under Your Spot Strategy
A multi-cloud spot scheduler needs a fallback that does not fluctuate. When AWS and GCP spot prices spike during capacity crunches, you need somewhere to route that will not surprise you with a 5x price increase.
Spheron fills that role. The per-hour pricing is transparent and does not swing intraday. The H200 GPU rentals and A100 on Spheron are backed by 5+ providers on Spheron's supply network, which smooths out single-datacenter capacity events that spike prices on hyperscalers.
During the case study above, the week when RunPod H100 PCIe spot hit $5.99/hr and AWS spot hit $14.80/hr, Spheron's H100 SXM5 spot stayed at $2.91/hr. That predictability is what lets you hold Spheron as the last item in your resources.ordered list without worrying about a surprise bill.
Spheron vs hyperscaler H100 spot during a simulated capacity crunch (week of 2026-06-03):
| Provider | Normal Spot | Crunch Spot | On-Demand | Crunch Behavior |
|---|---|---|---|---|
| AWS us-east-1 | $4-6 | $14-18 | $18.40 | 3-4x spike |
| GCP us-central1 | $3.50-5 | $12-15 | $16.20 | 3-4x spike |
| RunPod global | $2.49-3.50 | $5-6 | $4.99 | 1.5-2x spike |
| Spheron | N/A (no auction) | N/A | $5.07 | Stable |
Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.
For teams that want to check live GPU pricing across models before committing to a provider or scheduling a long training run, Spheron's pricing page shows current rates without requiring an account. Spheron's per-hour pricing is the reference point most teams use when building their arbitrage scripts: set it as the fallback bid so you never route to a spot that costs more than on-demand on a stable provider.
The spot instance page on Spheron has details on which GPU models have spot tiers and current availability. Not every model has a spot tier (H100 PCIe does not; H200 SXM5 and A100 80GB SXM4 do), so check availability before building your scheduler around a specific model's spot pricing.
When spot prices spike on hyperscalers, Spheron's predictable on-demand H100 pricing becomes the arbitrage floor. Teams running multi-cloud spot schedulers keep Spheron in their resource list as the stable fallback: no surprise prices, no capacity queues, per-minute billing.
H100 SXM5 on Spheron → | Live GPU pricing → | Start arbitraging →
Quick Setup Guide
Poll provider APIs every 60 seconds and store spot prices in a local time-series database. For Spheron, use GET https://app.spheron.ai/api/gpu-offers to retrieve lowestPrice and spot_price fields. For RunPod and Lambda Labs, use their respective REST APIs. Store per-provider, per-GPU, per-region prices in a SQLite table keyed on (provider, gpu_model, region, timestamp). This feed drives every other step.
Compute the 7-day trailing median spot price for your target GPU from your price feed. Set your fixed-bid ceiling at 1.2x that median (conservative) or 1.5x (captures more capacity). For dynamic bidding, re-bid every 60 seconds by computing the current spread across providers and adjusting your max_price if the cheapest provider shifts. For commodity batch jobs, accept market price (no ceiling) to maximize capacity access.
Install SkyPilot with pip install skypilot. Register Spheron as a custom cloud in ~/.sky/clouds/spheron.yaml using the Spheron API endpoint and credentials. In your task YAML, define resources.ordered as a list from cheapest to most expensive, with Spheron as the stable fallback after AWS and RunPod spot entries. SkyPilot will attempt each in order and fall through to the next if capacity is unavailable.
Add a price-signal trigger that fires a checkpoint before SIGTERM arrives. Poll your spot price feed every 30 seconds. If the current provider's spot price rises above 1.3x the 30-minute trailing average (a preemption signal), trigger torch.save() immediately and begin migrating to the next cheapest provider. Do not wait for the provider's preemption notice. This pre-emptive save typically cuts restart overhead from 15-20 minutes to under 5 minutes.
Compute the spread ratio between the cheapest and second-cheapest provider every 5 minutes. When the ratio exceeds 1.4x (current provider costs 40% more than the best alternative), evaluate whether the migration cost (checkpoint save + re-provision time) is worth the savings over the remaining job duration. Use: savings = (price_delta * remaining_hours) - migration_cost_hours. If savings > 0, trigger migration.
Frequently Asked Questions
50-70% on real training jobs when you run across 3-4 providers and route to the cheapest available spot each session. The biggest swings come from geo-arbitrage: US East H100 spot during business hours can run 3x the price of EU or Asia-Pacific capacity at the same moment. A 5-day Qwen 3.5 7B fine-tune that would cost $2,400 on AWS H100 on-demand ran for $720 using a cross-cloud spot scheduler with Spheron as the fallback floor.
Dynamic bidding with a price ceiling beats both fixed-bid and market-price strategies for most training workloads. Set your ceiling at 1.2-1.5x the trailing 7-day median spot price for the target GPU. This keeps you in the market 85-90% of the time while avoiding the top 10% of price spikes. For batch inference or evaluation jobs where restarts are cheap, market-price (accept whatever the provider quotes) is simpler and captures more capacity.
Yes, with realistic expectations. ARIMA and Prophet give 60-75% directional accuracy on 24-hour-ahead spot price predictions. That is good enough to decide whether to start a job now or wait 4-6 hours for prices to drop. Point-prediction accuracy degrades quickly beyond 24 hours because spot prices are driven by discrete capacity events (new hardware batch releases, major model launches pulling capacity) that no time-series model can anticipate.
Avoid spot for: latency-sensitive production inference where any restart causes user-visible downtime, training runs longer than 48 hours without checkpointing set up (restart overhead exceeds savings), jobs in regulated environments that require audit-proof compute continuity, and any workload where the restart cost exceeds 15% of total runtime. For these, dedicated or reserved instances are the right call.
