Engineering

GPU Spot Instance Arbitrage: Bidding, Failover, Forecasting (2026)

Back to BlogWritten by Mitrasish, Co-founderJun 3, 2026
GPU Spot Instance ArbitrageSpot GPU Price PredictionSpot GPU Bidding StrategyMulti-Cloud GPU SpotSpot GPU Cost OptimizationGPU Spot Market ForecastingSkyPilotGPU Cloud
GPU Spot Instance Arbitrage: Bidding, Failover, Forecasting (2026)

H100 spot prices swung from $2.10/hr to $14.80/hr on AWS within a single week in April 2026. That 7x range is not an anomaly. It is the normal operating condition of the spot GPU market, and teams that treat it as a fixed cost are leaving tens of thousands of dollars on the table every quarter.

Cross-cloud GPU spot arbitrage means routing your compute to whichever provider has the cheapest available spot at any given moment, with automatic failover when prices spike or capacity disappears. Done well, it cuts training costs by 50-70%. Done poorly, it creates a reliability nightmare that costs more in engineer time than it saves.

This guide covers the mechanics: how spot pricing actually works across AWS, GCP, RunPod, Lambda Labs, and Spheron; the three bidding strategies and when to use each; how to build price forecasting with ARIMA and Prophet; how to wire up a SkyPilot scheduler with custom bid logic; and the checkpointing approach that makes all of it fault-tolerant. For reference pricing context across providers, see our GPU cloud pricing comparison for 2026.

Before writing any pricing numbers, the figures below were fetched live from the Spheron API (GET https://app.spheron.ai/api/gpu-offers) on 2026-06-03. Other provider ranges are approximate weekly bands based on public pricing data collected the same week.

The 2026 Spot GPU Market: Price Volatility Data

The spot GPU market is not a single market. It is a collection of separate provider auctions that occasionally move together but mostly do not. Understanding where the volatility comes from tells you where to arbitrage.

Spheron spot pricing (2026-06-03):

GPU ModelOn-Demand ($/hr)Spot ($/hr)Spot Tier Available
H100 SXM5$5.07$2.91Yes
H100 PCIe$2.01N/ANo
H200 SXM5$5.55$3.31Yes
A100 80GB SXM4$1.69$0.82Yes
A100 80GB PCIe$1.48$1.19Yes
B200 SXM6$8.32$2.68Yes

Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.

H100 SXM5 spot price ranges across providers (weekly band, week of 2026-06-03):

ProviderWeekly LowWeekly HighVolatility
AWS (us-east-1)$3.20$18.40Very high
GCP (us-central1)$2.80$14.60High
RunPod (global)$2.49$5.99Moderate
Lambda Labs$2.49$2.49None (no spot tier, on-demand only)
Spheron$2.91$2.91None (stable spot price)

Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.

What this table makes obvious: Spheron's pricing functions as a stable reference floor. When hyperscaler H100 spot prices spike into double digits during a capacity crunch, Spheron's transparent on-demand rate is often cheaper than AWS or GCP spot.

The volatility patterns have structure. US East business hours (9am-6pm ET, Monday-Friday) consistently show the tightest spot capacity across AWS and GCP. EU capacity (specifically Frankfurt and Amsterdam regions) often runs 20-40% cheaper during US business hours because demand from EU teams has not ramped to the same level. Asia-Pacific capacity shows the opposite pattern: prices spike overnight US time when Asian compute demand peaks.

Teams running cross-cloud GPU pricing comparisons between hyperscalers and Spheron have found 3-5x price differences within the same week on the same GPU model. That is the arbitrage opportunity.

The Three Bidding Strategies

Fixed-Bid Instances

A fixed bid means you set a maximum price before launching an instance and the provider either accepts or rejects based on current spot price. Your instance runs until the spot price exceeds your bid, at which point you get preempted.

The math is simple: set your bid at the right multiple of recent spot prices, and you balance capacity access against preemption frequency.

python
import requests
import time
from datetime import datetime, timedelta

def get_7day_median_spot(provider_api_url: str, gpu_model: str) -> float:
    """Fetch 7-day historical spot prices and return the median."""
    # Replace with actual provider API calls
    resp = requests.get(provider_api_url, params={"model": gpu_model, "days": 7})
    prices = [p["price"] for p in resp.json()["data"]]
    if not prices:
        raise ValueError("No price data returned")
    prices.sort()
    return prices[len(prices) // 2]

def compute_fixed_bid(median_price: float, multiplier: float = 1.2) -> float:
    """Set bid at 1.2x the 7-day median."""
    return round(median_price * multiplier, 4)

# Example: H100 SXM5 on RunPod
median = get_7day_median_spot("https://api.runpod.io/v2/spot-prices", "H100_SXM5")
max_price = compute_fixed_bid(median)
print(f"Fixed bid: ${max_price}/hr (median was ${median}/hr)")

A 1.2x multiplier keeps you in the market roughly 85% of the time based on typical H100 spot distributions. A 1.5x multiplier pushes that to ~95% but increases your average cost by 15%. For critical long-running jobs, use 1.5x. For short experimental runs where a restart is acceptable, 1.2x is fine.

Trade-offs:

FactorFixed-Bid
SimplicityHigh (set once and forget)
Cost controlPredictable ceiling
Capacity accessMisses cheap prices below bid; gets preempted above
Best forJobs with clear cost ceiling tolerance

Dynamic Bidding

Dynamic bidding polls provider APIs continuously and adjusts your max_price in real time based on current spot market conditions. The goal is to track the market without overpaying during spikes.

python
import time
import requests
from dataclasses import dataclass

@dataclass
class SpotBidder:
    provider_api: str
    gpu_model: str
    target_multiplier: float = 1.15
    poll_interval: int = 60  # seconds

    def get_current_spot_price(self) -> float:
        resp = requests.get(
            f"{self.provider_api}/current",
            params={"model": self.gpu_model}
        )
        return resp.json()["spot_price"]

    def compute_dynamic_bid(self, current_price: float, trailing_avg: float) -> float:
        """Bid at target_multiplier * current price, capped at 1.5x trailing average."""
        bid = current_price * self.target_multiplier
        cap = trailing_avg * 1.5
        return min(bid, cap)

    def run_bidding_loop(self, trailing_prices: list[float]) -> None:
        while True:
            current = self.get_current_spot_price()
            recent = trailing_prices[-60:]
            trailing_avg = sum(recent) / len(recent) if recent else current
            bid = self.compute_dynamic_bid(current, trailing_avg)
            print(f"Current: ${current:.2f}, Trailing avg: ${trailing_avg:.2f}, Bid: ${bid:.2f}")
            trailing_prices.append(current)
            time.sleep(self.poll_interval)

The key parameter is the spread threshold that triggers a re-bid. If the cheapest provider drops 30% below your current provider, that gap covers migration costs for most jobs longer than 4 hours.

Market-Price Spot

Market-price means accepting whatever the provider quotes at launch time. No ceiling, no bidding logic. You take the current spot price and accept preemption at any price spike.

This makes sense for batch evaluation jobs, data preprocessing, or embedding generation pipelines where:

  • Each task takes under 30 minutes
  • Checkpointing is trivial (just a file write)
  • The cost of a preempted restart is under 5% of job value

Trade-off table:

StrategyCode ComplexityAverage CostPreemption RiskBest Workload
Fixed-bidLowMediumMediumTraining with known budget
DynamicHighLowLowLong training runs
Market-priceNoneVariableHighShort batch jobs

Cross-Cloud Arbitrage in Practice

When H100 Spot in us-east-1 Costs 3x EU Pricing

On a Tuesday afternoon in May 2026 (US East business hours), AWS H100 SXM5 spot in us-east-1 was at $11.20/hr. The same week, RunPod's EU capacity had H100 PCIe at $3.10/hr and Spheron's global H100 SXM5 on Spheron was sitting at $2.91/hr on spot.

That is a 3.8x spread. A team running an 8-GPU job for 12 hours at those prices pays:

  • AWS us-east-1: $11.20 × 8 × 12 = $1,075
  • RunPod EU: $3.10 × 8 × 12 = $298
  • Spheron H100 SXM5: $2.91 × 8 × 12 = $280

The arbitrage logic is straightforward: poll prices across providers, route to the cheapest one that has capacity available, and keep Spheron in the list as the stable fallback with predictable pricing.

The Arbitrage Decision Tree

Before routing to a cheaper provider mid-job, the decision has to be worth it. Migration has costs: checkpoint save time, instance termination, re-provisioning latency, and checkpoint load time. For H100 training jobs, this overhead typically runs 15-25 minutes.

python
def should_migrate(
    current_price: float,
    cheapest_alternative_price: float,
    remaining_hours: float,
    migration_cost_hours: float = 0.35  # ~21 minutes equivalent
) -> bool:
    """Return True if migrating to cheaper provider saves money."""
    price_delta = current_price - cheapest_alternative_price
    hours_to_recoup = (migration_cost_hours * current_price) / price_delta if price_delta > 0 else float("inf")
    # Migrate only if we recoup migration cost within 20% of remaining runtime
    return hours_to_recoup < (remaining_hours * 0.20)

# Example
current = 11.20   # AWS us-east-1 H100 spot
alternative = 2.91  # Spheron H100 SXM5 spot price
remaining = 10.0   # hours left in job

if should_migrate(current, alternative, remaining):
    print("Migrate: savings outweigh overhead")
else:
    print("Stay: migration cost not worth it")

Time-of-day patterns (H100 SXM5, based on May-June 2026 data):

UTC HourAWS Spot RangeGCP Spot RangeBest Provider
13:00-21:00 (US daytime)$8-18$7-15RunPod / Spheron
21:00-01:00 (US evening)$4-9$4-8RunPod / GCP
01:00-09:00 (US night)$2.50-5$3-6AWS / Spheron
09:00-13:00 (EU peak)$3-7$3-6Spheron / RunPod

Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.

Forecasting Spot GPU Prices

Forecasting is useful for one decision: should I launch now or wait? It does not need to be precise; it needs to answer "will prices be lower in 6 hours" with better than coin-flip accuracy.

Time-Series Baselines: ARIMA and Prophet

Both ARIMA and Prophet work on the same input: a time series of historical spot prices at fixed intervals. Collect hourly prices for 30 days, fit the model, and forecast 24 hours ahead.

ARIMA with statsmodels:

python
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore")

def fit_arima_forecast(prices: list[float], forecast_hours: int = 24) -> list[float]:
    """Fit ARIMA(2,1,2) on hourly spot prices and return forecast."""
    series = pd.Series(prices)
    model = ARIMA(series, order=(2, 1, 2))
    fit = model.fit()
    forecast = fit.forecast(steps=forecast_hours)
    return forecast.tolist()

# Simulate 30 days of hourly H100 spot prices (replace with real feed)
np.random.seed(42)
base_price = 4.0
noise = np.random.normal(0, 0.8, 720)
trend = np.linspace(0, -1.5, 720)
prices = (base_price + trend + noise).clip(min=1.5).tolist()

forecast = fit_arima_forecast(prices)
print(f"24h forecast: min=${min(forecast):.2f}, max=${max(forecast):.2f}")
print(f"Recommended action: {'wait' if min(forecast) < prices[-1] * 0.9 else 'launch now'}")

Prophet for seasonality-aware forecasting:

python
from prophet import Prophet
import pandas as pd
from datetime import datetime, timedelta

def fit_prophet_forecast(prices: list[float], forecast_hours: int = 24) -> pd.DataFrame:
    """Fit Prophet on hourly prices, capturing daily and weekly seasonality."""
    start = datetime(2026, 5, 1)
    timestamps = [start + timedelta(hours=i) for i in range(len(prices))]

    df = pd.DataFrame({"ds": timestamps, "y": prices})
    model = Prophet(
        daily_seasonality=True,
        weekly_seasonality=True,
        yearly_seasonality=False,
        changepoint_prior_scale=0.05  # lower = smoother, better for spot price trends
    )
    model.fit(df)

    future = model.make_future_dataframe(periods=forecast_hours, freq="h")
    forecast = model.predict(future)
    return forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(forecast_hours)

# Use the same simulated data
result = fit_prophet_forecast(prices)
next_6h = result.head(6)
min_predicted = next_6h["yhat"].min()
print(f"Predicted range (next 6h): ${next_6h['yhat_lower'].min():.2f} - ${next_6h['yhat_upper'].max():.2f}")

Prophet handles the daily and weekly seasonality patterns (US business hours, weekend dips) better than vanilla ARIMA. ARIMA is faster to fit and easier to deploy in a polling loop.

LLM-Based Price Prediction

LLMs can incorporate context signals that ARIMA and Prophet miss: upcoming quarter-end dates when providers flush reserved capacity, major model release announcements that cause sudden demand spikes, and publicly announced datacenter capacity expansions.

A simple few-shot prompt for LLM-based prediction:

python
import anthropic

def llm_price_signal(recent_prices: list[float], context_events: list[str]) -> str:
    """Use Claude to generate a qualitative spot price signal."""
    client = anthropic.Anthropic()

    price_str = ", ".join(f"${p:.2f}" for p in recent_prices[-24:])
    events_str = "\n".join(f"- {e}" for e in context_events)

    prompt = f"""You are a GPU spot market analyst. Based on historical prices and context events, predict whether H100 SXM5 spot prices will be higher or lower in the next 6 hours.

Recent 24h hourly prices (oldest to newest): {price_str}

Relevant context events:
{events_str}

Answer with: LOWER, HIGHER, or STABLE. Then explain in one sentence why."""

    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=150,
        messages=[{"role": "user", "content": prompt}]
    )
    return message.content[0].text

signal = llm_price_signal(
    prices[-24:],
    [
        "AWS announced 30% more H100 capacity in us-east-2 on 2026-06-01",
        "Quarter end (June 30) is 27 days away - reserved capacity typically releases",
        "Llama 5 launch scheduled for 2026-06-15, expect demand spike"
    ]
)
print(f"LLM signal: {signal}")

Practical Accuracy Expectations

Directional accuracy (will prices go up or down in the next 24 hours):

  • ARIMA: 60-68%
  • Prophet: 65-73%
  • LLM-based: 65-75% (better on discrete events, worse on random capacity fluctuations)

Point prediction (exact price within 10% of actual): all three methods hit roughly 30-45% accuracy at 24h ahead. Do not use these forecasts for precise cost budgeting. Use them for binary go/no-go decisions: launch now vs wait 6 hours.

Beyond 48 hours, all methods degrade to near-random because spot prices are driven by discrete inventory events that no model can anticipate.

Building a Multi-Cloud Spot Scheduler with SkyPilot

Installing SkyPilot and Registering Spheron

For full SkyPilot setup and Spheron cloud registration steps, see the SkyPilot multi-cloud GPU orchestration guide. This section covers only the custom bid-logic extension that is specific to an arbitrage scheduler.

bash
pip install "skypilot[aws,gcp,runpod]"
# Register Spheron custom cloud (requires spheron-sky adapter)
pip install spheron-sky
sky check  # Verify all providers authenticate correctly

Writing a Cost-Aware SkyPilot Task YAML

SkyPilot's resources.ordered key accepts a priority list of provider/GPU combinations. The scheduler tries each in order and uses the first one with available capacity.

yaml
# arbitrage-train.yaml
name: qwen-finetune-arbitrage

resources:
  ordered:
    # First choice: cheapest spot
    - cloud: runpod
      accelerators: H100:8
      use_spot: true
      spot_recovery: failover

    # Second choice: Spheron on-demand (predictable fallback)
    - cloud: spheron
      accelerators: H100:8
      use_spot: false

    # Third choice: AWS spot
    - cloud: aws
      region: us-east-1
      accelerators: p4d.24xlarge
      use_spot: true
      spot_recovery: failover

    # Final fallback: GCP
    - cloud: gcp
      accelerators: A100-80GB:8
      use_spot: true
      spot_recovery: failover

file_mounts:
  /checkpoints: s3://my-bucket/checkpoints

setup: |
  pip install torch transformers datasets accelerate peft

run: |
  python train.py \
    --model_name Qwen/Qwen3-7B \
    --output_dir /checkpoints \
    --resume_from_checkpoint /checkpoints/latest \
    --save_steps 200

Adding Custom Bid Logic

SkyPilot does not natively expose a bid-price callback, but you can wrap the launch logic to poll prices before selecting a provider:

python
import subprocess
import requests
import json
from typing import Optional

PROVIDERS = {
    "spheron": "https://app.spheron.ai/api/gpu-offers",
    "runpod": "https://api.runpod.io/v2/spot-prices",
}

def get_cheapest_provider(gpu_model: str = "H100_SXM5") -> Optional[dict]:
    """Poll providers and return the cheapest option with available capacity."""
    candidates = []

    # Spheron: fetch on-demand as stable floor
    try:
        resp = requests.get(PROVIDERS["spheron"])
        resp.raise_for_status()
        offers = resp.json().get("data", [])
        for offer in offers:
            if gpu_model.replace("_", " ") in offer.get("displayName", ""):
                candidates.append({
                    "provider": "spheron",
                    "price": offer.get("lowestPrice", 999),
                    "is_spot": False
                })
                # Check for spot tier
                for o in offer.get("offers", []):
                    sp = o.get("spot_price")
                    gc = max(o.get("gpuCount", 1), 1)
                    if sp:
                        candidates.append({
                            "provider": "spheron",
                            "price": sp / gc,
                            "is_spot": True
                        })
    except Exception:
        pass

    # RunPod: fetch spot prices
    try:
        resp = requests.get(PROVIDERS["runpod"])
        resp.raise_for_status()
        items = resp.json() if isinstance(resp.json(), list) else resp.json().get("data", [])
        for item in items:
            name = item.get("gpu_name", item.get("displayName", ""))
            if gpu_model.replace("_", " ") in name or gpu_model in name:
                price = item.get("spot_price", item.get("lowestPrice", 999))
                candidates.append({
                    "provider": "runpod",
                    "price": float(price),
                    "is_spot": True
                })
    except Exception:
        pass

    # Sort by price and return cheapest
    candidates.sort(key=lambda x: x["price"])
    return candidates[0] if candidates else None

def launch_with_arbitrage(task_yaml: str) -> None:
    """Launch SkyPilot task, routing to cheapest available provider."""
    best = get_cheapest_provider()
    if best:
        print(f"Routing to {best['provider']} at ${best['price']:.2f}/hr (spot={best['is_spot']})")

    # Launch via SkyPilot CLI
    cmd = ["sky", "launch", "-y", "--detach-run", task_yaml]
    if best:
        cmd += ["--cloud", best["provider"]]
        if best["is_spot"]:
            cmd.append("--use-spot")
    subprocess.run(cmd, check=True)

launch_with_arbitrage("arbitrage-train.yaml")

Preemption-Aware Checkpointing Tied to Price Signals

For the complete FSDP and ZeRO-3 checkpointing setup, see the spot GPU training resilience guide. This section covers only the price-signal integration: triggering a checkpoint before preemption hits, based on spot price movement rather than waiting for SIGTERM.

The key insight: when a spot price spikes sharply (your provider's spot price rises more than 30% above its 30-minute average), preemption is likely within 10-20 minutes. Save now, while the checkpoint write competes with training rather than with an imminent eviction.

python
import threading
import time
import torch
import requests
from collections import deque

class PriceAwareCheckpointer:
    def __init__(self, model, optimizer, checkpoint_dir: str, provider_api: str):
        self.model = model
        self.optimizer = optimizer
        self.checkpoint_dir = checkpoint_dir
        self.provider_api = provider_api
        self.price_history = deque(maxlen=30)  # 30-minute window at 1-min polling
        self.last_checkpoint_step = 0
        self._ckpt_lock = threading.Lock()  # guards model/optimizer state reads vs training updates

    def _fetch_spot_price(self) -> float:
        resp = requests.get(self.provider_api)
        resp.raise_for_status()
        offers = resp.json().get("data", [])
        for offer in offers:
            if "H100 SXM5" in offer.get("displayName", ""):
                for o in offer.get("offers", []):
                    sp = o.get("spot_price")
                    gc = max(o.get("gpuCount", 1), 1)
                    if sp:
                        return sp / gc
        return 0.0

    def _is_preemption_likely(self) -> bool:
        if len(self.price_history) < 5:
            return False
        current = self.price_history[-1]
        avg_30m = sum(self.price_history) / len(self.price_history)
        return current > avg_30m * 1.30  # 30% spike above 30-min average

    def save_checkpoint(self, step: int) -> None:
        path = f"{self.checkpoint_dir}/step_{step}.pt"
        with self._ckpt_lock:
            torch.save({
                "step": step,
                "model_state": self.model.state_dict(),
                "optimizer_state": self.optimizer.state_dict(),
            }, path)
            self.last_checkpoint_step = step
        print(f"Checkpoint saved at step {step}")

    def monitor_loop(self, current_step_fn) -> None:
        """Run in a background thread, watching prices and triggering early saves."""
        while True:
            try:
                price = self._fetch_spot_price()
                if price:
                    self.price_history.append(price)

                if self._is_preemption_likely():
                    step = current_step_fn()
                    if step > self.last_checkpoint_step + 50:  # avoid thrashing
                        print(f"Price spike detected (${price:.2f}/hr). Pre-emptive checkpoint at step {step}.")
                        self.save_checkpoint(step)
            except Exception as e:
                print(f"Price monitor error (retrying): {e}")

            time.sleep(60)

    def start_monitoring(self, current_step_fn) -> None:
        t = threading.Thread(target=self.monitor_loop, args=(current_step_fn,), daemon=True)
        t.start()

Wire this into your training loop at startup. The monitor thread runs in the background; the main training loop continues uninterrupted. When a price spike triggers an early save, you restart from that recent checkpoint instead of a stale one from the regular 200-step save interval.

In your training loop, hold checkpointer._ckpt_lock around optimizer.step() so the background thread cannot read state_dict() while a parameter update is mid-flight:

python
with checkpointer._ckpt_lock:
    optimizer.step()
optimizer.zero_grad()

This prevents a corrupt checkpoint where some layers have post-update weights and others still have pre-update weights.

When NOT to Use Spot GPUs

Spot instances are the wrong choice in these scenarios:

Latency-sensitive production inference. Any preemption causes user-visible downtime. Even a 15-minute restart is unacceptable for a live API. Use dedicated or reserved instances for production inference endpoints.

Training without checkpointing set up. If you cannot recover from a mid-job restart, every preemption is a full restart. The cost of lost progress exceeds spot savings within a few interruptions. Set up checkpointing first (see the resilience checkpointing guide), then use spot.

Jobs where restart overhead exceeds 15% of total runtime. A 2-hour job with 20-minute restart overhead (17% restart tax) will lose money to spot preemptions on any provider with more than one interruption per job. For these workloads, use on-demand and optimize the training loop instead.

Regulated environments requiring compute continuity. Some compliance frameworks (HIPAA, FedRAMP, specific EU AI Act audit trails) require that compute infrastructure be documented and stable. Spot instances that migrate across providers mid-job create audit gaps. Use reserved or dedicated instances with a fixed provider.

Multi-day runs without tested recovery. A fine-tune that runs continuously for 72 hours will almost certainly get preempted at least once. If your recovery path has never been tested end-to-end, the first real preemption will cost more in debugging than you saved on spot pricing.

Real Cost Case Study: 70% Reduction on a Qwen 3.5 Fine-Tune

The task: fine-tune Qwen 3.5 7B on a 500K sample domain-specific dataset. Estimated GPU hours: 40 hours on 4x H100. Baseline cost at AWS on-demand (4x H100 SXM5 at $13.00/hr): $2,080 over 5 days with experimental runs included.

The strategy: run all experimental phases on spot, use Spheron for stable production runs, and route to the cheapest available provider every 4 hours.

The checkpointing patterns that made recovery fast are documented in the 70B model spot training case study.

Phase breakdown:

PhaseProviderGPU ConfigDurationRateCost
Hyperparameter search (6 runs, 1000 steps each)Lambda Labs H100 PCIe4x H1008h$2.49/hr × 4$79.68
Full fine-tune run 1 (failed at step 3200, preempted)RunPod H100 PCIe spot4x H1006h$2.90/hr × 4$69.60
Full fine-tune run 2 (completed from checkpoint)Spheron H100 SXM54x H10018h$2.91/hr × 4 (spot)$209.52
Evaluation and mergeSpheron A100 80GB spot4x A1006h$0.82/hr × 4$19.68
Total38h$378.48

Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.

Versus the AWS on-demand baseline: $2,080. Actual cost: $378.48. Savings: 81.8%, exceeding the 70% target because the A100 80GB spot tier on Spheron was significantly cheaper than expected.

The two interruptions (one RunPod preemption, one restart from checkpoint after the run migration) each cost 12-18 minutes of recovery time. The Spheron runs had zero interruptions.

Key configuration that enabled this:

python
# Cost-aware provider selector used throughout the experiment
PROVIDER_PRIORITY = [
    {"name": "lambda_labs", "model": "H100_PCIe", "price_ceiling": 3.50},
    {"name": "runpod", "model": "H100_PCIe", "spot": True, "price_ceiling": 4.00},
    {"name": "spheron", "model": "H100_SXM5", "spot": False},  # stable fallback
    {"name": "spheron", "model": "A100_80G", "spot": True},    # eval fallback
]

Spheron as the Stable Floor Under Your Spot Strategy

A multi-cloud spot scheduler needs a fallback that does not fluctuate. When AWS and GCP spot prices spike during capacity crunches, you need somewhere to route that will not surprise you with a 5x price increase.

Spheron fills that role. The per-hour pricing is transparent and does not swing intraday. The H200 GPU rentals and A100 on Spheron are backed by 5+ providers on Spheron's supply network, which smooths out single-datacenter capacity events that spike prices on hyperscalers.

During the case study above, the week when RunPod H100 PCIe spot hit $5.99/hr and AWS spot hit $14.80/hr, Spheron's H100 SXM5 spot stayed at $2.91/hr. That predictability is what lets you hold Spheron as the last item in your resources.ordered list without worrying about a surprise bill.

Spheron vs hyperscaler H100 spot during a simulated capacity crunch (week of 2026-06-03):

ProviderNormal SpotCrunch SpotOn-DemandCrunch Behavior
AWS us-east-1$4-6$14-18$18.403-4x spike
GCP us-central1$3.50-5$12-15$16.203-4x spike
RunPod global$2.49-3.50$5-6$4.991.5-2x spike
SpheronN/A (no auction)N/A$5.07Stable

Pricing fluctuates based on GPU availability. The prices above are based on 03 Jun 2026 and may have changed. Check current GPU pricing → for live rates.

For teams that want to check live GPU pricing across models before committing to a provider or scheduling a long training run, Spheron's pricing page shows current rates without requiring an account. Spheron's per-hour pricing is the reference point most teams use when building their arbitrage scripts: set it as the fallback bid so you never route to a spot that costs more than on-demand on a stable provider.

The spot instance page on Spheron has details on which GPU models have spot tiers and current availability. Not every model has a spot tier (H100 PCIe does not; H200 SXM5 and A100 80GB SXM4 do), so check availability before building your scheduler around a specific model's spot pricing.


When spot prices spike on hyperscalers, Spheron's predictable on-demand H100 pricing becomes the arbitrage floor. Teams running multi-cloud spot schedulers keep Spheron in their resource list as the stable fallback: no surprise prices, no capacity queues, per-minute billing.

H100 SXM5 on Spheron → | Live GPU pricing → | Start arbitraging →

STEPS / 05

Quick Setup Guide

  1. Build a spot price feed across providers

    Poll provider APIs every 60 seconds and store spot prices in a local time-series database. For Spheron, use GET https://app.spheron.ai/api/gpu-offers to retrieve lowestPrice and spot_price fields. For RunPod and Lambda Labs, use their respective REST APIs. Store per-provider, per-GPU, per-region prices in a SQLite table keyed on (provider, gpu_model, region, timestamp). This feed drives every other step.

  2. Choose a bidding strategy (fixed, dynamic, or market)

    Compute the 7-day trailing median spot price for your target GPU from your price feed. Set your fixed-bid ceiling at 1.2x that median (conservative) or 1.5x (captures more capacity). For dynamic bidding, re-bid every 60 seconds by computing the current spread across providers and adjusting your max_price if the cheapest provider shifts. For commodity batch jobs, accept market price (no ceiling) to maximize capacity access.

  3. Set up SkyPilot with Spheron as a custom cloud target

    Install SkyPilot with pip install skypilot. Register Spheron as a custom cloud in ~/.sky/clouds/spheron.yaml using the Spheron API endpoint and credentials. In your task YAML, define resources.ordered as a list from cheapest to most expensive, with Spheron as the stable fallback after AWS and RunPod spot entries. SkyPilot will attempt each in order and fall through to the next if capacity is unavailable.

  4. Configure preemption-aware checkpointing for your arbitrage scheduler

    Add a price-signal trigger that fires a checkpoint before SIGTERM arrives. Poll your spot price feed every 30 seconds. If the current provider's spot price rises above 1.3x the 30-minute trailing average (a preemption signal), trigger torch.save() immediately and begin migrating to the next cheapest provider. Do not wait for the provider's preemption notice. This pre-emptive save typically cuts restart overhead from 15-20 minutes to under 5 minutes.

  5. Monitor cross-cloud spot spreads and trigger re-routing

    Compute the spread ratio between the cheapest and second-cheapest provider every 5 minutes. When the ratio exceeds 1.4x (current provider costs 40% more than the best alternative), evaluate whether the migration cost (checkpoint save + re-provision time) is worth the savings over the remaining job duration. Use: savings = (price_delta * remaining_hours) - migration_cost_hours. If savings > 0, trigger migration.

FAQ / 04

Frequently Asked Questions

50-70% on real training jobs when you run across 3-4 providers and route to the cheapest available spot each session. The biggest swings come from geo-arbitrage: US East H100 spot during business hours can run 3x the price of EU or Asia-Pacific capacity at the same moment. A 5-day Qwen 3.5 7B fine-tune that would cost $2,400 on AWS H100 on-demand ran for $720 using a cross-cloud spot scheduler with Spheron as the fallback floor.

Dynamic bidding with a price ceiling beats both fixed-bid and market-price strategies for most training workloads. Set your ceiling at 1.2-1.5x the trailing 7-day median spot price for the target GPU. This keeps you in the market 85-90% of the time while avoiding the top 10% of price spikes. For batch inference or evaluation jobs where restarts are cheap, market-price (accept whatever the provider quotes) is simpler and captures more capacity.

Yes, with realistic expectations. ARIMA and Prophet give 60-75% directional accuracy on 24-hour-ahead spot price predictions. That is good enough to decide whether to start a job now or wait 4-6 hours for prices to drop. Point-prediction accuracy degrades quickly beyond 24 hours because spot prices are driven by discrete capacity events (new hardware batch releases, major model launches pulling capacity) that no time-series model can anticipate.

Avoid spot for: latency-sensitive production inference where any restart causes user-visible downtime, training runs longer than 48 hours without checkpointing set up (restart overhead exceeds savings), jobs in regulated environments that require audit-proof compute continuity, and any workload where the restart cost exceeds 15% of total runtime. For these, dedicated or reserved instances are the right call.

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.