Comparison

Spheron vs Shadeform: Marketplace vs Marketplace: Which GPU Cloud Wins?

Back to BlogWritten by Mitrasish, Co-founderMar 12, 2026
GPU CloudShadeform AlternativeMulti-Cloud GPUAI InfrastructureCost ComparisonGPU Rental
Spheron vs Shadeform: Marketplace vs Marketplace: Which GPU Cloud Wins?

Most GPU cloud comparisons pit a marketplace against a dedicated provider, the aggregator vs the hyperscaler. This one is different. Both Spheron and Shadeform take the same foundational approach: aggregate GPU supply from multiple cloud providers and give you a single interface to access all of it. Neither owns the underlying hardware. Both give you access to capacity across multiple data centers through one account.

But that's where the similarity ends. The two platforms were built with different buyers in mind, serve different use cases, and make different tradeoffs on access model, bare metal guarantees, spot instance availability, and what "the product" actually means.

The single most important difference for most teams: Shadeform has no spot instance offering at all. Every workload you run on Shadeform pays full on-demand rates, with no cheaper preemptible tier available. Spheron offers spot instances across its entire GPU catalog at up to 68% below dedicated rates. For teams running model training, hyperparameter search, batch inference, or any fault-tolerant workload, this one difference can cut compute spend by tens to hundreds of thousands of dollars per year. The spot instance section below breaks down exactly which workloads benefit and by how much.

Workloads where Spheron spot saves up to 68% vs Shadeform on-demand (no spot tier exists on Shadeform): LLM training and fine-tuning (checkpoint every 500 steps and resume instantly), hyperparameter search sweeps (each trial is independent so a preemption only loses one config), batch inference over large datasets, data preprocessing and tokenization pipelines, embedding generation at scale, RLHF training stages, model distillation, diffusion model and LoRA fine-tuning, ablation studies and architecture search, audio transcription pipelines (Whisper, Parakeet), RAG index building and vector corpus ingestion, protein structure prediction (AlphaFold over compound libraries), CI/ML validation pipelines, and continuous pre-training on domain corpora. On Shadeform, all of these workloads run at full on-demand rates. On Spheron, they run on spot. A team spending $200K per year on GPU compute saves approximately $91K per year by routing eligible workloads through Spheron spot instead of Shadeform on-demand. A team at $500K per year saves approximately $213K. See the full cost breakdown and use case guide below.

Here's where the differences actually matter for your team.

Quick Comparison

FeatureSpheronShadeform
ModelMulti-provider marketplace + direct rentalsMulti-provider API brokerage
GPU selection30+ SKUs from 5+ vetted providers21 cloud providers active in API (marketed as '30+ clouds'), including Lambda, Nebius, Crusoe, Hyperstack, DigitalOcean, Vultr, Verda, Boost Run, Excess Supply, Massed Compute, FPT, Paperspace, and others
Pricing modelMarketplace competitionProvider rates via unified API
Consumer productYes, full dashboardConsole + primarily API/CLI focused
Bare metal accessYes, for select configurations (H100 PCIe, RTX 4090)Select providers only (Latitude, Cudo, Voltage Park, Amaya, Boostrun, Excess Supply, FPT, Evergreen)
Spot instancesYes, across all major GPU models (H100 SXM5 from $0.97/hr spot / $2.50/hr dedicated - ~61% savings; H200 SXM5 from $1.43/hr spot / $4.54/hr dedicated - ~68% savings; B300 from $2.45/hr spot / $8.55/hr dedicated - ~71% savings)No spot tier; every workload (training, batch inference, fine-tuning, preprocessing) runs at full on-demand rates
Reserved instancesYesYes, via Reserved Commitments feature (request-based, not self-service; terms from 1 week to 3 years; minimum 1 node = 8 GPUs)
API accessYes, REST APIYes, core product with REST API
GPU catalog highlightsRTX 5090, B300, GH200, H100, H200, A100, RTX 4090, L40SH100, H200, GH200, A100, B200, B300, RTX 5090 (via network partners)
Signup to deployMinutes via dashboardMinutes via console or API
Multi-node clustersYesYes, on-demand clusters up to 64 GPUs (2-8 nodes of 8 GPUs each, no commitment required); larger deployments via Reserved Commitments (contact-based, no GPU count limit)

Of all the differences in this table, spot instance availability has the largest direct impact on your compute budget. Shadeform has no spot tier at all. Every instance provisioned on Shadeform runs at the standard on-demand rate. Spheron offers spot instances across all major GPU models at up to 68% below dedicated rates on current-generation GPUs, depending on which GPU you use. For teams running model training, hyperparameter search, batch inference, or any workload that tolerates restarts, this single difference can save tens to hundreds of thousands of dollars per year at scale. The full breakdown is in the Spot Instances section below.

Provider Network: How Each Sources GPU Supply

Both platforms aggregate supply, but the philosophy behind each network is different.

Shadeform's approach: Shadeform (YC S23) markets itself as a '30+ clouds' marketplace. The API currently includes 21 active providers for on-demand GPU deployments, including Lambda, Nebius, Crusoe, Hyperstack, Massed Compute, DigitalOcean, Scaleway, Voltage Park, Latitude, Denvr, Vultr, Cudo Compute, Verda, Horizon Computing, Boost Run, Evergreen Compute, Excess Supply, FPT, IMWT, Amaya, and Paperspace. Users access all active providers through Shadeform's unified API and console without needing individual accounts with each provider. Shadeform handles billing centrally. They do apply a platform markup, but it is built into the prices displayed on their dashboard rather than shown as a separate fee, so the rates you see already include their margin. For teams that want access to the widest possible market through one API, this is genuinely valuable. Shadeform provisions on underlying providers using its own pooled accounts, so you do not need to sign up with the individual provider directly. Provider availability changes as Shadeform expands its network; always check the current provider list at platform.shadeform.ai before making procurement decisions.

Spheron's approach: Spheron sources GPU capacity from vetted data center partners with a focus on consistency of hardware access rather than raw provider count. The emphasis is on direct access to GPU resources, with bare-metal configurations available for select instance types. Fewer providers, deeper integration with each.

The practical difference: Shadeform gives you access to more providers across its network (21 currently active for on-demand deployments, marketed as '30+ clouds'). Spheron gives you a more curated provider network with spot instance availability and a first-class product dashboard. If provider breadth and the ability to tap into specific clouds (Lambda, Nebius, Crusoe) matters for your team, Shadeform's wider network is an advantage. If spot instances, competitive pricing, and a full product experience matter more, Spheron's model is the stronger choice. See our multi-cloud GPU strategy guide for context on how provider networks affect real workloads.

API-First vs Product-First

This is the most important distinction between the two platforms.

Shadeform: Shadeform offers both a full UI console at platform.shadeform.ai and a public REST API. The console lets you browse GPU availability, compare pricing across providers, and launch instances without writing any code. The REST API covers instance creation, cluster management, SSH key management, volume management, and template deployment. They have documented SkyPilot integration (single-node only via SkyPilot; multi-node clusters are handled directly through Shadeform's own API) and support Docker container launch configurations and startup scripts natively in the API. Shadeform was designed with API-first teams in mind, but the platform is fully usable through its console UI for teams that prefer a visual workflow.

Spheron: Both a full consumer product (dashboard, instant deployment, GPU catalog browser) and a complete REST API. You can use Spheron the way you use AWS: pick a GPU from the catalog and deploy in the browser, or provision programmatically. The dashboard is not an afterthought. It's designed for teams that want to spin up and manage instances from a UI as part of their daily workflow.

Which matters to you:

  • If your team will use the UI regularly to spin up and manage instances, Spheron's product experience is built for this from day one
  • If you need to embed GPU provisioning into your own platform, CI pipeline, or internal tooling via API, both work, but Shadeform is more purpose-built for API-first workflows
  • If you want to link your existing cloud provider credits (Lambda, AWS, etc.), Shadeform's account-linking feature handles this directly

Bare Metal Access: The Critical Difference for AI Workloads

Spheron's access model: When you provision a GPU on Spheron, you get full VM or bare-metal access with root control. Bare-metal instances are available for select configurations, particularly H100 PCIe and RTX 4090. You can install custom CUDA versions, modify kernel parameters, and run workloads that require direct hardware access.

Shadeform's brokerage model: Shadeform provisions instances on top of its partner clouds. The majority of instances are VMs. Bare metal is available through select providers in their network: Latitude, Cudo Compute, Voltage Park, Amaya, Boostrun, Excess Supply, FPT, and Evergreen. Latitude offers bare-metal H100 PCIe at $1.99/hr in Dallas, for example. When you deploy through Shadeform on a VM-based provider, you get SSH root access to the instance but the underlying hardware may have hypervisor overhead depending on the provider's infrastructure.

For AI training and inference workloads where custom CUDA configurations, kernel tuning, or driver-level optimizations matter, bare-metal access is a real advantage on either platform. When it matters less (standard inference workloads with pre-packaged Docker images and no driver-level customization requirements), the VM-based instances available through Shadeform's broader network work fine. For more on why hardware access level matters for AI workloads, see our dedicated vs shared GPU comparison.

GPU Catalog: What You Can Actually Access

Spheron: RTX 5090, B300, H100 PCIe, H100 SXM5, H200, A100 80GB, GH200, L40S, RTX 4090, RTX PRO 6000, and more. Includes the latest Blackwell hardware (B300) and Hopper-generation data center GPUs.

Shadeform: Through its network of 21 active providers (marketed as '30+ clouds'), Shadeform offers access to H100, H200, GH200, A100, L40S, RTX 4090, RTX 5090, B200, B300 Blackwell Ultra, RTX Pro 6000 (Blackwell-architecture workstation GPU, available via IMWT from $1.25/GPU/hr as of March 2026), and consumer GPUs like A5000, A6000, RTX 6000 Ada Generation, and V100. B300 is available through Verda at $6.59/hr (1x VM, verified March 2026). B200 is listed through Verda at $5.63/hr and Lambda Labs at $5.29/hr, though both had limited availability as of March 2026; always check current inventory before planning a deployment. RTX 5090 is available from Excess Supply at $0.65/hr (Oslo, Norway) and from Evergreen at $0.70/GPU/hr as part of an 8-GPU node ($5.60/hr total). New GPU availability expands as provider partners add capacity.

The catalog difference: Both platforms have access to Blackwell hardware. Spheron maintains its own vetted relationships with providers carrying the latest GPUs. Shadeform's catalog expands as its partner providers add new hardware. Shadeform has no spot tier for any GPU; all instances run at on-demand rates. Spheron offers spot pricing for B300 (from $2.45/hr spot vs $8.55/hr dedicated - ~71% savings) and H100 SXM5 (from $0.97/hr spot vs $2.50/hr dedicated - ~61% savings). If you need a specific GPU model today, check current availability on both platforms as GPU inventory moves quickly.

Pricing: Marketplace Competition

Both platforms use an aggregation model that generally undercuts hyperscalers significantly. Exact pricing depends on which underlying provider you use.

Spheron: Multiple providers competing for your business with prices that vary by provider and region. Browse Spheron's live pricing for current rates across all GPU models.

Shadeform: Applies a platform markup that is baked into the prices shown on their dashboard and API. There is no separate fee line item, the markup is included in the displayed rate, so you are not paying the raw provider price directly.

Here's a representative comparison based on rates verified against live Spheron and Shadeform platform APIs in March 2026. GPU cloud pricing is highly dynamic and can shift daily as providers add and retire capacity. Treat these as directional benchmarks only, not current quotes. Always verify pricing directly on each platform before making deployment decisions:

ConfigurationSpheronShadeformNotes
H100 PCIe 80GB (1x)from $2.01/hr (dedicated)from $1.66/hr (Latitude VM, unavailable as of March 2026) / $1.90/hr (Hyperstack VM, Montreal) / $1.99/hr (Latitude bare metal, Dallas)Shadeform's $1.90/hr includes their hidden platform markup; actual raw provider cost is lower than displayed; Spheron price is transparent with no baked-in margin
H100 SXM5 (1x)from $0.97/hr (spot) / from $2.50/hr (dedicated)from $2.26/hr (Verda VM, on-demand only)Spheron spot is ~57% cheaper than Shadeform on-demand; Shadeform has no spot tier - every H100 SXM5 workload pays full price
H200 SXM5 (1x)from $1.43/hr (spot) / from $4.54/hr (dedicated)Not widely available through Shadeform networkSpheron H200 spot at $1.43/hr has no equivalent on Shadeform
H100 (8x node cluster)Availablefrom $14.34/hr (Latitude H100 PCIe bare metal NVLink) / $15.20/hr (Hyperstack VM) / $15.60/hr (Hyperstack NVLink VM)Compare multi-node options directly on each platform
RTX 5090 (1x)from $0.71/hr (US-based)from $0.65/hr (Excess Supply, Oslo Norway only, markup included) / $0.70/GPU/hr (Evergreen, 8x node only at $5.60/hr total)Shadeform's cheaper RTX 5090 is geographically restricted to Norway with markup built in; Spheron offers US-based access at $0.71/hr
B300 SXM6 (1x)from $2.45/hr (spot) / from $8.55/hr (dedicated)from $6.59/hr (Verda, on-demand)Spheron spot is ~63% cheaper than Shadeform on-demand; Shadeform has no spot option for B300

Pricing was verified against live Spheron and Shadeform APIs in March 2026. Always check current live rates on both platforms before committing, as GPU pricing shifts frequently as providers add and retire capacity.

Spot Instances: A Major Cost Advantage Shadeform Does Not Offer

This is the single most impactful cost difference between the two platforms. For teams running training jobs, batch inference, or any workload that can tolerate occasional restarts, this one factor can reduce your annual compute bill by up to 68% on the workloads that qualify.

Bottom line on savings: A team spending $100K/year on GPU compute can save approximately $46K annually by running eligible workloads on Spheron spot instead of Shadeform on-demand. At $200K/year GPU spend, that grows to ~$91K saved. At $500K/year, the savings reach ~$213K. Shadeform has no spot option; every single GPU hour pays full price regardless of workload type. See the cost impact table below for a full breakdown by team budget.

Shadeform does not offer spot instances. Spheron does. This single feature difference determines whether your team pays full on-demand rates for every GPU hour, or whether you can slash up to 68% off the compute cost of training, fine-tuning, batch inference, and experimentation workloads. If your team runs any of the workload types listed below, Spheron's spot instances represent a direct, quantifiable cost reduction. On Shadeform, those same workloads run at full price with no cheaper tier available.

Shadeform has no spot instance offering. Their model does include a platform markup baked into the displayed prices, and there is no spot or preemptible tier available through Shadeform's API. Every instance you provision on Shadeform runs at the standard on-demand rate, every time, for every workload. There is no cheaper interruptible tier, no way to access excess capacity at reduced cost, and no option to run training jobs on discounted preemptible hardware. If a provider in their network offers spot capacity elsewhere, that pricing is not exposed through Shadeform. (Verify this against current Shadeform documentation, as their product continues to evolve.)

The practical cost impact is direct: every training job, hyperparameter sweep, or batch inference workload on Shadeform pays full on-demand rates with no option for a cheaper tier. On Spheron, those same workloads can run on spot at a fraction of the price. Running 8x H100 SXM5 GPUs for training 40 hours per week costs approximately $311 per week on Spheron spot ($0.97/hr per GPU × 8 × 40 hrs). The same compute on Shadeform, where no spot option exists, runs at $2.26/hr per H100 SXM5 on-demand through Verda, putting the same 8x H100 SXM5 for 40 hours at approximately $723 per week - more than 2× more expensive. That gap compounds over a project lifecycle. Over a year, the difference between a spot-enabled workflow on Spheron and an all-on-demand workflow on Shadeform can save teams tens to hundreds of thousands of dollars on training compute alone.

Spheron offers spot instances across its GPU catalog at substantially lower prices than dedicated (on-demand) rates. Spot instances are preemptible (the provider can reclaim capacity with short notice), but for the right workloads they represent the most cost-effective GPU compute available. Representative rates as of March 2026:

GPUSpot PriceDedicated PriceSavings
H100 SXM5 (per GPU)from $0.97/hrfrom $2.50/hr~61%
H200 SXM5 (per GPU)from $1.43/hrfrom $4.54/hr~68%
B300 SXM6 (per GPU)from $2.45/hrfrom $8.55/hr~71%
A100 80G SXM4 (per GPU)from $0.61/hrfrom $1.14/hr~47%

GPU pricing fluctuates over time based on availability and demand. On-demand (dedicated) and spot prices above are fetched from Spheron's live GPU catalog as of 12 Mar 2026. H200, B300, and A100 spot rates reflect the last observed available price; spot availability may vary. Always check current GPU pricing for live rates.

In practice, teams running interruptible workloads with proper checkpointing infrastructure achieve up to 68% effective savings even after accounting for occasional interruptions. One startup documented training a large language model on 8x H100 GPUs for $11,200 on Spheron spot instances, compared to an estimated $41,500 on dedicated compute, a 73% cost reduction. See our spot GPU training case study for the full breakdown.

Workloads That Benefit from Spot Instances

The table below summarizes the best use cases for GPU spot instances on Spheron. These workloads share a key property: they can tolerate an occasional restart without losing meaningful work, because they either checkpoint state regularly or process data in independent chunks. On Shadeform, none of these workloads get a cheaper tier; every run pays the full on-demand rate.

Use CaseWhy Spot WorksTypical Savings vs DedicatedCheckpoint Strategy
LLM training and fine-tuningLong-running job; save full checkpoints every 500 steps, lightweight state every 100 steps; resume in minutesUp to 68% on H100 SXM5, up to 66% on H200, up to 65% on B300HuggingFace resume_from_checkpoint=True; custom SpotCheckpointCallback
Hyperparameter searchEach trial is independent; a lost trial means rerunning one config, not the whole sweepUp to 68% on current-gen GPUsNo persistent state needed between trials
Batch inferenceEmbarrassingly parallel; stateless per batch; requeue failed batchesUp to 65% on B300; up to 68% on H100 SXM5Write output per shard; restart skips completed shards
Data preprocessing and ETLTokenization, feature extraction, dataset formatting; checkpoint at file or shard level47-68% depending on GPUTrack processed file list; skip on restart
Embedding generation at scalePure batch job over a fixed corpus; output accumulates per documentUp to 65% on B300Append embeddings to output file; resume from last doc ID
Model evaluation and benchmarkingEval tasks are independent; a preemption reruns a subset, not the full suite47-68%Track completed eval IDs; skip on restart
RLHF pipeline (SFT, reward modeling, PPO)Each stage checkpoints independently; policy rollouts are stateless per stepUp to 65% on B300 and up to 68% on H100 SXM5Stage-level checkpointing; resume any phase independently
Diffusion model and LoRA fine-tuningShort fine-tuning runs; HuggingFace Diffusers has built-in checkpointingUp to 65% on B300; up to 68% on H100 SXM5--resume_from_checkpoint in Diffusers training scripts
Synthetic data and annotation pipelinesOutput builds per batch; restart from last written batch with no reprocessing47-68%Track output shards; deduplicate on final merge
Ablation studies and architecture searchDozens of independent runs; losing one ablation is trivialUp to 68% on current-gen GPUsPer-run result logging; no cross-run dependencies
CI/ML validation pipelinesFixed validation set; auto-retry on interrupt; bounded runtime47-68%Test-level tracking; skip passed tests on rerun
Video processing and computer vision batch jobsFrame-level or clip-level parallelism; stateless per unit of workUp to 65% on B300Track processed clip IDs; resume from last completed
Audio transcription and speech-to-text (Whisper, Parakeet)Pure batch job over audio files; each file is independent; stateless per audio segmentUp to 65% on B300; up to 68% on H100 SXM5Track transcribed file IDs; skip completed on restart
RAG index building and vector embedding ingestionChunking and embedding large corpora (documents, codebases, support tickets) for retrieval pipelines; embarrassingly parallel per document chunkUp to 65% on B300Track indexed chunk IDs; append to vector store incrementally
Protein structure prediction and scientific computeAlphaFold, RoseTTAFold, and molecular dynamics simulations over large compound libraries; each structure prediction is independentUp to 65% on B300 and up to 68% on H100 SXM5Track completed structure IDs; resume from last completed prediction
Model distillationTraining smaller student models from a larger teacher is a fully checkpointable training run; each distillation step is independent and recoverableUp to 68% on H100 SXM5, up to 65% on B300Checkpoint every 500 steps; resume from latest checkpoint
Continuous pre-training and domain adaptationExtending a foundation model on domain-specific corpora (medical, legal, code, finance) proceeds step-by-step with full recoverabilityUp to 68% on H100 SXM5; up to 65% on B300Standard HuggingFace or DeepSpeed checkpoint every 500 steps
Multi-modal fine-tuningCLIP, LLaVA, and vision-language model fine-tuning on custom image-text datasets proceeds in discrete steps with built-in HuggingFace checkpointingUp to 65% on B300; up to 68% on H100 SXM5save_steps in Trainer config; resume from latest checkpoint
Video generation model trainingTraining video diffusion models (Sora-class architectures) and video understanding models is compute-intensive and structurally identical to image diffusion training with standard checkpoint intervals; each training step is independent and recoverableUp to 65% on B300; up to 68% on H100 SXM5save_steps in training config; checkpoint to persistent storage every 200-500 steps
LLM-as-judge evaluation pipelinesUsing large language models to score, rank, or evaluate other models' outputs across fixed test sets (MT-Bench, Arena-style evaluation, red-teaming) is a batch inference workload; each evaluation call is independent and output accumulates per test itemUp to 65% on B300; up to 68% on H100 SXM5Track completed eval IDs; append results to output file; resume from last scored item
Background agentic workflowsScheduled agent pipelines running multi-step reasoning over documents, contracts, or code without a live user waiting (nightly data analysis, automated report generation, document review agents); save agent task state externally between steps so a spot interruption resumes mid-task rather than restarting from scratchUp to 65% on B300; up to 68% on H100 SXM5Persist agent state to external database or object storage between steps; restart picks up from last persisted state
RLAIF synthetic data generationGenerating large-scale RLHF or RLAIF training data using teacher models (preference pairs, reasoning traces, tool-use demonstrations) is embarrassingly parallel at the individual example level; each example is generated independently and results accumulate in the output datasetUp to 65% on B300; up to 68% on H100 SXM5Track generated example IDs; append to dataset incrementally; resume from last written example

Model training and fine-tuning: The majority of a training run can execute on spot with aggressive checkpointing. Save full model state every 500 steps and a lightweight resume checkpoint (optimizer state, step counter, RNG state) every 100 steps to persistent network storage. When a spot instance is reclaimed, a new instance auto-provisions and resumes from the last checkpoint. This pattern cuts training costs by up to 68% on H100 SXM5 (from $0.97/hr spot), up to 66% on H200 SXM5 (from $1.43/hr spot), and up to 65% on B300 (from $2.45/hr spot), with no impact on final model quality.

Hyperparameter search and experimentation: Each experimental run is independent and short-lived. If a spot instance is interrupted mid-run, you lose one configuration result, not your entire experiment. Running a grid search or Bayesian optimization sweep on spot vs. dedicated cuts experimentation costs significantly depending on GPU model. Since experimentation typically represents 20-40% of total training compute, the savings compound significantly.

Batch inference: If you're processing a dataset offline (generating embeddings, running evaluations, transcribing audio, classifying images), spot instances work well. Jobs are parallelizable, stateless, and can be requeued if interrupted. Serving real-time traffic is different and belongs on dedicated instances, but offline batch jobs are a natural fit for spot.

Data preprocessing pipelines: Tokenization runs, dataset formatting, feature extraction, and ETL pipelines are tolerant of restarts because they can checkpoint progress at the file or batch level. Running preprocessing on spot GPU instances at a fraction of dedicated prices saves significant budget for teams working with large datasets.

Model evaluation and benchmarking: Running your model through an eval suite (MMLU, HumanEval, domain-specific benchmarks, red-teaming runs) is embarrassingly parallel and naturally interruptible. Each evaluation task is independent; a spot interruption means re-running a subset of tasks, not losing the entire run. Spot is almost always the right choice here.

RLHF and reinforcement learning training: Reward model training and RL policy optimization runs operate over fixed datasets or simulator interactions and save progress through standard checkpoints. Each stage of an RLHF pipeline (supervised fine-tuning, reward modeling, PPO optimization) can be checkpointed and resumed independently. Teams building instruction-following or preference-aligned models routinely run RLHF stages on spot capacity, saving up to 68% on H100 SXM5 (from $0.97/hr spot) or up to 65% on B300 (from $2.45/hr spot).

Diffusion model and generative AI training: Text-to-image and video generation model training (Stable Diffusion variants, Flux, and similar architectures) is computationally intensive and benefits enormously from spot pricing. Standard training libraries like HuggingFace Diffusers have checkpointing built into the training loop. Community-trained diffusion models have historically been trained on preemptible clusters. Custom fine-tunes and LoRA adapters for image generation are short enough that even a single restart has minimal impact on total cost.

Synthetic data generation and annotation pipelines: Running large models to generate synthetic training data, perform data augmentation, label datasets at scale, or produce embeddings across a large corpus is a pure batch workload. Output accumulates incrementally per batch or shard, so jobs restart from the last written output without reprocessing earlier work. As teams build custom training datasets to improve domain-specific models, synthetic data generation often becomes a major compute line item, and spot instances cut that cost significantly depending on GPU model (up to 68% on H100 SXM5, up to 65% on B300).

Ablation studies and architecture search: Research teams running ablations (testing the effect of removing a component, changing a hyperparameter, or comparing architectural choices) execute dozens to hundreds of independent training runs. Each run is a separate experiment with no dependencies on other runs. A preemption during one ablation means restarting that single experiment, not the entire study. Running ablations on spot vs. dedicated cuts research compute costs substantially, with savings reaching up to 68% on H100 SXM5 and up to 65% on B300.

Continuous integration and model validation pipelines: Teams shipping ML-powered features run automated test suites that verify model quality, check for regression on held-out datasets, and validate output distributions before production deployment. These CI jobs run against a fixed validation set, complete in bounded time, and are retried automatically if interrupted. They are a natural fit for spot instances, and many teams run their entire model validation pipeline on preemptible compute, cutting CI infrastructure costs significantly compared to always-on dedicated hardware.

Audio transcription and speech-to-text at scale: Teams building voice products, transcribing podcasts, generating subtitles, or processing call center recordings with models like Whisper or Parakeet are running a pure batch workload over independent audio files. Each file is a separate unit of work with no dependency on other files in the queue. A spot interruption means requeuing a handful of files, not losing the entire job. Running audio transcription pipelines on spot GPU instances at a fraction of dedicated rates is one of the most straightforward spot use cases because the workload is naturally stateless and granular. On Shadeform, the same transcription infrastructure runs at full on-demand rates with no cheaper tier available.

RAG pipeline index building and vector embedding ingestion: Teams building retrieval-augmented generation (RAG) systems need to embed large corpora, whether internal documentation, support ticket histories, legal filings, or codebases, into vector stores before queries can be served. This embedding step is embarrassingly parallel: each document chunk is processed independently, and results append to the vector store incrementally. A spot interruption mid-batch means picking up from the last completed chunk, with completed embeddings already written. For teams reindexing large knowledge bases (millions of documents), running the ingestion pipeline on spot instances cuts the compute cost by up to 68% on H100 SXM5 (from $0.97/hr) compared to dedicated rates. Shadeform has no spot option for this workload; every embedding run pays full on-demand pricing.

Protein structure prediction and scientific computing: Teams working on drug discovery, materials science, or bioinformatics run models like AlphaFold or RoseTTAFold over large compound libraries. Each structure prediction is independent, taking seconds to minutes per sequence, making the workload naturally resumable at the individual structure level. Molecular dynamics simulations over fixed trajectories also checkpoint state at regular intervals. Spot instances are widely used for high-throughput screening workloads in computational biology precisely because the cost difference at scale is enormous: running AlphaFold structure prediction over 100,000 protein sequences on Spheron H100 SXM5 spot at $0.97/hr per GPU versus $2.26/hr on-demand through Shadeform means roughly 65% lower compute costs for the same scientific output.

Model distillation: Knowledge distillation (training a smaller student model to mimic a larger teacher model) follows the same checkpoint-and-resume pattern as any supervised training run. Each forward pass through the teacher is deterministic, and the student's training state is fully captured in standard HuggingFace or DeepSpeed checkpoints. Teams compressing 70B models into smaller 7B or 13B deployable versions can run the entire distillation pipeline on spot capacity at up to 68% below dedicated rates on H100 SXM5. A distillation run that would cost $8,000 on dedicated H100s costs approximately $2,560 on Spheron H100 SXM5 spot ($0.97/hr); on Shadeform, no spot option exists and the full $8,000 on-demand rate applies.

Continuous pre-training and domain adaptation: Extending a foundation model on domain-specific text corpora (medical literature, legal filings, financial reports, code repositories) is compute-intensive and fully checkpointable. The training loop is identical to standard pre-training with standard checkpoint intervals. Teams building vertical AI products (healthcare, legal tech, fintech) typically run multiple domain adaptation experiments before production fine-tuning. Running this phase on spot instances at up to 68% below dedicated rates on H100 SXM5 (from $0.97/hr) or up to 65% on B300 (from $2.45/hr) saves significant budget at a stage where teams are still iterating on corpus curation and training configuration.

Multi-modal fine-tuning: Fine-tuning vision-language models like CLIP, LLaVA, or InternVL on custom image-text datasets uses the same checkpoint mechanism as standard LLM training. Each training step processes image-text pairs independently, making the workload naturally interruptible. Teams building domain-specific visual search, image classification, or visual QA systems can run fine-tuning passes on spot GPU instances at a fraction of dedicated rates. With HuggingFace's save_steps parameter, a spot interruption means at most 500 steps of recomputation, typically 3-10 minutes of wall-clock time on modern hardware. On Shadeform, the same fine-tuning run pays full on-demand rates with no spot option available.

Video generation model training: Training video diffusion models (Sora-class architectures, video VAEs, video understanding models) is one of the most GPU-intensive workloads in AI and is structurally identical to image diffusion training in terms of checkpointing. Each training step is independent and recoverable via standard checkpoint intervals. As video generation becomes a core product capability for media, advertising, and entertainment teams, the compute costs compound quickly. Running video model training on spot instances at up to 68% below dedicated rates on H100 SXM5 (from $0.97/hr) translates directly into faster iteration cycles and lower total training cost. Shadeform has no spot tier; teams training video models on Shadeform pay full on-demand rates for every GPU-hour regardless of workload type.

LLM-as-judge evaluation pipelines: Using large language models to score, rank, or evaluate other models' outputs at scale (MT-Bench-style evaluation, Arena-style pairwise comparison, red-teaming runs, benchmark evaluation across hundreds of prompts) is a batch inference workload. Each evaluation call processes a fixed prompt independently, outputs a score or ranking, and appends the result to an output dataset. A spot interruption mid-batch means requeuing a subset of prompts, not losing the entire evaluation run. As AI teams run larger and more frequent model evaluations to track quality improvements and catch regressions, the compute cost of evaluation pipelines becomes a meaningful line item. Spot instances cut evaluation pipeline costs by up to 68% on H100 SXM5 vs. Shadeform's all-on-demand model.

Background agentic workflows: Not all AI agents serve interactive users in real time. A growing category of AI agent workloads runs in the background: scheduled document review agents, nightly financial analysis pipelines, automated code review across large codebases, and multi-step reasoning over research corpuses. These background agents process tasks asynchronously without a live user waiting, making them compatible with spot instances when agent state is persisted externally. By saving task state to a database or object storage between steps, a spot interruption causes the agent to resume mid-task on a new instance rather than restarting from scratch. Teams building automation products with agentic pipelines can run background agent workloads on spot capacity at up to 68% below dedicated rates on H100 SXM5 (from $0.97/hr). On Shadeform, every agent workload, interactive or background, runs at full on-demand rates.

RLAIF synthetic data generation: Teams building instruction-following, preference-aligned, or reasoning models increasingly generate large-scale synthetic training data using teacher models to produce preference pairs, reasoning traces, and tool-use demonstrations. Generating 1 million preference pairs using a large teacher model is embarrassingly parallel at the individual example level: each example is generated independently, and results accumulate in the output dataset. A spot interruption mid-generation means picking up from the last written example, with completed examples already saved. As RLAIF pipelines become standard practice for model alignment and capability improvement, the compute cost of synthetic data generation scales with dataset size. Running generation pipelines on Spheron H100 SXM5 spot at $0.97/hr instead of Shadeform's all-on-demand H100 SXM5 pricing (from $2.26/hr through Verda) reduces the per-example generation cost by approximately 65%, directly lowering the total compute budget for building alignment training datasets.

Real-World Cost Impact at Scale

The compounding effect of spot pricing is significant at the team level. A typical ML team spending $150,000-$300,000 per year on GPU compute will have roughly 70-80% of that workload running training, experimentation, preprocessing, and batch inference, categories that are fully compatible with spot instances. Spot savings reach up to 61% on H100 SXM5 and up to 71% on B300, depending on which GPU powers your training workloads.

Team Annual GPU BudgetSpot-Eligible ShareAnnual Cost on Spheron (Spot + Dedicated)Annual Cost on Shadeform (All On-Demand)Annual Savings with Spheron
$100K75% ($75K)$75K x 39% + $25K = $54K$100K~$46K saved (46% reduction)
$200K75% ($150K)$150K x 39% + $50K = $109K$200K~$91K saved (46% reduction)
$500K70% ($350K)$350K x 39% + $150K = $287K$500K~$213K saved (43% reduction)

Spot-eligible work runs at up to 61% below dedicated on H100 SXM5 (paying 39% of on-demand cost at $0.97/hr spot vs $2.50/hr dedicated). Serving, always-on monitoring, and production inference remain on dedicated. Actual savings depend on GPU model mix and workload interruption tolerance. Prices as of 12 Mar 2026. Always check current GPU pricing for live rates.

Teams that cannot access spot instances pay dedicated rates for all workloads, including training runs and batch jobs that could safely run on preemptible capacity. Over a year, the difference between a Spheron workflow (spot for training, dedicated for serving) and an all-dedicated workflow on Shadeform can represent tens to hundreds of thousands of dollars depending on compute scale.

When to Use Dedicated Instead

Production inference serving real-time user traffic requires Dedicated instances. You cannot have your API endpoint go down mid-request because a spot instance was reclaimed. Database workloads, always-on monitoring, real-time serving infrastructure, and anything with uptime SLAs belong on Dedicated.

The practical pattern most cost-conscious teams use: run training experiments and preprocessing on Spot, serve production traffic on Dedicated. This hybrid approach consistently delivers significant GPU cost reductions without compromising production reliability.

Browse Spheron's spot GPU pricing to see current spot rates across all available GPU models.

Who Should Choose Shadeform

Shadeform is the right choice for:

  • Engineering teams that need a unified API to programmatically provision GPUs across many different cloud providers with minimal setup overhead
  • Teams building internal platforms or infrastructure tools that embed GPU procurement into larger automated workflows
  • Organizations that want to leverage existing cloud credits from specific providers (Lambda, Nebius, etc.) through Shadeform's account-linking feature
  • Teams that need access to Shadeform's specific partner ecosystem. If you have a vendor preference for Lambda or Crusoe, Shadeform is the single-pane-of-glass access point
  • Workloads that run well on VMs and don't require kernel-level or driver-level customization
  • Teams comfortable with API-first products and a code-driven provisioning workflow
  • Organizations sourcing large GPU clusters: on-demand clusters up to 64 GPUs (2-8 nodes of 8 GPUs each) with no commitment required, or larger reserved capacity via Reserved Commitments (request-based, terms from 1 week to 3 years, no maximum GPU count)
  • Teams whose compute budgets are not spot-sensitive: Shadeform has no spot or preemptible tier, so every workload (training, batch inference, experimentation, preprocessing) runs at full on-demand rates. Budget planning should account for all-on-demand pricing across the entire compute footprint

Who Should Choose Spheron

Spheron is the right choice for:

  • Teams that want both a great product dashboard AND a complete API, deploy from the browser in 5 minutes or provision programmatically via REST API
  • Workloads that need bare-metal access for custom CUDA configurations, kernel tuning, or driver-level optimization (available for select configurations like H100 PCIe)
  • Teams that want the latest Blackwell hardware (RTX 5090, B300) and Hopper/Lovelace GPUs through direct provider relationships
  • H100 SXM5 workloads where Spheron's pricing (from $0.97/hr spot) is ~65% cheaper than Shadeform's $2.26/hr on-demand
  • Teams running training, fine-tuning, batch inference, or experimentation workloads that can leverage spot instances for up to 68% cost savings on H100 SXM5 (no spot option exists on Shadeform)
  • Teams processing large audio or video corpora with models like Whisper (speech-to-text transcription at scale on spot GPU instances vs. full on-demand rates on Shadeform)
  • Teams building RAG systems that need to embed large document corpora into vector stores (embarrassingly parallel workload, ideal for spot pricing)
  • Research teams running protein structure prediction (AlphaFold), molecular dynamics, or other scientific HPC workloads over large compound libraries on spot capacity
  • Teams building video generation models or video understanding systems that need GPU-intensive training at reduced cost
  • AI teams running LLM-as-judge evaluation pipelines, benchmark suites, or red-teaming workloads at scale on spot capacity (batch inference, naturally interruptible, up to 68% savings vs Shadeform on-demand)
  • Teams building agentic AI products with background automation pipelines (document review, nightly analysis, code review agents) that can tolerate spot preemption with externalized state
  • Organizations generating large-scale RLAIF or RLHF synthetic data with teacher models (embarrassingly parallel, ideal for spot; significant cost reduction vs Shadeform all-on-demand)
  • Startups and individuals who want to deploy a GPU instance quickly without needing to write API integration code first
  • Teams running multi-GPU training jobs who want to leverage spot instances for significant compute cost reductions
  • Organizations that want multi-provider redundancy with competitive pricing and spot availability across the GPU catalog

For a broader look at the GPU marketplace landscape, see our top GPU rental providers guide and our Spheron vs RunPod comparison.


Spheron is both a full product and a full API. Deploy a GPU from the dashboard in 5 minutes or provision programmatically via REST API. Spot instances available across all major GPU models.

See GPU catalog and pricing →

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.