Four releases shipped across April. v1.15.0 on April 4 added Sesterce as a persistent volume provider with region picking, deploy-time attachment, and automatic compatibility filtering. v1.16.0 on April 24 launched Spheron AI volumes with hot-attach support, raised the per-instance volume limit to 10 for both Spheron AI and Verda, and fixed a long-standing bug where terminated instances could permanently lock volumes. v1.17.0 on April 27 refreshed the marketplace design, applied team volume discounts to storage, flagged NVLink-bridged Spheron AI GPUs in the marketplace, and improved the reserved GPU request form. v1.17.1 on April 28 restructured the API reference, rebuilt the referrals page, cleaned up wizard typography, and increased spot offer refresh cadence.
What Shipped in April
- Sesterce Persistent Volume Support (v1.15.0): Sesterce added as a volume provider; region picker, deployment-time attachment, automatic compatibility filtering, and new volume regions API
- Spheron AI Persistent Volumes (v1.16.0): Spheron AI volumes with Cloud-SSD backing, up to 10 volumes per instance, 40 TB max per volume, hot-attach and hot-detach, provider rules dialogs
- Verda Multi-Volume Expansion (v1.16.0): Verda raised from 1 to 10 shared volumes per instance; stale terminated instances no longer lock volumes
- Marketplace Design Refresh (v1.17.0): unified styling across billing, teams, volumes, API keys, and SSH keys, with a consolidated team switcher and improved reserved GPU contact form
- Volume Discounts on Persistent Storage (v1.17.0): active team discounts now apply automatically at volume creation and resize time; rate applies at provisioning and stays active while the discount tier is active
- NVLink GPU Flagging (v1.17.0):
nvlink: truefield added for Spheron AI NVLink-bridged GPU variants; visual flag in marketplace UI - API Reference and Referrals Refresh (v1.17.1): endpoint cards grouped by category, cleaner examples, restructured referral stats, deployment wizard polish, and faster spot offer refresh
Sesterce Persistent Volume Support (v1.15.0)
The first April release added Sesterce as a supported volume provider. Previously, persistent volumes were limited to a subset of providers. v1.15.0 opens storage to Sesterce-hosted regions.
Creating a Sesterce volume follows a picker flow: you select the cloud provider and region during volume creation, and Spheron validates compatibility between your volume region and the compute region you plan to deploy to. The deployment wizard then automatically filters out incompatible volumes, so only valid options appear at attach time. Mismatched regions never make it to the provisioning step.
Volumes are attached automatically at deployment time. After instance termination, the volume persists and requires explicit deletion. If you are running batch workloads on Sesterce compute where you want data to outlive the instance, this gives you the same durable storage workflow available on other providers.
The new API endpoint for region discovery:
GET /api/volumes/regions?provider=sesterceReturns region identifiers, cloud providers, and pricing details per region. If you are building tooling to provision Sesterce resources programmatically, this is the endpoint to hit before volume creation.
Spheron AI Persistent Volumes (v1.16.0)
Three weeks later, v1.16.0 made a larger expansion. Spheron AI is now a supported volume provider, backed by Cloud-SSD storage. The release also raised the ceiling on how many volumes an instance can have and added the ability to attach volumes to running instances.
Multi-volume support
The previous limit was one volume per instance on most providers. v1.16.0 raises this to 10 volumes per instance for Spheron AI and bumps Verda from 1 to 10 shared volumes as well.
The difference between providers matters here. Spheron AI volumes are single-attach: one volume can only be mounted to one instance at a time, but an instance can hold up to 10 distinct volumes simultaneously. Verda volumes work differently: they are multi-attach by design, so a single Verda volume can be mounted on several instances at once. With the 10-volume limit raised, a Verda instance can now pull from up to 10 shared volumes concurrently, which is useful for distributed workloads that need to read from a common dataset.
Maximum volume size on Spheron AI is 40 TB per volume.
Hot-attach and hot-detach
Before this release, volumes had to be specified before deployment. If you forgot to add a volume at deploy time, you had to terminate the instance and redeploy with the volume configured.
Hot-attach removes that constraint. From the instance sidebar, you can attach a persistent volume to an already-running instance without stopping or redeploying it. Hot-detach works the same way: you can remove a volume from a running instance from the sidebar.
Volumes survive instance termination and require explicit deletion. This is by design: the volume lifecycle is independent of the instance lifecycle. Do not rely on instance termination to clean up volumes or you will accumulate storage costs.
Provider rules dialogs
Each volume provider has different attachment timing, resize policies, and lifecycle behavior. v1.16.0 adds provider-specific rules dialogs that surface these constraints during the provisioning flow. Before you create a volume, the dialog explains what you should know about that provider's behavior. This is mostly useful for teams onboarding to a new provider who have not read the docs.
Stale deployments no longer lock volumes
Before this fix, if an instance terminated without a clean detach (a crash, a forced stop, or a failed deployment), single-attach volumes attached to that instance would remain locked indefinitely. There was no way to reclaim them without a support ticket.
v1.16.0 fixes this. When evaluating whether a volume is already in use, Spheron now skips terminated, failed, stopped, and deleted deployments when checking occupancy. If your volume was stuck because of a dead instance, it is now immediately available for reattachment to a new instance. This applies to both Spheron AI and Verda volumes.
Wizard UX improvements
The create-volume wizard now checks whether the selected provider and region actually have live GPU offers before you finalize. If the region has no active GPU supply to attach to, a warning appears in the order summary so you can pick a different region before committing.
The wizard also auto-selects the first region that has live GPU offers rather than defaulting to the alphabetical first. Regions with no available GPU supply are labeled clearly so you can see the options at a glance.
After a Spheron AI volume is created, a confirmation modal appears summarizing the volume rules with a one-click shortcut to go directly into the deployment wizard with the new volume pre-selected.
The API gained three new endpoints alongside this release: volume region discovery per provider, pricing by provider, and deployment volume pre-attachment.
Marketplace Design Refresh (v1.17.0)
The marketplace had accumulated visual inconsistencies across sections built at different times. Billing pages, team management, volume settings, API key management, and SSH key sections each had slightly different card styles, spacing, and button treatments. v1.17.0 brings them to a single visual baseline.
The most visible change is the team switcher. Previously it appeared in different locations depending on which section of the app you were in. It is now consolidated in one fixed location in the navigation, so switching between team contexts does not require hunting for the control.
The billing page saw the biggest surface area of changes: balance display, deposit history, usage analytics, the discount overview, and the add-credits flow were all rebuilt with clearer hierarchy and tighter spacing.
The underlying component library got aligned on a consistent card border treatment and typography scale across all these sections. If you have been using the Spheron GPU marketplace for a while, the pages look noticeably cleaner, though nothing moved significantly enough to break any existing workflow.
Reserved GPU contact form
The reserved GPU request form got a useful set of additional fields in v1.17.0. It now captures a phone number with country code, preferred GPU model, and requested GPU quantity alongside the existing fields. There is also an optional marketing consent checkbox. The extra detail helps the Spheron team route and respond to inquiries faster, since the GPU model and quantity are no longer free-text in a general notes field.
Volume Discounts on Persistent Storage
This one has been requested for a while. Teams with negotiated volume discounts on compute could already see those discounts applied to on-demand GPU pricing. Persistent storage was excluded, so the effective discount rate on storage costs was lower than the contracted discount for teams doing heavy data work.
Starting with v1.17.0, when you create or resize a persistent volume and your team has an active discount tier, the discounted hourly rate applies automatically. No coupon code, no support ticket. The discounted rate applies at provisioning time and remains in effect as long as the discount tier is active.
To make this concrete: the standard persistent volume rate on Spheron is roughly $0.00016/GB/hr. At a 15% discount tier, that becomes approximately $0.000136/GB/hr. For a team storing 10TB of training data:
| Storage | Standard rate | 15% discount rate | Monthly delta |
|---|---|---|---|
| 10 TB persistent volume | $0.00016/GB/hr | $0.000136/GB/hr | ~$172.80 saved/month |
| 50 TB persistent volume | $0.00016/GB/hr | $0.000136/GB/hr | ~$864.00 saved/month |
The math compounds quickly at scale. If you are running long pretraining or CPT jobs where checkpoint volumes accumulate over weeks, check your current volume spend in billing.
The "What's Next" section below covers persistent volume multi-attach, which will change how teams share volumes across compute nodes.
NVLink GPU Flagging in the Marketplace
Spheron AI GPU offers now include an nvlink: true flag in the extras object for models that are NVLink-bridged variants, such as the H100 NVL and A100 NVL. These are distinct from the standard H100 SXM5 or PCIe variants: the NVL form factor uses NVLink to connect GPUs within a node rather than PCIe. For single-GPU workloads this does not matter. For multi-GPU inference and training, it is the difference between 900 GB/s intra-node bandwidth (H100/H200 NVLink 4) and the ~64 GB/s ceiling of PCIe 5.0 x16.
Here is how to filter for NVLink offers programmatically:
const res = await fetch("https://app.spheron.ai/api/gpu-offers");
if (!res.ok) throw new Error(res.statusText);
const { data } = await res.json();
if (!Array.isArray(data)) throw new Error("Unexpected response shape");
// Filter for NVLink-connected GPU bundles
const nvlinkOffers = data.flatMap(gpu =>
(gpu.offers ?? []).filter(offer => offer.extras?.nvlink === true)
);The same flag is surfaced in the marketplace UI. NVLink-connected offers are visually marked on the GPU offer cards so you can identify them without reading the API response manually.
Why this matters: collective communication operations in distributed training (all-reduce, all-gather, reduce-scatter) consume a significant fraction of step time. On PCIe-only nodes, NCCL falls back to host-memory-routed transfers. On NVLink nodes, intra-node transfers happen at full NVSwitch bandwidth. For a 70B model on 8 GPUs, the difference can shift all-reduce from 20-30% of step time down to 5-8%. The NCCL tuning guide for multi-GPU LLM training covers the specific environment variables to set once you have confirmed NVLink connectivity.
For inference disaggregation, NVLink-connected prefill nodes are better suited for tensor-parallel prefill because the all-reduce between TP shards stays within the NVSwitch fabric at full bandwidth. The prefill-decode disaggregation guide has the hardware planning details.
For a breakdown of NVLink form factors by GPU generation, the H100 NVL vs SXM5 vs PCIe form factor guide explains what each variant offers.
Current pricing for NVLink-capable GPU models on Spheron:
| GPU | On-demand (per GPU/hr) | Spot (per GPU/hr) |
|---|---|---|
| H100 SXM5 | $3.70 | $1.66 |
| H200 SXM5 | $4.36 | $1.76 |
| B200 SXM6 | $6.62 | $3.50 |
Pricing fluctuates based on GPU availability. The prices above are based on 12 May 2026 and may have changed. Check current GPU pricing → for live rates.
To deploy a multi-GPU NVLink workload, start with H100 GPU rental on Spheron for the best price-to-performance ratio on SXM5 hardware, or H200 GPU rental if your workload is memory-bandwidth bound.
API Reference and Referrals Refresh (v1.17.1)
The API reference at docs.spheron.ai/api-reference got a layout pass in v1.17.1. Previously, endpoints were listed in a flat order that mixed authentication, resource management, and deployment endpoints without clear grouping. They are now organized into categories, so related endpoints appear together. If you are onboarding onto the API for the first time, the grouped layout is significantly easier to scan.
Code examples were also cleaned up. Several endpoint examples had verbose headers that obscured the key fields; those are trimmed down to the relevant parameters.
The referrals section saw a parallel update. Referral stats (clicks, conversions, payouts) are now in a cleaner table format, and reward callouts are displayed more prominently. If you run a community or course where you share Spheron links, the updated stats view makes it easier to track what is converting.
v1.17.1 also brought two quality-of-life improvements to the deployment experience. The deployment wizard got a typography and spacing cleanup: labels now have consistent font weights and the step indicator has proper padding throughout the flow. Spot offer cadence is faster too. The Spheron AI pricing engine now refreshes spot offers more frequently, so the prices you see in the marketplace are closer to real-time availability. If you were seeing stale spot prices and coming back to find the offer gone or repriced, this update reduces that gap. See the spot instances feature page for how spot pricing works.
What This Means for Builders
Persistent storage across more providers. The April cadence expanded volume support from a narrow set of providers to Sesterce and Spheron AI (Cloud-SSD). If you were blocked on storage because your preferred compute provider did not support volumes, check the current provider list.
Multi-volume instances and hot-attach. The 10-volume limit per instance and hot-attach from v1.16.0 change how you can structure data pipelines. You no longer need to pack all data into a single volume or redeploy to add storage mid-run. Dataset volumes, checkpoint volumes, and output volumes can each be managed independently on the same instance. Verda teams benefit as well: the same 10-volume ceiling now applies, and those volumes are multi-attach, so multiple compute nodes can share a single dataset volume without copying data.
Volumes no longer get stuck. The stale-deployment fix in v1.16.0 means volumes from crashed or terminated instances are immediately reclaimable. If you previously had volumes that appeared in-use but were not, they can now be attached to new instances without a support ticket.
Lower effective storage spend. If your team is on a volume discount tier and you have been running persistent volumes, check your billing for May onward. The discount now applies at provisioning time, so volumes created or resized after April 27 are already priced correctly.
Explicit NVLink selection. Before v1.17.0, confirming whether a Spheron AI offer was an NVLink-bridged variant required asking support or cross-referencing provider specs. The nvlink flag in the extras object removes that ambiguity. For any workload where intra-node bandwidth determines throughput, you can now filter offers programmatically and pick NVLink variants (H100 NVL, A100 NVL) without guessing from model names alone.
Faster API onboarding. The grouped API reference is a direct reduction in time-to-first-call for new teams. Finding the right endpoint in a flat 80-endpoint list takes longer than it should. Category grouping plus cleaner examples cuts the "where do I even start" friction for engineers who are new to the platform.
How to Take Advantage Today
The NVLink field is live in the GPU offers endpoint:
const res = await fetch("https://app.spheron.ai/api/gpu-offers");
if (!res.ok) throw new Error(res.statusText);
const { data } = await res.json();
if (!Array.isArray(data)) throw new Error("Unexpected response shape");
// Filter for NVLink-connected GPU bundles
const nvlinkOffers = data.flatMap(gpu =>
(gpu.offers ?? []).filter(offer => offer.extras?.nvlink === true)
);
console.log(`Found ${nvlinkOffers.length} NVLink-connected offers`);For Spheron AI volumes, the region and pricing API is live:
GET /api/volumes/regions?provider=spheron-ai
GET /api/volumes/regions?provider=sesterceVolume discounts apply automatically. If your team is not on a discount tier and you are spending more than $5,000/month on compute, reach out to the Spheron team for enterprise terms.
For current GPU and storage rates, see Spheron pricing. For API integration details, see the updated API reference.
What's Next
Three features are on the roadmap for May and beyond:
Spheron ES provider: a new provider category currently in testing. Details when it ships.
Per-region storage pricing: persistent volume rates that reflect actual storage costs by region rather than a global blended rate. This will benefit teams co-locating compute and storage in regions where underlying storage is cheaper.
Persistent volume multi-attach: the ability to mount a single persistent volume to multiple compute nodes simultaneously. Coming in May. Useful for training runs where multiple workers need read access to the same dataset volume without copying data between nodes.
April's four releases expanded storage provider coverage, removed the single-volume limit per instance, cut storage costs for discount-tier teams, and made NVLink selection explicit in the API. If you have not set up your team discount or explored multi-volume instances, now is the right time.
Explore Spheron GPU pricing → | Rent H100 → | Launch on Spheron →
