Case Study

Should You Rent or Buy GPUs? The 3-Year TCO Math for AI Training

Rent vs Buy GPUsGPU TCOCost OptimizationTotal Cost of OwnershipGPU InfrastructureCapex vs OpexGPU DepreciationOn-Premise vs Cloud GPU
Should You Rent or Buy GPUs? The 3-Year TCO Math for AI Training

For most AI teams, the answer is: rent. The 3-year math comes out 50 to 70 percent cheaper than owning the same hardware once you count power, cooling, staff, and depreciation. This post walks the numbers in both directions so you can decide for your own workload. If you already know you want to rent GPUs and just need live rates, the Spheron catalog has current pricing on H100, H200, A100, B200, RTX 5090, and more.

The math has shifted hard over the last two years. A four-GPU A100 box that runs about $246,000 over three years on-premises now costs around $122,000 on cloud, and that gap widens every quarter as neocloud pricing gets sharper. The same shift that put H100s in the hands of two-person startups also made the rent-or-buy decision more interesting for the teams that used to own by default: research labs, mid-size AI companies, anyone with steady training load.

The framing below is straightforward. First, the real cost of buying. Then the real cost of renting. Then a side-by-side on an 8x H100 cluster. Then the cases where buying still wins, the cases where renting wins, and a short decision framework. If you only need a number for a specific training run first, estimate the GPU cost to train your model and weigh that against the cost of owning.

What Buying GPUs Actually Costs Over 3 Years

The sticker price of the GPUs is the small part. A retail H100 lists around $30,000. An 8-GPU server with NVLink runs over $250,000 before you plug it in. From there, four cost categories pile on.

Supporting hardware

You don't run a $30,000 GPU on a desktop motherboard. An eight-GPU node needs an enterprise chassis, two high-PCIe-lane CPUs, 1-2 TB of system RAM, NVMe storage for checkpoints, and a 400 Gb/s NIC if you ever want to scale to multiple nodes. That's another $40,000 to $80,000 on top of the GPUs. Add InfiniBand switches for a multi-node setup and you're past $100,000 in supporting gear.

Power and cooling

Eight H100s pull around 5.6 kW. Add CPUs, NVMe, and switching and a real node draws 10 kW under sustained load. At U.S. commercial electricity rates ($0.10-$0.18/kWh) that's $700 to $1,300 a month per node, every month, before cooling. Liquid cooling for high-density racks adds capex and ongoing maintenance, easily another 20-30 percent of the power bill.

Facility and networking

You need a climate-controlled space, redundant power (UPS plus generator), redundant networking, and physical security. If you have a server room already, the marginal cost is real but bounded. If you don't, colocation runs $1,500 to $3,000 per rack per month for power-dense GPU configurations.

People

Someone keeps the rack alive: firmware updates, driver bumps, failed PSUs, NVLink errors that show up at 2 a.m. Budget $500-$1,500 a month minimum for that share of a sysadmin's time, more if you don't already have an ops team. Lose a GPU under warranty and you still wait days for a replacement.

The 3-year on-premise bill (8x H100)

A reasonable build, run for three years:

Line item3-year cost
8x H100 SXM5 + server$247,766
Networking, racks, cooling capex$42,624
Power and cooling (3 yr)$108,000
Personnel share$36,000
Total$434,390

That's $144,797 per year, or about $2.07/GPU/hr if you actually hit 24/7 utilization. Most teams don't get close to 24/7 in year one or two.

What Renting GPUs Actually Costs Per Year

Renting collapses all four cost categories above into a single per-hour rate. On Spheron, an H100 SXM5 runs $3.90/hr on-demand and $0.80/hr on spot. If you rent eight H100s 24/7 for one year at on-demand rates, the bill is roughly $273,384. If you can run spot for half of that time, the bill drops to about $164,688 per year. For teams that can run spot 24/7, the three-year cloud total comes to $168,192, well below the $434,390 on-prem total.

Renting from a hyperscaler is a different story. The same 8-GPU H100 workload runs per year:

  • AWS p5.48xlarge at ~$6.88/hr per GPU after the June 2025 price cut: ~$481,824 per year
  • Microsoft Azure ND H100 v5 at ~$12.29/hr per GPU: ~$859,296 per year
  • Google Cloud A3 at ~$11.06/hr per GPU: ~$772,512 per year

The hyperscaler markup is real. Spheron aggregates capacity from 5+ providers and exposes it on a single per-minute billed catalog, which is why renting H100s on Spheron lands at about 43% below AWS, 68% below Azure, and 65% below GCP for the same hardware. For deeper rate comparisons across providers, see the cloud GPU pricing guide and the GPU cost optimization playbook for tactics to cut effective spend another 40-60 percent.

Pricing fluctuates based on GPU availability. The prices above are based on 24 May 2026 and may have changed. Check current GPU pricing → for live rates.

Side-by-Side: Annual Cost for 8x H100 (24/7)

PathAnnual cost (24/7)Effective $/GPU/hr (24/7)Capex required
Buy + run on-prem$144,797$2.07$290,000 upfront
AWS p5.48xlarge on-demand$481,824$6.88$0
Azure ND H100 v5 on-demand$859,296$12.29$0
GCP A3 on-demand$772,512$11.06$0
Spheron H100 on-demand$273,384$3.90$0
Spheron H100 spot (half-time)$164,688$2.35 (blended)$0
Cloud vs On-Premise Cost Comparison

A few honest caveats on those numbers. At true 24/7 utilization, on-prem ($2.07/GPU/hr) is actually cheaper than Spheron on-demand ($3.90/GPU/hr). Only Spheron spot ($0.80/hr) beats on-prem at full saturation. That changes fast as utilization drops: fall to 40% and the on-prem effective rate climbs above $5/GPU/hr, while cloud costs fall proportionally because you only pay for hours used. Cloud costs scale with use; on-prem costs are mostly fixed.

When Buying Still Makes Sense

There are real cases where ownership wins. Don't take the rent-bias above as universal.

Genuine 24/7 saturation, sustained for years. If you're running a production inference fleet that holds 70-90 percent GPU utilization continuously, on-prem can come out ahead by year three. The break-even on an 8x H100 build versus Spheron on-demand sits around 53% sustained utilization over the full 3-year window. Below that, cloud wins.

Data sovereignty and regulated industries. Financial services, healthcare, and government workloads sometimes have to keep data inside a specific facility or country. GDPR, PIPL, and similar regimes can rule out the simpler cloud options. Even here, hybrid setups (on-prem for the regulated piece, cloud for everything else) usually beat full ownership.

Specialized hardware or topology. If you need a custom interconnect, a non-standard CPU-GPU ratio, or specific co-located storage that no provider offers, ownership gets you that control. This is a small slice of teams.

You already own the building. A research university with a data center and a power budget treats the marginal cost of one more rack very differently from a startup signing a colo contract. Own the building, own the staff, own the cooling, and the buy math gets friendlier.

If none of these apply to your situation, the answer is probably rent.

When Renting Wins

The cases for renting cover most teams shipping AI in 2026:

Variable load. A training run for a few weeks, then quieter inference for a few months, then another training push. On-prem charges you for the troughs. Cloud doesn't.

Frontier hardware access. A new architecture lands every 18-24 months. Cloud users switch instance types. Owners write off depreciating assets. Right now teams renting B200 spot at $1.71/hr on Spheron are paying less than a properly amortized H100 on-prem build.

No capex, no procurement. Spinning up an 8-GPU H100 node takes two minutes from a credit card. Procuring the same hardware for ownership runs 4-12 weeks, plus facility prep.

Multi-region inference. If you serve users in three continents, distributed cloud capacity is genuinely cheaper than building three on-prem footprints.

Short experiments. A 90-minute fine-tuning run on a rented A100 costs about $1.64. The same run on owned hardware costs whatever fraction of the $144,797 annual all-in cost that 90 minutes represents, except you're paying for the other 8,758 hours too.

The closer your workload looks to "predictable, saturated, multi-year", the more buying makes sense. The further from that, the more renting wins. Most AI teams sit well inside the renting zone.

The Decision Framework

A short version, four questions:

1. Will the hardware be 70%+ utilized 24/7 for three years? If no, rent.

2. Do you have a hard regulatory or data-sovereignty reason to keep workloads in a specific facility? If yes, look at a hybrid: on-prem for the regulated path, cloud for everything else.

3. Can you afford to commit $250,000-$400,000 of capex before any training runs? If no, rent. Capex preserved is runway extended.

4. Do you have an ops team that can run the rack reliably? If no, the all-in cost of hiring or contracting that work eats most of the savings, and rent wins again.

If you answered "no" to any of the first three, or "no" to four, the recommendation is straightforward: rent on-demand for production, spot for fault-tolerant training and batch work. Specialized platforms offering competitive rates eliminate the cost barriers ownership used to impose. Spheron lists A100 80GB starting at $1.09/hr on-demand (PCIe) with spot from $0.45/hr (SXM4), with H100 and other GPUs available for heavier workloads.

For workload-specific GPU selection, see the guide to the best NVIDIA GPUs for LLMs and the best GPU for AI inference in 2026.

Bottom Line

Buying GPUs in 2026 is the right call for a narrow slice of teams: regulated industries, true 24/7 production workloads sustained over years, or organizations that already own the facility and staff. For everyone else, the 3-year TCO math points the same direction. Renting is cheaper, faster to start, and matches the variable usage patterns AI work actually has.

The harder decision is which provider to rent from. Hyperscalers price GPU compute as a premium add-on to their broader cloud, which is why their per-GPU rates run 2-3x what a focused GPU cloud charges. Spheron sits at the lower end of that range because it aggregates capacity from 5+ providers and skips the hyperscaler markup, but the broader point holds with any specialist neocloud: don't pay AWS retail for a GPU job that doesn't need AWS services.

Cut your AI training costs by half or more. Spheron offers H100 at $3.90/hr, H200 at $4.56/hr, and A100 at $1.09/hr with per-minute billing and no commit.

Check H100 availability → | A100 80GB on Spheron → | View all GPU pricing →

FAQ / 06

Frequently Asked Questions

For most teams, yes. A 3-year TCO analysis shows cloud GPU rental costs 40-60% less than on-premises ownership when you factor in hardware, power, cooling, facility infrastructure, and staffing. On-premises breaks even only if you have genuine 24/7 utilization sustained over multiple years, which is rare.

For models up to 70B parameters quantized, an H100 SXM5 ($3.90/hr) or A100 80GB PCIe ($1.09/hr on-demand) fits most inference workloads. For 100B+ models, multi-GPU setups (4-8x H100 or H200 141GB at $4.56/hr) are necessary. See our GPU requirements guide for exact VRAM needs by model.

Yes, many cloud GPU providers including Spheron offer per-minute billing. This is ideal for development and experimentation. Just set auto-shutdown policies to avoid idle charges.

On-demand instances have instant access with no interruption risk but cost 2-3x more. Spot instances are 70-90% cheaper but can be reclaimed with notice. Spot works great for training jobs with frequent checkpointing, batch jobs, and hyperparameter sweeps.

Spot instances offer the lowest per-hour cost (currently $0.80/hr vs $3.90/hr on-demand). For production workloads, reserved instances save 30-60% off on-demand. Spheron offers competitive spot and on-demand pricing across multiple regions.

Yes, on-demand instances require zero commitment and can be terminated immediately. Hourly billing lets you pay only for what you use. This makes cloud rentals ideal for experimentation and unpredictable workloads.

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.