Rent NVIDIA A100 80GB GPUs on Demand from $0.45/hr
80GB HBM2e, NVLink 600 GB/s, MIG, per-minute billing. Live in under 2 minutes.
Renting an NVIDIA A100 80GB on Spheron starts at $0.45/hr per GPU per hour on dedicated (99.99% SLA), with interruptible spot instances cheaper still. There is no minimum commit, billing is per minute, and most instances are live inside two minutes. The A100 has 80GB of HBM2e and 2.0 TB/s of memory bandwidth, enough to train or fine-tune models up to about 30B parameters on a single card and serve quantized 70B models at production latency. SXM variants add 600 GB/s NVLink between GPUs for multi-GPU training. Hyperscaler on-demand A100 80GB pricing runs roughly $3.40 per GPU per hour on AWS p4de, $4.10 on Azure ND A100 v4, and about $5.00 on GCP a2-ultragpu.
Technical specifications
Pricing comparison
| Provider | Price/hr | Savings |
|---|---|---|
SpheronYour price | $0.45/hr | - |
Jarvislabs | $1.49/hr | 3.3x more expensive |
TensorDock | $1.57/hr | 3.5x more expensive |
Lambda Labs | $2.49/hr | 5.5x more expensive |
AWS p4de | $3.43/hr | 7.6x more expensive |
Azure ND A100 v4 | $4.10/hr | 9.1x more expensive |
Google Cloud | $5.07/hr | 11.3x more expensive |
Need More A100 Than What's Listed?
Reserved Capacity
Commit to a duration, lock in availability and better rates
Custom Clusters
8 to 512+ GPUs, specific hardware, InfiniBand configs on request
Supplier Matchmaking
Spheron sources from its certified data center network, negotiates pricing, handles setup
Need more A100 capacity? Tell us your requirements and we'll source it from our certified data center network.
Typical turnaround: 24–48 hours
When to pick the A100
Pick the A100 if
You are training or fine-tuning a 7B to 30B parameter model, serving a quantized 70B model, or running classic workloads like BERT, ResNet, recommender systems, and RAPIDS analytics. The A100 is also the right call when you want the most mature ML stack on the market and are happy trading a bit of FP8 throughput for 40 to 60 percent lower hourly cost than H100.
Pick the H100 instead if
Your workload is FP8-native (Llama 3 / DeepSeek inference, FP8 training runs) or you need Transformer Engine speedups. H100 is roughly 2.5 to 3x faster on Tensor Core math and 1.7x more memory bandwidth, but it costs about 2x as much. If the speedup pays for itself, make the jump.
Pick the L40S instead if
You are running pure inference on sub-30B models, or batch image and video generation. L40S has 48GB GDDR6 and a much lower hourly cost, with strong FP8 and Ada Lovelace Tensor Cores. It has no NVLink, so it is not the right pick for multi-GPU training.
Pick the RTX 4090 instead if
You are doing development, small-scale fine-tuning, or sub-13B inference on a budget. The 4090 has 24GB VRAM and no NVLink, but it is the cheapest way to run modern AI stacks. Step up to A100 once you need more memory or multi-GPU scaling.
Ideal use cases
LLM training and fine-tuning
Train or fine-tune models in the 7B to 30B range with mixed precision. FSDP and DeepSpeed ZeRO scale cleanly across 8x A100 with NVLink, and LoRA / QLoRA bring 70B within reach on a single card.
Production LLM inference
Serve models at steady latency with vLLM, TensorRT-LLM, or Triton. INT8 and FP16 paths are well optimized, and MIG lets you carve one A100 into up to 7 isolated inference slots.
Classic ML and computer vision
The A100 still holds the line on computer vision and recommender workloads that predate the LLM wave. Mature CUDA kernels, stable ecosystem, predictable throughput.
GPU data analytics and HPC
RAPIDS, cuDF, cuGraph, and GPU-accelerated SQL engines all target A100 first. FP64 throughput is 9.7 TFLOPS, enough for most simulation work that does not need Hopper-class double precision.
Performance benchmarks
Serve Llama 3.1 8B on an A100 in under 2 minutes
Spin up a Spheron A100 80GB, pull the vLLM image, and serve Llama 3.1 8B with an OpenAI-compatible API. Point any OpenAI SDK client at the endpoint and you are done.
# 1. Provision an A100 80GB from the Spheron CLI (or use the dashboard)spheron deploy --gpu a100-80gb --image vllm/vllm-openai:latest # 2. Inside the instance, serve Llama 3.1 8B Instructvllm serve meta-llama/Llama-3.1-8B-Instruct \ --max-model-len 8192 \ --gpu-memory-utilization 0.92 \ --port 8000 # 3. Hit the endpoint from any OpenAI-compatible clientcurl http://<instance-ip>:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Summarize MIG partitioning on A100."}] }'For 70B inference, add --tensor-parallel-size 2 and rent 2x A100 80GB with NVLink. For multi-node training, contact us for InfiniBand-connected clusters.
Multi-GPU A100 with NVLink and InfiniBand
A100 SXM4 nodes on Spheron link 8 GPUs with NVLink at 600 GB/s intra-node, and multi-node jobs use 200 Gb/s HDR InfiniBand with GPUDirect RDMA. That is the same fabric NVIDIA ships in DGX A100 systems, so PyTorch DDP, DeepSpeed ZeRO, and Megatron-LM run at close to linear scaling.
Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.
A100 vs alternatives
A100 is roughly 2.5 to 3x faster on training and inference, with 2.5x the memory. V100 is effectively end-of-life for modern LLM work.
L40S is cheaper and strong at single-GPU inference with FP8, but has no NVLink. A100 wins for multi-GPU training and 70B INT4 serving that needs 80GB.
Related resources
NVIDIA A100 vs V100: Specs, Benchmarks, and When to Upgrade
Side-by-side Ampere vs Volta comparison with benchmarks and migration guidance.
A100 Deployment Guide: SXM vs PCIe, Spot vs Dedicated, MIG
Deep dive on A100 configurations, interconnects, MIG partitioning, and deployment patterns on Spheron.
Best NVIDIA GPUs for LLMs
Framework for matching GPU choice to model size, from 7B on A100 to 670B on B200.
GPU Memory Requirements for Large Language Models
Calculate VRAM needs across precision levels and KV-cache pressure for every major model class.
How a 12-Person Startup Trained a 70B Model for $11,200
Cost breakdown for training a 70B model using spot A100 instances with aggressive checkpointing.
GPU Cost Optimization Playbook
Practical tactics to cut A100 spend: spot scheduling, MIG, batching, right-sizing.
Frequently asked questions
How much does it cost to rent an A100 GPU?
On Spheron the A100 80GB starts at $0.45/hr per GPU per hour on dedicated (99.99% SLA, non-interruptible), with interruptible spot instances cheaper still. There is no minimum commit and billing is per minute. For reference, Lambda Labs runs ~$2.49/hr, AWS p4de ~$3.43/hr per GPU, Azure ND A100 v4 ~$4.10/hr per GPU, and Google Cloud a2-ultragpu around $5/hr.
What is the cheapest way to rent an A100?
Spot instances on Spheron are the cheapest path, often 50 to 70 percent below the dedicated rate. The trade-off is that the instance can be reclaimed when demand spikes, so checkpoint every 15 to 30 minutes and treat spot as a fit for fault-tolerant training, batch jobs, and experimentation. For steady production serving, stay on dedicated (99.99% SLA, non-interruptible). Both are on-demand tiers with per-minute billing.
Can I rent an A100 by the hour?
Yes. Spheron bills per minute with no minimum. A one-hour benchmark costs you one hour. No contracts, no reserved-instance lock-in on dedicated or spot, and no commit fees.
How fast can I deploy an A100 instance?
Most A100 instances are live in 45 to 90 seconds. Hardware is pre-warmed, so provisioning behaves more like a container start than a VM boot. If your Docker image is ready, you can be running a training script inside two minutes of hitting deploy.
What is the difference between A100 SXM and A100 PCIe?
SXM4 is the higher-power variant (400W) with NVLink between GPUs at 600 GB/s, which matters for multi-GPU training and model parallelism. PCIe is lower-power (300W) and easier to mix with standard servers, but has no NVLink. Pick SXM for distributed training or 70B FP16 inference across 2+ GPUs. Pick PCIe for single-GPU inference or data processing.
What is the difference between A100 40GB and 80GB?
The 80GB variant doubles VRAM and bumps memory bandwidth from 1.55 TB/s to 2.0 TB/s. That matters for larger batch sizes, long-context inference, and 70B-class quantized models. Spheron defaults to the 80GB SKU because the memory headroom usually pays for itself.
Does A100 support Multi-Instance GPU (MIG)?
Yes. A single A100 splits into up to 7 isolated MIG instances, each with dedicated compute, memory, and bandwidth. MIG is perfect for running multiple small inference workloads on one card without noisy-neighbor effects. It is exposed on both SXM and PCIe variants.
Do you support multi-node A100 clusters with InfiniBand?
Yes. Spheron offers 8x A100 per node with NVLink, and multi-node clusters connected by 200 Gb/s HDR InfiniBand with GPUDirect RDMA. Clusters are tested with PyTorch DDP, DeepSpeed ZeRO-3, and Megatron-LM. Larger configurations are available on request.
What regions are A100s available in?
A100 capacity is online across North America, Europe, and Asia, sourced from data center partners. Availability shifts with demand and the dashboard shows live capacity per region.
What frameworks and drivers come pre-installed?
PyTorch, TensorFlow, JAX, and the major serving stacks (vLLM, TensorRT-LLM, Triton, SGLang) all ship in the default images. CUDA 12.6+, cuDNN, NCCL, and RAPIDS are pre-tuned for A100. You can also bring your own Docker image.
Can A100 handle 70B-parameter models?
For inference, yes. A 70B model in INT4 (~35GB) runs on a single A100 80GB. At INT8 you need two A100 80GBs with tensor parallelism. FP16 training or inference at 70B requires 2+ A100 80GB with NVLink. Sweet spot for the A100 remains 7B to 30B parameters.
Is the A100 worth it over the H100?
If your workload is FP8-native or memory-bandwidth-bound, H100 pays for itself. If you are doing classic training or fine-tuning up to 30B parameters, or inference on models that fit in 80GB without FP8, A100 usually wins on dollars per token. Start on A100, move to H100 when the speedup justifies the cost.
Do you offer enterprise SLAs and dedicated support for A100?
For 100+ GPU deployments and production-critical workloads, Spheron offers dedicated Slack or Discord support, sourcing assistance, and SLA-backed instances. Smaller deployments are self-serve through the dashboard.
Talk to our team →How does A100 pricing on Spheron compare to AWS, GCP, and Azure?
For the same A100 80GB hardware, Spheron is meaningfully cheaper than AWS p4de, Azure ND A100 v4, and GCP a2-ultragpu on-demand. As of April 2026, hyperscaler on-demand A100 80GB pricing runs roughly $3.43/hr per GPU on AWS p4de, $4.10/hr on Azure ND A100 v4, and about $5/hr on GCP. Spheron starts at $0.45/hr. Same silicon, different pricing model.