Azure's ND96isr H100 v5 costs $98.32/hr on-demand, or $12.29/hr per GPU. On Spheron, the same H100 SXM5 80GB hardware starts at $2.64/hr on-demand ($1.66/hr spot) with no quota application, no reserved commitment, and no egress fees. For broader context on hyperscaler versus neocloud GPU pricing, see the full GPU cloud pricing comparison covering H100, H200, and B200 across multiple providers.
TL;DR: Azure ND H100 v5 vs Spheron H100 (May 2026)
| Tier | Azure ND96isr H100 v5 (per GPU) | Spheron H100 SXM5 (per GPU) |
|---|---|---|
| Pay-as-you-go | ~$12.29/hr | $2.64/hr |
| 1-year reserved (upfront) | ~$7.93/hr | N/A (no commitment required) |
| 3-year reserved (upfront) | ~$5.47/hr | N/A |
| Spot / low-priority | ~$2.25-$3.69/hr | $1.66/hr |
| Egress fees | $0.087/GB (East US) | $0 |
| Quota wait time | 1-4 weeks | Instant |
Pricing fluctuates based on GPU availability. The prices above are based on 21 May 2026 and may have changed. Check current GPU pricing for live rates.
What Is the Azure ND H100 v5 (NDv5)?
The ND96isr H100 v5 is Microsoft's current top-tier GPU training instance. The SKU name decodes as: ND (network-dense), 96 vCPUs, i (InfiniBand), s (premium storage), r (RDMA), H100 v5. Each instance has:
- 8x NVIDIA H100 SXM5 80GB GPUs
- 96 vCPUs (AMD EPYC 9004 series)
- 1.9 TB RAM
- NVLink 4 (900 GB/s GPU-to-GPU bandwidth)
- InfiniBand HDR100 (400 Gb/s inter-node)
- 2x 1.92 TB NVMe SSD local storage
This replaces the NDv4, which used 8x A100 SXM4 80GB GPUs (see the A100 vs H100 comparison for a full spec breakdown). The H100 SXM5 delivers roughly 3x the FP8 tensor core throughput of A100 and 1.7x the HBM bandwidth (3.35 TB/s vs 2 TB/s). The NDv4 is still available but being phased toward end-of-life for new workloads.
Azure also has NC-series instances for inference-class work: NCv4 (A100 PCIe) and NC H100 v4 (H100 PCIe). These lack NVLink and InfiniBand, making them unsuitable for multi-GPU training but cheaper for single-GPU inference. The NDv5 is the right SKU specifically for distributed training and high-bandwidth multi-GPU inference workloads.
Azure NDv5 Per-Hour Pricing: Full Breakdown
All figures below are for East US (generally the cheapest Azure region for NDv5) as of 21 May 2026.
| Tier | ND96isr H100 v5 (instance/hr) | Per GPU (÷8) | Payment |
|---|---|---|---|
| Pay-as-you-go | ~$98.32 | ~$12.29 | Billed per second |
| 1-year reserved (upfront) | ~$63.40 | ~$7.93 | Paid fully upfront |
| 1-year reserved (monthly) | ~$67.00 | ~$8.38 | Monthly installments |
| 3-year reserved (upfront) | ~$43.79 | ~$5.47 | Paid fully upfront |
| 3-year reserved (monthly) | ~$47.20 | ~$5.90 | Monthly installments |
| Spot (Low Priority VMs) | ~$18.00-$29.50 | ~$2.25-$3.69 | Subject to eviction |
A few notes on the reserved pricing math:
Upfront vs monthly payment. Paying upfront locks in the lowest effective hourly rate. Monthly payment reserved instances cost 5-7% more over the term. The tables above use upfront rates as the headline, since that is the best-case scenario Azure advertises. If you cannot pay the full term upfront, the monthly-payment rate applies.
Reserved capacity vs reserved pricing. A reserved instance purchase is a billing commitment; it does not guarantee that Azure will have NDv5 capacity available in your target region. If you buy a reservation and the region runs out of capacity, you still pay. Separate "capacity reservations" (a different product) guarantee compute slots but typically come with 1-year minimums of their own.
Spot availability. Low Priority VMs on NDv5 are not consistently available. Some East US and South Central US deployments have reasonable spot availability; Southeast Asia and UK South frequently show zero NDv5 spot capacity. If your workload depends on spot, verify regional availability before building on it.
Pricing fluctuates based on GPU availability. The prices above are based on 21 May 2026 and may have changed. Check current GPU pricing for live rates.
Regional Pricing Deltas
Azure pay-as-you-go pricing varies 5-15% across regions for the NDv5 series. Reserved pricing is consistent regardless of region.
| Region | Pay-as-you-go (per GPU) | Spot (per GPU) | Notes |
|---|---|---|---|
| East US | ~$12.29 | ~$2.25-$3.69 | Best capacity, typical baseline |
| South Central US | ~$12.29 | ~$2.60-$3.80 | Good capacity, similar OD pricing |
| West Europe | ~$13.15 | ~$2.80-$4.00 | ~7% premium vs East US |
| Southeast Asia | ~$13.50 | Not available | High OD price, frequent capacity gaps |
| UK South | ~$13.20 | Limited | Capacity often exhausted |
East US and South Central US offer the most consistent NDv5 capacity. West Europe is a reasonable fallback but adds cost. Southeast Asia and UK South should be treated as secondary options: capacity is frequently constrained, and spot is often unavailable for the NDv5 SKU.
Pricing fluctuates based on GPU availability. The prices above are based on 21 May 2026 and may have changed. Check current GPU pricing for live rates.
The Hidden Costs Azure Does Not Advertise
The $98.32/hr instance rate is only the start. Several additional charges push actual Azure NDv5 costs well above what the pricing calculator shows.
Outbound data transfer. Azure charges approximately $0.087/GB for the first 5 TB/month transferred out of East US, dropping slightly for higher volumes. Data moving from your NDv5 instances to on-premises storage, other cloud providers, or end users is billable. For teams running training jobs that produce multi-hundred-gigabyte checkpoints, this adds up fast.
Premium SSD managed disks. NDv5 local NVMe is ephemeral and not persisted across instance stops. For persistent storage, you need managed disks. Premium SSD P30 (1 TB) costs $0.17/GB/month, or roughly $170/TB/month. A 2 TB training dataset stored persistently costs $340/month before you run a single training step.
Azure Files for shared datasets. Multi-node training jobs that share datasets across ND96isr nodes typically use Azure Files Premium. At $0.21/GB/month provisioned, a 10 TB shared dataset volume costs $2,100/month in storage alone.
Azure Monitor and Log Analytics. Default monitoring is limited. Useful observability (GPU utilization, memory, temperature) requires Azure Monitor Metrics and Log Analytics workspaces. Costs vary by data ingested, but training workloads generating 10-50 GB of logs/month add $20-150/month at standard Log Analytics rates.
Support tiers. Azure's Developer support plan costs $29/month. Standard (with 2-hour severity-A SLA) is $100/month. Professional Direct (1-hour SLA, architecture support) is $1,000/month. These are per-subscription costs and are rarely accounted for in per-job cost estimates.
Worked example: A team running a single ND96isr H100 v5 node ($98.32/hr) for 40 hours/week training, with 5 TB of checkpoint egress per month and 2 TB on Premium SSD P30:
| Line item | Monthly cost |
|---|---|
| Compute (40 hrs/week x 4.3 weeks) | $16,911 |
| Egress (5 TB at $0.087/GB) | $435 |
| Premium SSD 2 TB | $340 |
| Azure Monitor (estimated) | $80 |
| Support (Standard tier) | $100 |
| Total | $17,866/month |
Pure compute is $16,911. The extras add $955/month, or a ~5.6% overhead. For a team running 4+ nodes simultaneously, the overhead scales proportionally (egress and storage grow with checkpoints), making the ratio worse.
Pricing fluctuates based on GPU availability. The prices above are based on 21 May 2026 and may have changed. Check current GPU pricing for live rates.
Azure ND H100 v5 vs Spheron H100: Direct Comparison
Per-GPU Hourly Cost
| Metric | Azure (NDv5 pay-as-you-go) | Azure (1-yr reserved) | Spheron H100 SXM5 |
|---|---|---|---|
| Per GPU/hr | $12.29 | $7.93 | $2.64 |
| vs Spheron | 4.7x higher | 3.0x higher | baseline |
| Egress fees | Yes ($0.087/GB) | Yes ($0.087/GB) | No |
| Commitment | None | 1 year | None |
| Quota wait | 1-4 weeks | 1-4 weeks | Instant |
Running H100 on Spheron costs $2.64/hr per GPU on-demand. Azure pay-as-you-go at $12.29/hr is 4.7x that. Even Azure's 3-year reserved rate ($5.47/hr) is 2.1x Spheron's on-demand price, and it requires a multi-year financial commitment up front.
Cost-Per-Million Tokens (Llama 3 70B FP8 Inference)
H100 SXM5 at batch size 8 delivers approximately 3,000 tokens/second for Llama 3 70B FP8. Using that benchmark, cost per million tokens works out as follows. For the full benchmark methodology, see cost-per-token benchmarks for LLM inference.
Formula: ($/hr / 3,600 sec) x (1,000,000 / tokens_per_sec)
| Provider | $/hr per GPU | $/million tokens (Llama 3 70B FP8, ~3,000 tok/s) |
|---|---|---|
| Azure NDv5 pay-as-you-go | $12.29 | $1.14 |
| Azure NDv5 1-yr reserved | $7.93 | $0.73 |
| Spheron H100 SXM5 (on-demand) | $2.64 | $0.24 |
| Spheron H100 SXM5 (spot) | $1.66 | $0.15 |
Note: Tokens/sec varies significantly with batch size, quantization, framework, and system prompt length. These figures assume batch size 8, FP8 quantization, and vLLM. Single-request throughput will be lower; high-batch throughput will be higher. The relative cost ratio between providers holds regardless of absolute throughput.
1-Year TCO: 8x H100, 40 Hours/Week Training
| Cost component | Azure NDv5 pay-as-you-go | Azure 3-yr reserved (yr 1) | Spheron H100 SXM5 |
|---|---|---|---|
| GPU hours/year (8 GPUs x 40 hrs/wk x 52 wks) | 16,640 | 16,640 | 16,640 |
| Compute cost/yr | $204,506 | $91,021 | $43,930 |
| Egress (10 TB/mo, $0.087/GB) | $10,440 | $10,440 | $0 |
| Storage (2 TB Premium SSD) | $4,080 | $4,080 | Included during compute |
| Monitoring + support | $2,160 | $2,160 | Included |
| Total 1-year TCO | $221,186 | $107,701 | ~$43,930 |
The 3-year reserved option gets Azure's annual TCO to $107,701 for this workload. That is still 2.5x Spheron's on-demand TCO without any multi-year lock-in. And the reserved rate requires paying for capacity whether you use it or not.
The hyperscaler GPU alternative analysis across AWS, GCP, and Azure shows this cost structure is consistent across all three platforms: raw GPU compute is expensive, and the ancillary fees make the gap wider on real workloads.
Pricing fluctuates based on GPU availability. The prices above are based on 21 May 2026 and may have changed. Check current GPU pricing for live rates.
Azure Quota and Capacity Reality Check
Getting NDv5 capacity on Azure is not just a matter of paying for it. The process has two distinct friction points: quota approval and actual capacity availability.
Quota approval. By default, Azure subscriptions have zero quota for ND-series instances. To use NDv5, you submit a quota increase request through Azure Portal: Subscriptions > Usage + Quotas, filter by the ND H100 series, and submit with a justification. Microsoft's SLA for quota decisions is typically 1-4 business days for small requests and up to 2 weeks for large vCPU counts. Teams that need 8+ ND96isr nodes (768 H100 GPUs) report processing times stretching to 3-4 weeks, with occasional rejections if the region lacks capacity commitments.
Capacity vs quota. Quota approval grants permission to request instances; it does not guarantee those instances will be available. You can have quota and still get SkuNotAvailable or OverconstrainedAllocationRequest errors when Azure's regional NDv5 capacity is exhausted. This happens most frequently in West Europe, Southeast Asia, and UK South. East US and South Central US are the most reliable regions, but even those see periodic capacity crunches.
Capacity reservations (separate product). Azure sells "capacity reservations" as a separate product from reserved instances. A capacity reservation guarantees compute availability but charges you the pay-as-you-go rate whether you use the instance or not. For NDv5, capacity reservations are typically only available with 1-year commitments. This means you pay $98.32/hr for 8,760 hours per year ($861,283/yr per node) regardless of actual utilization.
For teams with urgent training timelines, this quota-and-capacity friction is often the deciding factor in switching. Spheron provisions H100 capacity in under 5 minutes with no quota process.
When to Stay on Azure vs When to Switch
Stay on Azure when
Microsoft ecosystem integration is a hard requirement. If your authentication runs on Azure Active Directory, your data lives in Azure Data Lake, and your compliance team requires Microsoft-signed BAAs, the integration cost of migrating GPU compute to a different provider may exceed the savings. Azure AD, Azure ML pipelines, and AKS integrations are genuinely useful when you are already all-in on the Azure platform.
You have Enterprise Agreement credits to spend. EA credits cannot be transferred to other providers. If your company signed a multi-year EA with large committed spend, using those credits on NDv5 at $12.29/hr may be better than paying cash on a cheaper provider.
FedRAMP High or HIPAA compliance is mandatory. Azure has FedRAMP High authorization and provides a signed HIPAA BAA. For government contractors and regulated healthcare organizations where these certifications are non-negotiable, Azure or AWS are often the only viable options.
Workload is tightly coupled to Azure-specific services. SageMaker-equivalent managed training (Azure ML), managed Kubernetes (AKS), cognitive services, and Azure Blob Storage integrations take engineering time to replicate elsewhere. If decoupling these services would require weeks of refactoring, the ROI on switching diminishes.
Switch to Spheron when
You need H100 capacity now, not in three weeks. If your training timeline cannot absorb a 1-4 week quota approval process, Azure NDv5 is not a realistic option. Spheron provisions H100 on-demand in under 5 minutes.
The ~5x pricing delta is a budget constraint. At $12.29/hr per GPU vs $2.64/hr, Azure costs 4.7x more for the same NVIDIA H100 SXM5 hardware. For a 6-month training project running 40 hours/week on 8 GPUs, that is the difference between spending $102,000 and spending $21,965.
Your stack is Docker/SSH portable. If your training scripts run in Docker containers with standard PyTorch or JAX, moving them to Spheron requires changing an SSH host and an API endpoint. No code rewrites. No proprietary SDKs to remove.
You do not want a reserved commitment. Azure's best H100 rates require 1 or 3-year financial commitments. Spheron's $2.64/hr is the on-demand rate with no commitment ($1.66/hr on spot).
For teams considering the migration path, the Azure migration guide covers the full process step by step, from auditing current Azure spend to running your first workload on alternative infrastructure.
Azure NDv5 is capable hardware at an expensive price with substantial procurement friction. If your use case requires tight Azure integration or mandatory Microsoft compliance certifications, the cost may be justified. For pure GPU compute, the ~5x pricing difference and 1-4 week quota wait are hard to defend against available alternatives.
Spheron provides H100 SXM5 at $2.64/hr on-demand, with $1.66/hr spot pricing also available. No quota applications, no reserved commitments, and no egress fees. For teams that need H100 capacity without the Azure overhead, the math is straightforward.
Quick Setup Guide
Navigate to Azure Portal > Subscriptions > Usage + Quotas. Filter by 'Standard NDSv3 Family vCPUs' or the ND H100 series. Submit a quota increase request with your intended use case and vCPU count. Expect 1-4 weeks for approval. Include a business justification and specify the target region, as capacity varies significantly.
In Azure Portal or CLI, search for 'Standard_ND96isr_H100_v5'. This is the primary 8-GPU NDv5 SKU. Confirm the instance type is available in your region before committing to a reservation. In East US, South Central US, and West Europe, capacity is more consistently available than in Southeast Asia or UK South.
Calculate your expected data transfer out per month (training data, model checkpoints, API responses). Multiply by $0.087/GB for East US egress. Add Premium SSD costs for your dataset size. Add this to the hourly compute rate before comparing against alternative GPU clouds, where egress fees are typically zero.
Frequently Asked Questions
Azure's ND96isr H100 v5 instance (8x H100 SXM5 80GB) costs approximately $98.32/hr pay-as-you-go, which works out to $12.29/hr per GPU. With a 1-year reserved commitment the instance drops to roughly $63.40/hr ($7.93/hr per GPU). Spot pricing varies by region and availability, typically ranging from $2.25 to $3.69 per GPU per hour when capacity is available. For comparison, Spheron's H100 SXM5 is available at $2.64/hr on-demand or $1.66/hr on spot pricing, with no quota requirements.
The ND H100 v5 series (ND96isr H100 v5) uses NVIDIA H100 SXM5 80GB GPUs with 3.35 TB/s HBM3 bandwidth and NVLink 4. The older ND A100 v4 series uses NVIDIA A100 80GB SXM4 GPUs with 2 TB/s HBM2e bandwidth. The NDv5 provides roughly 3x the compute throughput (FP8 Tensor Core) and 1.7x the memory bandwidth of NDv4. Both are 8-GPU instances but the NDv5 has higher vCPU count (96) and substantially more RAM (1.9 TB).
NDv5 instances require a quota increase request through Azure support and are subject to regional capacity constraints. Most teams report 1-4 week wait times for quota approval, and on-demand availability is not guaranteed even with quota. Reserved instances offer better capacity guarantees but require 1 or 3-year commitments. Alternative GPU cloud providers like Spheron offer H100 capacity without quota applications, typically provisioning in under 5 minutes.
Azure charges approximately $12.29/hr per H100 GPU on the ND96isr H100 v5 instance. AWS charges approximately $6.88/hr per H100 on the p5.48xlarge (8-GPU) instance, making Azure roughly 79% more expensive than AWS for the same H100 hardware. Spheron's H100 on-demand rate is $2.64/hr, which is 78% below Azure and 62% below AWS.
Beyond the hourly compute rate, Azure charges for: outbound data transfer (approximately $0.087/GB for the first 5 TB/month from East US, dropping slightly for higher volumes), Premium SSD managed disks ($0.17/GB/month for P30 tier), Azure Files storage for shared datasets, and Azure Monitor/Log Analytics for observability. For a team training a 70B model with 5 TB of checkpoint egress per month, these extras add $400-$600/month on top of the ND96isr compute cost.
