TL;DR: GCP A3 H100 vs Spheron H100 (22 May 2026)
| Metric | GCP A3 High (H100) | Spheron H100 SXM5 |
|---|---|---|
| On-demand $/hr per GPU | ~$10.98 | $2.64 |
| Spot $/hr per GPU | ~$3.69 | $1.66 |
| 1yr CUD $/hr per GPU | ~$8.78 | N/A |
| 3yr CUD $/hr per GPU | ~$5.93 | N/A |
| Monthly TCO (8xH100, 720 hr) | ~$63,302 | ~$15,206 (on-demand) / ~$9,562 (spot) |
| Egress fees | $0.08-$0.12/GB | None |
| Persistent disk | Separate billing | Included |
| Minimum commitment | None (on-demand) | None |
Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates. GCP rates are from the public GCP pricing page as of the same date; verify with the GCP pricing calculator before budgeting.
GCP's A3 instances give you H100 SXM5 GPUs with the full Google Cloud stack attached. If your team is already deep in Vertex AI or BigQuery ML, that stack has real value. If not, you're paying a 6x premium for the same CUDA hardware. This post covers every pricing tier for GCP A3, including the costs that don't appear in the headline hourly rate, and compares them against Spheron's current H100 rates.
For a broader comparison across 5+ providers, see our GPU cloud pricing comparison 2026.
GCP A3 Instance Family Overview
The A3 series is Google Cloud's current flagship GPU family. Three configurations exist:
| Instance | GPUs | vCPUs | RAM | Local SSD | Network | Primary Use |
|---|---|---|---|---|---|---|
| a3-highgpu-8g | 8x H100 SXM5 80GB | 208 | 1.872 TB | 6 TB | 200 Gbps | General large model training |
| a3-megagpu-8g | 8x H100 SXM5 80GB | 208+ | Higher | 6 TB | 200 Gbps | Memory-intensive workloads |
| a3-edgegpu-8g | 8x H100 SXM5 80GB | 208 | 1.872 TB | 6 TB | 200 Gbps | Low-latency edge inference |
A3 High has the broadest regional footprint, available in us-central1, us-east4, europe-west4, asia-southeast1, and additional zones. A3 Mega and A3 Edge have more limited availability. Pricing for A3 Mega is not published on the public GCP pricing page and requires a quota request or direct sales contact. A3 Edge pricing similarly varies by region and availability tier.
For a complete H100 SXM5 spec breakdown including memory bandwidth, Tensor Core throughput, and MIG profiles, see our NVIDIA H100 specs guide.
A3 H100 Per-Hour Pricing
The table below shows a3-highgpu-8g pricing per GPU (full node price divided by 8) in us-central1. All prices in USD.
| Pricing Model | $/hr per GPU (us-central1) |
|---|---|
| On-demand | ~$10.98 |
| Spot (preemptible) | ~$3.69 |
| 1yr CUD (approximate) | ~$8.78 |
| 3yr CUD (approximate) | ~$5.93 |
The CUD figures above apply standard GCP discount tiers (approximately 20% for 1yr, 46% for 3yr). Actual committed-use pricing varies by negotiation and region. Verify with the GCP pricing calculator before signing any commitment.
Regional pricing for europe-west4 and asia-southeast1 typically runs 7-15% above us-central1 rates. us-east4 is close to parity with us-central1.
A3 Mega pricing is not publicly listed. A3 Edge pricing varies by region and availability tier and should be confirmed through a quota request or GCP sales.
Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates.
What GCP A3 Pricing Includes
The hourly rate covers the compute instance only. Here is what each a3-highgpu-8g provides:
| Component | Specification |
|---|---|
| GPUs | 8x NVIDIA H100 SXM5 80GB |
| GPU interconnect | NVLink 900 GB/s bidirectional |
| vCPUs | 208 cores |
| Host RAM | 1.872 TB |
| Local SSD | 6 TB (ephemeral) |
| Network | 200 Gbps |
What the hourly rate does NOT cover:
- Persistent disk: Standard HDD is approximately $0.04/GB/month, SSD is approximately $0.17/GB/month. The included 6 TB local SSD is ephemeral: data is deleted when the instance stops or is preempted.
- Data egress: $0.08-$0.12/GB for data leaving the GCP VPC, and $0.01-$0.08/GB for cross-region transfers within GCP.
- Managed services: Vertex AI, Managed Notebooks, and other Google-managed ML services bill separately on top of the compute rate.
- Cloud NAT and IP addresses: Outbound NAT gateway usage and reserved static IP addresses carry additional fees.
Hidden Costs in GCP A3 Billing
The sticker price understates the real cost of typical AI workloads. Here is where the gap grows:
Data egress fees
GCP charges $0.08-$0.12/GB for data leaving the VPC. Training jobs with large model checkpoints that move between nodes, regions, or to external storage accumulate egress charges fast. A one-time 2TB dataset transfer out of GCP costs $160-$245 depending on destination.
Persistent disk
Local SSD is ephemeral. Stop the instance and the data is gone. Any persistent checkpoint storage requires attaching a Persistent Disk. Standard PD at $0.04/GB/month means 10TB of checkpoint storage costs $400/month on top of the GPU rate. SSD PD at $0.17/GB/month for 10TB is $1,700/month extra.
Sustained-use vs committed-use discounts
GCP applies sustained-use discounts (SUDs) automatically to on-demand instances that run more than 25% of the month, up to 30% at full monthly runtime. But SUDs and CUDs are mutually exclusive. Switching to a committed-use contract removes the automatic SUD you were already receiving on on-demand. Teams often miscalculate the CUD breakeven because they forget to account for the SUD they give up when committing.
A worked example: 500hr/month with 2TB egress and 10TB SSD storage
| Cost item | GCP A3 High | Spheron On-Demand | Spheron Spot |
|---|---|---|---|
| Compute (8x H100, 500hr) | $43,920 | $10,560 | $6,640 |
| Egress (2TB at $0.08/GB) | $164 | $0 | $0 |
| Persistent SSD (10TB) | $1,700 | $0 | $0 |
| Total | $45,784 | $10,560 | $6,640 |
Spheron's billing model: per-minute compute charges with no egress fees, and storage included during the compute session.
Region-by-Region Price Differences
A3 High availability and approximate pricing by region:
| Region | Relative rate | Notes |
|---|---|---|
| us-central1 | Base (~$10.98/hr per GPU) | Broadest availability |
| us-east4 | ~0-3% higher | Limited zones |
| europe-west4 | ~7-12% higher | Netherlands; GDPR residency option |
| asia-southeast1 | ~10-15% higher | Singapore |
A3 Mega and A3 Edge have sparser coverage. As of 22 May 2026, A3 Mega is limited to select US and European regions, and A3 Edge is primarily in early-access zones. Verify zone availability before building deployment plans around specific configurations.
GCP A3 vs Spheron H100: Full TCO Comparison
Per-GPU hourly cost
| Config | GCP A3 On-Demand | GCP A3 Spot | Spheron On-Demand | Spheron Spot |
|---|---|---|---|---|
| H100 SXM5 | ~$10.98 | ~$3.69 | $2.64 | $1.66 |
Spheron on-demand runs ~76% below GCP A3 on-demand list price; Spheron spot runs ~85% below.
Teams looking for H100 SXM5 cloud instances without requiring a GCP-native stack can eliminate the premium entirely.
Cost per million output tokens (LLaMA 3.3 70B, vLLM, FP8)
Based on 1,850 tokens/sec throughput from our vLLM vs TensorRT-LLM vs SGLang benchmarks on H100 SXM5 FP8:
| Provider | $/hr per GPU | Cost per 1M output tokens |
|---|---|---|
| Spheron spot | $1.66 | $0.25 |
| Spheron on-demand | $2.64 | $0.40 |
| GCP A3 3yr CUD | ~$5.93 | ~$0.89 |
| GCP A3 1yr CUD | ~$8.78 | ~$1.32 |
| GCP A3 on-demand | ~$10.98 | ~$1.65 |
Formula: (1,000,000 tokens) / (1,850 tok/s 3,600 s/hr) $/hr = 0.150 * $/hr
Monthly TCO for 8x H100 running 720 hours
| Config | Compute only | With typical overhead |
|---|---|---|
| GCP A3 on-demand | ~$63,302 | ~$65,166 |
| GCP A3 1yr CUD | ~$50,573 | ~$52,437 |
| GCP A3 3yr CUD | ~$34,157 | ~$36,021 |
| Spheron on-demand | ~$15,206 | ~$15,206 |
| Spheron spot | ~$9,562 | ~$9,562 |
GCP overhead uses 2TB egress at $0.08/GB ($164) plus 10TB SSD persistent disk at $0.17/GB/month ($1,700). Spheron has no egress fees and includes storage during compute sessions.
When GCP A3 Makes Sense
GCP A3 is worth the premium in these specific cases:
- Deep Vertex AI integration: If Vertex Pipelines, AutoML, or Vertex AI Workbench is core infrastructure rather than incidental tooling, the tight A3 integration reduces operational overhead and narrows the effective cost gap.
- BigQuery ML pipelines: Training loops that read large datasets directly from BigQuery benefit from internal transfer rates that are cheaper than cross-cloud egress. If your data is already in BigQuery and relocating it would cost more than the per-GPU premium, staying on GCP makes sense.
- Compliance or data residency requirements: Workloads bound to a specific GCP-certified region with no equivalent alternative may have no practical option.
- Negotiated enterprise CUD rates: Large accounts sometimes negotiate CUD pricing below the public list. At the right discount, the effective per-GPU rate can approach what independent GPU cloud providers charge on-demand.
When to Use Spheron Instead
- Framework-portable workloads: Any stack built on vLLM, SGLang, TensorRT-LLM, PyTorch, or standard CUDA containers runs identically on Spheron's H100 SXM5 instances without modification.
- No GCP service dependencies: If your data lives in S3-compatible storage or can be moved without significant egress cost, there is no structural reason to pay the GCP premium.
- Checkpoint-heavy training: Large models generate large checkpoints. GCP's per-GB egress adds up when checkpoints cross instance, region, or cloud boundaries. Spheron's zero-egress model eliminates this variable.
- On-demand without a multi-year commitment: Spheron charges per-minute with no minimum. This fits teams whose GPU utilization varies or whose model roadmap evolves faster than a 3-year commitment horizon.
Migration Considerations
Moving from GCP A3 to Spheron is primarily a container portability story. Any Docker image running on A3 runs on Spheron without modification, because both platforms provide the same H100 SXM5 hardware with the same CUDA environment.
What changes:
- Storage: Replace Cloud Storage buckets with S3-compatible storage or direct instance storage. Use
rcloneorgsutil rsyncfor one-time migration. Expect $0.08/GB egress from GCP for the initial data transfer. After migration, ongoing checkpoint I/O has no egress charges. - Access model: Spheron provides SSH root access. There is no IAM role system, service account model, or VPC security group to configure. Access is key-based.
- Networking: No VPC peering setup required. Instances are accessible directly via SSH with a public IP.
What stays the same:
- CUDA drivers and versions
- Container runtime (Docker, NVIDIA Container Toolkit)
- Framework code (PyTorch, JAX, vLLM, etc.)
- Data format and checkpoint structure
For a step-by-step migration guide, see Migrating from AWS, GCP, or Azure to Spheron.
Pricing fluctuates based on GPU availability. The prices above are based on 22 May 2026 and may have changed. Check current GPU pricing → for live rates. GCP rates are from the public GCP pricing page as of the same date; verify with the GCP pricing calculator before budgeting.
Teams running H100 inference without GCP-native service dependencies can cut compute bills significantly. Spheron provides bare-metal H100 SXM5 access through data center partners globally, with no egress fees and per-minute billing.
Rent H100 on Spheron → | View all GPU pricing → | Start now →
Frequently Asked Questions
GCP A3 High (a3-highgpu-8g) runs approximately $10.98/hr per GPU on-demand in us-central1 as of 22 May 2026. The full 8-GPU node is approximately $87.84/hr. Spot pricing is approximately $3.69/hr per GPU. A 1yr committed-use discount brings it to around $8.78/hr per GPU. Verify the current rate with the GCP pricing calculator before budgeting. Spheron H100 SXM5 starts at $2.64/hr per GPU on-demand or $1.66/hr spot. Check current rates at /pricing/.
Each a3-highgpu-8g instance includes 8x NVIDIA H100 SXM5 80GB GPUs, 208 vCPUs, 1.872 TB RAM, 6 TB local SSD, and 200 Gbps network. GPUs are connected via NVLink. Pricing does not include persistent disk, data egress, or managed services such as Vertex AI.
A3 High (a3-highgpu-8g) is the standard 8xH100 SXM5 configuration. A3 Mega (a3-megagpu-8g) adds more host RAM and is positioned for memory-intensive multi-tenant workloads. A3 Edge (a3-edgegpu-8g) targets lower-latency edge deployments. All three use the same H100 SXM5 80GB GPU die but differ in host configuration and availability by region.
GCP offers Spot VMs for A3 instances in select regions, at approximately $3.69/hr per GPU in us-central1 as of 22 May 2026. Spot VMs can be preempted with a short notice period and reclaimed by GCP at any time, making them unsuitable for training runs without checkpoint/resume logic. Spheron spot pricing for H100 SXM5 is $1.66/hr per GPU with per-minute billing. See current rates at /pricing/.
GCP A3 makes sense when your stack is tightly integrated with BigQuery ML, Vertex AI, or Google Cloud Storage and the migration cost of decoupling exceeds the GPU price difference. For teams running standalone training or inference without GCP-native service dependencies, Spheron H100 on-demand rates run approximately 76% lower than GCP A3 list price, and spot rates run approximately 85% lower, with no committed-use requirement.
