NVIDIA GH200 Grace Hopper: Specs, Pricing & Rental. Rent GH200 GPU from $3.02/hr
96GB HBM3 + 432GB LPDDR5X unified memory over 900 GB/s NVLink-C2C. Per-minute billing on GH200 GPU rentals.
You can rent an NVIDIA GH200 Grace Hopper Superchip on Spheron starting at $3.02/hr per GPU per hour, the lowest live marketplace rate. Per-minute billing, no long-term contracts, and instances deploy in under 2 minutes across data center partners in multiple regions. Each module ships with 96GB HBM3 on the Hopper GPU plus 432GB LPDDR5X on the Grace ARM CPU, connected by 900 GB/s NVLink-C2C. That gives you ~528GB of cache-coherent unified memory in a single socket, eliminating the PCIe bottleneck for inference workloads with large KV caches, graph workloads with billion-edge datasets, and genomics pipelines that spill beyond GPU VRAM.
NVIDIA GH200 specifications
NVIDIA GH200 pricing
| Provider | Price/hr | Savings |
|---|---|---|
SpheronYour price | $3.02/hr | - |
Lambda Labs | $1.99/hr | - |
CoreWeave | $6.50/hr | 2.2x more expensive |
Need More GH200 Than What's Listed?
Reserved Capacity
Commit to a duration, lock in availability and better rates
Custom Clusters
8 to 512+ GPUs, specific hardware, InfiniBand configs on request
Supplier Matchmaking
Spheron sources from its certified data center network, negotiates pricing, handles setup
Need more GH200 capacity? Tell us your requirements and we'll source it from our certified data center network.
Typical turnaround: 24–48 hours
When to pick the GH200
Pick GH200 if
Your workload needs memory beyond 96GB of HBM but isn't worth paying B200/H200 rates, or your model spills KV cache onto system memory and you need coherent access. Also the sweet spot for graph neural networks, genomics pipelines, and recommendation models with huge embedding tables.
Pick H100 80GB instead if
Your model fits in 80GB HBM3 and you want maximum multi-GPU training throughput with NVLink + InfiniBand. H100 SXM5 is the standard for 8-way tensor parallelism and pre-training runs where CPU memory isn't in the critical path.
Pick H200 141GB instead if
You need more GPU-side HBM than 96GB, but don't need the unified memory architecture. H200 gives you 141GB HBM3e at 4.8 TB/s, a cleaner fit for 70B+ inference without going ARM.
Pick B200 192GB instead if
You need Blackwell FP4 Transformer Engine, 8 TB/s bandwidth, and the latest NVLink 5. B200 is the choice for 200B+ model training, and its dedicated HBM3e beats GH200's unified memory for bandwidth-bound workloads.
NVIDIA GH200 use cases
AI Inference & Serving
Use the 432GB unified memory pool to serve large AI models with enormous KV caches, enabling high-throughput inference without CPU-GPU data transfer overhead.
Large Dataset Processing
Utilize the 432GB unified memory architecture to process datasets that don't fit in GPU VRAM alone, eliminating costly data transfers between CPU and GPU memory.
Scientific Computing & HPC
Combine the energy-efficient ARM Grace CPU with the powerful Hopper GPU for high-performance computing workloads.
Edge AI & Autonomous Systems
Deploy the compact superchip form factor for edge AI applications requiring powerful inference in a single integrated module.
NVIDIA GH200 benchmarks
Serve Llama 3.1 70B with a massive KV cache on GH200
The GH200's 96GB HBM3 holds Llama 3.1 70B at FP8 (~70GB), and the 432GB LPDDR5X CPU memory over NVLink-C2C lets you extend the effective working set far beyond what a pure HBM card can hold.
# SSH into your GH200 instance (ARM64 / aarch64)ssh ubuntu@<instance-ip> # Install vLLM for ARM with CUDA 12.4+pip install vllm # Launch Llama 3.1 70B with FP8, long contextvllm serve meta-llama/Llama-3.1-70B-Instruct \ --quantization fp8 \ --max-model-len 32768 \ --gpu-memory-utilization 0.9 \ --enforce-eager # Sanity checkcurl http://localhost:8000/v1/modelsMost major ML frameworks (PyTorch, JAX, vLLM) have native ARM64 wheels. If you hit a package without an ARM build, NVIDIA's NGC containers cover the common cases.
NVLink-C2C Configuration
The GH200 Grace Hopper Superchip features NVLink-C2C (Chip-to-Chip) interconnect providing 900 GB/s bidirectional coherent bandwidth between the Grace CPU and Hopper GPU, eliminating the traditional PCIe bottleneck and enabling seamless unified memory access across the entire module.
Need a custom multi-node cluster or reserved capacity? Talk to us about topology, regions, and committed pricing.
NVIDIA GH200 guides and resources
NVIDIA GH200 Grace Hopper Superchip: Architecture and Performance Guide
Deep dive into GH200 architecture, unified memory, ARM-based Grace CPU, and ideal use cases.
Best NVIDIA GPUs for LLMs: Complete Ranking Guide
How the GH200 ranks against H100, H200, and A100 for large language model workloads.
GPU Memory Requirements for LLMs: VRAM Calculator and Sizing Guide
Calculate exactly how much VRAM you need, and why GH200's 96GB + 432GB unified memory matters.
NVIDIA GH200 Grace Hopper Superchip Release Date and Cloud Availability
The NVIDIA GH200 Grace Hopper Superchip was announced at Computex May 2023 and started shipping in Q3 2023. The HBM3 variant launched first with 96GB of HBM3 on the Hopper GPU plus 480GB of LPDDR5X on the Grace ARM CPU. The HBM3e refresh followed in 2024, bumping the GPU memory to 144GB. Cloud availability rolled out through 2024: CoreWeave was first to GA, with Lambda Labs, Nebius, and smaller GPU-focused clouds following through Q3-Q4 2024.
On Spheron the GH200 is available with per-minute billing and no contract. Live availability and pricing is on the pricing page. The GH200 sits in a niche between H100 SXM5 (faster GPU-only workloads) and the upcoming Rubin generation (HBM4). Its successor is the GB200 NVL72 rack-scale system, which pairs Grace CPUs with Blackwell B200 GPUs at much larger scale.
GH200 VRAM and Unified Memory: 96GB HBM3 + 432GB LPDDR5X
The GH200 ships with two memory pools accessible as one unified address space: 96GB of HBM3 at 4 TB/s directly on the Hopper GPU, plus 432GB of LPDDR5X on the Grace ARM CPU connected over 900 GB/s NVLink-C2C (cache-coherent chip-to-chip). The HBM3e variant has 144GB instead of 96GB on the GPU side. Total addressable cache-coherent memory per socket: 528GB or 576GB depending on variant.
Where the unified memory architecture matters: graph neural network training where the embedding table exceeds GPU VRAM, genomics pipelines that load full datasets into LPDDR5X then stream working sets through HBM3, billion-edge graph queries, and any long-context LLM workload that needs KV cache offload without the PCIe bottleneck. For standard LLM inference under 70B parameters, the GH200's advantage over a plain H100 is smaller, the H100 SXM5 or H200 SXM5 is often the better economic fit. For memory-bound workloads beyond 96GB, the GH200's unified memory is the unique architectural advantage.