Spheron GPU Catalog

NVIDIA RTX PRO 6000 GPU: 96GB Blackwell Specs, Pricing & Rental. Rent RTX PRO 6000 GPU from $0.90/hr

96GB GDDR7 ECC Blackwell. RTX PRO 6000 GPU rentals built to run 70B FP8 LLMs on a single GPU.

At a glance

You can rent an NVIDIA RTX PRO 6000 Blackwell on Spheron starting at $0.90/hr per GPU per hour, the lowest live marketplace rate. Per-minute billing, no long-term contracts, and instances deploy in under 2 minutes across data center partners in multiple regions. Each card ships with 96GB GDDR7 ECC, 1.79 TB/s memory bandwidth, 24,064 CUDA cores, and 5th generation Tensor Cores with native FP4 support, giving you the largest single-GPU VRAM available outside HBM datacenter SKUs. Perfect for teams that need to run 30B-70B LLMs at FP8 on a single GPU, fine-tune medium models with LoRA, or handle professional rendering and visualization workloads without stepping up to H100 pricing.

GPU ArchitectureNVIDIA Blackwell
VRAM96 GB GDDR7 ECC
Memory Bandwidth1.79 TB/s

NVIDIA RTX PRO 6000 specifications

GPU Architecture
NVIDIA Blackwell
VRAM
96 GB GDDR7 ECC
Memory Bandwidth
1.79 TB/s
Tensor Cores
5th Gen (752 cores)
CUDA Cores
24,064
RT Cores
4th Gen (188 cores)
FP32 Performance
126 TFLOPS
FP16 Tensor (dense)
504 TFLOPS
FP8 Tensor (dense)
1,008 TFLOPS
FP4 Tensor (dense)
2,016 TFLOPS
Form Factor
Workstation (dual-slot PCIe)
Interconnect
PCIe Gen5 x16
NVLink
Not supported
TDP
600W (Max-Q: 300W)

NVIDIA RTX PRO 6000 pricing

ProviderPrice/hrSavings
SpheronYour price
$0.90/hr-
Hyperstack
$1.80/hr2.0x more expensive
RunPod
$2.09/hr2.3x more expensive
CoreWeave
$2.50/hr2.8x more expensive
Latitude.sh
$5.9975/hr6.7x more expensive
Custom & Reserved

Need More RTX PRO 6000 Than What's Listed?

Reserved Capacity

Commit to a duration, lock in availability and better rates

Custom Clusters

8 to 512+ GPUs, specific hardware, InfiniBand configs on request

Supplier Matchmaking

Spheron sources from its certified data center network, negotiates pricing, handles setup

Need more RTX PRO 6000 capacity? Tell us your requirements and we'll source it from our certified data center network.

Typical turnaround: 24–48 hours

When to pick the RTX PRO 6000

Scenario 01

Pick RTX PRO 6000 Blackwell if

You want to run 30B-70B LLMs at FP8 on a single GPU without paying H100 rates. 96GB GDDR7 lets Llama 3.3 70B FP8, Qwen 2.5 32B FP16, and 70B AWQ models fit comfortably with KV cache headroom. Best single-GPU VRAM capacity below the H100/H200 price tier.

Recommended fit
Scenario 02

Pick RTX 5090 instead if

Your models fit in 32GB and you want the cheapest Blackwell hourly rate. RTX 5090 matches PRO 6000 on memory bandwidth (1.79 TB/s) and FP4 support, but lacks ECC and caps out at 32GB. Great for 7B-13B inference, SDXL, and Flux.

Recommended fit
Scenario 03

Pick L40S instead if

You need a datacenter-certified SKU with 48GB ECC and long-term multi-tenant support, and you don't need Blackwell FP4. L40S is purpose-built for inference serving and is widely available across hyperscalers.

Recommended fit
Scenario 04

Pick H100 or B200 instead if

You need HBM bandwidth (3.35-8 TB/s) and NVLink for multi-GPU tensor parallelism on 100B+ models. PCIe PRO 6000 has no NVLink, so scale-out is limited to data parallelism. For trillion-parameter training, go B200.

Recommended fit

NVIDIA RTX PRO 6000 use cases

Use case / 01
🎨

Professional Rendering

Use 4th generation RT Cores and Blackwell architecture for real-time ray tracing, CAD/CAM workflows, and digital content creation.

Real-time ray tracing for architectural visualizationCAD/CAM design and engineering workflowsDigital content creation and VFX pipelinesProduct design and photorealistic rendering
Use case / 02
🧠

AI Development & Fine-Tuning

Perfect for fine-tuning 7B-32B models and running 70B FP8 on a single GPU with 96GB of GDDR7 ECC memory.

LoRA and QLoRA fine-tuning of 7B-32B modelsLlama 3.3 70B FP8 and 70B AWQ inferenceQwen 2.5 32B FP16 fine-tuning with headroom for KV cacheTransfer learning and domain adaptation
Use case / 03

AI Inference

Cost-effective inference for 30B-70B models on a single GPU, with FP4 and FP8 Tensor Core acceleration.

Llama 3.3 70B FP8 and 70B AWQ on a single GPUReal-time image generation and diffusion modelsProduction inference APIs with dynamic batchingEdge AI and embedded deployment testing
Use case / 04
🔬

Scientific Visualization

Accelerate medical imaging, molecular visualization, and engineering simulation with professional-grade GPU compute.

Medical imaging and DICOM visualizationMolecular dynamics and protein structure visualizationEngineering simulation and CFD post-processingGeospatial data analysis and 3D mapping

NVIDIA RTX PRO 6000 benchmarks

Llama 3.1 8B Inference
~8,990 tokens/s
vLLM, batched aggregate
Llama 3.1 70B Inference
~24,000 tok/s
vLLM FP8, 100 concurrent requests (aggregate)
30B AWQ Throughput
~8,400 tokens/s
matches 4x RTX 4090 (CloudRift)
SDXL 1024x1024
~11 img/min
FP16, base + refiner
Memory Bandwidth
1.79 TB/s
GDDR7, 512-bit bus
vs RTX 6000 Ada
~2x faster
Blackwell FP4 + 2x VRAM

Serve Llama 3.3 70B FP8 on a single RTX PRO 6000

96GB GDDR7 is enough to load Llama 3.3 70B at FP8 (~70GB weights) with room for KV cache at moderate batch sizes. vLLM gives you an OpenAI-compatible endpoint in one command.

bash
Spheron
# SSH into your RTX PRO 6000 instancessh root@<instance-ip> # Install vLLM with CUDA 12.4+ (Blackwell FP8 kernels)pip install vllm>=0.6.3 # Launch Llama 3.3 70B at FP8vllm serve meta-llama/Llama-3.3-70B-Instruct \  --quantization fp8 \  --max-model-len 8192 \  --gpu-memory-utilization 0.92 # Test the endpointcurl http://localhost:8000/v1/chat/completions \  -H "Content-Type: application/json" \  -d '{"model":"meta-llama/Llama-3.3-70B-Instruct","messages":[{"role":"user","content":"Hello"}]}'

For 30B-class models (Qwen 2.5 32B, Mixtral 8x7B), FP16 fits comfortably and lets you serve higher concurrency.

NVIDIA RTX PRO 6000 guides and resources

01Technical Brief

NVIDIA RTX PRO 6000 Blackwell Release Date and Cloud Availability

The NVIDIA RTX PRO 6000 Blackwell was announced at GTC March 2025 as the professional workstation flagship of the Blackwell generation, succeeding the RTX 6000 Ada Generation. Production shipments began Q2 2025 in both workstation and Server Edition (passively cooled, datacenter-form-factor) variants. GPU cloud availability followed in Q3-Q4 2025, with Spheron, RunPod, Lambda, and other neo-clouds adding the Server Edition through late 2025 and into 2026.

On Spheron the RTX PRO 6000 Blackwell is available with per-minute billing and no contract, deployed via data center partners. Live availability and pricing is on the pricing page. The RTX PRO 6000 sits between the consumer RTX 5090 (32GB GDDR7) and the data-center-class H100 (80GB HBM3) on hourly cost while exceeding both on VRAM at 96GB.

02Technical Brief

RTX PRO 6000 VRAM: 96GB GDDR7 ECC at 1.79 TB/s

The RTX PRO 6000 Blackwell ships with 96GB of GDDR7 ECC memory at 1.79 TB/s of bandwidth. That is 3x the VRAM of the RTX 5090 (32GB GDDR7 at 1.79 TB/s) and 1.2x the VRAM of the H100 (80GB HBM3 at 3.35 TB/s, where bandwidth is the H100's advantage). The 96GB GDDR7 makes the RTX PRO 6000 the largest single-GPU VRAM available outside of HBM-equipped data center cards.

Where the 96GB VRAM matters: a 70B parameter model fits in FP8 quantization on a single RTX PRO 6000 with room for KV cache and a small batch, a 30B-class model fits in BF16, and Stable Diffusion XL with all the LoRA adapters and ControlNets you can throw at it runs comfortably without OOM. ECC support adds enterprise-grade reliability that the consumer RTX 5090 lacks, which matters for production rendering, visualization, and 24/7 inference serving. For workloads requiring HBM bandwidth (large-batch inference, distributed training with InfiniBand), the H100 or H200 is the better fit.

FAQ / 11

NVIDIA RTX PRO 6000 FAQ

NVIDIA RTX PRO 6000 alternatives and related GPUs