Can I run open-source Ray on a GPU cloud without Anyscale?

Yes. Open-source Ray is Apache-licensed and runs on any Linux machine with Python 3.8+. You provision GPU instances (one head node, one or more workers), install Ray with pip install ray, start the head with ray start --head --port=6379, connect workers with ray start --address= :6379 --num-gpus= , and submit jobs with ray job submit. For Kubernetes-based clusters, KubeRay provides a Kubernetes operator that manages RayCluster, RayJob, and RayService custom resources. No Anyscale account or license required.

10 Anyscale Alternatives for Ray Training and KubeRay (2026)

TL;DR: For self-hosted Ray with no markup: Spheron (bare-metal, KubeRay, per-minute billing). For AWS-native teams: Ray on EKS. For multi-cloud cost arbitrage: SkyPilot.

Provider	H100/hr	Ray Support	KubeRay	Best For
Spheron	$4.34	KubeRay, vanilla Ray	Yes	Max flexibility, lowest markup
AWS (Ray on EKS)	~$4.00	KubeRay native	Yes	AWS-native orgs
SkyPilot	Any cloud	Ray-compatible	Yes	Multi-cloud cost arbitrage

Full 10-provider comparison with RLHF cluster fit and migration steps below.

Anyscale charges a platform markup on top of whatever cloud GPUs your cluster runs on. That markup typically adds 50-100% on top of bare-metal GPU rates. An 8x H100 cluster that costs around $34/hr on bare-metal translates to $48-$64/hr on Anyscale's effective rate, depending on your tier and configuration. For teams running training jobs that span hundreds of hours per month, that gap is substantial.

The more important thing to understand: Ray is Apache-licensed open-source software. Everything Anyscale builds on top of it, the cluster lifecycle management, managed Ray Serve, the hosted dashboard, the RLHF training stack, can be self-hosted on any GPU cloud. Teams with H100 SXM5 instances on Spheron can run the exact same Ray code they run on Anyscale today. The only difference is who manages the control plane.

The question is whether the operational savings justify the markup. For a five-person research team that treats Ray as an afterthought, Anyscale's managed ops might be worth it. For a team where Ray is central infrastructure and GPU cost is a meaningful line item, the math almost always favors self-hosting. Teams that want to rent GPUs directly and run their own Ray clusters can deploy bare-metal H100 nodes on Spheron in minutes. This post covers a decision framework and 10 alternatives with specific H100 cluster pricing and workload fit breakdowns.

What Anyscale Actually Offers

Anyscale is a managed platform that wraps open-source Ray. Understanding what it provides helps clarify what you are paying for and what you give up by moving off it.

Managed Ray clusters are the core product. Anyscale handles cluster lifecycle: auto-scaling node pools up and down, replacing failed nodes, managing head-node failover, and versioning cluster environments (the combination of Ray version, Python version, and ML framework dependencies that you pin per project). Without a managed platform, you handle all of this with scripts, monitoring, and manual intervention.

RLHF and post-training stack: Anyscale ships pre-built environment images for common post-training workflows, including OpenRLHF, alignment tuning, and GRPO training. These are essentially Docker images with curated dependency combinations that are known to work together. You can replicate this with your own container images, but the curated environments save setup time.

Ray Serve hosting: Anyscale's managed inference deployment layer runs on Ray Serve under the hood. You get auto-scaling, A/B traffic routing, and a Ray Serve dashboard without setting up a KubeRay operator or managing Ray head nodes yourself.

What Anyscale is NOT: it is not a GPU cloud. It runs on AWS, GCP, and Azure. You pay cloud GPU rates (AWS p4d, p5, GCP A3, Azure ND H100) plus the Anyscale platform markup on top. This means you also inherit the GPU SKU limitations of those providers. If AWS does not have H200 instances in the region you need, Anyscale does not either.

Why Teams Look for Alternatives

Managed orchestration tax

The effective per-hour cost on Anyscale for H100 compute is typically 1.5-2x over bare-metal rates. On AWS, the p5.48xlarge (8x H100 SXM5) works out to roughly $4.00/hr per GPU when you divide the bundle cost. Anyscale's platform markup brings the effective rate to roughly $5-8/hr per GPU depending on cluster size and tier. That math does not include data transfer, storage, or the AWS base cost, which Anyscale passes through.

For an 8x H100 cluster running 200 hours a month, the difference between Anyscale effective rates (~$48/hr) and bare-metal GPU cloud (~$34/hr) is over $2,800 per month on that one cluster alone.

Multi-cloud lock-in

Anyscale clusters run exclusively on AWS, GCP, and Azure. You cannot route workloads to specialty GPU providers with different GPU availability profiles. When AWS p5 instances are constrained in your region, your only option on Anyscale is to switch regions or wait. On a multi-cloud setup, you can shift to a provider with availability.

Custom GPU SKU access

Anyscale does not expose H200, B200, B300, GH200, or RTX PRO 6000 instances. Those GPUs are not widely available through Anyscale's partner cloud providers at scale. Custom post-training labs running RLHF on H200 or B200 clusters, where VRAM per GPU is the binding constraint, need direct bare-metal access that Anyscale cannot provide.

Operational complexity vs control

Some teams want full root access, the ability to set custom NCCL flags, custom InfiniBand topology configs, and the option to run non-Ray orchestrators (DeepSpeed with SLURM, FSDP with torchrun) alongside Ray on the same cluster. Managed platforms abstract away that layer, which is a genuine feature for teams that do not need it and a real constraint for teams that do.

Decision Framework

Situation	Recommendation
Small team, Ray expertise is secondary, want managed ops	Anyscale managed Ray
Ray-first team, need full GPU SKU access, cost-sensitive	Self-hosted Ray on bare-metal GPU cloud (Spheron, CoreWeave)
Not Ray-native, workload is training-only or inference-only	Consider Ray-free alternatives (Modal, Together AI, SkyPilot)

Quick Comparison: Anyscale vs 10 Alternatives

Provider	H100/hr	8x H100/hr	Ray support	Billing	Best for
Anyscale (via AWS)	~$5-8 effective	~$40-64 effective	Managed (first-class)	Per-hour + platform fee	Teams wanting managed Ray ops
Spheron	$4.34	$34.72	KubeRay, vanilla Ray	Per-minute	Max flexibility, lowest cost
AWS (Ray on EKS)	~$4.00	~$32	KubeRay on EKS	Per-hour	AWS-native teams
GCP (Ray on GKE)	~$3.85	~$30.80	KubeRay on GKE	Per-hour	GCP-native teams
CoreWeave	Custom	Custom	KubeRay	Per-hour	Large-scale bare-metal clusters
RunPod	~$2.69	~$21.52	Self-install Ray	Per-second (serverless) / per-hour	Mixed dedicated + serverless
Lambda Labs	~$2.49	~$19.92	Ray + SLURM	Per-hour	Training-focused teams
Modal	~$3.95 effective	N/A (serverless)	Python-native	Per-second	Serverless burst jobs
Together AI	$3.49 (Instant Clusters)	$27.92	Platform-managed	Per-hour	Fine-tuning workloads
Nebius	Custom	Custom	KubeRay (self-deployed)	Per-hour	EU data residency
SkyPilot	Any cloud	Any cloud	Ray-compatible	Depends on cloud	Multi-cloud Ray orchestration

GPU rates fetched 06 May 2026. Third-party rates are publicly listed on-demand prices as of 06 May 2026 and fluctuate. Anyscale effective rate includes platform markup over underlying cloud compute.

1. Spheron

Bare-metal GPU cloud, no orchestration markup, per-minute billing

Spheron gives you raw GPU instances with root access and no managed orchestration layer on top. For Ray workloads, that means you pay infrastructure cost only: no platform fee, no cluster management markup, no lock-in to a specific Ray version or environment image. The tradeoff is that you handle Ray cluster setup and lifecycle yourself, which is a few hours of work the first time and trivial ongoing maintenance for a team that already runs Ray.

How to run Ray on Spheron

The simplest setup is a vanilla Ray cluster with a head node and worker nodes over a private IP network. Provision your instances at app.spheron.ai, ensure they share a private subnet, then:

bash

# Head node
pip install "ray[default,train,serve]"
ray start --head --port=6379 --dashboard-host=0.0.0.0 --num-gpus=8 --block

# Worker nodes (repeat for each)
pip install "ray[default,train,serve]"
ray start --address=<HEAD_PRIVATE_IP>:6379 --num-gpus=8 --block

# Verify
ray status

For Kubernetes-based clusters, KubeRay runs natively on any Kubernetes deployment. Install the operator via Helm and apply a RayCluster CRD:

yaml

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: my-training-cluster
spec:
  rayVersion: '2.40.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "4"
              memory: "32Gi"
            limits:
              nvidia.com/gpu: "1"
  workerGroupSpecs:
  - replicas: 8
    minReplicas: 8
    maxReplicas: 8
    groupName: gpu-workers
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "16"
              memory: "128Gi"
            limits:
              nvidia.com/gpu: "8"

You can also use SkyPilot targeting Spheron as the cloud provider for multi-cloud Ray job submission with cost arbitrage.

Pricing

Spheron H100 SXM5 runs at $4.34/hr per GPU on-demand. An 8x H100 cluster costs $34.72/hr on-demand. For a 200-hour training month, that is $6,944. Check live rates at any time since pricing updates with GPU availability. See the Spheron H100 instances page for current availability and spot pricing when offered.

Workload fit

Best for distributed pretraining, RLHF/GRPO post-training (policy + reference model setups require 8x H100 SXM5 nodes with InfiniBand), Ray Tune hyperparameter sweeps where per-minute billing reduces idle cost, and Ray Serve deployments that need full control over replica placement and NCCL flags. For the Ray Serve setup specifics, see the Ray Serve setup guide and multi-node training guide.

For SSH key setup and GPU instance provisioning steps, see the Spheron documentation.

2. AWS with Ray on EKS

Managed Kubernetes, AWS GPU instances, KubeRay-native

AWS Elastic Kubernetes Service with KubeRay is the most common enterprise path for teams that already run AWS infrastructure. P4d.24xlarge instances provide 8x A100 80GB; p5.48xlarge provides 8x H100 SXM5 with 3200 Gbps EFA network interconnect.

The KubeRay Helm chart deploys the operator onto an EKS cluster in under 10 minutes. From there, RayCluster, RayJob, and RayService CRDs work the same as anywhere else. The AWS-native advantage is ecosystem integration: S3 for checkpoints, ECR for container images, CloudWatch for metrics, and IAM for fine-grained access control.

H100 on-demand on AWS (p5.48xlarge) works out to roughly $4.00/hr per GPU ($32/hr for 8x). That is higher than bare-metal GPU clouds but lower than Anyscale's effective rate. EKS cluster management adds $0.10/hr per cluster plus node overhead.

Good fit for: teams with existing AWS infrastructure, compliance requirements that mandate AWS, and orgs that want Kubernetes-native Ray without a dedicated GPU cloud contract.

3. GCP with Ray on GKE

GKE Autopilot or Standard, A3 instances, officially supported KubeRay path

Google Cloud's A3 instance family uses H100 SXM5 GPUs. GKE has first-class KubeRay support, partly because of the Google-Anyscale partnership around Ray's development. The KubeRay operator is available as a GKE add-on, which simplifies installation.

GCP's H100 on-demand rate for A3 instances is approximately $3.85/hr per GPU ($30.80/hr for 8x). GKE Standard clusters cost around $0.10/hr per cluster plus node costs. For teams using Vertex AI for experiment tracking, BigQuery for data pipelines, or GCS for checkpoint storage, the GCP ecosystem integration is meaningful.

One practical advantage over AWS: GCP's TPU Multislice and A3 Mega instances (H100 with 1800 Gbps inter-node networking via Titanium offload) are available in some regions. For teams that run both GPU and TPU workloads, GKE as a single Kubernetes plane for both is a real operational simplification.

4. CoreWeave with KubeRay

Bare-metal GPU cloud, Kubernetes-native, InfiniBand available

CoreWeave is the other major bare-metal GPU cloud with Kubernetes-native deployment. Like Spheron, there is no managed orchestration markup. KubeRay runs natively on CoreWeave's Kubernetes clusters.

The main CoreWeave advantage for large Ray workloads is InfiniBand availability at scale. For multi-node training runs that exceed 8 GPUs and need full bisection bandwidth, CoreWeave's IB network fabric is production-ready. NCCL configs are accessible at the infrastructure level.

CoreWeave pricing is negotiated via contract rather than self-serve on-demand rates. For teams running sustained large-scale clusters (16+ H100 nodes), contract pricing can undercut on-demand rates significantly. For smaller teams or variable workloads, the lack of self-serve hourly access is a friction point.

5. RunPod with Ray Clusters

Self-install Ray on dedicated pods, per-second serverless for inference

RunPod supports two deployment modes: dedicated pods (per-hour billing, persistent storage, full GPU access) and serverless endpoints (per-second billing, scale-to-zero, no cluster management). For Ray workloads, dedicated pods are the relevant mode since serverless endpoints do not support long-running Ray clusters.

Install Ray on a RunPod H100 pod the same way you would on any Linux machine: pip install ray[default,train,serve], then start head and worker nodes over RunPod's private network. Per-hour H100 rates on RunPod are roughly $2.69/hr (the actual rate varies with availability and GPU model).

The main limitation: RunPod does not offer InfiniBand interconnect. For multi-node training beyond a single 8x H100 node where NCCL bandwidth matters, the NVLink-only setup may bottleneck on inter-node communication. Fine for single-node Ray workloads and Ray Tune sweeps; less ideal for large-scale distributed training.

6. Lambda Labs with Ray + SLURM

Bare-metal clusters, SLURM support, training-focused

Lambda Labs operates GPU clusters with SLURM job scheduling. Ray can run inside SLURM jobs by starting Ray head and worker processes from within a SLURM batch script. This hybrid approach lets you use SLURM for resource allocation and Ray for task-level orchestration within the allocated nodes.

Lambda's H100 on-demand rate is approximately $2.49/hr per GPU, which is among the lowest public on-demand rates for H100. Per-hour billing with no per-minute granularity means short jobs pay for the full hour.

H100 availability on Lambda can be limited, particularly for on-demand (non-reserved) access. Teams that need guaranteed cluster access for production training runs typically need a reserved instance contract. For batch training workloads with flexible scheduling, Lambda's SLURM clusters with Ray are a cost-effective option.

Python-native serverless, not Ray-native

Modal takes a different approach: instead of Ray's actor and remote function model, Modal uses Python decorators to define functions that run on managed GPU instances. You write @modal.function(gpu="H100") and Modal handles provisioning, scaling, and teardown.

This is a fundamentally different paradigm from Ray. Modal functions are stateless by default, which makes them a poor fit for long-running Ray actors, stateful Ray Serve deployments, or multi-step Ray Data pipelines. Cold starts apply because Modal scales to zero between invocations.

Where Modal works well: batch inference jobs where you can express the work as a Python function call, fine-tuning runs where you want to submit work without managing a cluster, and one-off training scripts where you do not want to think about cluster setup. The $3.95/hr effective H100 rate is for serverless compute, not a dedicated instance.

Teams that are not invested in Ray and want serverless-first semantics for similar workloads (training, inference, data processing) should evaluate Modal on its own terms rather than as a Ray replacement. Teams comparing managed-API options including Modal, Replicate, and per-replica platforms should also see our Baseten alternatives guide.

8. Together AI

Instant Clusters for fine-tuning, platform-managed compute

Together AI's Instant Clusters product provides on-demand multi-GPU fine-tuning at $3.49/hr per H100. The environment is managed: Together AI handles CUDA drivers, framework installation, and cluster provisioning. You submit a fine-tuning job via their API or UI.

The limitation is that Together AI does not provide raw GPU access. You cannot SSH into instances, run custom Ray clusters, or install arbitrary frameworks. The platform is purpose-built for fine-tuning open-weight LLMs (Llama, Mistral, Qwen families) and inference on their supported model catalog.

Fine for: teams running standard supervised fine-tuning or LoRA fine-tuning on supported model families without custom post-training frameworks. Not suitable for teams that need KubeRay, custom NCCL configurations, or RLHF pipelines that require direct process control.

9. Nebius AI Cloud

European GPU cloud, managed Kubernetes and Slurm, H100 clusters in EU regions

Nebius (a spin-off from Yandex infrastructure) operates GPU data centers in Europe. It offers managed Kubernetes and Slurm, on which you can deploy your own KubeRay setup. Nebius is not a first-party managed Ray product like Anyscale, but combined with EU data residency it can substitute for Anyscale for GDPR-bound teams.

For teams subject to GDPR or EU data localization requirements, Nebius H100 clusters in EU regions keep data within EU jurisdiction. Pricing is custom and negotiated rather than posted publicly.

Nebius is a smaller platform than the major cloud providers, so the ecosystem integrations (object storage, databases, monitoring) are less mature. For teams that specifically need EU-based GPU access with Kubernetes-native orchestration, Nebius is worth evaluating.

10. SkyPilot

Multi-cloud Ray orchestrator, not a cloud provider

SkyPilot is an open-source framework that abstracts GPU provisioning across multiple clouds. You write a sky.yaml task definition specifying the GPU type and job command, and SkyPilot picks whichever configured cloud has the cheapest available spot instance at that moment.

bash

sky launch -c my-cluster cluster.yaml

SkyPilot is Ray-compatible: it can start Ray head and worker nodes across cloud providers, letting you submit Ray jobs to multi-cloud clusters. The key use case is cost arbitrage, running training sweeps or batch inference jobs on whichever provider has H100 spot availability and the lowest price.

SkyPilot works on top of Spheron, AWS, GCP, Lambda, and others. It does not provide compute itself. Teams using SkyPilot still pay the underlying cloud's GPU rates. The value is in job scheduling, spot interruption handling, and automatic failover to the next cheapest provider. For Ray Tune hyperparameter sweeps where jobs are naturally parallel and checkpointed, SkyPilot with spot instances can cut costs significantly.

Workload-by-Workload Fit

Workload	Best fit	Why
Distributed pretraining (70B+)	Spheron, CoreWeave	InfiniBand, bare-metal, full NCCL control
RLHF/GRPO post-training	Spheron, CoreWeave	Multi-node H100/H200, custom policy+reward model setup
Hyperparameter sweeps (Ray Tune)	Spheron spot, SkyPilot	Spot pricing cuts sweep cost; Ray Tune runs natively
Batch inference (Ray Data)	Spheron, RunPod	Per-minute billing; no idle cost between batches
Online serving (Ray Serve)	Spheron, AWS EKS, GCP GKE	Autoscaling, always-on, SLA requirements
Fine-tuning workloads	Lambda, Together AI	Simpler setup if no custom framework needed

8x H100 Cluster Cost Comparison

Monthly cost for an 8x H100 cluster running 200 hours per month:

Provider	H100/hr	8x H100/hr	200-hr month cost	Notes
Anyscale (via AWS)	~$6 effective	~$48 effective	~$9,600	Platform markup included
Spheron (on-demand)	$4.34	$34.72	$6,944	Per-minute billing
AWS p5.48xlarge (8x H100)	~$4.00	~$32	~$6,400	On-demand, US-East
GCP a3-highgpu-8g (8x H100)	~$3.85	~$30.80	~$6,160	On-demand, US-Central
CoreWeave	Custom	Custom	Contact sales	InfiniBand available
RunPod (dedicated)	~$2.69	~$21.52	~$4,304	Per-hour
Lambda Labs	~$2.49	~$19.92	~$3,984	Per-hour

Pricing fluctuates based on GPU availability. The prices above are based on 06 May 2026 and may have changed. Check current GPU pricing for live rates.

Migration Playbook: Anyscale to Self-Hosted KubeRay

1. Export your Anyscale cluster environment

bash

# List existing cluster environments
anyscale cluster-env list

# Build and capture the environment definition
anyscale cluster-env build --name <env-name>

Capture: the base Docker image name, the pip requirements file, the Ray version, and the cluster YAML from your Anyscale project settings. This is your migration blueprint.

2. Install the KubeRay operator

bash

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.0

The operator installs into the default namespace. Verify with kubectl get pods | grep kuberay.

3. Write a RayCluster CRD

yaml

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: my-training-cluster
spec:
  rayVersion: '2.40.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "4"
              memory: "32Gi"
            limits:
              nvidia.com/gpu: "1"
  workerGroupSpecs:
  - replicas: 8
    minReplicas: 8
    maxReplicas: 8
    groupName: gpu-workers
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "16"
              memory: "128Gi"
            limits:
              nvidia.com/gpu: "8"

Replace the container image with your own image built from the Anyscale base image if you have custom dependencies. The apiVersion: ray.io/v1 is the stable API from KubeRay 1.x (Ray 2.30+).

4. Submit a job

bash

ray job submit --address http://<head-node-service-ip>:8265 -- python train.py

The Ray client API is identical to Anyscale. Your training and inference scripts require no changes.

5. Validate

bash

kubectl get raycluster
ray status

Check the Ray Dashboard at http://<head-node-service-ip>:8265. Verify GPU resources are visible: submit python -c "import ray; ray.init(); print(ray.cluster_resources())" and confirm GPU count matches your cluster spec.

Which Alternative Should You Choose?

If you...	Choose
Want lowest cost for training, can manage Ray yourself	Spheron or Lambda Labs
Need InfiniBand for multi-node 70B+ training	Spheron (H100 SXM5 with IB) or CoreWeave
Are AWS-native and want managed Kubernetes	Ray on EKS
Want multi-cloud cost arbitrage without rewriting jobs	SkyPilot on top of any GPU cloud
Run fine-tuning only, no custom framework needed	Together AI
Need EU data residency	Nebius
Want serverless batch with Python-native API	Modal

Teams running distributed training, RLHF, or Ray Serve workloads can cut compute costs significantly by moving off managed Ray platforms onto bare-metal GPUs. Spheron gives you full Ray flexibility - KubeRay, vanilla clusters, or SkyPilot - with no orchestration markup and per-minute billing.
Rent H100 on Spheron | View all GPU pricing

STEPS / 06

Quick Setup Guide

Export your Anyscale cluster configuration
From the Anyscale console or CLI, download your cluster environment spec: the pip dependencies, base Docker image, Ray version, and cluster YAML. Run `anyscale cluster-env build --name <env-name>` to capture the environment, and export the cluster config JSON from Settings. This becomes your migration blueprint.
Provision GPU instances on a bare-metal cloud
On Spheron or another GPU cloud, provision one head node (can be CPU-only or a small GPU instance) and one or more worker nodes matching the GPU type from your Anyscale cluster. Enable a private network between instances so Ray can communicate over a fixed IP range. Note the private IP of the head node.
Install Ray and your dependencies
On all nodes, install matching Ray and Python versions: `pip install ray==<version> ray[default] ray[train] ray[serve]`. Install your ML framework dependencies (PyTorch, vLLM, DeepSpeed, etc.) that match your Anyscale environment. If using Docker, build a container image from your Anyscale base image and push it to a registry.
Start the Ray cluster
On the head node: `ray start --head --port=6379 --dashboard-host=0.0.0.0 --num-gpus=<n> --block`. On each worker: `ray start --address=<head-private-ip>:6379 --num-gpus=<n> --block`. Run `ray status` from the head node to confirm all workers are registered. For KubeRay, apply a RayCluster CRD instead.
Submit your first job
Point your Ray client or job submission at the new cluster: `ray job submit --address http://<head-ip>:8265 -- python train.py`. For scripts using `ray.init()` with no address argument, set `RAY_ADDRESS=http://<head-ip>:8265` in the environment. The Ray API surface is identical between Anyscale and self-hosted clusters, so your training and inference scripts require no changes.
Validate and monitor
Access the Ray Dashboard at http://<head-ip>:8265 to view node utilization, job status, and actor logs. For KubeRay clusters, use `kubectl get raycluster` and `kubectl logs`. Set up a simple Ray remote function test before running production workloads: submit a job that returns `ray.cluster_resources()` and verify GPUs are visible.

FAQ / 05

Frequently Asked Questions

Anyscale adds a managed control plane on top of open-source Ray: cluster lifecycle management (auto-scaling, node replacement, head-node failover), a hosted Ray dashboard with job history, pre-built environment images for common ML frameworks, and a managed Ray Serve deployment layer for production inference. It also provides an RLHF-ready stack for post-training workloads. The core compute underneath is still standard cloud GPUs. Teams pay the platform markup in exchange for not managing Ray infrastructure themselves.

Yes. Open-source Ray is Apache-licensed and runs on any Linux machine with Python 3.8+. You provision GPU instances (one head node, one or more workers), install Ray with pip install ray, start the head with ray start --head --port=6379, connect workers with ray start --address=<head-ip>:6379 --num-gpus=<n>, and submit jobs with ray job submit. For Kubernetes-based clusters, KubeRay provides a Kubernetes operator that manages RayCluster, RayJob, and RayService custom resources. No Anyscale account or license required.

The cost difference depends on cluster size and utilization. Anyscale's managed Ray typically costs 1.5-2x more than bare-metal GPU rates for the same hardware because the platform markup is applied on top of compute costs. For an 8x H100 cluster running 200 hours per month, Anyscale's effective rate translates to $10,000-$14,000 vs roughly $6,944 on Spheron bare-metal at on-demand rates. Spot pricing on a cloud like Spheron can reduce that further for interruptible training jobs.

For RLHF and GRPO workloads, the key requirements are multi-node GPU clusters with high-bandwidth interconnect (NVLink or InfiniBand), the ability to run custom post-training frameworks like OpenRLHF or verl, and enough VRAM for the policy and reference models simultaneously. Bare-metal GPU cloud providers (Spheron, CoreWeave) with 8x H100 SXM5 nodes and InfiniBand are the best fit. Managed services like Together AI or Modal are less suited because they restrict cluster configuration and custom framework installation.

The migration involves three steps: (1) Export your Anyscale cluster environment by capturing the pip requirements, Docker image, and Ray version from your current cluster config. (2) Provision equivalent GPU instances on a bare-metal GPU cloud and deploy a Kubernetes cluster, then install the KubeRay operator via Helm. (3) Write a RayCluster custom resource YAML that mirrors your Anyscale cluster spec (head node resources, worker node pools, container image), and apply it with kubectl apply. Your Ray job scripts work without changes - the Ray client and job submission API are identical across Anyscale and self-hosted KubeRay.

What Anyscale Actually Offers

Why Teams Look for Alternatives

Managed orchestration tax

Multi-cloud lock-in

Custom GPU SKU access

Operational complexity vs control

Decision Framework

Quick Comparison: Anyscale vs 10 Alternatives

1. Spheron

2. AWS with Ray on EKS

3. GCP with Ray on GKE

4. CoreWeave with KubeRay

5. RunPod with Ray Clusters

6. Lambda Labs with Ray + SLURM

7. Modal Labs

8. Together AI

9. Nebius AI Cloud

10. SkyPilot

Workload-by-Workload Fit

8x H100 Cluster Cost Comparison

Migration Playbook: Anyscale to Self-Hosted KubeRay

1. Export your Anyscale cluster environment

2. Install the KubeRay operator

3. Write a RayCluster CRD

4. Submit a job

5. Validate

Which Alternative Should You Choose?

Quick Setup Guide

Export your Anyscale cluster configuration

Provision GPU instances on a bare-metal cloud

Install Ray and your dependencies

Start the Ray cluster

Submit your first job

Validate and monitor

Frequently Asked Questions

01What does Anyscale actually offer beyond open-source Ray?

02Can I run open-source Ray on a GPU cloud without Anyscale?

03How much cheaper is self-hosted Ray on a GPU cloud compared to Anyscale?

04Which Anyscale alternative is best for RLHF and GRPO post-training?

05How do I migrate an existing Anyscale cluster to self-hosted KubeRay?

Build what's next.