Alternatives

10 Anyscale Alternatives for Ray Training and KubeRay (2026)

Anyscale AlternativesAnyscale CompetitorsAnyscale Pricing AlternativeRay Cluster AlternativeSelf-Hosted Ray GPU CloudManaged Ray AlternativeKubeRay GPU CloudRay Serve AlternativeDistributed Training GPU CloudRLHF GPU CloudKubeRay Migration
10 Anyscale Alternatives for Ray Training and KubeRay (2026)

TL;DR: For self-hosted Ray with no markup: Spheron (bare-metal, KubeRay, per-minute billing). For AWS-native teams: Ray on EKS. For multi-cloud cost arbitrage: SkyPilot.

ProviderH100/hrRay SupportKubeRayBest For
Spheron$4.34KubeRay, vanilla RayYesMax flexibility, lowest markup
AWS (Ray on EKS)~$4.00KubeRay nativeYesAWS-native orgs
SkyPilotAny cloudRay-compatibleYesMulti-cloud cost arbitrage

Full 10-provider comparison with RLHF cluster fit and migration steps below.

Anyscale charges a platform markup on top of whatever cloud GPUs your cluster runs on. That markup typically adds 50-100% on top of bare-metal GPU rates. An 8x H100 cluster that costs around $34/hr on bare-metal translates to $48-$64/hr on Anyscale's effective rate, depending on your tier and configuration. For teams running training jobs that span hundreds of hours per month, that gap is substantial.

The more important thing to understand: Ray is Apache-licensed open-source software. Everything Anyscale builds on top of it, the cluster lifecycle management, managed Ray Serve, the hosted dashboard, the RLHF training stack, can be self-hosted on any GPU cloud. Teams with H100 SXM5 instances on Spheron can run the exact same Ray code they run on Anyscale today. The only difference is who manages the control plane.

The question is whether the operational savings justify the markup. For a five-person research team that treats Ray as an afterthought, Anyscale's managed ops might be worth it. For a team where Ray is central infrastructure and GPU cost is a meaningful line item, the math almost always favors self-hosting. Teams that want to rent GPUs directly and run their own Ray clusters can deploy bare-metal H100 nodes on Spheron in minutes. This post covers a decision framework and 10 alternatives with specific H100 cluster pricing and workload fit breakdowns.

What Anyscale Actually Offers

Anyscale is a managed platform that wraps open-source Ray. Understanding what it provides helps clarify what you are paying for and what you give up by moving off it.

Managed Ray clusters are the core product. Anyscale handles cluster lifecycle: auto-scaling node pools up and down, replacing failed nodes, managing head-node failover, and versioning cluster environments (the combination of Ray version, Python version, and ML framework dependencies that you pin per project). Without a managed platform, you handle all of this with scripts, monitoring, and manual intervention.

RLHF and post-training stack: Anyscale ships pre-built environment images for common post-training workflows, including OpenRLHF, alignment tuning, and GRPO training. These are essentially Docker images with curated dependency combinations that are known to work together. You can replicate this with your own container images, but the curated environments save setup time.

Ray Serve hosting: Anyscale's managed inference deployment layer runs on Ray Serve under the hood. You get auto-scaling, A/B traffic routing, and a Ray Serve dashboard without setting up a KubeRay operator or managing Ray head nodes yourself.

What Anyscale is NOT: it is not a GPU cloud. It runs on AWS, GCP, and Azure. You pay cloud GPU rates (AWS p4d, p5, GCP A3, Azure ND H100) plus the Anyscale platform markup on top. This means you also inherit the GPU SKU limitations of those providers. If AWS does not have H200 instances in the region you need, Anyscale does not either.

Why Teams Look for Alternatives

Managed orchestration tax

The effective per-hour cost on Anyscale for H100 compute is typically 1.5-2x over bare-metal rates. On AWS, the p5.48xlarge (8x H100 SXM5) works out to roughly $4.00/hr per GPU when you divide the bundle cost. Anyscale's platform markup brings the effective rate to roughly $5-8/hr per GPU depending on cluster size and tier. That math does not include data transfer, storage, or the AWS base cost, which Anyscale passes through.

For an 8x H100 cluster running 200 hours a month, the difference between Anyscale effective rates (~$48/hr) and bare-metal GPU cloud (~$34/hr) is over $2,800 per month on that one cluster alone.

Multi-cloud lock-in

Anyscale clusters run exclusively on AWS, GCP, and Azure. You cannot route workloads to specialty GPU providers with different GPU availability profiles. When AWS p5 instances are constrained in your region, your only option on Anyscale is to switch regions or wait. On a multi-cloud setup, you can shift to a provider with availability.

Custom GPU SKU access

Anyscale does not expose H200, B200, B300, GH200, or RTX PRO 6000 instances. Those GPUs are not widely available through Anyscale's partner cloud providers at scale. Custom post-training labs running RLHF on H200 or B200 clusters, where VRAM per GPU is the binding constraint, need direct bare-metal access that Anyscale cannot provide.

Operational complexity vs control

Some teams want full root access, the ability to set custom NCCL flags, custom InfiniBand topology configs, and the option to run non-Ray orchestrators (DeepSpeed with SLURM, FSDP with torchrun) alongside Ray on the same cluster. Managed platforms abstract away that layer, which is a genuine feature for teams that do not need it and a real constraint for teams that do.

Decision Framework

SituationRecommendation
Small team, Ray expertise is secondary, want managed opsAnyscale managed Ray
Ray-first team, need full GPU SKU access, cost-sensitiveSelf-hosted Ray on bare-metal GPU cloud (Spheron, CoreWeave)
Not Ray-native, workload is training-only or inference-onlyConsider Ray-free alternatives (Modal, Together AI, SkyPilot)

Quick Comparison: Anyscale vs 10 Alternatives

ProviderH100/hr8x H100/hrRay supportBillingBest for
Anyscale (via AWS)~$5-8 effective~$40-64 effectiveManaged (first-class)Per-hour + platform feeTeams wanting managed Ray ops
Spheron$4.34$34.72KubeRay, vanilla RayPer-minuteMax flexibility, lowest cost
AWS (Ray on EKS)~$4.00~$32KubeRay on EKSPer-hourAWS-native teams
GCP (Ray on GKE)~$3.85~$30.80KubeRay on GKEPer-hourGCP-native teams
CoreWeaveCustomCustomKubeRayPer-hourLarge-scale bare-metal clusters
RunPod~$2.69~$21.52Self-install RayPer-second (serverless) / per-hourMixed dedicated + serverless
Lambda Labs~$2.49~$19.92Ray + SLURMPer-hourTraining-focused teams
Modal~$3.95 effectiveN/A (serverless)Python-nativePer-secondServerless burst jobs
Together AI$3.49 (Instant Clusters)$27.92Platform-managedPer-hourFine-tuning workloads
NebiusCustomCustomKubeRay (self-deployed)Per-hourEU data residency
SkyPilotAny cloudAny cloudRay-compatibleDepends on cloudMulti-cloud Ray orchestration

GPU rates fetched 06 May 2026. Third-party rates are publicly listed on-demand prices as of 06 May 2026 and fluctuate. Anyscale effective rate includes platform markup over underlying cloud compute.

1. Spheron

Bare-metal GPU cloud, no orchestration markup, per-minute billing

Spheron gives you raw GPU instances with root access and no managed orchestration layer on top. For Ray workloads, that means you pay infrastructure cost only: no platform fee, no cluster management markup, no lock-in to a specific Ray version or environment image. The tradeoff is that you handle Ray cluster setup and lifecycle yourself, which is a few hours of work the first time and trivial ongoing maintenance for a team that already runs Ray.

How to run Ray on Spheron

The simplest setup is a vanilla Ray cluster with a head node and worker nodes over a private IP network. Provision your instances at app.spheron.ai, ensure they share a private subnet, then:

bash
# Head node
pip install "ray[default,train,serve]"
ray start --head --port=6379 --dashboard-host=0.0.0.0 --num-gpus=8 --block

# Worker nodes (repeat for each)
pip install "ray[default,train,serve]"
ray start --address=<HEAD_PRIVATE_IP>:6379 --num-gpus=8 --block

# Verify
ray status

For Kubernetes-based clusters, KubeRay runs natively on any Kubernetes deployment. Install the operator via Helm and apply a RayCluster CRD:

yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: my-training-cluster
spec:
  rayVersion: '2.40.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "4"
              memory: "32Gi"
            limits:
              nvidia.com/gpu: "1"
  workerGroupSpecs:
  - replicas: 8
    minReplicas: 8
    maxReplicas: 8
    groupName: gpu-workers
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "16"
              memory: "128Gi"
            limits:
              nvidia.com/gpu: "8"

You can also use SkyPilot targeting Spheron as the cloud provider for multi-cloud Ray job submission with cost arbitrage.

Pricing

Spheron H100 SXM5 runs at $4.34/hr per GPU on-demand. An 8x H100 cluster costs $34.72/hr on-demand. For a 200-hour training month, that is $6,944. Check live rates at any time since pricing updates with GPU availability. See the Spheron H100 instances page for current availability and spot pricing when offered.

Workload fit

Best for distributed pretraining, RLHF/GRPO post-training (policy + reference model setups require 8x H100 SXM5 nodes with InfiniBand), Ray Tune hyperparameter sweeps where per-minute billing reduces idle cost, and Ray Serve deployments that need full control over replica placement and NCCL flags. For the Ray Serve setup specifics, see the Ray Serve setup guide and multi-node training guide.

For SSH key setup and GPU instance provisioning steps, see the Spheron documentation.


2. AWS with Ray on EKS

Managed Kubernetes, AWS GPU instances, KubeRay-native

AWS Elastic Kubernetes Service with KubeRay is the most common enterprise path for teams that already run AWS infrastructure. P4d.24xlarge instances provide 8x A100 80GB; p5.48xlarge provides 8x H100 SXM5 with 3200 Gbps EFA network interconnect.

The KubeRay Helm chart deploys the operator onto an EKS cluster in under 10 minutes. From there, RayCluster, RayJob, and RayService CRDs work the same as anywhere else. The AWS-native advantage is ecosystem integration: S3 for checkpoints, ECR for container images, CloudWatch for metrics, and IAM for fine-grained access control.

H100 on-demand on AWS (p5.48xlarge) works out to roughly $4.00/hr per GPU ($32/hr for 8x). That is higher than bare-metal GPU clouds but lower than Anyscale's effective rate. EKS cluster management adds $0.10/hr per cluster plus node overhead.

Good fit for: teams with existing AWS infrastructure, compliance requirements that mandate AWS, and orgs that want Kubernetes-native Ray without a dedicated GPU cloud contract.


3. GCP with Ray on GKE

GKE Autopilot or Standard, A3 instances, officially supported KubeRay path

Google Cloud's A3 instance family uses H100 SXM5 GPUs. GKE has first-class KubeRay support, partly because of the Google-Anyscale partnership around Ray's development. The KubeRay operator is available as a GKE add-on, which simplifies installation.

GCP's H100 on-demand rate for A3 instances is approximately $3.85/hr per GPU ($30.80/hr for 8x). GKE Standard clusters cost around $0.10/hr per cluster plus node costs. For teams using Vertex AI for experiment tracking, BigQuery for data pipelines, or GCS for checkpoint storage, the GCP ecosystem integration is meaningful.

One practical advantage over AWS: GCP's TPU Multislice and A3 Mega instances (H100 with 1800 Gbps inter-node networking via Titanium offload) are available in some regions. For teams that run both GPU and TPU workloads, GKE as a single Kubernetes plane for both is a real operational simplification.


4. CoreWeave with KubeRay

Bare-metal GPU cloud, Kubernetes-native, InfiniBand available

CoreWeave is the other major bare-metal GPU cloud with Kubernetes-native deployment. Like Spheron, there is no managed orchestration markup. KubeRay runs natively on CoreWeave's Kubernetes clusters.

The main CoreWeave advantage for large Ray workloads is InfiniBand availability at scale. For multi-node training runs that exceed 8 GPUs and need full bisection bandwidth, CoreWeave's IB network fabric is production-ready. NCCL configs are accessible at the infrastructure level.

CoreWeave pricing is negotiated via contract rather than self-serve on-demand rates. For teams running sustained large-scale clusters (16+ H100 nodes), contract pricing can undercut on-demand rates significantly. For smaller teams or variable workloads, the lack of self-serve hourly access is a friction point.


5. RunPod with Ray Clusters

Self-install Ray on dedicated pods, per-second serverless for inference

RunPod supports two deployment modes: dedicated pods (per-hour billing, persistent storage, full GPU access) and serverless endpoints (per-second billing, scale-to-zero, no cluster management). For Ray workloads, dedicated pods are the relevant mode since serverless endpoints do not support long-running Ray clusters.

Install Ray on a RunPod H100 pod the same way you would on any Linux machine: pip install ray[default,train,serve], then start head and worker nodes over RunPod's private network. Per-hour H100 rates on RunPod are roughly $2.69/hr (the actual rate varies with availability and GPU model).

The main limitation: RunPod does not offer InfiniBand interconnect. For multi-node training beyond a single 8x H100 node where NCCL bandwidth matters, the NVLink-only setup may bottleneck on inter-node communication. Fine for single-node Ray workloads and Ray Tune sweeps; less ideal for large-scale distributed training.


6. Lambda Labs with Ray + SLURM

Bare-metal clusters, SLURM support, training-focused

Lambda Labs operates GPU clusters with SLURM job scheduling. Ray can run inside SLURM jobs by starting Ray head and worker processes from within a SLURM batch script. This hybrid approach lets you use SLURM for resource allocation and Ray for task-level orchestration within the allocated nodes.

Lambda's H100 on-demand rate is approximately $2.49/hr per GPU, which is among the lowest public on-demand rates for H100. Per-hour billing with no per-minute granularity means short jobs pay for the full hour.

H100 availability on Lambda can be limited, particularly for on-demand (non-reserved) access. Teams that need guaranteed cluster access for production training runs typically need a reserved instance contract. For batch training workloads with flexible scheduling, Lambda's SLURM clusters with Ray are a cost-effective option.


7. Modal Labs

Python-native serverless, not Ray-native

Modal takes a different approach: instead of Ray's actor and remote function model, Modal uses Python decorators to define functions that run on managed GPU instances. You write @modal.function(gpu="H100") and Modal handles provisioning, scaling, and teardown.

This is a fundamentally different paradigm from Ray. Modal functions are stateless by default, which makes them a poor fit for long-running Ray actors, stateful Ray Serve deployments, or multi-step Ray Data pipelines. Cold starts apply because Modal scales to zero between invocations.

Where Modal works well: batch inference jobs where you can express the work as a Python function call, fine-tuning runs where you want to submit work without managing a cluster, and one-off training scripts where you do not want to think about cluster setup. The $3.95/hr effective H100 rate is for serverless compute, not a dedicated instance.

Teams that are not invested in Ray and want serverless-first semantics for similar workloads (training, inference, data processing) should evaluate Modal on its own terms rather than as a Ray replacement. Teams comparing managed-API options including Modal, Replicate, and per-replica platforms should also see our Baseten alternatives guide.


8. Together AI

Instant Clusters for fine-tuning, platform-managed compute

Together AI's Instant Clusters product provides on-demand multi-GPU fine-tuning at $3.49/hr per H100. The environment is managed: Together AI handles CUDA drivers, framework installation, and cluster provisioning. You submit a fine-tuning job via their API or UI.

The limitation is that Together AI does not provide raw GPU access. You cannot SSH into instances, run custom Ray clusters, or install arbitrary frameworks. The platform is purpose-built for fine-tuning open-weight LLMs (Llama, Mistral, Qwen families) and inference on their supported model catalog.

Fine for: teams running standard supervised fine-tuning or LoRA fine-tuning on supported model families without custom post-training frameworks. Not suitable for teams that need KubeRay, custom NCCL configurations, or RLHF pipelines that require direct process control.


9. Nebius AI Cloud

European GPU cloud, managed Kubernetes and Slurm, H100 clusters in EU regions

Nebius (a spin-off from Yandex infrastructure) operates GPU data centers in Europe. It offers managed Kubernetes and Slurm, on which you can deploy your own KubeRay setup. Nebius is not a first-party managed Ray product like Anyscale, but combined with EU data residency it can substitute for Anyscale for GDPR-bound teams.

For teams subject to GDPR or EU data localization requirements, Nebius H100 clusters in EU regions keep data within EU jurisdiction. Pricing is custom and negotiated rather than posted publicly.

Nebius is a smaller platform than the major cloud providers, so the ecosystem integrations (object storage, databases, monitoring) are less mature. For teams that specifically need EU-based GPU access with Kubernetes-native orchestration, Nebius is worth evaluating.


10. SkyPilot

Multi-cloud Ray orchestrator, not a cloud provider

SkyPilot is an open-source framework that abstracts GPU provisioning across multiple clouds. You write a sky.yaml task definition specifying the GPU type and job command, and SkyPilot picks whichever configured cloud has the cheapest available spot instance at that moment.

bash
sky launch -c my-cluster cluster.yaml

SkyPilot is Ray-compatible: it can start Ray head and worker nodes across cloud providers, letting you submit Ray jobs to multi-cloud clusters. The key use case is cost arbitrage, running training sweeps or batch inference jobs on whichever provider has H100 spot availability and the lowest price.

SkyPilot works on top of Spheron, AWS, GCP, Lambda, and others. It does not provide compute itself. Teams using SkyPilot still pay the underlying cloud's GPU rates. The value is in job scheduling, spot interruption handling, and automatic failover to the next cheapest provider. For Ray Tune hyperparameter sweeps where jobs are naturally parallel and checkpointed, SkyPilot with spot instances can cut costs significantly.


Workload-by-Workload Fit

WorkloadBest fitWhy
Distributed pretraining (70B+)Spheron, CoreWeaveInfiniBand, bare-metal, full NCCL control
RLHF/GRPO post-trainingSpheron, CoreWeaveMulti-node H100/H200, custom policy+reward model setup
Hyperparameter sweeps (Ray Tune)Spheron spot, SkyPilotSpot pricing cuts sweep cost; Ray Tune runs natively
Batch inference (Ray Data)Spheron, RunPodPer-minute billing; no idle cost between batches
Online serving (Ray Serve)Spheron, AWS EKS, GCP GKEAutoscaling, always-on, SLA requirements
Fine-tuning workloadsLambda, Together AISimpler setup if no custom framework needed

8x H100 Cluster Cost Comparison

Monthly cost for an 8x H100 cluster running 200 hours per month:

ProviderH100/hr8x H100/hr200-hr month costNotes
Anyscale (via AWS)~$6 effective~$48 effective~$9,600Platform markup included
Spheron (on-demand)$4.34$34.72$6,944Per-minute billing
AWS p5.48xlarge (8x H100)~$4.00~$32~$6,400On-demand, US-East
GCP a3-highgpu-8g (8x H100)~$3.85~$30.80~$6,160On-demand, US-Central
CoreWeaveCustomCustomContact salesInfiniBand available
RunPod (dedicated)~$2.69~$21.52~$4,304Per-hour
Lambda Labs~$2.49~$19.92~$3,984Per-hour

Pricing fluctuates based on GPU availability. The prices above are based on 06 May 2026 and may have changed. Check current GPU pricing for live rates.


Migration Playbook: Anyscale to Self-Hosted KubeRay

1. Export your Anyscale cluster environment

bash
# List existing cluster environments
anyscale cluster-env list

# Build and capture the environment definition
anyscale cluster-env build --name <env-name>

Capture: the base Docker image name, the pip requirements file, the Ray version, and the cluster YAML from your Anyscale project settings. This is your migration blueprint.

2. Install the KubeRay operator

bash
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.0

The operator installs into the default namespace. Verify with kubectl get pods | grep kuberay.

3. Write a RayCluster CRD

yaml
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: my-training-cluster
spec:
  rayVersion: '2.40.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "4"
              memory: "32Gi"
            limits:
              nvidia.com/gpu: "1"
  workerGroupSpecs:
  - replicas: 8
    minReplicas: 8
    maxReplicas: 8
    groupName: gpu-workers
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.40.0-gpu
          resources:
            requests:
              cpu: "16"
              memory: "128Gi"
            limits:
              nvidia.com/gpu: "8"

Replace the container image with your own image built from the Anyscale base image if you have custom dependencies. The apiVersion: ray.io/v1 is the stable API from KubeRay 1.x (Ray 2.30+).

4. Submit a job

bash
ray job submit --address http://<head-node-service-ip>:8265 -- python train.py

The Ray client API is identical to Anyscale. Your training and inference scripts require no changes.

5. Validate

bash
kubectl get raycluster
ray status

Check the Ray Dashboard at http://<head-node-service-ip>:8265. Verify GPU resources are visible: submit python -c "import ray; ray.init(); print(ray.cluster_resources())" and confirm GPU count matches your cluster spec.


Which Alternative Should You Choose?

If you...Choose
Want lowest cost for training, can manage Ray yourselfSpheron or Lambda Labs
Need InfiniBand for multi-node 70B+ trainingSpheron (H100 SXM5 with IB) or CoreWeave
Are AWS-native and want managed KubernetesRay on EKS
Want multi-cloud cost arbitrage without rewriting jobsSkyPilot on top of any GPU cloud
Run fine-tuning only, no custom framework neededTogether AI
Need EU data residencyNebius
Want serverless batch with Python-native APIModal

Teams running distributed training, RLHF, or Ray Serve workloads can cut compute costs significantly by moving off managed Ray platforms onto bare-metal GPUs. Spheron gives you full Ray flexibility - KubeRay, vanilla clusters, or SkyPilot - with no orchestration markup and per-minute billing.

Rent H100 on Spheron | View all GPU pricing

STEPS / 06

Quick Setup Guide

  1. Export your Anyscale cluster configuration

    From the Anyscale console or CLI, download your cluster environment spec: the pip dependencies, base Docker image, Ray version, and cluster YAML. Run `anyscale cluster-env build --name <env-name>` to capture the environment, and export the cluster config JSON from Settings. This becomes your migration blueprint.

  2. Provision GPU instances on a bare-metal cloud

    On Spheron or another GPU cloud, provision one head node (can be CPU-only or a small GPU instance) and one or more worker nodes matching the GPU type from your Anyscale cluster. Enable a private network between instances so Ray can communicate over a fixed IP range. Note the private IP of the head node.

  3. Install Ray and your dependencies

    On all nodes, install matching Ray and Python versions: `pip install ray==<version> ray[default] ray[train] ray[serve]`. Install your ML framework dependencies (PyTorch, vLLM, DeepSpeed, etc.) that match your Anyscale environment. If using Docker, build a container image from your Anyscale base image and push it to a registry.

  4. Start the Ray cluster

    On the head node: `ray start --head --port=6379 --dashboard-host=0.0.0.0 --num-gpus=<n> --block`. On each worker: `ray start --address=<head-private-ip>:6379 --num-gpus=<n> --block`. Run `ray status` from the head node to confirm all workers are registered. For KubeRay, apply a RayCluster CRD instead.

  5. Submit your first job

    Point your Ray client or job submission at the new cluster: `ray job submit --address http://<head-ip>:8265 -- python train.py`. For scripts using `ray.init()` with no address argument, set `RAY_ADDRESS=http://<head-ip>:8265` in the environment. The Ray API surface is identical between Anyscale and self-hosted clusters, so your training and inference scripts require no changes.

  6. Validate and monitor

    Access the Ray Dashboard at http://<head-ip>:8265 to view node utilization, job status, and actor logs. For KubeRay clusters, use `kubectl get raycluster` and `kubectl logs`. Set up a simple Ray remote function test before running production workloads: submit a job that returns `ray.cluster_resources()` and verify GPUs are visible.

FAQ / 05

Frequently Asked Questions

Anyscale adds a managed control plane on top of open-source Ray: cluster lifecycle management (auto-scaling, node replacement, head-node failover), a hosted Ray dashboard with job history, pre-built environment images for common ML frameworks, and a managed Ray Serve deployment layer for production inference. It also provides an RLHF-ready stack for post-training workloads. The core compute underneath is still standard cloud GPUs. Teams pay the platform markup in exchange for not managing Ray infrastructure themselves.

Yes. Open-source Ray is Apache-licensed and runs on any Linux machine with Python 3.8+. You provision GPU instances (one head node, one or more workers), install Ray with pip install ray, start the head with ray start --head --port=6379, connect workers with ray start --address=<head-ip>:6379 --num-gpus=<n>, and submit jobs with ray job submit. For Kubernetes-based clusters, KubeRay provides a Kubernetes operator that manages RayCluster, RayJob, and RayService custom resources. No Anyscale account or license required.

The cost difference depends on cluster size and utilization. Anyscale's managed Ray typically costs 1.5-2x more than bare-metal GPU rates for the same hardware because the platform markup is applied on top of compute costs. For an 8x H100 cluster running 200 hours per month, Anyscale's effective rate translates to $10,000-$14,000 vs roughly $6,944 on Spheron bare-metal at on-demand rates. Spot pricing on a cloud like Spheron can reduce that further for interruptible training jobs.

For RLHF and GRPO workloads, the key requirements are multi-node GPU clusters with high-bandwidth interconnect (NVLink or InfiniBand), the ability to run custom post-training frameworks like OpenRLHF or verl, and enough VRAM for the policy and reference models simultaneously. Bare-metal GPU cloud providers (Spheron, CoreWeave) with 8x H100 SXM5 nodes and InfiniBand are the best fit. Managed services like Together AI or Modal are less suited because they restrict cluster configuration and custom framework installation.

The migration involves three steps: (1) Export your Anyscale cluster environment by capturing the pip requirements, Docker image, and Ray version from your current cluster config. (2) Provision equivalent GPU instances on a bare-metal GPU cloud and deploy a Kubernetes cluster, then install the KubeRay operator via Helm. (3) Write a RayCluster custom resource YAML that mirrors your Anyscale cluster spec (head node resources, worker node pools, container image), and apply it with kubectl apply. Your Ray job scripts work without changes - the Ray client and job submission API are identical across Anyscale and self-hosted KubeRay.

Build what's next.

The most cost-effective platform for building, training, and scaling machine learning models-ready when you are.