Wan 2.1 at 720p takes 10–12 minutes on an H100 and requires 65–80GB VRAM. Runway Gen-3 Alpha Turbo costs approximately $0.25 for a 5-second clip; Gen-3 Alpha costs approximately $0.50. Both run in your browser. Both are real options for AI video generation in 2026, and the right choice depends entirely on your volume, budget, and tolerance for infrastructure complexity.
This guide helps you choose between open-source models and cloud APIs based on your specific situation. It covers Wan 2.1/2.2, HunyuanVideo, LTX-2.3, CogVideoX-1.5-5B, and Runway, with GPU requirements, quality rankings, and cost breakdowns for each.
Why AI Video Generation Needs More GPU Than Image Generation
A 512×512 image with SDXL generates in 3–5 seconds on an H100 using 8–12GB of VRAM. A 5-second 720p video with Wan 2.1 takes 10–12 minutes on the same hardware and needs 65–80GB. That's not a linear difference.
Video generation is fundamentally more expensive because the model must maintain temporal consistency across dozens or hundreds of frames simultaneously. The attention mechanism scales quadratically with token count, and token count scales with both frame count and resolution. Going from 480p to 720p doesn't just double the pixels, it increases the attention matrix by roughly 2–3x.
Clip length makes this worse. A 10-second Wan 2.1 clip at 720p often pushes past the 80GB H100 capacity. The 5-second version uses 65–70GB comfortably. This is why consumers with RTX 4090s (24GB) or RTX 5090s (32GB) cannot run the top-tier models at all.
| Content Type | Model | H100 Gen Time | VRAM Needed |
|---|---|---|---|
| 512×512 image | SDXL | 3–5 seconds | 8–12GB |
| 1024×1024 image | Flux.1 Dev | 10–20 seconds | 20–25GB |
| 5s 480p video | Wan 2.1 | ~4 minutes | ~40–48GB |
| 5s 720p video | Wan 2.1 | ~10–12 minutes | ~65–80GB |
| 5s 720p video | HunyuanVideo | ~20 minutes | ~60–80GB |
For the full breakdown of why video models are so VRAM-hungry, see GPU cloud for video AI 2026.
The Four Open-Source Video Models Worth Using
Wan 2.1 and 2.2
Released by Alibaba in February 2025 (Wan 2.1) and July 2025 (Wan 2.2), this series is the most deployed open-source video model for production use. The 14B transformer generates broadcast-quality output at 480p and 720p.
Wan 2.2 uses a Mixture-of-Experts architecture with the same VRAM footprint as 2.1, but trained on significantly more image and video data than Wan 2.1. If you're starting a new deployment, use Wan 2.2 weights. Existing Wan 2.1 setups run Wan 2.2 with a weight swap, no infrastructure changes.
- Min VRAM: ~40–48GB at 480p (fp8 quantization), 65–80GB at 720p
- Minimum GPU: H100 PCIe (80GB) for 720p
- Quality: High. Best motion coherence and instruction following in its cost tier.
- Best for: Production pipelines where cost-per-clip matters. The economics are better than HunyuanVideo for high-volume use.
For full deployment instructions, see Deploy Wan 2.1/2.2 on GPU Cloud.
HunyuanVideo
Tencent's HunyuanVideo (13B parameters) benchmarks ahead of Wan 2.2 on motion realism and scene coherence. It's the right choice when quality is the primary metric and generation time is secondary.
The original 13B model requires 60–80GB VRAM at 720p, with OOM risk on exactly-80GB H100 hardware due to VRAM spikes during generation. The H200 (141GB) is the recommended minimum for reliable production use.
HunyuanVideo 1.5 (released November 2025) is a separate 8.3B model that runs on consumer GPUs with 14GB+ VRAM. It's a different product from the original. If someone says "HunyuanVideo runs on 14GB," they mean version 1.5, not the full-quality original.
- Min VRAM: ~60–80GB at 720p (80GB tight with OOM risk), ~100–120GB at 1080p
- Minimum GPU: H100 PCIe (risky at 720p), H200 SXM (recommended)
- Quality: Highest single-GPU quality for motion realism as of March 2026.
- Best for: High-end video production where output quality justifies the cost.
LTX-2.3
From Lightricks, released March 5, 2026. The current version in the LTX series supports native 4K at 50 FPS. The official minimum is 32GB VRAM (for up to 720p; 48GB+ required for native 4K at full precision, since the full 22B model at FP16 needs ~44GB for weights alone). At 720p with fp8 quantization, it runs on 12–24GB.
LTX-2.3 is a substantially rebuilt model from the earlier LTX-Video series (2B and 13B variants, 2024–early 2025). The new architecture includes a rebuilt VAE, a 4x larger text connector, and native audio generation. If you're evaluating LTX for a new project, use LTX-2.3 specifically.
- Min VRAM: 12–24GB at 720p (fp8 quant), 24–32GB at 720p full quality
- Minimum GPU: RTX 4090 (24GB) or RTX 5090 (32GB)
- Quality: Good. Surprisingly strong for a model that runs on consumer hardware. Not at Wan 2.2 or HunyuanVideo's level for motion coherence.
- Best for: Teams that want open-source video AI without committing to H100-tier hardware.
CogVideoX-1.5-5B
Released by Zhipu AI in November 2024. The 5B parameter model runs 10-second near-720p clips (native output is 1360×768, i.e. 768p) on 24–32GB VRAM, with 8-bit quantization reducing this further to ~16GB. The smaller model size means faster generation than Wan 2.1 but noticeably lower quality on complex scenes.
- Min VRAM: 24–32GB at 720p (16GB with 8-bit quantization)
- Minimum GPU: RTX 4090 (24GB)
- Quality: Good for its hardware tier. Motion stability is the weak point on longer clips.
- Best for: Consumer GPU users who need 10-second clips and can accept quality tradeoffs.
Master Comparison Table
| Model | Min VRAM | Best GPU | Gen Time (5s 720p, Best GPU) | Quality Tier | Cost/clip (Best GPU) |
|---|---|---|---|---|---|
| Wan 2.1/2.2 | 65–80GB | H100 SXM5 | ~10–12 min | High | ~$0.40–0.48 |
| HunyuanVideo | 60–80GB | H200 SXM | ~12–18 min | Highest | ~$0.74–1.11 (H200) |
| LTX-2.3 | 24–32GB | RTX 5090 | ~5–8 min | Good | ~$0.06–0.10 |
| CogVideoX-1.5-5B | 24–32GB | RTX 4090 | ~8–12 min | Moderate | ~$0.07–0.10 |
Costs calculated at Spheron on-demand prices. GPU pricing fluctuates based on availability. Check current GPU pricing for live rates.
Runway vs Open-Source: When to Use Each
Runway (Gen-3 Alpha and Gen-4) is a commercial cloud API for AI video generation. You write a prompt, optionally upload a reference image, and get back a polished video clip in under a minute. No GPU, no setup, no VRAM budgets.
Runway pricing structure: Credit-based pricing on subscription plans. Approximately $0.25 for a 5-second clip on Gen-3 Alpha Turbo, or $0.50 on Gen-3 Alpha. A 10-second clip costs ~$0.50–1.00 on equivalent plans. There are no infrastructure costs, but you have limited control over the generation process and cannot fine-tune the model on your own data.
Open-source on cloud GPU: Full control over the model, weights, and pipeline. Higher complexity. You manage the GPU, the software stack, and the generation queue. Costs scale with GPU hours consumed, not output length.
| Dimension | Runway | Open-Source (Wan 2.1/HunyuanVideo) |
|---|---|---|
| Cost per 5s clip | ~$0.25–0.50 | $0.40–1.11 (on-demand H100/H200) |
| Cost at 1,000 clips/day | ~$250/day | ~$440/day (on-demand H100 SXM5) |
| Setup time | Minutes | Hours to days |
| Minimum VRAM | None | 60–80GB |
| Output control | Limited | Full |
| Fine-tuning | No | Yes |
| Quality ceiling | High | Higher (HunyuanVideo) |
| Infrastructure complexity | None | Moderate to high |
The right answer is not always open-source. Runway makes sense for:
- Teams with no GPU infrastructure experience
- Low-volume use (under a few hundred clips per day)
- Projects where generation speed matters more than cost
Open-source makes sense for:
- High-volume pipelines where per-clip economics matter
- Projects that need fine-tuning or custom model behavior
- Teams with existing GPU infrastructure or GPU cloud access
GPU Requirements Per Model
The key threshold is 60–80GB VRAM. Consumer GPUs (RTX 4090 at 24GB, RTX 5090 at 32GB) cannot run Wan 2.1 or HunyuanVideo at production settings. For a detailed breakdown of consumer vs datacenter GPU tradeoffs for AI workloads, see best NVIDIA GPUs for LLMs. These models require datacenter hardware.
| Model | Min VRAM | Recommended GPU | Notes |
|---|---|---|---|
| Wan 2.1 (480p) | ~40–48GB | H100 PCIe | With fp8 quantization |
| Wan 2.1 (720p) | ~65–80GB | H100 SXM5 | Tight on 80GB; H200 for 10s clips |
| HunyuanVideo (720p) | ~60–80GB | H200 SXM | H100 possible but OOM risk |
| HunyuanVideo (1080p) | ~100–120GB | H200 SXM | Community-tested |
| LTX-2.3 (720p) | 24–32GB | RTX 4090 (fp8) or RTX 5090 | fp8 quant required below 32GB; 48GB+ for 4K |
| CogVideoX-1.5-5B (720p) | 24–32GB | RTX 4090 | 8-bit reduces to ~16GB |
| AnimateDiff v3 (512p) | ~18–24GB | RTX 5090 | Short clips, limited motion |
LTX-2.3 and CogVideoX-1.5-5B are the only models in this list that don't require datacenter hardware. If you're constrained to consumer GPUs, these are your options.
For detailed per-model VRAM breakdowns and quantization options, see GPU cloud for video AI 2026.
Quality vs Speed vs Cost: The Real Tradeoffs
Quality rankings here are based on community benchmarks and testing as of March 2026, focusing on motion coherence (does movement look natural?), prompt adherence (does the video match the description?), and structural stability (do objects maintain consistent appearance?).
| Model | Quality (1–5) | Speed (clips/hr, Best GPU) | Cost/min output | Practical Rating |
|---|---|---|---|---|
| HunyuanVideo | 5 | ~3.3–5 clips/hr (H200) | ~$8.86–13.28/min | Best quality, high cost |
| Wan 2.2 | 4.5 | ~5 clips/hr | ~$4.80–5.76/min | Best balance for production |
| Wan 2.1 | 4 | ~5 clips/hr | ~$4.80–5.76/min | Strong; Wan 2.2 preferred |
| LTX-2.3 | 3.5 | ~7.5–12 clips/hr (RTX 5090) | ~$0.76–1.22/min | Best value on cheaper hardware |
| CogVideoX-1.5-5B | 3 | ~5–7 clips/hr (RTX 4090) | ~$0.89–1.25/min | OK; 10s clips are unique strength |
Cost figures are based on Spheron on-demand prices. Cost per minute of output = (gen_time / clip_duration) x hourly_rate.
Three concrete tradeoffs worth knowing:
HunyuanVideo vs Wan 2.1 at 480p: HunyuanVideo costs roughly 3–4x more per clip. On motion realism and scene coherence, the quality difference is visible in side-by-side comparisons. On structural stability and prompt adherence, the gap is smaller.
Wan 2.1 at 480p vs 720p: The quality jump is real. 720p reveals detail in hair, fabric, and background objects that 480p smooths over. But the cost doubles and generation time triples. For draft review, always generate at 480p first.
Multi-GPU scaling: All four open-source models scale linearly across multiple GPUs. Each GPU handles one generation job independently, with no inter-GPU communication (no NVLink required). Four H100s run four concurrent jobs, throughput scales linearly. For throughput benchmarks across GPU types, see GPU cloud benchmarks.
Cloud GPU Recommendations for Video AI
Entry and Testing: RTX 5090 (32GB, ~$0.76/hr on-demand)
The RTX 5090 supports AnimateDiff, Mochi 1, LTX-2.3 at 720p (fp8 quantized), and CogVideoX-1.5-5B. It cannot run Wan 2.1 or HunyuanVideo at any useful quality setting. Use it to evaluate video AI pipelines and test prompt strategies before committing to H100-tier costs.
Rent an RTX 5090 on Spheron.
Production Standard: H100 SXM5 (80GB, ~$2.40/hr on-demand)
The H100 SXM5 is the recommended starting point for Wan 2.1 production workloads. It handles 480–720p Wan 2.1 generation, LTX-2.3 at full quality, and CogVideoX-1.5-5B. HunyuanVideo at 720p is technically possible on H100 PCIe (80GB) but carries OOM risk during generation spikes. The SXM5 variant's higher memory bandwidth (3.35 TB/s vs 2 TB/s PCIe) reduces generation times for video workloads.
Rent an H100 on Spheron.
Production High-Quality: H200 SXM (141GB, ~$3.69/hr on-demand)
The H200 runs all current video models without VRAM pressure. HunyuanVideo at 1080p, Wan 2.1 10-second clips at 720p, any workload where OOM errors are unacceptable. The 141GB HBM3e at 4.8 TB/s gives meaningful speed advantages over H100 for HunyuanVideo's long generation times.
Rent an H200 on Spheron.
Summary Table
| GPU | VRAM | Supported Models | On-Demand | Spot | Verdict |
|---|---|---|---|---|---|
| RTX 5090 | 32GB | LTX-2.3, CogVideoX, AnimateDiff | ~$0.76/hr | N/A | Testing and cheap models only |
| H100 PCIe | 80GB | Wan 2.1 (480–720p), CogVideoX | ~$2.01/hr | N/A | Good for Wan at lower cost |
| H100 SXM5 | 80GB | Wan 2.1 (720p), LTX-2.3 (full) | ~$2.40/hr | N/A | Recommended for Wan production |
| H200 SXM | 141GB | All models, HunyuanVideo 1080p | ~$3.69/hr | ~$1.43/hr | Best for quality-first workloads |
Pricing as of 24 Mar 2026. GPU pricing fluctuates based on availability. Check current GPU pricing for live rates.
Cost Per Minute of Generated Video
Cost per minute of output video is more useful than cost per clip when comparing models across different clip lengths. The formula: (gen_time_minutes / clip_duration_seconds × 60) × (hourly_rate / 60).
Wan 2.1 at 480p on H100 SXM5 (~$2.40/hr):
- Gen time: ~4 minutes for a 5-second clip
- Cost per clip: ~$0.16
- Cost per minute of output: (4/5 × 60) × $2.40/60 = ~$1.92/minute of video
Wan 2.1 at 720p on H100 SXM5 (~$2.40/hr):
- Gen time: ~11 minutes for a 5-second clip
- Cost per clip: ~$0.44
- Cost per minute of output: (11/5 × 60) × $2.40/60 = ~$5.28/minute of video
HunyuanVideo at 720p on H200 SXM (~$3.69/hr on-demand, ~$1.43/hr spot):
- Gen time: ~15 minutes for a 5-second clip
- Cost per clip (on-demand): ~$0.92
- Cost per clip (spot): ~$0.36 — roughly 60% cheaper, making it competitive with Wan 2.1 on-demand on H100
- Cost per minute of output (on-demand): (15/5 × 60) × $3.69/60 = ~$11.07/minute of video
At-scale projection for Wan 2.1 720p: 1,000 clips per day at 720p = approximately 11,000 GPU-minutes = ~183 GPU-hours per day. At H100 SXM5 on-demand (~$2.40/hr): ~$440/day.
| Model | Resolution | Gen Time | GPU Cost | Cost/clip | Cost/min output |
|---|---|---|---|---|---|
| AnimateDiff v3 | 512×512 | ~3–5 min | $0.76/hr (RTX 5090) | ~$0.04–0.06 | ~$0.46–0.76/min |
| Wan 2.1 | 832×480 | ~4–5 min | $2.40/hr (H100 SXM5) | ~$0.16–0.20 | ~$1.92–2.40/min |
| Wan 2.1 | 1280×720 | ~10–12 min | $2.40/hr (H100 SXM5) | ~$0.40–0.48 | ~$4.80–5.76/min |
| HunyuanVideo | 720p | ~12–18 min | $3.69/hr (H200 SXM) | ~$0.74–1.11 | ~$8.86–13.28/min |
| LTX-2.3 | 720p | ~5–8 min | $0.76/hr (RTX 5090) | ~$0.06–0.10 | ~$0.76–1.22/min |
Pricing as of 24 Mar 2026. GPU pricing fluctuates over time. Check current GPU pricing for live rates.
Which Model Should You Use?
Four concrete scenarios:
1. Experimenting or learning video AI
Use LTX-2.3 on an RTX 5090 (~$0.76/hr). You get 720p output on affordable hardware, decent quality, and fast iteration. Monthly cost for light use (a few hours per day): ~$46–68/month. The limitation: LTX-2.3 doesn't match Wan 2.1 on motion coherence for complex scenes. But it's a good starting point before you commit to H100 spending.
2. Production at scale, cost matters
Use Wan 2.1/2.2 on H100 SXM5 (~$2.40/hr on-demand). At 720p, a 5-second clip costs $0.40–0.48. For 1,000 clips per day, you're looking at ~$440/day on-demand. The limitation: 10-second clips at 720p push VRAM limits, so batch at 5-second clips and concatenate if needed. For a real-world scaling example, see running 100 concurrent AI agents on Spheron.
3. Maximum quality, budget secondary
Use HunyuanVideo on H200 SXM (~$3.69/hr on-demand). You get the best motion realism of any open-source model at 720p, plus 1080p capability. Monthly cost for heavy use: several thousand dollars. The limitation: 12–18 minutes per clip on H200 makes this slow for high-volume production.
4. No infrastructure, small volume
Use Runway. Browser-based, no setup, ~$0.25–0.50 per 5-second clip depending on model variant. For teams generating a few dozen clips per day, the per-clip cost is similar to on-demand H100, without the infrastructure overhead. The limitation: no fine-tuning, limited output control, and costs don't improve with scale.
These models are running on Spheron's H100 and H200 GPUs today. No contracts, no waitlists.
