What is the main difference between LangGraph and LangChain?

LangChain provides a toolkit of abstractions - chains, retrievers, tools, and memory objects - for composing LLM calls. LangGraph is built on top of LangChain and adds a stateful directed-graph execution model with checkpointing, time-travel debugging, and built-in support for human-in-the-loop interruptions. LangChain is the right choice for simple pipelines with a defined linear flow. LangGraph is the right choice when your agent needs to branch, loop, retry, or persist state across sessions.

Can I use LangChain and LangGraph together?

Yes - and most production teams do. LangGraph handles the orchestration layer: defining the graph, managing state, checkpointing progress. LangChain components - retrievers, tool definitions, prompt templates, output parsers - plug into LangGraph nodes as callables. You get LangChain's ecosystem of integrations with LangGraph's production-grade state machine execution.

Is LangGraph harder to learn than LangChain?

Yes, by about a week of effort. LangGraph requires you to think in terms of graph nodes, edges, and state reducers rather than linear chains. The State object definition is the steepest part of the learning curve. Teams comfortable with finite state machines or workflow engines (Prefect, Airflow) adapt fastest. If your agent logic fits in a linear sequence with no branching, LangChain is simpler to ship and maintain.

What GPU do I need for running LangGraph with a self-hosted LLM?

It depends on the model behind the LangGraph nodes. For a single 70B reasoning agent (e.g. DeepSeek R1) serving under 20 concurrent sessions at 8K context, an H100 PCIe 80GB is the minimum. For a hierarchical multi-agent LangGraph workflow with a 70B orchestrator and multiple 8B worker agents, plan for at least two H100 SXM5 instances or a single B200 to keep all models VRAM-resident simultaneously.

How do LangGraph checkpoints work in production?

LangGraph checkpoints serialize the entire graph state after each node execution. In production, this state is persisted to an external store - PostgreSQL, Redis, or a custom backend - so graph execution can resume after interruption, rollback to any prior state, or branch into parallel timelines for evaluation. Checkpointing adds one small write per node per run, which is negligible compared to LLM inference latency.

LangGraph vs LangChain: Which to Use for Production AI Agents in 2026

Teams shipping agents in 2026 keep getting stuck on the same question: LangGraph or LangChain? The framing is wrong. LangGraph is built on top of LangChain. The real question is whether your agent needs what LangGraph adds: stateful graph execution, checkpointing, time-travel debugging, and interruption support.

If you have a linear pipeline with no branching, LangChain is sufficient. If your agent needs to loop, branch, resume after failure, or wait for human approval mid-execution, you need LangGraph. Most production agents end up in the second category.

TL;DR Decision Matrix

Scenario	Use LangChain	Use LangGraph	Use Both
Simple RAG pipeline (retrieve, generate, return)	Yes	No	Optional
Linear chatbot with no memory across sessions	Yes	No	Optional
Multi-step tool use with fixed sequence	Yes	No	Optional
Agent with conditional branches or retries	No	Yes	Yes
Human-in-the-loop approval gate	No	Yes	Yes
Long-running session that must resume	No	Yes	Yes
Multi-agent supervisor with specialized workers	No	Yes	Yes
Complex workflow with state replay/debugging	No	Yes	Yes

What LangChain Actually Is (and Isn't)

LangChain is a composition toolkit. It gives you building blocks: retrievers that connect to vector stores, tool definitions, prompt templates, output parsers, and LCEL (LangChain Expression Language) for wiring them into pipelines. The ecosystem is the real moat. Hundreds of integrations with vector stores, document loaders, embedding models, and third-party APIs exist out of the box.

LCEL makes composition readable:

python

chain = prompt | llm | output_parser
result = chain.invoke({"question": "What is the capital of France?"})

That is clean, testable, and easy to understand. For fixed-flow pipelines, it's hard to beat.

The weakness shows up with AgentExecutor. LangChain's built-in agent loop handles tool calling, but it's a black box. No native state persistence. No way to interrupt and resume. No branching or conditional routing. If you call agent_executor.invoke() and it fails halfway through a long chain of tool calls, you restart from scratch. For short, low-stakes tasks, this is fine. For production agents running 5-15 tool calls on expensive context, it's a problem.

What LangGraph Adds

LangGraph gives you a directed graph where nodes are Python functions and edges are transitions between them. The state is explicit: a typed Python dict (TypedDict) that every node reads and writes to.

python

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    tool_calls_remaining: int
    last_tool_output: str

def call_llm(state: AgentState) -> AgentState:
    # call LLM, return partial state update
    ...

def call_tool(state: AgentState) -> AgentState:
    # execute tool, update state
    ...

def should_continue(state: AgentState) -> str:
    if state["tool_calls_remaining"] > 0:
        return "tool"
    return END

graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tool", call_tool)
graph.add_conditional_edges("llm", should_continue)
graph.add_edge("tool", "llm")
graph.set_entry_point("llm")
app = graph.compile()

The graph is explicit, inspectable, and testable. You can visualize it, step through it in a debugger, and replay any state from any checkpoint.

Checkpointing

Checkpointing is the feature that changes what's possible in production. After every node execution, LangGraph serializes the full state. In development, you use MemorySaver. In production, you use AsyncPostgresSaver or AsyncRedisSaver.

python

from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver

async with AsyncPostgresSaver.from_conn_string(DATABASE_URL) as checkpointer:
    app = graph.compile(checkpointer=checkpointer)
    result = await app.ainvoke(input, config={"configurable": {"thread_id": "session-123"}})

With PostgreSQL checkpointing, a spot instance interruption means resuming from the last completed node, not restarting from scratch. For a 10-node graph, that's the difference between losing 9 LLM calls and losing 0.

Human-in-the-loop

interrupt_before and interrupt_after let you pause graph execution at specific nodes, send state to a human reviewer, and resume:

python

app = graph.compile(
    checkpointer=checkpointer,
    interrupt_before=["execute_code"]  # pause before code execution for review
)

LangChain's AgentExecutor has no equivalent primitive.

Time-Travel Debugging

LangGraph Studio (the visual debugger) lets you replay any prior state, branch off a new execution path from any point, and compare outcomes. For debugging complex multi-turn agent failures, this saves hours.

State Management Deep Dive

LangChain handles memory through objects like ConversationBufferMemory and RedisChatMessageHistory. These work well for chatbots that need the last N messages. They fall apart for agents that need to track structured state across turns.

Compare:

LangChain memory (conversation history only):

python

from langchain.memory import RedisChatMessageHistory, ConversationBufferWindowMemory

history = RedisChatMessageHistory(session_id="user-123", url=REDIS_URL)
memory = ConversationBufferWindowMemory(chat_memory=history, k=10)

LangGraph state (structured, typed, persisted):

python

class ResearchState(TypedDict):
    query: str
    sources_found: list[str]
    drafts: list[str]
    approval_status: str
    token_budget_remaining: int
    user_id: str

LangGraph's state is explicit about everything your agent tracks. There's no implicit message buffer that you hope contains the right context. Every field is visible, reduceable (you can define how fields merge on updates), and checkpointed.

When checkpoints outperform naive memory:

Multi-session resume: a user comes back after three days. LangGraph can restore the exact state from where they left off. LangChain's RedisChatMessageHistory stores messages but not the full execution state.
Parallel branch evaluation: you can fork a graph at checkpoint N, run two different paths, and compare outcomes for A/B testing or debugging.
Compliance audit trails: regulated industries need a complete record of every agent decision. LangGraph's checkpoint history provides this out of the box.

For long-term cross-session memory that goes beyond graph state (embedding-based recall of facts across sessions), LangGraph checkpoints and vector memory serve different purposes. The guide on persistent agent memory with Mem0 and Zep covers how embedding-based memory sits alongside LangGraph checkpoints as a separate retrieval layer.

Multi-Agent Orchestration: Where LangGraph Wins Decisively

LangChain's AgentExecutor has no native multi-agent primitive. You can chain two agents together, but managing state across them, routing between them conditionally, or running them in parallel requires custom code.

LangGraph handles this with the supervisor pattern:

python

from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from typing import TypedDict, Annotated

class SupervisorState(TypedDict):
    messages: Annotated[list, add_messages]
    next_agent: str

def router(state: SupervisorState) -> str:
    return state["next_agent"]

# Build supervisor graph
supervisor = StateGraph(SupervisorState)
supervisor.add_node("supervisor", supervisor_node)
supervisor.add_node("researcher", researcher_subgraph)
supervisor.add_node("writer", writer_subgraph)
supervisor.add_conditional_edges("supervisor", router)

The parallel subgraph execution via the Send API runs multiple agents simultaneously:

python

from langgraph.types import Send

def spawn_parallel_agents(state):
    return [
        Send("researcher", {"query": q})
        for q in state["queries"]
    ]

Each sub-agent is its own graph with its own state. The parent graph manages routing, aggregation, and final output. This maps directly to GPU batching at the inference layer: if your graph has parallel tool call branches, those branches can share a single GPU inference pool efficiently.

For the infrastructure side of scaling these multi-agent topologies, the guide on scaling agent fleets with MCP orchestration covers autoscaling patterns, GPU tiering, and cost modeling for fleets of 1,000+ concurrent agents.

Streaming, Interruptions, and Replay

graph.astream_events() gives you per-node streaming. You get a stream of events as each node starts, runs, and completes. This lets you show users incremental progress rather than waiting for the full graph to complete:

python

async for event in app.astream_events(input, version="v2"):
    if event["event"] == "on_chat_model_stream":
        print(event["data"]["chunk"].content, end="", flush=True)

LangChain's streaming works at the chain level. LangGraph's streaming works at the graph level, with visibility into which node is running.

LangGraph Studio is the visual debugger. It shows the graph structure, lets you inspect state at any node, replay runs from any checkpoint, and branch off new execution paths from any point. For debugging complex multi-turn agent failures, this is the tool that saves hours.

Production interruption patterns: placing interrupt_before on dangerous nodes (code execution, database writes, external API calls) gives you a human approval gate without changing the agent logic. The graph pauses, the state is checkpointed, a notification is sent, and execution resumes when a human approves.

Production Observability

Langfuse is the most straightforward observability layer for LangGraph. The callback handler traces every node execution:

python

from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler()
result = app.invoke(
    input,
    config={"callbacks": [langfuse_handler]}
)

Every node execution shows up as a span with token counts, latency, and model parameters. For cost tracking, Langfuse calculates cost per trace automatically using its model pricing table.

Helicone works similarly for cost tracking across inference backends. If you're routing LangGraph nodes to different models (e.g., a cheap 8B model for routing decisions, an expensive 70B model for final output), Helicone gives you a unified cost view.

For a full observability setup covering OpenTelemetry instrumentation, DCGM metric correlation, and compliance requirements, the LLM observability guide covering Langfuse, Arize Phoenix, and Helicone covers the complete stack.

Migration Guide: LangChain AgentExecutor to LangGraph

The migration is more structural than a simple API swap. You're not changing the tools, the prompt, or the model. You're changing the execution loop.

Before (LangChain AgentExecutor):

python

from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
from langchain.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return web_search(query)

llm = ChatOpenAI(model="gpt-4o")
tools = [search_web]
agent = create_openai_functions_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
result = executor.invoke({"input": "Research the latest LLM benchmarks"})

After (LangGraph StateGraph):

python

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
from langchain_core.messages import BaseMessage, HumanMessage
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

llm_with_tools = llm.bind_tools(tools)

def agent_node(state: AgentState):
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if last.tool_calls:
        return "tools"
    return END

graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools))
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent")
app = graph.compile(checkpointer=MemorySaver())

result = app.invoke({"messages": [HumanMessage("Research the latest LLM benchmarks")]}, config={"configurable": {"thread_id": "thread-1"}})

The tool list, prompt template, and LLM are unchanged. Only the execution loop changes. The key translation: AgentExecutor.invoke() becomes graph.invoke() with a {"messages": [...]} state dict.

What you gain in the migration: full state visibility, checkpointing, streaming at the node level, and the ability to add human-in-the-loop gates without restructuring.

GPU Infrastructure for Both Frameworks

Both LangChain and LangGraph are inference-backend-agnostic. They call LLMs through HTTP APIs. The GPU layer determines your production latency and cost, not the orchestration framework.

vLLM exposes an OpenAI-compatible API that either framework connects to with a single line change:

python

# Before: OpenAI API
llm = ChatOpenAI(model="gpt-4o")

# After: Self-hosted vLLM on Spheron
llm = ChatOpenAI(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    base_url="http://<spheron-instance-ip>:8000/v1",
    api_key="your-vllm-key"
)

That change works for both LangChain and LangGraph nodes. The orchestration layer never needs to know whether the model is hosted at OpenAI or on a bare-metal H100 in a Spheron data center.

For VRAM sizing, throughput estimates, and latency budgets for specific models, the GPU infrastructure requirements for AI agents guide covers the math. The short version: for a 70B agent model serving under 20 concurrent sessions at 8K context, an H100 PCIe 80GB is the practical minimum.

Reference Architecture: LangGraph + vLLM on Spheron H100

Here is a concrete production setup for a hierarchical multi-agent LangGraph workflow:

LangGraph Supervisor Graph
    |
    ├── Router Node (8B model, fast, routing decisions)
    |       calls vLLM: Qwen3-8B on H100 PCIe
    |
    ├── Researcher Sub-graph (17B active model, deep analysis)
    |       calls vLLM: Llama-4-Scout-17B-16E-Instruct on H100 PCIe
    |
    ├── Code Writer Sub-graph (7B model, code generation)
    |       calls vLLM: Qwen2.5-Coder-7B-Instruct on H100 PCIe
    |
    └── Synthesizer Node (32B model, final output)
            calls vLLM: DeepSeek-R1-Distill-Qwen-32B on H100 PCIe

PostgreSQL checkpointer (RDS or self-hosted)
Langfuse callback handler (traces all nodes)
Redis for session affinity across vLLM instances

Start the vLLM backend on Spheron's H100 instances:

bash

# H100 PCIe for router + worker nodes (starts at $2.01/hr)
docker run --gpus all --ipc=host -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model Qwen/Qwen3-8B \
  --host 0.0.0.0 \
  --port 8000 \
  --api-key your-secret-key \
  --max-model-len 32768 \
  --max-num-seqs 64

# H100 PCIe for the researcher node (starts at $2.01/hr)
vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
  --host 0.0.0.0 \
  --port 8001 \
  --tensor-parallel-size 1 \
  --max-model-len 65536

Note: Spheron's H100 instances are available as on-demand PCIe instances. For a 17B-active MoE model at the researcher tier, a single H100 PCIe 80GB handles the load with enough VRAM headroom for 65K context.

GPU cost for this reference architecture:

GPU	Use Case	On-Demand Price	Est. Tokens/hr at 70% Util
H100 PCIe	Router / Researcher / Code Writer / Synthesizer	$2.01/hr	~2.5M tokens (8B), ~1.4M tokens (17B active)

Pricing fluctuates based on GPU availability. The prices above are based on 01 May 2026 and may have changed. Check current GPU pricing for live rates.

For a typical production setup with four H100 PCIe instances (router, researcher, code writer, synthesizer at $2.01/hr each), the baseline cost is roughly $8.04/hr. At 70% utilization across all nodes, this architecture handles approximately 4-5M tokens per hour across the entire graph.

When the Framework Choice Doesn't Matter

Most teams that argue about LangGraph vs LangChain are optimizing the wrong layer. The bottleneck is almost never the orchestration framework. It's TTFT (time to first token), throughput, and cost per token from the inference layer.

A well-tuned vLLM backend on bare-metal GPU serves TTFT under 200ms for 8B models at moderate concurrency. A poorly-provisioned managed API serving the same model can sit at 1-3 seconds under load. That latency gap swamps any efficiency difference between LangGraph and LangChain.

The framework choice matters for:

Control flow complexity: LangGraph for anything beyond linear.
State management: LangGraph for checkpointed, resumable state.
Multi-agent routing: LangGraph, full stop.
Team onboarding speed: LangChain wins if your agents are simple.
Debugging capability: LangGraph Studio is significantly better.

The framework choice does not matter for:

Throughput and latency: entirely determined by inference backend.
Cost per token: entirely determined by GPU type and utilization.
Model quality: entirely determined by model selection and prompting.

If your agents run slowly or cost too much, the fix is usually a better vLLM configuration or a more efficient GPU provisioning strategy, not switching orchestration frameworks.

Both LangGraph and LangChain run faster and cheaper when the inference layer is bare metal. Spheron's H100 instances start at $2.01/hr for PCIe with per-second billing and no seat licenses.
Rent H100 on Spheron → | View all GPU pricing → | Get started →

TL;DR Decision Matrix

What LangChain Actually Is (and Isn't)

What LangGraph Adds

Checkpointing

Human-in-the-loop

Time-Travel Debugging

State Management Deep Dive

Multi-Agent Orchestration: Where LangGraph Wins Decisively

Streaming, Interruptions, and Replay

Production Observability

Migration Guide: LangChain AgentExecutor to LangGraph

GPU Infrastructure for Both Frameworks

Reference Architecture: LangGraph + vLLM on Spheron H100

When the Framework Choice Doesn't Matter

Build what's next.