LangGraph Tutorial: Build Your First Multi-Agent System in Python

Most production AI systems that actually work aren’t single agents with a long system prompt. They’re systems: an orchestrator that routes work, specialist agents that do specific jobs well, and a shared state object that keeps everything coherent. LangGraph is the framework that makes this pattern buildable, debuggable, and deployable without rewriting your architecture every time requirements change.

This tutorial builds a real multi-agent system from scratch: a web research pipeline where an orchestrator delegates to a researcher agent (with tool use) and a writer agent, passing typed state between them. You’ll see exactly how LangGraph’s state machine model works, how tool calls connect to agent nodes, and how LangSmith gives you trace visibility across the whole graph. Every code block in this post runs as written.

By the end you’ll understand the architecture well enough to apply it to your own workflows — and you’ll see why this stack is the production-ready foundation we build on at Agentic Runbook.

What Is LangGraph — and Why Use It for Multi-Agent Systems?

LangGraph is a stateful orchestration framework built on top of LangChain. It models your application as a directed graph: nodes are Python functions (your agents, tools, transforms), edges define control flow, and a typed state object is passed and updated at every step.

That sounds abstract. Here’s why it matters in practice:

Single agents don’t compose well. A single-agent loop — LLM decides, calls tool, observes result, repeats — works for simple tasks. When you need two agents with different prompts, different tools, and different output formats to coordinate on the same task, you need a framework that manages state hand-offs explicitly. LangGraph’s graph model makes every hand-off a named, inspectable edge.

State needs to be first-class. In a multi-agent system, state is shared infrastructure. LangGraph’s TypedDict state schema gives every node a typed contract: here’s what you receive, here’s what you’re allowed to update. No more global dictionaries or ad-hoc message passing.

Production requires observability. When a five-node graph produces a wrong answer, you need to know which node was responsible. LangGraph emits structured traces to LangSmith automatically, giving you per-node latency, token counts, and full I/O at every step.

This is the model Agentic Runbook uses for every production system we build. Let’s build one.

Core Concepts You Need First

StateGraph and TypedDict

Every LangGraph application starts with a StateGraph parameterized by a TypedDict schema. The schema defines the fields every node can read and write.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
import operator

class AgentState(TypedDict):
    query: str                              # The original user question
    search_results: str                     # Raw results from web search tool
    draft: str                              # Writer agent's draft output
    final_output: str                       # Reviewed, finalized content
    messages: Annotated[list, operator.add] # Message history — uses a reducer

Notice the messages field. It uses Annotated[list, operator.add] — that’s a state reducer. Instead of each node replacing the messages list, every update appends to it. LangGraph merges node return values back into state using the reducer; without one, the default is to overwrite. For message histories you always want operator.add. For scalar fields like query or draft, the default overwrite is correct.

Nodes

A node is a plain Python function that takes the current AgentState and returns a dict of updates:

def my_node(state: AgentState) -> dict:
    # Read from state
    query = state["query"]
    # Do work...
    return {"draft": "result here"}  # Partial update — only touched fields

LangGraph merges the returned dict back into state. You never mutate state directly.

Edges and Conditional Routing

# Normal edge: always goes A → B
graph.add_edge("researcher", "writer")

# Conditional edge: routing function decides which node comes next
graph.add_conditional_edges(
    "orchestrator",
    routing_function,          # returns a string: name of next node
    {
        "researcher": "researcher",
        "writer": "writer",
        "end": END,
    }
)

The routing function inspects state and returns a string. No LLM required — it’s pure Python logic (or an LLM-as-judge if you need it). This is where your branching, looping, and error-handling logic lives.

Tutorial Part 1 — Single Agent with Tool Calling

Before adding a second agent, get one agent with a tool working. We’ll build a researcher agent that calls a web search stub.

Define the Tool

# tools.py
from langchain_core.tools import tool

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic. Returns a summary of results."""
    # Stub: replace with Tavily, SerpAPI, or your actual search provider
    return (
        f"[Web search results for '{query}']\n"
        "- Result 1: Relevant information about the topic from source A.\n"
        "- Result 2: Supporting data from source B, published 2024.\n"
        "- Result 3: Expert perspective from source C.\n"
        "Note: Replace this stub with a real search provider in production."
    )

The @tool decorator registers web_search as a LangChain tool — it extracts the function name, docstring (used as the tool description for the LLM), and type signature automatically.

Build the Researcher Agent Node

# researcher.py
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, ToolMessage
from tools import web_search

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([web_search])

RESEARCHER_SYSTEM = """You are a research specialist. Your job is to gather accurate, 
relevant information on a given topic using the web_search tool.

Call web_search at least once. Return a structured summary of what you found — 
bullet points, key facts, and source context. Be specific, not general."""


def researcher_node(state: dict) -> dict:
    """Researcher agent: calls web search tool and returns structured notes."""
    messages = [
        SystemMessage(content=RESEARCHER_SYSTEM),
        HumanMessage(content=f"Research this topic thoroughly: {state['query']}"),
    ]

    # First LLM call — may return tool calls
    response = llm_with_tools.invoke(messages)
    messages.append(response)

    # Execute any tool calls the LLM requested
    if response.tool_calls:
        for tool_call in response.tool_calls:
            if tool_call["name"] == "web_search":
                result = web_search.invoke(tool_call["args"])
                messages.append(
                    ToolMessage(
                        content=result,
                        tool_call_id=tool_call["id"],
                    )
                )

        # Second LLM call — synthesize tool results into research notes
        final_response = llm_with_tools.invoke(messages)
        search_results = final_response.content
    else:
        search_results = response.content

    return {
        "search_results": search_results,
        "messages": messages,
    }

Wire It Into a Minimal Graph

# single_agent_graph.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator

from researcher import researcher_node

class SimpleState(TypedDict):
    query: str
    search_results: str
    messages: Annotated[list, operator.add]

def build_single_agent():
    graph = StateGraph(SimpleState)
    graph.add_node("researcher", researcher_node)
    graph.set_entry_point("researcher")
    graph.add_edge("researcher", END)
    return graph.compile(checkpointer=MemorySaver())

if __name__ == "__main__":
    app = build_single_agent()
    config = {"configurable": {"thread_id": "test-1"}}
    result = app.invoke(
        {"query": "What are the main use cases for LangGraph in production?",
         "search_results": "", "messages": []},
        config=config
    )
    print(result["search_results"])

Run this and you’ll see the researcher agent call the search stub and return structured notes. One node, one tool, full state tracking.

Tutorial Part 2 — Extend to Multi-Agent: Orchestrator + Specialist Pattern

Now add the writer agent and an orchestrator that routes between them.

Add the Writer Agent

# writer.py
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)

WRITER_SYSTEM = """You are a technical writer. Given a research brief and source notes, 
write a clear, well-structured answer to the original question.

Requirements:
- Use the research notes as your only source — don't invent facts
- Structure with a brief intro, key points, and a conclusion
- Aim for clarity and precision over length
- Write for a technical audience"""


def writer_node(state: dict) -> dict:
    """Writer agent: turns research notes into polished output."""
    messages = [
        SystemMessage(content=WRITER_SYSTEM),
        HumanMessage(content=(
            f"Original question: {state['query']}\n\n"
            f"Research notes:\n{state['search_results']}\n\n"
            "Write the final answer now."
        )),
    ]
    response = llm.invoke(messages)
    return {
        "final_output": response.content,
        "messages": [response],
    }

Add the Orchestrator

The orchestrator decides what to do based on current state. For this pipeline it’s a simple sequence — but in production this is where you’d add conditional logic: skip research if the cache is warm, escalate to human review if quality is low, branch to a different specialist for different query types.

# orchestrator.py

def orchestrator_node(state: dict) -> dict:
    """Orchestrator: inspects state and decides the next step."""
    # Has research been done yet?
    if not state.get("search_results"):
        return {"messages": []}  # Route to researcher (no state change needed)

    # Has the writer produced output?
    if not state.get("final_output"):
        return {"messages": []}  # Route to writer

    # Both done — route to END
    return {"messages": []}


def route_from_orchestrator(state: dict) -> str:
    """Routing function: returns the name of the next node."""
    if not state.get("search_results"):
        return "researcher"
    if not state.get("final_output"):
        return "writer"
    return "end"

Assemble the Full Multi-Agent Graph

# multi_agent_graph.py
from typing import TypedDict, Annotated
import operator

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

from orchestrator import orchestrator_node, route_from_orchestrator
from researcher import researcher_node
from writer import writer_node


class AgentState(TypedDict):
    query: str
    search_results: str
    final_output: str
    messages: Annotated[list, operator.add]


def build_multi_agent_graph():
    graph = StateGraph(AgentState)

    # Register nodes
    graph.add_node("orchestrator", orchestrator_node)
    graph.add_node("researcher", researcher_node)
    graph.add_node("writer", writer_node)

    # Entry point: always start at the orchestrator
    graph.set_entry_point("orchestrator")

    # Orchestrator uses conditional routing
    graph.add_conditional_edges(
        "orchestrator",
        route_from_orchestrator,
        {
            "researcher": "researcher",
            "writer": "writer",
            "end": END,
        },
    )

    # Specialists always return to the orchestrator after completing their work
    graph.add_edge("researcher", "orchestrator")
    graph.add_edge("writer", "orchestrator")

    return graph.compile(checkpointer=MemorySaver())


if __name__ == "__main__":
    app = build_multi_agent_graph()
    config = {"configurable": {"thread_id": "multi-agent-test-1"}}

    initial_state: AgentState = {
        "query": "How does LangGraph handle state persistence across multiple agent calls?",
        "search_results": "",
        "final_output": "",
        "messages": [],
    }

    result = app.invoke(initial_state, config=config)

    print("=== RESEARCH NOTES ===")
    print(result["search_results"])
    print("\n=== FINAL OUTPUT ===")
    print(result["final_output"])

The execution path is:

START → orchestrator → researcher → orchestrator → writer → orchestrator → END

Every node passes through the orchestrator. The orchestrator inspects state and decides what’s still needed. This pattern scales: add a fact-checker node, an editor node, or a quality-evaluator node — the orchestrator gets one more branch in route_from_orchestrator, and the rest of the graph stays unchanged.

Adding LangSmith Observability

Set three environment variables and every node execution appears as a trace in LangSmith. No code changes required.

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_langsmith_api_key_here
export LANGCHAIN_PROJECT=multi-agent-tutorial

What you see in the LangSmith UI:

One top-level trace per app.invoke() call — the full graph run, with total latency and token cost
A child span per node — orchestrator → researcher → orchestrator → writer → orchestrator, each with its own latency, token count, and I/O
Tool call spans nested inside the researcher node — the web_search call logged with its input query and returned result
Full state at every boundary — the exact dict going into each node and the exact dict coming out

For production environments, use the @traceable decorator to add custom trace metadata:

from langsmith import traceable

@traceable(name="researcher-agent", tags=["production", "v2"])
def researcher_node(state: dict) -> dict:
    # ... same implementation
    pass

Tag by environment, version, and experiment name. In LangSmith you can then filter traces by tag, compare latency distributions across versions, and set up automated evaluators to score output quality on a sample of live runs.

Set LANGCHAIN_PROJECT to environment-specific values — multi-agent-prod, multi-agent-staging — so traces from different environments don’t mix. Cost reporting, error rate tracking, and latency monitoring are all per-project.

Three Anti-Patterns to Avoid

1. The Monolithic Node

The most common mistake: putting research, synthesis, quality evaluation, and formatting into a single large node because “it’s simpler.” It isn’t simpler — it’s a procedure masquerading as a graph. You can’t trace individual steps, you can’t route conditionally between them, you can’t reuse any piece of that node in a different workflow, and when it fails you have no idea which part of the logic broke.

Rule of thumb: if a node is doing more than one distinct job, it’s two nodes.

2. Mutable State Side Effects

Nodes should only update state through their return value. Never do this:

# WRONG: mutating state directly
def bad_node(state: dict) -> dict:
    state["search_results"] += "extra data"  # Bypasses LangGraph's state management
    return {}

Direct mutation bypasses LangGraph’s merge logic, breaks checkpointing, and produces state that diverges from what LangSmith shows in traces. Always return a dict of updates and let LangGraph apply them.

3. No Iteration Guard in Routing Functions

If your routing function has a bug — or if state never reaches the condition that triggers END — your graph loops indefinitely. This runs up your API bill and triggers rate limits silently.

# Always add an iteration ceiling
MAX_ITERATIONS = 5

def route_from_orchestrator(state: dict) -> str:
    if state.get("iterations", 0) >= MAX_ITERATIONS:
        return "end"  # Hard stop
    if not state.get("search_results"):
        return "researcher"
    if not state.get("final_output"):
        return "writer"
    return "end"

Include an iterations counter in your state schema and increment it in the orchestrator node. One guard prevents the entire class of runaway-loop failures.

Ready to deploy agents like this at scale?

The architecture in this tutorial is the same stack we deploy for production clients — LangGraph graphs, LangSmith observability, persistent checkpointers, and real tools. Our Diagnostic Sprint gets your team from proof-of-concept to production-ready in two weeks.

Book a Diagnostic Sprint

Frequently Asked Questions

What is LangGraph and how is it different from LangChain?

LangGraph is a stateful orchestration layer built on top of LangChain. LangChain provides the primitives — LLM wrappers, tool interfaces, prompt templates — while LangGraph provides the control-flow layer: a directed graph model with typed state, conditional routing, and checkpointing. You use LangChain components (like ChatOpenAI and @tool) inside LangGraph nodes. Think of LangChain as the component library and LangGraph as the system architecture framework that orchestrates those components.

How does LangGraph state management work?

LangGraph state is a TypedDict schema shared across all nodes. When a node returns a dict of updates, LangGraph merges those updates back into the current state before passing it to the next node. For fields with reducers (like Annotated[list, operator.add]), updates are accumulated rather than overwritten. For scalar fields, the node’s return value replaces the previous value. The checkpointer persists a snapshot of state after every node execution, enabling resume-from-failure and human-in-the-loop pause points.

Can I run LangGraph agents in production without LangSmith?

Yes — LangSmith is optional. LangGraph runs without any observability configured. However, for any system with more than two nodes, conditional routing, or real users, debugging without traces is significantly harder. LangSmith’s free tier handles substantial volume, and the integration is three environment variables with no code changes. The cost of adding it is low enough that there’s no reason to skip it in production.

What’s the difference between a single-agent loop and a multi-agent LangGraph system?

A single-agent loop is one LLM making decisions and calling tools repeatedly until a stopping condition is met. It’s a cycle in graph terms: one node with edges back to itself. A multi-agent system has multiple distinct nodes (agents), each with a specific role, prompt, and potentially different tools, coordinating through shared state. Multi-agent systems are better when: (a) different parts of the task require genuinely different capabilities or prompts, (b) you want independent traceability for each agent’s work, or (c) you need to route different inputs to the appropriate specialist without one agent needing to do everything.

How do I add memory to a LangGraph agent across sessions?

Replace MemorySaver with a persistent checkpointer. SqliteSaver writes state to a local SQLite database and requires no additional infrastructure — suitable for single-server deployments. For distributed systems, use the Postgres checkpointer. Each conversation or workflow run gets a thread_id in the config dict; the checkpointer uses that ID to store and retrieve state across restarts, deployments, and arbitrarily long pauses. Long-term semantic memory (user facts, past decisions) is a separate concern handled by a vector store retrieval step in the relevant node, not the checkpointer itself.