Deploy Multi-Agent LangGraph Systems on FlexAI Inference Endpoints

This experiment explores a Multi-Agent architecture where specialized AI agents work together under the guidance of a central supervisor. The supervisor acts as an intelligent coordinator, managing communication between agents and strategically delegating tasks based on each agent’s expertise and the specific requirements of the problem at hand. In this experiment, you’ll create a multi-agent system powered by LangGraph with two agents — a research and a math expert. The web search agent is using Tavily, visit their website to get an API key.

Start the FlexAI endpoints

Create the FlexAI secret that contains your HF token in order to access the inference models:

# Enter your HF token value when prompted
flexai secret create hf-token

Export your Tavily API key:

export TAVILY_API_KEY=<TAVILY_API_KEY>

Start the FlexAI endpoint of the LLM:

LLM_INFERENCE_NAME=qwen-llm
export LLM_MODEL_NAME=Qwen/Qwen2.5-32B-Instruct
flexai inference serve $LLM_INFERENCE_NAME --hf-token-secret hf-token -- --model=$LLM_MODEL_NAME --enable-auto-tool-choice --tool-call-parser hermes --max-model-len 16384

Store the returned Inference Endpoint API KEY and Endpoint URL:

export LLM_API_KEY=<INFERENCE_ENDPOINT_API_KEY>
export LLM_URL=$(flexai inference inspect $LLM_INFERENCE_NAME -j | jq .config.endpointUrl -r)

You’ll notice the last export line uses the jq tool to extract the value of endpointUrl from the JSON output of the inspect command. If you don’t have it already, you can get jq from its official website: https://jqlang.org/

Setup

Navigate to the experiment directory

cd code/multi-agent/

Install the required dependencies

pip install -r requirements.txt

Run the application

python main.py

Interact with the Multi-Agent System

When prompted, enter your question. The system will automatically route it to the appropriate agents (research and math experts).Research + Math Question:

In 2025, how old are Trump and Macron? Also sum their ages.

Expected Output:

In 2025, Donald Trump will be 79 years old and Emmanuel Macron will be 48 years old.
The sum of their ages in 2025 will be 127 years.

Code

`requirements.txt`

langgraph>=0.6.6
langgraph-supervisor>=0.0.29
langchain-tavily>=0.2.11
langchain[openai]>=0.3.25

`main.py`

# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT

import getpass
import os

from agents.supervisor import supervisor
from langchain_core.messages import convert_to_messages


def _set_if_undefined(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"Please provide your {var}")


_set_if_undefined("LLM_API_KEY")
_set_if_undefined("LLM_URL")
_set_if_undefined("LLM_MODEL_NAME")
_set_if_undefined("TAVILY_API_KEY")


def pretty_print_message(message, indent=False):
    pretty_message = message.pretty_repr(html=True)
    if not indent:
        print(pretty_message)
        return

    indented = "\n".join("\t" + c for c in pretty_message.split("\n"))
    print(indented)


def pretty_print_messages(update, last_message=False):
    is_subgraph = False
    if isinstance(update, tuple):
        ns, update = update
        # skip parent graph updates in the printouts
        if len(ns) == 0:
            return

        graph_id = ns[-1].split(":")[0]
        print(f"Update from subgraph {graph_id}:")
        print("\n")
        is_subgraph = True

    for node_name, node_update in update.items():
        update_label = f"Update from node {node_name}:"
        if is_subgraph:
            update_label = "\t" + update_label

        print(update_label)
        print("\n")

        messages = convert_to_messages(node_update["messages"])
        if last_message:
            messages = messages[-1:]

        for m in messages:
            pretty_print_message(m, indent=is_subgraph)
        print("\n")


while True:
    user_question = input("Please enter your question: ")

    for chunk in supervisor.stream(
        {
            "messages": [
                {
                    "role": "user",
                    "content": user_question,
                }
            ]
        },
    ):
        if not ("supervisor" in chunk and chunk["supervisor"] is None):
            pretty_print_messages(chunk, last_message=True)

    final_message_history = chunk["supervisor"]["messages"]

`agents/supervisor.py`

# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT

import os

from langchain_openai import ChatOpenAI
from langgraph_supervisor import create_supervisor

from .math import math_agent
from .web_search import research_agent

llm = ChatOpenAI(
    model_name=os.environ.get("LLM_MODEL_NAME"),
    openai_api_key=os.environ.get("LLM_API_KEY"),
    openai_api_base=os.environ.get("LLM_URL") + "/v1",
)

supervisor = create_supervisor(
    model=llm,
    agents=[research_agent, math_agent],
    prompt=(
        "You are a supervisor managing two agents:\n"
        "- a research agent. Assign research-related tasks to this agent\n"
        "- a math agent. Assign math-related tasks to this agent\n"
        "Assign work to one agent at a time, do not call agents in parallel.\n"
        "Do not do any work yourself."
    ),
    add_handoff_back_messages=True,
    output_mode="full_history",
).compile()

`agents/math.py`

# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT

import os

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI(
    model_name=os.environ.get("LLM_MODEL_NAME"),
    openai_api_key=os.environ.get("LLM_API_KEY"),
    openai_api_base=os.environ.get("LLM_URL") + "/v1",
)


def add(a: float, b: float):
    """Add two numbers."""
    return a + b


def multiply(a: float, b: float):
    """Multiply two numbers."""
    return a * b


def divide(a: float, b: float):
    """Divide two numbers."""
    return a / b


math_agent = create_react_agent(
    model=llm,
    tools=[add, multiply, divide],
    prompt=(
        "You are a math agent.\n\n"
        "INSTRUCTIONS:\n"
        "- Assist ONLY with math-related tasks\n"
        "- After you're done with your tasks, respond to the supervisor directly\n"
        "- Respond ONLY with the results of your work, do NOT include ANY other text."
    ),
    name="math_agent",
)

`agents/web_search.py`

# Copyright (c) 2025 FlexAI
# This file is part of the FlexAI Experiments repository.
# SPDX-License-Identifier: MIT

import os

from langchain_openai import ChatOpenAI
from langchain_tavily import TavilySearch
from langgraph.prebuilt import create_react_agent

web_search = TavilySearch(max_results=3)


llm = ChatOpenAI(
    model_name=os.environ.get("LLM_MODEL_NAME"),
    openai_api_key=os.environ.get("LLM_API_KEY"),
    openai_api_base=os.environ.get("LLM_URL") + "/v1",
)

research_agent = create_react_agent(
    model=llm,
    tools=[web_search],
    prompt=(
        "You are a research agent.\n\n"
        "INSTRUCTIONS:\n"
        "- Assist ONLY with research-related tasks, DO NOT do any math\n"
        "- After you're done with your tasks, respond to the supervisor directly\n"
        "- Respond ONLY with the results of your work, do NOT include ANY other text."
    ),
    name="research_agent",
)

⚡ Deploy this on FlexAI

Any model, NVIDIA or AMD, live in under 60 seconds. No code changes, no infra management, hardware-agnostic.

Get started →Talk to us

​Start the FlexAI endpoints

​Setup

​Code

​requirements.txt

​main.py

​agents/supervisor.py

​agents/math.py

​agents/web_search.py

​⚡ Deploy this on FlexAI