Build span execution trees for multi-step AI agents

Single-step agents are easy to debug. Multi-step agents — where an LLM reasons, triggers a tool, then reasons again based on the result — are not. TracePilot AI builds a visual execution tree from your spans by linking each step to its parent using the spanId returned by the previous step. You can see the full chain of decisions in the dashboard and fork any step to investigate a failure.

How the tree is built

Every wrapOpenAI and wrapToolCall call returns a spanId. Pass that spanId as parentSpanId to the next call that logically follows from it. TracePilot uses these links to render a nested tree rather than a flat list. The stepOrder parameter controls the display order of sibling spans under the same parent. Always pass it — the dashboard sorts siblings by stepOrder, not by wall-clock time, so the tree stays readable even when steps run concurrently.

Full example: research agent

The following agent follows three steps: an LLM produces a research plan, a web search tool executes the plan, and a second LLM synthesizes the search results into a final answer. Each step is linked to the one before it.

import { TracePilot } from 'tracepilot-sdk';
import OpenAI from 'openai';

const tp = new TracePilot('tp_live_YOUR_KEY');
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function researchAgent(query: string): Promise<string> {
  await tp.startTrace('research-agent');

  const messages = [{ role: 'user', content: query }];

  // ── Step 1: Initial reasoning ──────────────────────────────────────────────
  // No parentSpanId — this is the root span of the execution tree.
  const { result: plan, spanId: planSpanId } = await tp.wrapOpenAI(
    () => openai.chat.completions.create({ model: 'gpt-4o', messages }),
    messages,
    undefined, // no parent
    1          // stepOrder 1 — displayed first in the tree
  );

  // ── Step 2: Web search tool ────────────────────────────────────────────────
  // parentSpanId = planSpanId links this span as a child of step 1.
  // The dashboard shows this span nested beneath the reasoning span.
  const { result: searchResult, spanId: searchSpanId } = await tp.wrapToolCall(
    'web-search',
    () => webSearch(plan.choices[0].message.content ?? ''),
    planSpanId, // parent is the LLM reasoning span
    2           // stepOrder 2 — displayed second
  );

  // ── Step 3: Final synthesis ────────────────────────────────────────────────
  // parentSpanId = searchSpanId links this span as a child of step 2.
  // The full tree reads: plan → search → answer.
  const followUp = [
    ...messages,
    plan.choices[0].message,
    { role: 'tool', content: JSON.stringify(searchResult) }
  ];

  const { result: answer } = await tp.wrapOpenAI(
    () => openai.chat.completions.create({ model: 'gpt-4o', messages: followUp }),
    followUp,
    searchSpanId, // parent is the tool span
    3             // stepOrder 3 — displayed third
  );

  return answer.choices[0].message.content ?? '';
}

What the execution tree looks like

After this agent runs, the dashboard renders the following tree:

research-agent (trace)
└── Step 1 · gpt-4o · plan (root span)
    └── Step 2 · web-search · tool call (child of step 1)
        └── Step 3 · gpt-4o · synthesis (child of step 2)

Each node in the tree is clickable. Select any span to inspect its input messages, output, token count, latency, and cost.

Always pass stepOrder — even when steps run sequentially. The dashboard uses it to sort siblings within the same parent, which keeps the tree readable if steps arrive out of order due to network or async timing differences.

Understanding each `parentSpanId` linkage

The three linkages in the example create a strict chain of causality:

Step	`parentSpanId`	What it means
Step 1 (plan)	`undefined`	Root of the tree — no parent
Step 2 (search)	`planSpanId`	The search was triggered by the plan
Step 3 (answer)	`searchSpanId`	The final answer was shaped by the search results

If you pass the wrong parentSpanId — for example, linking step 3 to planSpanId instead of searchSpanId — the tree still renders, but the relationship between the search and the answer becomes invisible. Be deliberate about which span is the logical cause of the next one.

Forking a failing step

If step 3 returns a bad answer, open the dashboard, find the step 3 span, and click Fork & Rerun. Edit the followUp messages directly in the UI — for example, adjust the tool output passed as context — and run the span again. The new result appears immediately without redeploying your agent.

Branching trees (fan-out)

When an agent runs multiple tools in parallel from a single LLM call, give each tool span the same parentSpanId (the LLM span) and unique stepOrder values. They appear as siblings in the tree.

const { result: plan, spanId: planSpanId } = await tp.wrapOpenAI(
  () => openai.chat.completions.create({ model: 'gpt-4o', messages }),
  messages,
  undefined,
  1
);

// Run two tools in parallel, both children of the plan span
const [searchSpan, calcSpan] = await Promise.all([
  tp.wrapToolCall('web-search', () => webSearch(query), planSpanId, 2),
  tp.wrapToolCall('calculate', () => calculate(expr), planSpanId, 3),
]);

The dashboard renders:

Step 1 · plan
├── Step 2 · web-search
└── Step 3 · calculate

Deeply nested trees

You can nest spans as many levels deep as your agent requires. Each call returns a spanId you can use as the parentSpanId for the next level. There is no hard limit on tree depth.

Mixing wrapOpenAI and wrapToolCall in the same tree

parentSpanId accepts the spanId from either wrapOpenAI or wrapToolCall. You can freely interleave LLM spans and tool spans as parents and children in the same tree.

Documentation Index

​How the tree is built

​Full example: research agent

​What the execution tree looks like

​Understanding each parentSpanId linkage

​Forking a failing step

How the tree is built

Full example: research agent

What the execution tree looks like

Understanding each `parentSpanId` linkage

Forking a failing step