OpenAI agent infinite loops are silent budget killers. Your agent retries, re-calls tools, or re-enters its own memory — and nothing in your logs tells you why.Documentation Index
Fetch the complete documentation index at: https://docs.tracepilotai.com/llms.txt
Use this file to discover all available pages before exploring further.
npm install tracepilot-openai gives you structured span traces that expose every loop iteration, every tool re-invocation, and every token spent — so you can replay the exact execution that went wrong.
Why OpenAI agent infinite loops happen
Infinite loops in LLM agents aren’t random. They fall into three predictable patterns:Recursive tool calls
The model calls a tool, receives output it doesn’t know how to use, and calls the same tool again with a slightly different input. Without a hard exit condition, this repeats until you hit a rate limit or your budget runs out.Retry loops
Error-handling logic retries failed completions unconditionally. If the underlying cause (bad context, token overflow, invalid tool schema) isn’t fixed, every retry triggers the same failure — and the same retry.Memory corruption loops
The agent appends its own output back to the messages array on every iteration. After several turns, the context window fills with redundant or contradictory content. The model starts producing low-quality outputs, which trigger more retries.Installation
npm
yarn
pnpm
Instrument your OpenAI agent
Wrap every completion call with TracePilot. Each iteration of your agent loop becomes a numbered span — so you can see exactly when and why the loop started repeating.Loop detection with span tracing
With TracePilot active, open your dashboard after a suspicious run. Look for:| Signal | What it means |
|---|---|
Same toolName repeated 3+ times | Recursive tool call loop |
| Rising token count per span, same prompt | Memory bloat loop |
Identical stepOrder groups repeating | Retry loop |
Span count > MAX_ITERATIONS | Guard not firing correctly |
How to replay an infinite loop execution
Once you’ve captured the looping trace, you don’t need to reproduce it locally. Fork the span where the loop started and rerun with a modified prompt or context.Find the looping trace
Open tracepilotai.com/dashboard. Filter by high span count or high token usage — loops produce both. Select the trace.
Identify the entry point
Expand the span tree. Find the first span where the repeated tool call appears. That’s your entry point for the fix.
Fork the span
Click Fork & Rerun on the entry-point span. You’ll see the exact messages array and tool definitions the model received.
Apply the fix
Edit the prompt to add an explicit stopping instruction, remove the ambiguous tool, or reduce the context size. Click Run.

Prevent burning your OpenAI API budget
Every iteration of an infinite loop costs tokens. A 10-step loop ongpt-4o with 1k input tokens per step costs roughly 0.50 per runaway execution. At scale, that’s a real incident.
TracePilot gives you three layers of protection:
- Span count alerts — set a threshold in the dashboard. Get notified when any trace exceeds N spans.
- Cost-per-trace visibility — see the total token spend for every run before it compounds.
- Replay without re-execution — fix the bug in the dashboard instead of running the agent again.
Common loop fixes
Recursive tool calls
Recursive tool calls
Add a
MAX_ITERATIONS guard and pass { parallel_tool_calls: false } to force sequential tool execution. In the Fork & Rerun view, add an explicit instruction like "If you have already called search-web, do not call it again." to the system prompt.Unconditional retry loops
Unconditional retry loops
Classify errors before retrying. Only retry on transient network errors (
529, 503). For context-window overflows (400) or invalid tool schemas, fix the root cause — retrying will always fail.Memory corruption / growing context
Memory corruption / growing context
Implement a sliding window: keep only the last N messages plus the system prompt. Use
tp.wrapToolCall to trace your memory pruning step so you can inspect what was dropped.Tool output causing re-invocation
Tool output causing re-invocation
The model re-calls a tool when the output is ambiguous or empty. Use TracePilot to inspect the exact tool output in the span. Add output validation before appending to
messages.Next steps
Quickstart: tracepilot-openai
Get OpenAI agents tracing in under 5 minutes.
Time-travel debugging
Fork any failing span and replay it with edited inputs.
Tracing tool calls
Instrument every tool your agent invokes.
Cost tracking
Monitor token spend per span to catch runaway costs early.