Why Context Windows Kill AI Agent Performance
TL;DR: Your AI agent gets dumber as its
The maximum amount of text (measured in tokens) that a language model can process in a single interaction. Think of it as the modelâs working memory - everything in the context window is available for the model to reference when generating a response.
Context Window Hygiene
This week at the gym I overheard a couple regulars, including a power lifter, discussing AI agents for optimized training program generation. I joined the conversation and quickly realized their problem wasnât the AI itself; they were keeping long-running chats where multiple distinct tasks got smushed together. Their chat completions were all over the place because they were hitting context window limitations.
We went over basic context window hygiene, and that conversation was the catalyst for this blog post. I realized the Ralph Wiggum pattern, when applied correctly, minimizes context window usage on each iteration. Itâs the same principle whether youâre generating training programs or writing code.
The pattern I see constantly: your agent starts strong, completes a few tasks perfectly, then gradually degrades. It misses obvious bugs. It rewrites code it just fixed. It ignores your instructions.
The issue isnât the model itself; itâs where in the
The maximum amount of text (measured in tokens) that a language model can process in a single interaction. Think of it as the modelâs working memory - everything in the context window is available for the model to reference when generating a response.
The Smart Zone vs The Dumb Zone
Letâs talk about how context windows actually work. Take Claude Sonnet 4.5 with its 200k token window. That sounds like a lot of room, but:
Performance zones:
- 0-30% (Smart Zone): Peak performance, optimal attention, fastest responses
- 30-60% (Okay Zone): Still functional but starting to degrade
- 60%+ (Dumb Zone): Severe degradation, unreliable outputs
For a 200k context window, that means:
- First 60k tokens: Your agent is sharp
- Next 60k tokens: Itâs okay, not great
- Last 80k tokens: Itâs struggling
Why does this happen?
A type of language model that generates text by predicting the next token based on all previous tokens in the sequence. Auto-regressive models like GPT and Claude process input sequentially, using the entire context history to inform each prediction.
Key-Value cache ¡ A memory optimization technique in transformer models that stores previously computed attention key and value matrices. KV cache enables faster inference by avoiding redundant computations, but becomes a memory bandwidth bottleneck as context windows grow.
Research backs this up. The Lost in the Middle paper from MIT showed that LLMs have a U-shaped performance curve; they remember stuff at the beginning and end, but information in the middle gets lost.
Even before you write your first prompt, 10% of that smart zone is already gone:
- System prompt: 8.3%
- System tools: 1.4%
- Skills youâve loaded: varies
-
Mcp tools: varies
Model Context Protocol ¡ An open standard by Anthropic that enables AI applications to securely connect to external data sources and tools. MCP provides a standardized way for AI models to access databases, APIs, and local resources through server implementations. - Agent config files: varies
The entire script of âThe Fellowship of the Ringâ, one of my favorite movies ever, is about 53k Claude tokens or 47k Gemini tokens. Thatâs your smart zone.
What Ralph Wiggum Actually Is
The Ralph Wiggum loop was created by Geoffrey Huntley. Itâs dead simple: a bash loop that gives an AI agent the exact same prompt over and over again.
Hereâs the canonical implementation:
while true; do
cat prompt.md | claude --dangerously-skip-permissions
done
Thatâs it. Thatâs the whole pattern.
The prompt.md file tells the agent:
- Read the
plan.mdfile (contains tasks) - Pick the most important task
- Make the changes
- Run tests
- Commit and push
- Mark the task as done in
plan.md - Repeat
The genius is that each iteration gets a fresh context window. No compaction. No accumulated history. Just the prompt, the current state of the codebase, and the task list.
Why Most Implementations Gets It Wrong
Iâve seen so many Ralph implementations that miss the point. The most common mistakes:
Mistake 1: Using Compaction
Anthropicâs official Ralph plugin uses compaction. When it moves to the next task, it summarizes what happened previously instead of resetting the context.
The problem? The model doesnât know whatâs actually important. It guesses. It picks what it thinks matters and discards the rest. Critical information gets lost.
Mistake 2: Max Iterations
Some implementations have max iteration limits. The loop stops after X attempts.
But Ralphâs power is in letting it run. Iâve seen agents find performance issues I never would have noticed because they kept iterating after âcompletingâ all the tasks. They found edge cases, tightened up error handling, improved naming.
If youâre watching the loop (human-in-the-loop), you can stop it when it goes off the rails. But donât artificially limit it.
Mistake 3: Growing the Agent Config
Ryan Carsonâs approach adds to the AGENTS.md file on each iteration. The file grows. Token count increases. If you do that, youâre pushing the model out of the smart zone.
Models are wordy by default. If you let them append to a config file on every iteration, youâre just adding tokens to the beginning of each prompt. Eventually you hit the dumb zone again.
Why This Pattern Matters
Back to those gym conversations: they were keeping a single chat window open for months, asking the agent to track their whole progress. âWhat is my 1RM deadlift?â, then âAnalyze my protein intake for the past weekâ, then âCreate a deload week scheduleâ. Each request added more tokens. After the first few messages, their agent was operating at 70-80% context capacity.
Thatâs when quality tanks. At that point your agent isnât âthinkingâ clearly. Itâs like asking someone to solve complex problems while theyâre exhausted and distracted. Sure, they might get it done, but the quality suffers.
The same thing happens with LLMs. Iâm sure you have seen agents start making circular changes once they hit high context usage. They edit a file, then edit it again differently, then second-guess themselves and revert. Theyâre stuck in a loop because theyâve either compacted away critical context or theyâre operating in the dumb zone where attention is diluted.
When to Deviate from Canonical Ralph
Donât get me wrong; there are good reasons to customize the pattern.
Parallel Ralphs (Raz Mikeâs Ralphy script): Running multiple Ralph loops in parallel for independent tasks. Excellent idea. Each loop gets its own fresh context.
GitHub Issues Integration (Matt Percoâs version): Using actual GitHub issues as the task list. Clever. The filesystem still acts as the source of truth, just synced with GitHub.
Browser Testing (Ralphy with Velâs agent browser tool): Adding browser automation for E2E testing. Makes sense for web projects.
The key is keeping the core principle: fresh context per iteration. If your modification preserves that, youâre good.
The Real Trade-Off
Ralph is slower than compaction. A lot slower.
Each iteration is a new session. The model has to read the entire codebase state again. Thereâs no accumulated knowledge from previous iterations.
Slow and correct beats fast and wrong. Every time.
Iâve seen compacted agents complete 20 tasks in an hour, half of them introducing bugs. Iâve seen Ralph complete 8 tasks in the same time, all of them solid.
Which would you rather have?
Another major trade-off of course is the $đĽ. You can mitigate dollar burn on a subscription by running the agent when you sleep to take advantage of rate limit windows that you donât use otherwise.
How to Actually Use Ralph
If you want to use canonical Ralph:
1. Keep your prompt.md simple Donât write a novel. Be direct. Tell the agent what to do, not how to think about doing it.
2. Make plan.md granular Break tasks into small, verifiable chunks. Not âimplement authenticationâ but âadd
JSON Web Token - A compact, URL-safe token format used to securely transmit information between parties as a JSON object. JWTs are digitally signed and can be verified and trusted, commonly used for authentication and authorization.
3. Use filesystem state, not memory Donât rely on the agent remembering things. Write it down. Use files. The filesystem is your state management system.
4. Watch it run Human-in-the-loop is powerful. Youâll spot patterns. Youâll see when the agent gets stuck. Youâll learn which prompts work.
5. Stop when itâs done Donât let it run forever. When all tasks are complete and itâs not finding new issues, stop it. Use a promise as a circuit breaker.
The Context Reset Pattern Beyond Ralph
Ralph is one implementation of a broader pattern: context resets. Anthropicâs own documentation recommends this pattern for agent systems. If youâre looking for more strategies to optimize your workflow with Claude Code, check out how to get the most out of Claude Code.
Instead of long-running agents with compaction, spawn fresh subagents for specific tasks. Each subagent gets a clean context window optimized for its job.
Want to analyze a log file? Spawn a subagent with just the log file and analysis prompt. Want to fix a bug? Spawn a subagent with the relevant files and error message. Want to run tests? Spawn a subagent with the test command and failure output.
Each subagent operates in the smart zone. No accumulated cruft. No compacted half-memories. Just focused execution. Also, make it a habit to create Skills.
Whatâs Next for Ralph
Geoffrey Huntley is working on Loom and Weaver, which builds on Ralphâs concepts for autonomous software creation. The idea is scaling the pattern: more sophisticated task management, better verification, smarter state handling.
The core insight remains: context is a scarce resource. Treat it like one. And if you want to know how to effectively extend the context window in your agentic applications, why donât you give this one a read.
Sources
- How to Ralph Wiggum ¡ Original pattern documentation
- Lost in the Middle: How Language Models Use Long Contexts ¡ MIT research on position bias
- Claude Code: Best Practices for Agentic Coding ¡ Official Anthropic guidance
- Effective Context Engineering for AI Agents ¡ Context reset patterns
- Context Rot: How Increasing Input Tokens Impacts LLM Performance ¡ Research on degradation mechanisms