Insights Index
ToggleContext Engineering for AI Agents – Memory, Compression & Intelligent Orchestration
By Prady K | Published on DataGuy.in
When building AI agents, it’s tempting to think the magic lies entirely in crafting the right prompt. And while a good prompt gets you started, it’s not what keeps your agent smart, consistent, or reliable over time. That responsibility lies elsewhere — in what’s known as context engineering.
In fact, Andrej Karpathy put it best:
“Context engineering is the delicate art and science of filling the context window with just the right information for the next step.”
— Andrej Karpathy
Think of it like this: if a large language model (LLM) is the CPU of your AI system, then the context window is the RAM. It only holds a limited amount of active information at a time. And just like in computing, the performance of the system depends on what you load into that memory — and when.
That’s where context engineering comes in. It’s about feeding the model the right information at the right time — while keeping distractions, overload, and contradictions out of the way.
As agents become more autonomous, multi-modal, and multi-step — this discipline is no longer optional. It’s what turns a working prototype into a production-ready agent.
What Is Context Engineering?
Context engineering is more than prompt tweaking. It’s about designing the flow of information that reaches the LLM — not just once, but throughout the agent’s lifecycle.
Karpathy’s full quote lays it out perfectly:
“+1 for ‘context engineering’ over ‘prompt engineering.’ People associate prompts with short task descriptions you’d give an LLM in your day-to-day use.
In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science — because doing this right involves task descriptions, few-shot examples, RAG, multimodal inputs, tool outputs, memory, and history. Too little or in the wrong form, and performance drops. Too much or too irrelevant, and costs go up and responses degrade.
Art — because it requires intuition about model behavior and task flow.”
— Andrej Karpathy
So, how is this different from traditional prompt engineering?
- Prompt Engineering: A static input, crafted once, often without system-level awareness.
- Context Engineering: A dynamic orchestration of signals — memory, retrieval, tools, metadata — that evolves as the agent acts.
And when context is poorly managed, problems emerge fast:
- Context Poisoning: Malicious or misleading inputs distort output.
- Context Drift: The model subtly veers from the task over time.
- Context Overload: Too much input crowds the context window.
- Context Clash: Contradictory signals confuse reasoning and alignment.
In short, prompt engineering gets you in the door. But context engineering decides what happens after. It’s the system layer that transforms single-shot LLMs into long-running, capable agents.
Why Context Engineering Matters for Agents
The deeper you go into building LLM agents, the more obvious it becomes: a clever prompt isn’t enough. Agents don’t just generate a response and stop — they read tools, hold memory, adjust behavior, and evolve across multiple steps.
This means their success depends not just on what prompt you give, but what context they carry with them: what they remember, what tools they call, what state they’re in.
Longer Tasks, Longer Context
Think of agents doing complex work — researching, coding, planning, debugging. They need to track what they’ve already done, where they are, and what’s next. That’s not just a prompt — that’s stateful, engineered context.
With every step, the amount of information they carry grows — tool outputs, user interactions, memory retrievals. Managing that growth is what makes or breaks agent reliability.
What the Experts Say
Both Anthropic and Cognition have flagged the same issue. In Anthropic’s testing of multi-agent setups, they noted a 15x increase in token usage because of uncontrolled context sharing. That means costs go up and performance gets harder to track.
Cognition Labs (makers of Devin) also emphasize compression and selective memory as critical for long-running agents. Otherwise, it’s just token bloat and degraded reasoning.
Core Challenges
- Window Limits: Most LLMs still have hard limits on how much context they can take in one go.
- Performance Degradation: More context ≠ better output. Too much = confusion, contradiction, or hallucination.
- Cost Explosion: Every extra token adds to the bill. Context needs to be tight and meaningful.
In short: every agent call is a balance between what you tell it, how much you tell it, and when you tell it. That’s the job of context engineering.
The 4 Pillars of Context Engineering
If you want to build reliable, efficient LLM agents, you need more than prompts and plugins. You need a strategy to control what the agent remembers, sees, ignores, and shares. That’s where the four core strategies of context engineering come in:
1. Write
Agents don’t always need to hold everything in the context window. Sometimes, they just need a place to jot things down.
- Scratchpads: Temporary storage outside the LLM’s window — like a notebook for mid-task thinking.
- Memories: Long-term storage broken into episodic (event logs), semantic (knowledge), and procedural (how-to) memories.
Tools like ChatGPT’s custom GPTs, Cursor, Reflexion, and Windsurf all use writing mechanisms to give their agents working memory.
2. Select
Not all information is useful all the time. Agents need to pull in just the right context at the right moment.
- Tool Selection: Choosing the right tools based on the task — and feeding the right outputs back in.
- Memory Retrieval: Fetching from long-term memory using either keyword rules or vector search (embeddings).
Poor selection means the agent gets confused — or worse, distracted. Bad retrieval = irrelevant noise.
3. Compress
Context windows are finite. If you want to fit more, you need to shrink smartly.
- Summarization: Claude does this by automatically compressing past turns. Others use layered or hierarchical summaries.
- Trimming: Discarding older or less relevant information, especially in long-running tasks.
Compression is tricky. Go too far, you lose detail. Don’t compress enough, you overwhelm the model.
4. Isolate
When multiple agents or tools are running, isolation keeps context clean and contained.
- Agent Isolation: Each agent has its own memory and context stream (like OpenAI’s Swarm or Anthropic’s parallel agents).
- Runtime Context: Shared states encoded using structured schemas (like Pydantic) to prevent clashes.
Isolation prevents token explosion, accidental leaks, and cross-contamination in multi-agent flows.
Master these four levers — write, select, compress, isolate — and you unlock scalable, intelligent behavior in agents.
Patterns of Failure in Poor Context Management
Even the smartest agents can fail — not because the model is broken, but because the context was poorly engineered. When we lose control of what goes into the model, we lose control of what comes out. Here are four common failure modes:
Context Poisoning
This happens when the model receives biased, misleading, or malicious inputs — either from users or prior tool output.
Example: A prompt chain subtly includes fake citations or manipulative phrasing, leading the LLM to generate distorted or harmful content.
Fix: Validate inputs, score source trustworthiness, and isolate risky context from core reasoning pathways.
Context Drift
The model gradually veers off-topic because accumulated context subtly shifts the task focus.
Example: A legal assistant chatbot slowly drifts into casual banter because older messages weren’t cleared or re-weighted.
Fix: Apply temporal weighting, task anchoring, and routine context alignment.
Context Overload
Too much conflicting information clutters the context window, making it harder for the model to focus.
Example: A summarization agent juggling a huge document, chat history, tool logs, and metadata may start hallucinating or losing coherence.
Fix: Use context filters, summarizers, and task-specific routing to reduce noise and highlight key signals.
Context Clash
When multiple competing signals enter the context window, the model becomes uncertain or contradictory.
Example: An agent receives conflicting instructions from two different tools — leading to ambiguous behavior or invalid outputs.
Fix: Disambiguate roles, establish hierarchy in context sources, and monitor conflicting signals with schema-driven isolation.
Context is a lever — but left unchecked, it’s also a liability. That’s why proactive context engineering is not just nice to have; it’s mission-critical.
Key Architectures & Systems
Let’s bring theory to life. Across the AI ecosystem, several groundbreaking systems are already implementing context engineering in practice. These architectures highlight how writing, selecting, compressing, and isolating context work at scale.
Claude’s Code Interpreter (Anthropic)
Claude automatically compresses prior interactions using summarization layers—what Anthropic calls “auto-compact memory.” This allows it to remember the essence of long chats while staying within token limits.
It’s a strong example of the Compress pillar in action—prioritizing signal over size to sustain long conversations without drift.
Cursor AI
Cursor is a coding assistant that applies local memory, scratchpads, and file-scoped context to help developers without flooding the context window. It manages active documents and function calls intelligently.
Cursor shines at the Write and Select layers—helping users store and retrieve only what matters.
Reflexion
Reflexion enables agents to reflect on their failures and refine strategies across multiple attempts. It stores episodes and outcomes, feeding them into the model as refined stateful memories.
Reflexion exemplifies Write and Compress—building episodic and semantic memory through self-feedback. Source: Yao et al., Reflexion: Language Agents with Verbal Reinforcement Learning, 2023.
Windsurf
Windsurf experiments with task segmentation, memory recall, and dynamic context switching across agent chains. It routes relevant memory to sub-agents based on task stage.
This aligns with the Isolate and Select approaches—showing how modular workflows benefit from scoped context.
HuggingFace CodeAgent
CodeAgent uses a runtime state model—a structured JSON environment that tracks file states, tool results, and prior decisions. It enables agents to act with awareness of both history and file dependencies.
Here, context isn’t just text—it’s structured data. This is the future of Isolate and task-aware orchestration.
OpenAI’s Swarm Framework
Swarm orchestrates multiple agents with parallel workflows. To prevent overload, it uses shared memory pools and scoped task instructions.
It’s a masterclass in Isolate and Write—partitioning context while enabling collaboration across agents.
Across these examples, one theme holds: context isn’t a prompt—it’s an architecture. Each system shows how designing smarter context flows leads to better agents, better outcomes, and more scalable intelligence.
Future Directions: Beyond Token Stuffing
So where are we heading? As agent architectures evolve, we’re entering an era where intelligent orchestration replaces brute-force context stuffing.
The old model? Shove everything into the context window and hope it works. The new model? Curate, route, and orchestrate that context based on task, timing, and intent.
Building True Agent Stacks
Leading systems like OpenAI’s Swarm and Anthropic’s Claude are evolving into full-stack agent platforms—not just prompting tools. These stacks include:
- UI/UX orchestration flows
- Guardrails and safety triggers
- Prefetching and parallel calls
- Custom evaluation and retry mechanisms
In these architectures, context isn’t just an ingredient—it’s a flow to be managed in real time.
Solving the Generation–Retrieval Asymmetry
One major bottleneck? LLMs are great at generating text—but not great at retrieving or choosing the right context on their own.
Research from the June 2024 arXiv survey points to this: the asymmetry between generation and retrieval. The fix? Build retrieval-aware agents that know what to ask, when to recall, and how to weight it.
Agent State Objects & Schema Modeling
We’re also seeing a shift toward structured state modeling—where agents operate on JSON-based state objects instead of raw text logs.
Tools like Pydantic schemas, runtime environments, and declarative memory objects are becoming critical for managing task flow, tool input/output, and long-term memory.
The takeaway? Context is becoming programmable. And context engineering will soon look more like software engineering—complete with debuggers, version control, observability, and modular memory pipelines.
Quotes & Commentary
Andrej Karpathy on Context Engineering
“Context engineering is the delicate art and science of filling the context window with just the right information for the next step.”
In a widely cited post on X , Andrej Karpathy explained how modern LLM apps must curate task descriptions, few-shot examples, tools, states, and history—all without overloading or under-informing the model.
He emphasizes that context engineering is just one layer of a much deeper stack: problem decomposition, context packing, call orchestration, evals, and UI/UX feedback loops. The era of “just prompt it” is over.
Anthropic on Token Bloat in Agents
In a recent technical paper, Anthropic reported that multi-agent setups can use 15x more context tokens than single-agent systems. As agents pass messages and share memory, context windows fill up fast—leading to cost spikes and degraded quality.
Their solution? Smarter compression and context modularity. Claude’s hierarchical summarization pipeline is one approach that helps retain signal while cutting token weight.
Cognition on Summarization & State
The team behind Cognition Labs (creators of Devin, the coding agent) emphasizes structured agent memory and automatic summarization checkpoints to maintain state across long tasks.
They argue that compression isn’t optional—it’s foundational. Without it, even the best agents lose task continuity, especially in complex environments like coding IDEs or multi-step workflows.
Reflexion on Verbal Reinforcement
The Reflexion framework proposes that agents can learn and improve by self-verbalizing lessons at the end of tasks. These “verbal memories” help the system avoid repeating past mistakes.
This form of lightweight memory doesn’t require formal retraining—just smart reflection. It’s context engineering as both architecture and learning loop.
Conclusion: Architecting Intelligence with Context
In the world of AI agents, context is no longer just a helper—it’s the hidden infrastructure that determines whether your system is insightful or incoherent.
TL;DR – The 4 Context Engineering Levers
- Write: Create and manage externalized memories to preserve agent continuity.
- Select: Retrieve only the most relevant past signals, tools, or examples.
- Compress: Use summarization and pruning to reduce context bloat.
- Isolate: Separate agent scopes and runtime environments to avoid confusion and clash.
Get these right—and your agents won’t just complete tasks; they’ll sustain focus, adapt midstream, and behave with intent.
“In agent design, context isn’t an accessory—it’s the architecture.”
As AI systems become more agentic, orchestration matters more than optimization. If you’re building with LLMs—whether you’re designing workflows, deploying tools, or scaling intelligent systems—context engineering is your next frontier.
Dive deeper into the systems, memory flows, and orchestration patterns that power tomorrow’s AI agents. Context Engineering is how we scale reasoning.

