Context hygiene: why your AI forgets what matters and what to do about it
The biggest bottleneck in long-running AI sessions isn't model intelligence — it's context management. Here's what I've learned about keeping AI on track across complex, multi-session projects.
The more I work with LLMs on real projects, the more I'm convinced that the core skill isn't prompting. It's context management.
We talk a lot about which model is "smarter" or which tool generates better code. But in practice, the thing that makes or breaks a long-running AI collaboration is whether the model still remembers what you told it three hours ago. Spoiler: it usually doesn't.
The problem nobody warns you about
Every LLM has a context window — a fixed amount of text it can "see" at once. Gemini 1.5 Pro claims 2 million tokens. Claude has 200K. Sounds massive. But here's the thing: having a large context window and actually using it effectively are completely different problems.
Researchers call it the "lost in the middle" phenomenon. Models are significantly better at recalling information at the very beginning and very end of a conversation. Everything in the middle — which is where most of your actual work lives — gets progressively more diluted. The model doesn't "forget" in the human sense. It just stops paying attention.
And when the conversation eventually exceeds the window, the system has to compress. It summarizes your history, making executive decisions about what's "important." If your carefully specified architecture constraints get compressed into "user is building a web app," you're in trouble and you probably won't notice until something breaks.
I call this silent context loss. It's the most dangerous failure mode in AI-assisted work, because it looks like the AI is still following your instructions when it's actually operating on a degraded version of them.
What this looks like in practice: building software
I've been using AI coding assistants to build projects for a while now. The pattern is always the same.
In the first session, everything is great. You lay out the architecture, define your tech stack, specify constraints. The AI understands. It generates code that follows your patterns. You feel productive.
By session three or four, things start drifting. You ask for a new feature and the AI suggests a library you explicitly ruled out in session one. It generates a database migration that conflicts with a schema decision from two days ago. It stops using the naming conventions you established.
This isn't the AI getting dumber. It's context rot. The initial decisions — the ones that matter most — are now buried under hundreds of messages about bug fixes, refactors, and minor tweaks. The model's attention has shifted to whatever happened in the last few exchanges.
I hit this recently while building this blog. By the time I was debugging CSS rendering issues, the AI had effectively lost track of the original design system decisions. Not because they weren't in the history, but because they were competing with dozens of more recent, more "salient" messages about Docker volumes and git lock files.
The fix isn't to write better prompts. It's to maintain external state that the AI can reference. For coding projects, this means keeping a living document — an architecture decision record, a progress file, a CLAUDE.md — that captures the decisions that matter. Every new session starts by reading that file, not by relying on conversational memory.
What this looks like in practice: legal work
The stakes get higher when you move from code to consequential decisions.
Consider using AI to help navigate a complex legal case over several months. You start by uploading contracts, case law, depositions — hundreds of pages of source material. The AI helps you identify precedents, draft motions, analyze opposing arguments. It's genuinely useful.
But legal cases evolve. New evidence surfaces. Rulings narrow the scope of what's admissible. Strategic pivots happen. And each of these developments pushes the original foundation — the specific contract clauses, the jurisdictional nuances, the initial theory of the case — further into the "middle" of the context where attention fades.
Five months in, you ask the AI to draft a response to a new motion. It produces something technically competent but strategically wrong — because it's forgotten a preliminary ruling from month two that limited exactly the line of argument it's now proposing. Or it references a contract clause without the specific interpretation you established early on.
In legal work, the AI hallucinates by omission. It doesn't make up facts (usually). It just forgets constraints. And a legally sound argument that ignores a prior ruling isn't just wrong — it's potentially malpractice.
The pattern here is the same as in coding: the solution is hierarchical context management. The core facts of the case — parties, jurisdiction, theory, key rulings — need to be explicitly maintained in a reference document that's injected at the start of every session. Evidence gets retrieved dynamically when needed. The AI shouldn't be trusted to maintain the "golden thread" of strategy on its own.
Practical context hygiene
After enough sessions going sideways, I've settled on a few principles that actually help.
Keep a state file. For any project that spans multiple sessions, maintain a document that captures the current state: decisions made, constraints established, what's done, what's next. Feed this to the AI at the start of every session. Don't trust conversational history to carry this information forward.
Re-anchor before big asks. Before requesting anything significant — a new feature, a strategic decision, a complex analysis — explicitly restate the constraints that matter. "Using our established PostgreSQL schema with the audit logging pattern from week one, design the new endpoint." This costs a few tokens but saves hours of fixing drift.
Checkpoint regularly. Every 10-15 exchanges, ask the AI to summarize its current understanding of the project state. This forces it to consolidate context and gives you a chance to catch misalignment early. If the summary is wrong, you've found context rot before it caused damage.
Know when to reset. This is the most counterintuitive one. When a conversation gets too long and the AI starts making subtle errors, the best move is to end the session entirely. Have the AI generate a handover note — all current state, pending tasks, established constraints — and start fresh. A clean context window with a good handover note consistently outperforms a bloated conversation where the important bits are buried.
Modularize aggressively. Break complex work into discrete tasks with clear inputs and outputs. When one module is complete, save the results externally. Start the next module with only the relevant context, not the entire sprawling history. The AI doesn't need to remember every debugging session — it needs to know the current interface contract.
The real skill of 2026
We're past the era where "prompt engineering" meant crafting a clever one-shot instruction. The real skill now is context engineering — managing the information lifecycle across an entire project, deciding what the AI needs to know right now versus what can be retrieved later versus what can be safely forgotten.
The models will keep getting better at this on their own. Agentic retrieval, where the AI can proactively search its own files and history, is already making a difference. But for now, the human in the loop needs to be the one maintaining context hygiene. The AI is a brilliant collaborator with a very particular kind of amnesia, and the best results come from working with that limitation rather than pretending it doesn't exist.
The professionals who figure this out — who treat context management as a first-class concern rather than an afterthought — are the ones getting genuinely transformative results from these tools. Everyone else is just having the same conversation over and over, wondering why the AI keeps forgetting.