I've written two posts now about managing context in AI sessions — one about the problem and one about a practical fix using .md files as external memory. While writing those, I kept running into industry research that made me realize something: the crude file-based system I'm using daily is basically a hand-rolled version of what the agent memory researchers are trying to automate.

That's either validating or embarrassing, depending on how you look at it.

The five files are a memory architecture

My starter prompt creates five markdown files for every project: CLAUDE.md (project constitution), PROGRESS.md (build log), DECISIONS.md (architecture decision record), TECHNICAL.md (implementation details), and HANDOVER.md (session transition state). Each gets read and written at specific times, with specific rules about what goes where.

I built this by trial and error over a few months. But it maps almost exactly to a pattern described in the agent memory literature as "hierarchical memory with separated stores":

  • Long-term knowledgeCLAUDE.md — rarely changes, loaded every session, defines the project's identity and constraints
  • Episodic memoryDECISIONS.md and PROGRESS.md — records of what happened and why, searchable when you need to understand past choices
  • Working memoryHANDOVER.md — the current task state, overwritten each session, what the agent needs right now

The research calls these "different storage and retrieval strategies for different memory types." I call them "files I read in a specific order." Same idea, different packaging.

Compression is already happening — we're just doing it manually

One of the bigger trends in context engineering right now is semantic compression: instead of shoving raw conversation logs into the context window, systems generate layered summaries — session-level, topic-level — and keep only those plus the most recent raw exchanges.

That's exactly what HANDOVER.md does at session boundaries. A two-hour session with 150 exchanges gets compressed into a structured document: current task state, decisions made, constraints that carry forward, next steps. The raw conversation is gone, but the semantically important parts survive.

The difference is that I do this compression manually (or rather, I ask Claude to do it at the end of each session). The automated systems are building this into the infrastructure — running compression continuously, clustering by topic, allocating token budgets across different context layers. But the core insight is identical: raw history is a bad format for context. Structured summaries are better. And the compression needs to be lossy in the right way — preserving decisions and constraints while dropping the debugging tangents.

What I've noticed: the quality of the handover document matters enormously. A lazy summary ("worked on CSS and deployment") is nearly useless. A structured one ("changed header border to gradient, deployed via Portainer, discovered Docker volume overlays content images — fix: copy to public/images") lets the next session pick up instantly. The automated systems will need to learn this same distinction, and I suspect many will struggle with it initially.

The "maximum effective context window" explains a lot

There's a finding from recent research that I wish I'd had six months ago: the gap between a model's advertised context window and its "maximum effective context window" — the point where performance actually holds up.

Models with 200K token windows sometimes start degrading at a few thousand tokens on certain tasks. The exact threshold is task-dependent, but the pattern is consistent: there's an inflection point where adding more context starts hurting rather than helping. Attention dilutes. The model has more text to search through but less ability to find the right piece at the right time.

This explains something I observed empirically but couldn't articulate: why a fresh session with a good 500-word handover note consistently outperforms a 4-hour session where all the information is technically "in context." The handover note is well below any model's effective window. The 4-hour session is way past it, even if it's within the theoretical limit.

The practical implication: don't treat context like a bucket you fill until it's full. Treat it like a workbench with limited surface area. Keep the active working set small and well-organized. Put everything else in files you can pull in when needed.

Where my approach falls short: the agent-to-agent gap

The research on multi-agent context is where my manual system starts looking primitive. When a team of specialized agents need to collaborate — a research agent feeding a writing agent feeding an editing agent — they need shared context that's richer than "pass the output text along."

I actually built exactly this kind of pipeline last week: an n8n workflow where a research agent searches the web and produces a brief, then a separate writer agent turns that into a blog post. The research agent's output gets passed to the writer as raw text in the prompt.

The more sophisticated version would have both agents referencing a shared semantic layer — structured entities with metadata and relationships, not just prose. The research agent would tag its findings with confidence levels, source quality, timeliness. The writer agent would query that structure rather than parsing unstructured text. Think of it as the difference between handing someone a stack of printouts versus giving them access to a well-organized database.

Protocols like A2A (Agent-to-Agent) are trying to standardize this: how agents exchange state, what format context takes when it crosses agent boundaries, how to avoid the "telephone game" problem where information degrades as it passes through multiple agents. My text-in, text-out pipeline works, but it's the equivalent of agents shouting across a room instead of sharing a whiteboard.

The thing nobody talks about: context governance

The research mentions "guardrails at the context layer" and "observability around context" almost as afterthoughts. I think this is actually the most important trend for practitioners.

Right now, most people using AI coding assistants have zero visibility into what's actually in the model's context at any given moment. They don't know what got compressed, what got dropped, or what the model is actually attending to. When things go wrong — the AI suggests a library you ruled out, or forgets a schema constraint — there's no way to debug why it forgot.

My file-based system provides crude governance by making context explicit and auditable. I can read DECISIONS.md and see exactly what the AI should know. If it contradicts a logged decision, I know the file wasn't read or wasn't weighted heavily enough. That's not sophisticated observability, but it's infinitely better than hoping the conversation history is intact somewhere in the attention mechanism.

The enterprise version of this is what some teams are building: logging what was retrieved or summarized for each response, tracking context composition over time, and tuning retrieval and compression policies based on outcome data. Essentially treating context management as an ML pipeline with its own metrics and optimization loop.

I'd love to see this trickle down to individual developer tools. Imagine Claude Code showing you a sidebar with "here's what I'm currently considering from your project files" and letting you pin or remove items. That would make the context engineering workflow I do manually — reading files, re-anchoring, checkpointing — visible and interactive instead of implicit.

Where this is heading

If I had to bet, here's what context management looks like in a year:

The manual approach I'm using — markdown files with explicit read/write protocols — becomes a built-in feature of AI coding tools. Not as files you manage yourself, but as a structured memory layer the tool maintains automatically. Your architecture decisions, progress state, and session handovers get tracked without you writing a prompt that says "update PROGRESS.md."

The semantic compression gets good enough that you stop noticing session boundaries. Right now, starting a new session feels like a reset. With good automated compression and retrieval, it should feel continuous — the tool always has the right context loaded, whether that's from five minutes ago or five weeks ago.

The agent-to-agent context problem gets solved with standardized protocols and shared memory stores, making multi-agent workflows feel less like duct-taping outputs together and more like a team with shared understanding.

But the core principle won't change: context is a resource that needs to be managed, not a bucket that needs to be bigger. The models will keep getting longer windows, and those windows will keep having effective limits well below their theoretical maximums. The teams that treat context engineering as a first-class concern — whether they do it with markdown files or million-dollar infrastructure — will keep getting better results than those who don't.

For now, I'll keep my five .md files. They work. And apparently, I've been doing agent memory architecture all along. I just didn't have the vocabulary for it.