The Tiered Context Architecture: Managing AI Working Memory at Scale

I run 60+ projects through synthesis engineering — sustained human-AI collaboration with Claude Code as the primary development partner. Each project has a CONTEXT.md file that the AI reads at the start of every session. For months, this worked. Then it stopped working.

The largest CONTEXT.md reached 1,014 lines. It combined session logs from December, team rosters that hadn’t changed in weeks, completed task checklists, architecture decisions made months ago, and the three tasks I actually needed to work on today. All of it was loaded into every session, consuming tokens and diluting the AI’s attention.

An audit of all 60+ projects revealed this wasn’t an isolated problem. Eighteen projects had CONTEXT.md files over 500 lines. The median was 178 lines and growing. There was no mechanism for information to leave these files once written, except by closing the entire project and starting fresh.

The problem is structural, and I built a structural solution: the tiered context architecture.

The root cause

A single context file serves four functions with fundamentally different lifecycles:

Function	How often it’s needed	How it grows	What it should do
Working memory — current state, active tasks	Every session	Stays constant	Refresh frequently
Reference facts — team, URLs, architecture	Most sessions, changes rarely	Slow, update-in-place	Stay current
Session history — what happened when	Rarely after 1 week	Unbounded append	Archive
Completed records — done checklists	Almost never	Unbounded append	Delete

Combining all four in one file means the file grows linearly with session count. A project that runs for 15 sessions at 60 lines per session produces a 900-line context file. Most of those 900 lines aren’t needed today.

This is the hot/warm/cold data problem from database engineering. When you put hot transactional data and cold analytics data in the same table with no partitioning, performance degrades. The same thing happens when you put hot working memory and cold session archives in the same file with no lifecycle management.

The architecture

Three tiers, each matching a different information lifecycle:

project/
├── CONTEXT.md      # Working memory (budget: ≤150 lines)
├── REFERENCE.md    # Stable facts (updated in place)
├── sessions/       # Session history (archived monthly)
│   └── YYYY-MM.md
└── [other files]

CONTEXT.md is working memory. It contains what the AI needs for today’s session: current project state, active tasks with priorities, and a brief summary of the last one or two sessions. It has a hard budget of 150 lines. Everything else lives elsewhere.

REFERENCE.md is semantic memory. Team rosters, repository URLs, deployment configuration, architecture decisions. This file is updated in place — when a team member leaves, you change the roster, you don’t append a dated note. It’s loaded on demand when the AI needs reference details, not every session.

sessions/ is episodic memory. Monthly archive files containing session logs moved from CONTEXT.md when they age past a week. Append-only. Rarely read, but searchable when historical context is needed.

Why these map to human memory

This isn’t a metaphor — it’s a design principle:

Human memory type	Synthesis equivalent	Properties
Working memory	CONTEXT.md	Small capacity, constantly refreshed, what you’re thinking about right now
Semantic memory	REFERENCE.md	Facts and relationships, updated when facts change, not time-stamped
Episodic memory	sessions/	Chronological events, “what happened when,” rarely recalled
Procedural memory	CLAUDE.md + _lessons/	How to do things, patterns, rules

Each memory type has different storage, retrieval, and maintenance characteristics. Treating them identically is like using a single data structure for a cache, a database, and a log file.

Or, if you prefer systems analogies

Cache level	Synthesis equivalent	Size	Load frequency
L1 cache	CONTEXT.md	≤150 lines	Every session
L2 cache	REFERENCE.md	≤300 lines	On demand
L3 cache	sessions/	Unbounded	When searched
Disk	Transcripts, raw files	Unbounded	Explicit read

The archival protocol

The architecture only works if information moves between tiers. This is garbage collection for context.

At session start, if CONTEXT.md exceeds 120 lines:

Archive first. Write completed tasks and old session summaries to sessions/YYYY-MM.md. Move stable facts to REFERENCE.md.
Verify. Confirm the archived content exists in its destination file.
Only then rewrite CONTEXT.md with the archived content removed.
Check budget. Verify CONTEXT.md is under 150 lines.

The principle: content must exist in the destination before it’s removed from the source. Two-phase commit. Never delete from CONTEXT.md based on the assumption that something was “already captured” — verify first.

The question for every piece of content: is this needed for today’s work?

Yes → stays in CONTEXT.md
No, but it’s a stable fact → REFERENCE.md
No, but it’s a session record → sessions/
No, and it’s a reusable lesson → _lessons/
No, and it’s none of the above → delete it

Results

Applied to 62 projects in a single session:

Metric	Before	After
Largest CONTEXT.md	1,014 lines	128 lines
Projects over 150 lines	18	0
Projects with REFERENCE.md	1	21
Projects with session archives	1	14

Not every project needs all three tiers. 41 projects are small enough that CONTEXT.md alone is sufficient. The architecture scales down as well as up — you create REFERENCE.md and sessions/ when you need them, not before.

How this differs from tool-native memory

Claude Code has MEMORY.md — an automatic memory file that persists facts across conversations. ChatGPT has its own persistent memory. These are useful, and the tiered context architecture coexists with them without conflict. But they solve different problems.

	Tool-native memory (MEMORY.md, etc.)	Tiered context architecture
Scope	Per-tool, per-directory	Per-project, tool-agnostic
Maintenance	Automatic (AI decides what to save)	Deliberate (human-AI collaboration)
Structure	Flat, append-oriented	Three tiers by information lifecycle
Budget	Usually none — grows until truncated	150-line hard limit on working memory
Lifecycle	None — facts accumulate	Garbage collection at session boundaries
Portability	Tied to one tool	Works with any AI assistant
Project awareness	Minimal — knows recent facts	Full — knows state, tasks, history, reference

Tool-native memory is a useful supplement. It catches things you didn’t explicitly save. But it’s not a substitute for deliberate context management any more than auto-save is a substitute for version control.

Getting started

Step 1: Set up the structure

Create a projects/ directory in your workspace. Add index.yaml with your first project:

projects:
  - id: my-project
    name: My Project Name
    status: active
    description: Brief description
    last_session: 2026-03-04

Step 2: Create CONTEXT.md

In projects/my-project/, create CONTEXT.md:

# My Project — Working Context

**Phase:** Initial
**Last session:** 2026-03-04

---

## Current State

[What exists, what doesn't, starting conditions]

## What's Next

1. [ ] [First task]
2. [ ] [Second task]

---

*This file follows the Tiered Context Architecture. Budget: ≤150 lines.*

Step 3: Add instructions to your AI assistant

Add context lifecycle instructions to your CLAUDE.md (or equivalent configuration for your AI tool):

## Context Lifecycle

After completing ANY significant task:
1. Update CONTEXT.md immediately (budget: ≤150 lines)
2. Move stable facts to REFERENCE.md
3. Archive session logs older than 1 week to sessions/
4. Update index.yaml last_session date
5. Commit to git

Step 4: Let the tiers emerge naturally

Don’t create REFERENCE.md or sessions/ on day one. Use CONTEXT.md alone until it approaches 120 lines. When it does, that’s when you:

Extract stable facts (team, URLs, architecture) into REFERENCE.md
Move old session logs into sessions/YYYY-MM.md
Trim CONTEXT.md back to lean working memory

The tiers emerge from need, not planning.

Step 5: Run garbage collection at session boundaries

Every time you start a new session on a project:

Read CONTEXT.md
Identify cold content: completed tasks, old session logs, stable facts.
Archive first: write completed tasks and old logs to sessions/, move stable facts to REFERENCE.md.
Verify the archived content exists in its destination.
Only then remove the archived content from CONTEXT.md.
Now start working.

This takes 2-3 minutes and keeps everything clean.

Where this is going

The tiered architecture is stage 3 of an evolution:

Ad hoc — Re-explain everything each session. Most AI users today.
Monolithic — Single context file that grows forever. The pre-March-2026 approach.
Tiered — Working memory + reference + archive with lifecycle management. Current.
Compiled — Context automatically assembled from live project state. Future.

Stage 4 is the long-term vision: the AI’s context at session start is compiled from live project state, code, and history rather than manually maintained. But stage 3 is the 80/20 solution that makes long-running AI-assisted projects sustainable today.

The complete system

The full implementation is available as open-source Agent Skills (formerly runbooks):

Context lifecycle management — Templates, decision trees, migration guides, and quality metrics
Synthesis project management — The broader project management system that the tiered architecture is part of

Both are MIT licensed and part of the synthesis-skills open-source project.

The tiered context architecture is part of synthesis project management, a discipline within synthesis engineering.

Rajiv Pant is President of Flatiron Software and Snapshot AI, where he leads organizational growth and AI innovation. He is former Chief Product & Technology Officer at The Wall Street Journal, The New York Times, and Hearst Magazines. Earlier in his career, he headed technology for Condé Nast’s brands including Reddit. Rajiv coined the terms “synthesis engineering” and “synthesis coding” to describe the systematic integration of human expertise with AI capabilities in professional software development. Connect with him on LinkedIn or read more at rajiv.com.