Session tokendynamics and the infrastructure gap in collaborative reasoning
Every engineer working with AI has experienced the same strange failure mode. The session is going well: ideas connect, code evolves, decisions accumulate. Then, somewhere near the middle of the conversation, something subtle breaks. The model still sounds coherent. It still answers confidently. But a constraint disappears. A design decision from twenty minutes ago vanishes. The reasoning quietly drifts off course.
Nothing crashed — the system simply forgot.
This isn't a bug in intelligence. It's a consequence of bounded memory.
Donald Knuth documented this phenomenon without naming it. In “Claude’s Cycles” (2026), after 31 explorations Claude produced a valid construction for odd values — but on the even case, it couldn't even write correct programs anymore. The model hadn't become less capable. Its context had filled with exploration residue, displacing the constraints it needed to reason correctly.
Most engineers encounter a milder version daily: a 200-turn coding session at 80% context, $54 spent. The AI starts repeating itself. Suggestions become generic. The context meter shows fullness, not quality.
Before instrumentation, reaching 85% context meant panic: save everything, brace for the restart tax — 10 or more turns rebuilding context the model just lost to compaction. Sessions above 80% were abandoned preemptively.
That failure mode has a name: context decay.
Context decay is the gradual degradation of a reasoning session's quality, cost efficiency, and structural coherence over time. Think of a conference room whiteboard that never gets erased — eventually every new idea is drawn on top of old ones. A fixed-size context window combined with an unbounded session inevitably produces decay.
It operates on three axes:
Economic decay. Every token in the context window gets re-read on every turn. Noise tokens don't just cost money when written — they cost money every turn they remain. Cache reads account for roughly 80% of session cost.
Reasoning decay. When compaction occurs, the model summarizes its own history. Constraints get softened. Decisions lose their rationale. Exploration dead-ends get compressed alongside conclusions, becoming ghost context — artifacts that influence reasoning without being visible.
Structural decay. Sessions drift across project boundaries. Scope creeps. The same project launched from two directories creates split contexts with no cross-visibility.
With instrumentation showing context utilization and signal quality, an operator at 82% context launched a major architectural task in planning mode, let compaction occur naturally, and resumed at 26% without a single reasoning restart. The operating vector — the active objective, constraints, and design decisions — survived compression intact.
The context window didn't change. The operator's confidence in compaction quality did.
A session with clean signal at 82% compacts predictably because the context is mostly signal, not noise. A session with degraded signal at 82% is a coin flip. The signal grade makes compaction predictable instead of random — the same shift that occurred when CI/CD introduced build status indicators. Engineers didn't stop deploying. They stopped deploying blind.
Uninstrumented session: explore → drift → compaction → restart → re-explain → drift Instrumented session: explore → signal check → consolidate → compact → continue
Software engineering disciplines emerge when a system encounters a hard resource constraint. CPU cycles gave us schedulers. Memory limits gave us allocators. Network bandwidth gave us congestion control.
The context window is the first hard resource constraint in collaborative reasoning.
Once you recognize this, operational questions appear: How full is the context? How much is noise? What will compaction lose? Is the session still on track? These require infrastructure, not prompting technique.
The infrastructure that emerges follows a familiar pattern:
Operator (ideas)
↓ stage
Vector Staging (classify, measure, prepare)
↓ launch
LLM Session (reasoning under bounded context)
↓ telemetry
Session Telemetry (cost per decision, signal grade, drift detection)
↓ feedback
Operator (adjusts)
Engineers recognize this as a closed feedback loop — the same structure as observability stacks, CI/CD pipelines, and SRE feedback systems.
The LLM is not the system.
It is one component inside a control loop. The reasoning stability comes from the infrastructure around it.
The most revealing metric isn't token count or session cost. It's Cost Per Decision: how many dollars it takes to reach each concrete commitment. Most sessions feel expensive, but CPD shows why.
In one observed session: $54.43 spent, 4 decisions made. $13.61 per decision. Decision density: 1 per 95 turns. The other 94 turns were exploration, clarification, and noise — all re-read on every subsequent turn.
Clean sessions have lower CPD because the model converges faster with less noise in the window. The real cost of AI development is not tokens. It's how expensive it is to reach the next decision.
Every session has a natural arc that nobody acknowledges. Operators treat sessions as infinite; context windows are finite. When idea velocity exceeds remaining capacity, ideas get silently dropped — not rejected, just forgotten.
| Phase | Context | Operator mode |
|---|---|---|
| Orient | 0–20% | Load context, set scope |
| Explore | 20–50% | Branch freely, investigate |
| Execute | 50–65% | Implement chosen approach |
| Consolidate | 65–80% | Commit, verify, document |
| Handoff | 80%+ | Save context for next session |
The dangerous transition is between Execute and Consolidate. The operator still has ideas but the window is filling. New ideas displace old constraints. The model sounds confident while silently dropping requirements.
In early use of a prototype implementing these ideas, observed effects include:
7.5 million tokens of redundant context removedAutomated cleaning — 241 cycles over 5 hours $349 recovered from a single session8,243 progress entries, 824 tangents cleaned in 12 minutes 3× throughput — same operator, same modelPreviously $200/day at <30% productivity. With instrumentation: $163/day, multiple features shipped Zero reasoning restarts after compaction82% → 26% context. A multi-file feature designed, implemented, tested, and committed without losing objectiveThese numbers are from early prototype use. They demonstrate that the process is observable and measurable — not that specific savings are guaranteed.
You can't manage what you can't name. Context decay persists partly because its failure modes don't have names.
| Term | Meaning |
|---|---|
| Re-read tax | The cumulative cost of re-reading noise on every model invocation |
| Compaction loss | Constraints and decisions silently dropped during context summarization |
| Vector drift | Gradual divergence of the session's reasoning from its stated objective |
| Ghost context | Compacted exploration residue that influences reasoning without being visible |
Each corresponds to a measurable phenomenon in real sessions.
Most AI users operate in short sessions: 10–30 turns, $1–5. In that regime, context decay is invisible. The model occasionally forgets something, the user shrugs, and the session ends before it matters.
Long sessions — 100+ turns, $50+ — surface a different class of problems: compaction survival, reasoning drift, restart tax, decision economics. These are invisible at small scale, just as distributed systems problems are invisible on a single server.
Small websites didn't need DevOps. Small ML experiments didn't need MLOps. Short AI sessions don't need session infrastructure. But as agentic workflows push session lengths up and costs rise, more developers will cross the threshold where sessions stop being conversations and start behaving like runtime systems.
Runtime systems always demand monitoring, signals, control loops, and discipline.
Once collaborative reasoning becomes structured, instrumented, and observable, it begins to resemble something familiar to engineers: an operational discipline.
| Constraint | Discipline |
|---|---|
| CPU cycles | Scheduling |
| Memory limits | Memory management |
| Network bandwidth | Congestion control |
| Context windows | Session tokendynamics |
Session tokendynamics is the control of idea flow, compression, and execution pacing inside a bounded context window — to avoid silent forgetting and minimize restart cost.
The bounded context window is not a limitation to be worked around. It is the constraint that will define how human-AI collaboration is engineered.
The prototype tools referenced in this essay are open source. They represent one implementation of the ideas described here — the practice of managing context decay can be applied with or without specific tooling.