Skip to content
Victor Del Puerto
Go back

Anatomy of a four-minute boot

Every Claude Code session in my office setup started with about four minutes of warm-up before any real work happened. I assumed it was my hooks. I run a heavy configuration: twelve hooks, two disk validators, a scarring system, persistent memory with three hundred entries. Something in that machinery had to be the problem.

So I asked the agent to measure it. The prompt said, roughly: “every session start takes four minutes while you review all the memory and feedback files — analyze it and get us to thirty seconds.”

That prompt triggered the bug. My memory-recall hook matched the words “agent”, “analysis”, “memory” against its index and forced the agent to read two files that had nothing to do with the question. Forty seconds gone before the analysis even started, demonstrating the problem it was asked to solve.

Where the four minutes actually went

I timed every component the way the harness invokes it:

ComponentMeasured
6 SessionStart hooks (Python spawns, validators, sync)~2.7 s
6 UserPromptSubmit hooks (run on every prompt)~1.0 s
The agent reading memory files, one turn at a time~3.5 min

The machinery I suspected was three seconds of the problem. The rest was the model doing the diligent thing: reading files sequentially, with full reasoning between each read. A single read turn costs 15 to 30 seconds. My session boot was spending eight to fifteen of them.

Two horizontal bars drawn to the same time scale. Before: a thin dark sliver for hooks (2.7 s) followed by roughly twelve wide clay blocks, the sequential read turns of 15 to 30 seconds each, totaling about four minutes. After: a thinner hook sliver (0.7 s) and a single batched read turn, totaling thirty seconds or less.
Hooks were three seconds of a four-minute problem. The bars share one time scale.

Three things were feeding that behavior:

  1. My memory index (the MEMORY.md that Claude Code auto-loads) had grown to 74 KB and 370 lines. The harness loads it truncated past a budget, with a warning. A conscientious agent reacts to “only part was loaded” by re-reading the whole file, then expanding the entries it points to.
  2. My recall hook fired on almost every prompt because it scored generic words from entry descriptions. Every false positive forced a read, and the agent obeys hooks.
  3. A rule in my CLAUDE.md said “read the office map before acting”, and the agent interpreted that as “at session start” instead of “before touching files”. That alone was a 52 KB read every boot.

None of this is exotic. If you run Claude Code with persistent memory and a few months of accumulated entries, some version of this is probably happening to you.

Eight things I learned fixing it

1. The bottleneck is the model’s turns. I went for the visible machinery first and was off by two orders of magnitude. Measure it: subprocess spawns cost milliseconds, model turns cost tens of seconds. Removing one read turn saves more than making every hook ten times faster.

2. A memory index that doesn’t fit in context stops being an index. Once it loads truncated, it becomes a tax: the agent pays tokens to re-read what the index was supposed to make free. The contract that works: one line per memory, under 150 characters, detail lives in the topic file. An index is for routing; storage belongs in the topic files.

3. Push context, don’t make the model pull it. The same information has two prices. Injected by a hook: milliseconds, deterministic. Fetched by the model through tool calls: seconds to minutes, stochastic. Anything needed every session (rules, validator status, active corrections) should be injected. Anything task-specific should be pulled lazily, and batched.

Two panels. Push: a hook injects additionalContext straight into the context window, in milliseconds, deterministically, before turn one. Pull: a model turn reasons first, issues a tool call to read a file, and the content lands in context one file later, at 15 to 30 seconds per turn, stochastic and compounding when sequential. Bottom rule: needed every session, push it; task-specific, pull it once in one batched message.
Same information, two prices.

4. A recall hook with false positives is worse than no hook. The cost is asymmetric. A false positive burns a full model turn, always, because the agent does what hooks tell it. A false negative is recoverable: the model can still search. My fix: stopwords for generic operational vocabulary, and requiring at least one keyword match against the memory’s identity (its filename or title), so the description text can reinforce a match but never trigger one by itself.

5. N parallel reads in one message cost about the same as one. N sequential reads cost N turns with reasoning in between. If your instructions say “read these files first”, they have to literally say “all in a single message”. The model won’t batch by default.

6. Archiving is safe when recall is mechanical. I moved 89 of 299 index entries out of the auto-loaded index: closed projects into an archive index, niche technical traps into a sub-index. Nothing was lost because the recall hook parses all three indexes — that’s the condition that makes an aggressive diet safe. And the verification is a script, not eyeballs: every entry lives in exactly one index, zero broken links.

The old 74 KB MEMORY.md, marked as loading truncated with the agent re-reading everything, becomes three files: a 31.8 KB MEMORY.md that auto-loads whole, an archive index for closed projects, and a traps sub-index. A recall hook on UserPromptSubmit parses all three, matching keywords against entry identity. A side panel lists the script-verified invariants: every entry in exactly one index, zero broken links, byte budget linted.
Archive aggressively once something mechanical can still find everything.

7. Consolidate hook processes and cache disk scans. Six Python spawns per event cost ~2.7 s on Windows. One dispatcher that runs the same six scripts in-process (via runpy, children unchanged) costs 0.2 to 0.7 s. Full-disk validators now run once a day with mtime-based invalidation; every other session reads the cached verdict, marked as cached so I can tell.

8. The anti-diligence rule has to be written down. The failure mode here is a virtue: “let me get context first”. If your CLAUDE.md doesn’t explicitly say “lazy boot, one batch of reads, no full memory sweep”, the agent will do the diligent, expensive thing in every new session. Whatever discipline you expect at boot has to be written into CLAUDE.md and enforced by hooks, because the model starts every session without any of yesterday’s intentions.

One measurement trap: Claude Code hooks read stdin. If you benchmark one with Measure-Command { python hook.py } and no stdin pipe, it blocks forever waiting for EOF, and you will spend a while staring at a hung terminal wondering which hook is slow. Measure the way the harness invokes them: '{}' | python hook.py.

The after

BeforeAfter
Session boot~4 min≤30 s
Memory index74.3 KB / 370 lines31.8 KB / 255 lines
SessionStart hooks~2.7 s0.7 s (0.2 s cached)
Recall hook on the prompt that started this2 forced irrelevant readssilent
Information lostnone (script-verified)

The informational result is identical. The index loads whole, the corrections still inject, the validators still report, and the task-specific memory arrives in one batched turn instead of a dozen sequential ones.

The generic versions of the two dispatchers and the index linter are in the agent-evals companion folder. They are reference implementations, the same model as the rest of that repo: read them, adapt them, no package to install.

The linter earned its place the same afternoon I wrote it: I ran it against my own freshly dieted index and it failed, because new entries had already pushed it 2.5 KB back over budget. Two more squeeze passes to get back to green, and the linter stays wired where it can catch the next drift early.


Share this post on:

Next Post
How do you know a correction held? Instrumenting an agent in production