Where this came from | Victor Del Puerto

A companion note to The Lucy Syndrome and AI. If the essay is the formal argument, this is the kitchen-table version: where the observation started, how the system around it took shape, and why this matters at all.

Open Table of contents

Where it started
Lucy, and why scars
The shape of the system
The soul of the idea

Where it started

I did not set out to write a paper about LLM amnesia. I set out to run a civil engineering firm with fewer recurring mistakes. But the essay reads colder than the work felt, and I want to say something about that before the framework starts.

I have been obsessed with these models for a while. Not as a tool I use for tasks — as something I want to understand. How they work, how they fail, what their dynamics actually are, what it would take to make them behave like a real collaborator in a real operation. The essay is engineering. The thing under it is closer to curiosity, the kind that does not let you put a problem down.

The path was not planned. It started with ChatGPT, the way most people start — conversational, one question at a time. Then I migrated to Claude. Then I discovered Projects and began keeping a bitácora per project, a log where decisions and corrections lived outside the chat. Then I moved into Claude Code, and the bitácoras turned into knowledge bases, and the knowledge bases turned into what is now the firm’s central intelligence layer. Along the way I spent a lot of time on things that look small from the outside — optimizing tokens, making the architecture faster, getting different parts of the system to talk to each other. None of it was a roadmap. Each step was an answer to a problem the previous step had created. And every step pushed the same philosophical question a little further open: what is this thing, really, and how do you give it the shape it needs to work with you instead of near you?

Somewhere in the middle of all of that, the errors started to feel different. Not all of them — the recurring ones. The same wrong answer to the same question on the same file, this week and last week and the week before. The kind of mistake that should not happen in a system where the correction was already in writing.

I remember one conversation in particular. I was inside a session with Claude trying to understand where the problem was actually coming from — and the model itself could not give me an answer that closed the question. It could describe the symptom. It could describe its architecture in general terms. It could not explain to me, in a way I could rest on, why writing a correction down did not protect the next session from repeating the error. We were both stuck on the same wall. It was during that conversation, with the model unable to answer me about itself, that the real shape of the problem came into focus. The interesting question stopped being “why did the model get this wrong?” and became “why does writing the correction down not protect the next session from repeating the error?”

That question is what the essay is about.

Lucy, and why scars

I have watched 50 First Dates more times than I can count. Twenty, maybe thirty. It is one of the movies I keep coming back to, and I never imagined it would intersect with a problem I was trying to solve at work. It did, and I want to say a word about how — because that part of the story is what holds the engineering up.

In the movie, Lucy Whitmore wakes up every morning with no memory of the day before. Henry, who loves her, makes her a video that recaps their relationship: who he is, what happened yesterday, what they mean to each other. She watches the video, orients herself, and functions. Some days beautifully. Some days she reacts differently to the same video. And she never — not once — remembers watching it.

The connection landed during one of those stuck conversations. It just dropped into place. The model is Lucy. The CLAUDE.md file is the video. The model reads it at the start of every session, orients itself, and functions. Some days beautifully. Some days it reacts differently to the same file. And it never remembers having read it. A movie I had watched for years, never thinking it had anything to do with the work, suddenly had everything to do with it.

But naming the failure was only half. The framework needed a name for the fix, and that came from somewhere else — from how humans actually learn.

A child learns to ride a bicycle by falling off it. A child learns the iron is hot by touching it once. An athlete refines a movement by repeating it badly until they stop repeating it badly. None of this happens because someone gave them more context. It happens because something registered — a fall, a burn, a missed shot. Reading that the iron is hot is not the same as having burned your hand on it. That gap, between reading something and registering it, is the gap a model lives in permanently.

The word that came to me, after a while, was scars. Not because it sounds dramatic, but because scars are exactly what the analogy needs. They are what remains after the lesson has been learned the hard way. They are specific — one per incident — and durable, and they shape the next action without needing to be re-explained. Each of us is partly defined by the scars we carry. The model arrived at me already trained on rewards: a long, careful pass of “this is good, do more of this”. What it does not have is the other half. It does not have a burn on the hand. Functional scars are what I tried to build to supply that other half, on the operator side, where it is reachable.

Two names from the same place: Lucy Syndrome for the failure, functional scars for the fix. One from a movie I happen to love. One from the basic observation that, in real life, we are shaped more by our mistakes than by our training.

The shape of the system

People who read the essay sometimes ask me what the working version actually looks like. The essay shows the framework. It does not really show the building.

The building is unglamorous: markdown files, shell hooks, a few Python scripts. It runs on Claude Code on a Windows laptop in Paraguay. There is no vector store, no embeddings pipeline, no fine-tune. There is one operator (me) and one model (Claude), and a layer of files between them that tries to keep yesterday available to today.

At the base sit three knowledge bases, written as documents for a reader who happens to be the model — explicit headings, no abbreviations the model would have to guess at, examples next to rules. Each business area has a CLAUDE.md primer at its root, two to five hundred words long, encoding the rules that have been violated enough times to deserve their own permanent line. Around them sits a small lab where new uses of the model get tested in low-stakes settings before they migrate into the primers, and a handful of skills — reusable workflows the model can invoke explicitly.

The scars themselves live in the firm’s settings file as hooks: short shell commands that fire before specific tool calls and inject a one-line reminder of the relevant past failure into the model’s working memory at exactly the moment that failure becomes possible again. That line is not a memory. It is a priming nudge, delivered when the prior failure is reachable and not before. It is the operational difference between a rule the model can read and a rule the model is forced to step over.

There is nothing in any of this you could not explain to a junior engineer in an afternoon. The interesting part is not the components. It is the discipline of treating each component as the place where a specific past failure should now be structurally harder to repeat.

The soul of the idea

Every piece of writing has an animating question, the thing it is really about underneath the formal argument. The Lucy Syndrome essay has one too, and I want to put it on the table here in plain words.

The animating question is: whose problem is LLM amnesia?

The default answer, the one the field has converged on by inertia, is that it is the lab’s problem. Anthropic, OpenAI, Google — they will solve memory eventually, with longer contexts and better retrieval. Meanwhile, the rest of us wait and absorb the cost of repeated errors as the cost of being early.

I think that answer is wrong as a posture. The labs cannot, by structural reasons, collect your specific failures. They cannot watch your knowledge base get the unit conversion wrong on Friday after you corrected it on Tuesday. That data does not exist outside of your operation. If you do not capture it, it never existed at all.

What the essay is really arguing — under the framework, under the diagrams, under the language of invariants and chains and scars — is that the operator is sitting on a dataset. A dataset of their own corrections, of the moments where the model and reality diverged. And that this dataset is wasted unless the operator builds the layer that turns it into a constraint on the next session.

The labs can build better models. They cannot, unprompted, build a corrective layer specific to your operation. That part is yours. Functional scars are not the technical centerpiece of the essay; they are a particular shape that this corrective layer can take. The deeper claim is that the corrective layer needs to exist.

The next link, if any of this resonated, is the essay itself: The Lucy Syndrome and AI. After that, the Q&A — the questions people have already asked, with the answers I have so far.

— Victor