The Pixar precedent: vibe coding has been here before

The Utah Teapot, the 1975 model by Martin Newell that became the canonical reference object for early 3D computer graphics.

Debates over disruptive tools follow a pattern. The current fight over vibe coding, whether engineers who lean heavily on LLM-generated code are still doing real work, is at a recognisable stage of that pattern. I want to argue something narrow: that the shape of the argument is older than the technology, and that recognising the shape changes which questions are worth asking.

I am not a software engineer. I run a civil engineering firm in Paraguay and I build AI tooling for it. Before any of that, in another life, I trained as a 3D character animator. I bring this up only because the animation industry had its own version of this debate thirty years ago, and the way it ended is instructive.

The 2D vs 3D argument, briefly

When Pixar released Toy Story in 1995, the hand-drawn animation community had a strong, well-articulated objection: the new medium did not capture what made animation animation. The line work was sterile. The performances felt mechanical. Worse, 3D would let people who did not understand the craft produce things that looked like animation but were not. The tools would lower the floor, and the floor was the whole point.

The objection was not stupid. Some of it was correct. Early 3D was often stiff. Tools did let underqualified people ship things. And yet the framing turned out to be wrong in a specific way: it treated the new technique as a substitute for the old craft, when it was really a substitute for the old toolchain. What survived the transition was not 2D technique. It was the underlying discipline: Thomas and Johnston’s twelve principles, the work on weight, anticipation, timing, and arc that any good animator internalises regardless of medium. A bad 3D animator and a bad 2D animator are bad in the same ways. So are the good ones.

Where the analogy holds for vibe coding

The vibe coding debate has the same structure. One side argues that engineers who let an LLM write most of the code are not really engineering. The other side notes, correctly, that the most respected practitioners in the field, including some of the loudest critics, now ship large amounts of AI-generated code themselves. The 2025 Stack Overflow Developer Survey put AI tool use among professional developers at 84% adoption, with around half (51%) using them daily. GitHub has reported that Copilot now authors a substantial share of code in repositories where it is actively used. The dispute is not about whether the tool is used. It is about who is allowed to use it without losing status.

That is the same fight the 2D animators were having. The resolution is likely to be the same: the tool wins, the fundamentals stay, and the line between competent and incompetent practitioners moves but does not disappear.

Where the analogy breaks

I want to be careful here, because the analogy is not clean.

Bad animation bores people. Bad software leaks data, corrupts records, and takes down systems other people depend on. The cost of failure is asymmetric in a way that the 2D-vs-3D fight never was. A 3D animator producing weightless work in 1996 wasted a studio’s budget and careers. A team shipping unreviewed LLM output to production in 2026 can produce outcomes that are not recoverable. The 2D animators did not have an equivalent problem. There was no version of bad animation that exfiltrated a customer database.

This is the part of the critique that is not gatekeeping. A December 2025 report by CodeRabbit found AI-co-authored pull requests carrying roughly 1.7x more issues and 1.4x more critical defects than human-written ones. And recent academic work on LLM-generated web code shows a persistent pattern of specific vulnerabilities, including path traversal in roughly a third of outputs across both Claude Sonnet 4 and GPT-4o. None of this is an argument against using the tool. It is an argument that the floor of competence required to deploy the tool’s output is higher than the floor required to produce the output, and that the gap between those two floors is where most of the damage happens.

A specific failure, from my own stack

I run a small multi-agent orchestration system called Karasu. It coordinates Claude Code, Codex, and a set of internal review hooks so that the human (me, usually) is not the message bus between them. It is the kind of system a vibe coding sceptic would point to as a worst case: an engineer who is not a professional software engineer, running multiple LLMs against his own repository, with imperfect supervision.

Here is the kind of failure that taught me what the sceptics are actually worried about.

A pull request went through the orchestration pipeline. The implementer agent wrote the change. The review agent inspected it, noted that the test suite was expected to pass, and approved. Karasu merged it. The problem: in that run, the review agent’s sandbox had a DNS issue and could not actually reach the test runner. It approved based on the implementer’s declared test results, not on tests it had executed itself. The merge was, on the surface, completely normal. Logs showed a green review. Nothing failed loudly.

I caught it because I went looking for the test runner’s output and found nothing. That is the entire skill. Not “writing the code from scratch”, I did not write most of that PR. The skill was knowing that a green review without an executable trace is not actually a green review, and being willing to dig until I found the trace or its absence. That instinct does not come from prompting. It comes from understanding what tests are for, what a review is for, and what it means for a system to lie to you politely.

This is the analogue of an animator knowing that a beautifully rendered character with no weight in the hips is still a bad performance. The tool produced an output that looked correct. The craft is recognising that the output is not what it appears to be.

What the pattern suggests

If the vibe coding debate follows the shape of the animation debate, three things are likely.

The first is that the tool wins. It is already winning. Arguments about whether developers should use LLMs to write code are not going to be resolved on their merits; they are going to be resolved by the fact that the developers everyone respects already do.

The second is that the fundamentals do not disappear. They get re-expressed. In animation it turned out that the twelve principles applied to 3D characters as much as to 2D ones; the surface changed, the underlying discipline did not. In software the equivalents are probably systems thinking, threat modelling, test discipline, observability, and the kind of suspicion that made me look for the missing test trace. Those are the things a vibe coder without prior training has to acquire, somehow, before deploying anything that matters. The tool does not provide them.

The third is that the gap between people who have the fundamentals and people who do not becomes more visible, not less, once the tool is universal. When everyone has the same instrument, the only thing left to differentiate the work is what the person behind it actually knows.

The question I would rather we were arguing about is not whether vibe coding counts as real engineering. It is which fundamentals carry across the transition, and how someone without a traditional path into the field is supposed to acquire them in time. I do not think we have a good answer to that yet.

Hero image: Utah Teapot, rendered by Dhatfield, released under CC BY-SA 3.0.