Skip to content
Victor Del Puerto
Go back

Calibrate against your own voice

I write in English as a second language. I ran four commercial AI detectors over blog posts I had written and edited by hand, and each came back over 90% AI. When I rewrote a paragraph to drop the real tells (aphorisms, hinge phrases, triplet negations), the score barely moved. The detectors were reading my passport, not my prose.

callus launch animation: a robot mouth morphs into human lips
From a machine voice to your own.

There is research behind that. In 2023 a Stanford group (Liang, Zou and colleagues) ran 91 TOEFL essays by non-native English speakers and 88 essays by native speakers through seven popular GPT detectors. On average the detectors flagged 61% of the non-native essays as AI. The native essays passed almost clean, and at least one detector flagged 97% of the non-native essays. The tools key on smaller vocabulary and simpler collocations, which is most of what writing English as a second language looks like.

callus scores something else. It reports two numbers apart: how far a draft sits from your own raw voice (a corpus pulled from how you actually write in real sessions), and the density of specific AI tells. Then it rewrites toward your voice, keeps your claims, numbers and links, and restructures paragraphs where that helps.

It calibrates per author. My corpus is mine, yours is yours. You point it at your own voice profile, it runs in English and Spanish, and the non-native bias from the Stanford work is corrected for inside the prompt.

pip install callus

Repo and setup: github.com/VDP89/callus.

callus is the third piece of a small open-source cluster: lucy-syndrome (the research on why agents forget your corrections), fscars (deterministic hooks that make corrections stick), and callus (voice). All operator-in-the-loop.

The detector industry sells one number. For a large share of writers that number measures the wrong thing.


For the record: callus score rates this post 26/100 (low_ai) against my own corpus — that is the tool described above, run on the post itself.


Share this post on:

Previous Post
The same scar, two agents
Next Post
A month of functional scars: 934 fires, one broken validation loop, and what it cost