2026-05-15 · #prompting

The Escape Hatch

Hallucinations don't disappear because the model got smarter. They decrease because you stop incentivizing it to lie.

A weathered robot in a chef's hat stands in a submarine kitchen in front of a sealed door labelled ESCAPE HATCH / DANGER, holding a battered teapot beside a precarious cake stacked from cans of tuna and spam.

One of the earliest sources of hallucinations we identified wasn’t a flaw in the model. It was a flaw in how we were prompting it. We were asking questions the AI had no way to honestly answer, then expecting it not to make something up.

Think of it like a submarine. Once you’ve submerged, the escape hatch is sealed. Now you ask the AI “can you bake me a cake?”. It has no way to open that hatch and you’ve put it in an impossible position. It can’t bake you a real cake, so it does what any cornered intelligence does. It raids whatever supplies it has lying around the sub and hands you something that vaguely resembles one. Sometimes it’s close. Sometimes it really isn’t. Sound familiar?

The fix is surprisingly simple: give it an out. When you explicitly allow the model to say “I don’t know”, whether through your prompt, your system instructions, or tool design, something shifts. It stops guessing to satisfy you and starts being “honest” with you. The hallucinations don’t disappear because the model got smarter. They decrease because you stopped accidentally incentivizing it to lie.

Just as it’s jarring to sit and listen to an obvious lie, we experience that same frustration when the AI confidently tells us something false just to quickly follow up with “you’re absolutely right!”. Only we seem to forget that in this case, we’re the ones who put it in the sub, sealed the hatch, and asked it to bake us a cake.