Data well with tripwire glyphs, one probe triggers a ripple

Salting your own well: defensive prompt injection as a tripwire

Defenders can deliberately plant content in their environments that triggers the refusal vectors of attacker-controlled agents. Against the median lazy adversary it works. Against a determined one with an abliterated model it doesn’t. Either way, it is a sensor — not a control.

2026-05-11 · 7 min · Markus Hupfauer