
Salting your own well: defensive prompt injection as a tripwire
Defenders can deliberately plant content in their environments that triggers the refusal vectors of attacker-controlled agents. Against the median lazy adversary it works. Against a determined one with an abliterated model it doesn’t. Either way, it is a sensor — not a control.