Notes on AI security, agentic systems, and non-human identity. Personal site, personal opinions.

Which agent bricked prod?
Every hop an agent makes has to answer two questions: whose authority is being exercised, and which agent exercised it? Lose either and your role model, your audit trail, and your incident response are fiction. Part one of a series, from the web tier down to Active Directory.

The registry is the control plane
When npx skills add ships an agent capability the same way npm install ships a library, a large part of governance stops looking novel and starts looking like supply chain. The muscle to handle this exists. The OS just started cooperating. The missing pieces are a real capability manifest and stateful execution policy — and someone needs to ship them.

Local inference, on purpose
Notes on actually running models locally on an M4 Max: omlx as the server, Qwen3.6-35B-A3B and GLM-4.7-Flash for agent work, gpt-oss-120b for writing. Why pointing Claude Code at localhost is fine for offline ergonomics and wrong as a perimeter.

Auto mode is a sensor too
The third time this year I have written down the same epistemic mistake. Auto-accept loops in agentic IDEs trust a classifier to be a control. Real bypasses already exist. The sufficiently motivated attacker problem is now sitting on your laptop, deciding which files to edit while you make coffee.

Identity is the control plane. Detection is a sensor.
Detection asks ‘is this input adversarial?’ Identity asks ‘what is this principal allowed to do, on whose behalf, right now?’ The first is probabilistic and bypassable. The second is enforceable and auditable.

Salting your own well: defensive prompt injection as a tripwire
Defenders can deliberately plant content in their environments that triggers the refusal vectors of attacker-controlled agents. Against the median lazy adversary it works. Against a determined one with an abliterated model it doesn’t. Either way, it is a sensor — not a control.