Markus Hupfauer

Notes on AI security, agentic systems, and non-human identity. Personal site, personal opinions.

A single thread passing through a series of gates, fraying to many loose ends at the last one

Which agent bricked prod?

Every hop an agent makes has to answer two questions: whose authority is being exercised, and which agent exercised it? Lose either and your role model, your audit trail, and your incident response are fiction. Part one of a series, from the web tier down to Active Directory.

Stacked horizontal shelves in off-white with one rust-colored shelf offset out of line

The registry is the control plane

When npx skills add ships an agent capability the same way npm install ships a library, a large part of governance stops looking novel and starts looking like supply chain. The muscle to handle this exists. The OS just started cooperating. The missing pieces are a real capability manifest and stateful execution policy — and someone needs to ship them.

Abstract laptop silhouette as a closed perimeter, with inference contained inside, on deep ink black

Local inference, on purpose

Notes on actually running models locally on an M4 Max: omlx as the server, Qwen3.6-35B-A3B and GLM-4.7-Flash for agent work, gpt-oss-120b for writing. Why pointing Claude Code at localhost is fine for offline ergonomics and wrong as a perimeter.

Automated gate opens for expected form while off-shape form slips past

Auto mode is a sensor too

The third time this year I have written down the same epistemic mistake. Auto-accept loops in agentic IDEs trust a classifier to be a control. Real bypasses already exist. The sufficiently motivated attacker problem is now sitting on your laptop, deciding which files to edit while you make coffee.

Identity perimeter deflecting one vector, permitting a credentialed one

Identity is the control plane. Detection is a sensor.

Detection asks ‘is this input adversarial?’ Identity asks ‘what is this principal allowed to do, on whose behalf, right now?’ The first is probabilistic and bypassable. The second is enforceable and auditable.

Data well with tripwire glyphs, one probe triggers a ripple

Salting your own well: defensive prompt injection as a tripwire

Defenders can deliberately plant content in their environments that triggers the refusal vectors of attacker-controlled agents. Against the median lazy adversary it works. Against a determined one with an abliterated model it doesn’t. Either way, it is a sensor — not a control.