Abstract laptop silhouette as a closed perimeter, with inference contained inside, on deep ink black

Local inference, on purpose

Notes on actually running models locally on an M4 Max: omlx as the server, Qwen3.6-35B-A3B and GLM-4.7-Flash for agent work, gpt-oss-120b for writing. Why pointing Claude Code at localhost is fine for offline ergonomics and wrong as a perimeter.

2026-05-25 · 5 min · Markus Hupfauer