
Local inference, on purpose
Notes on actually running models locally on an M4 Max: omlx as the server, Qwen3.6-35B-A3B and GLM-4.7-Flash for agent work, gpt-oss-120b for writing. Why pointing Claude Code at localhost is fine for offline ergonomics and wrong as a perimeter.