Run agent locally.
Download, switch, repeat.
locca sits on top of your
llama.cpp setup with one-command model
switching, defaults that work on iGPU / shared-VRAM machines,
and OpenAI-compatible endpoints printed for the taking — ships
with the
pi
coding agent wired up to whatever GGUF you just downloaded.
Why locca
Defaults that respect your hardware.
Most local-LLM tools either hide llama.cpp behind a wrapper or expose every flag at once. locca ships defaults that actually work on shared-VRAM iGPUs.
Tuned out of the box
Vulkan, --flash-attn on, q8_0 KV cache,
--parallel 1, --batch-size 1024,
--jinja. A 7–9B model at 128k context fits on a
16 GB shared-VRAM iGPU.
Catalog-aware picker
The first-run wizard and locca switch show every
curated model with a fits — 5.6 GB dl,
14.3 GB RAM, 256k ctx hint based on your detected hardware.
No more 30 GB downloads that won't run.
The fast, light stack
llama.cpp is the leanest local inference runtime;
pi
is the smallest agent that actually does coding tools. Both stay
out of your VRAM so the model
gets the budget.
Get going
Install in two steps.
locca drives llama.cpp
— install that first, then locca on top. First run launches a
wizard that finishes config for you and prints the exact install
hint for your distro if llama.cpp
isn’t on $PATH.
Surface
A small set of commands.
Run locca
with no args for the menu, or jump straight to what you need.
locca piLaunch the pi coding agent against your local server.locca serveStartllama-serverwith a picked model, detached.locca switchCatalog-aware picker — installed models + curated catalog with fit hints.locca benchRun llama-bench with a friendlier summary.locca doctorHealth check — hardware, server, log warnings, config sanity.locca optimiseHave pi review the deployment and rank concrete tweaks.locca apiPrint OpenAI-compatible connection info + LAN URLs.locca logsTail the server log (locca-spawned servers only).locca downloadPull a GGUF from HuggingFace into your models dir.locca searchFuzzy-search HuggingFace for GGUF models.locca deleteRemove a model directory you no longer need.locca stopStop the running server.locca configView / edit settings — get, set, reset, list, path.locca setupRe-run the setup wizard.
Bonus
One command into the pi coding agent.
locca pi qwen fuzzy-matches the first
*qwen*.gguf in your models dir, brings up the
server if it isn’t already running, and registers itself as a
custom OpenAI-compatible provider in
~/.pi/agent/models.json. Switch model, switch
brain — locca switch gpt-oss-20b.