Local-first

Run agent locally.
Download, switch, repeat.

locca sits on top of your llama.cpp setup with one-command model switching, defaults that work on iGPU / shared-VRAM machines, and OpenAI-compatible endpoints printed for the taking — ships with the pi coding agent wired up to whatever GGUF you just downloaded.

npm install -g @zeiq/locca

Watch video

Why locca

Defaults that respect your hardware.

Most local-LLM tools either hide llama.cpp behind a wrapper or expose every flag at once. locca ships defaults that actually work on shared-VRAM iGPUs.

Tuned out of the box

Vulkan, --flash-attn on, q8_0 KV cache, --parallel 1, --batch-size 1024, --jinja. A 7–9B model at 128k context fits on a 16 GB shared-VRAM iGPU.

Catalog-aware picker

The first-run wizard and locca switch show every curated model with a fits — 5.6 GB dl, 14.3 GB RAM, 256k ctx hint based on your detected hardware. No more 30 GB downloads that won't run.

The fast, light stack

llama.cpp is the leanest local inference runtime; pi is the smallest agent that actually does coding tools. Both stay out of your VRAM so the model gets the budget.

Get going

Install in two steps.

locca drives llama.cpp — install that first, then locca on top. First run launches a wizard that finishes config for you and prints the exact install hint for your distro if llama.cpp isn’t on $PATH.

1 · install llama.cpp
# use your platform’s package
# manager — brew, pacman, apt,
# dnf, zypper, apk. Or build
# from source.
 
# locca setup prints the exact
# line for your distro if it’s
# missing.

2 · install locca
$ npm install -g @zeiq/locca
$ locca # first run = wizard
 
# or run from source:
$ git clone …/locca
$ npm install && npm run build
$ npm link

Surface

A small set of commands.

Run locca with no args for the menu, or jump straight to what you need.

locca piLaunch the pi coding agent against your local server.
locca serveStart llama-server with a picked model, detached.
locca switchCatalog-aware picker — installed models + curated catalog with fit hints.
locca benchRun llama-bench with a friendlier summary.
locca doctorHealth check — hardware, server, log warnings, config sanity.
locca optimiseHave pi review the deployment and rank concrete tweaks.
locca apiPrint OpenAI-compatible connection info + LAN URLs.
locca logsTail the server log (locca-spawned servers only).
locca downloadPull a GGUF from HuggingFace into your models dir.
locca searchFuzzy-search HuggingFace for GGUF models.
locca deleteRemove a model directory you no longer need.
locca stopStop the running server.
locca configView / edit settings — get, set, reset, list, path.
locca setupRe-run the setup wizard.

Bonus

One command into the pi coding agent.

locca pi qwen fuzzy-matches the first *qwen*.gguf in your models dir, brings up the server if it isn’t already running, and registers itself as a custom OpenAI-compatible provider in ~/.pi/agent/models.json. Switch model, switch brain — locca switch gpt-oss-20b.