Local-first

Run agent locally.
Download, switch, repeat.

locca sits on top of your llama.cpp setup with one-command model switching, defaults that work on iGPU / shared-VRAM machines, and OpenAI-compatible endpoints printed for the taking — ships with the pi coding agent wired up to whatever GGUF you just downloaded.

npm install -g @zeiq/locca
Watch video

Why locca

Defaults that respect your hardware.

Most local-LLM tools either hide llama.cpp behind a wrapper or expose every flag at once. locca ships defaults that actually work on shared-VRAM iGPUs.

Tuned out of the box

Vulkan, --flash-attn on, q8_0 KV cache, --parallel 1, --batch-size 1024, --jinja. A 7–9B model at 128k context fits on a 16 GB shared-VRAM iGPU.

Catalog-aware picker

The first-run wizard and locca switch show every curated model with a fits — 5.6 GB dl, 14.3 GB RAM, 256k ctx hint based on your detected hardware. No more 30 GB downloads that won't run.

The fast, light stack

llama.cpp is the leanest local inference runtime; pi is the smallest agent that actually does coding tools. Both stay out of your VRAM so the model gets the budget.

Get going

Install in two steps.

locca drives llama.cpp — install that first, then locca on top. First run launches a wizard that finishes config for you and prints the exact install hint for your distro if llama.cpp isn’t on $PATH.

1 · install llama.cpp
# use your platform’s package
# manager — brew, pacman, apt,
# dnf, zypper, apk. Or build
# from source.
 
# locca setup prints the exact
# line for your distro if it’s
# missing.
2 · install locca
$ npm install -g @zeiq/locca
$ locca # first run = wizard
 
# or run from source:
$ git clone …/locca
$ npm install && npm run build
$ npm link

Surface

A small set of commands.

Run locca with no args for the menu, or jump straight to what you need.

  • locca piLaunch the pi coding agent against your local server.
  • locca serveStart llama-server with a picked model, detached.
  • locca switchCatalog-aware picker — installed models + curated catalog with fit hints.
  • locca benchRun llama-bench with a friendlier summary.
  • locca doctorHealth check — hardware, server, log warnings, config sanity.
  • locca optimiseHave pi review the deployment and rank concrete tweaks.
  • locca apiPrint OpenAI-compatible connection info + LAN URLs.
  • locca logsTail the server log (locca-spawned servers only).
  • locca downloadPull a GGUF from HuggingFace into your models dir.
  • locca searchFuzzy-search HuggingFace for GGUF models.
  • locca deleteRemove a model directory you no longer need.
  • locca stopStop the running server.
  • locca configView / edit settings — get, set, reset, list, path.
  • locca setupRe-run the setup wizard.

Bonus

One command into the pi coding agent.

locca pi qwen fuzzy-matches the first *qwen*.gguf in your models dir, brings up the server if it isn’t already running, and registers itself as a custom OpenAI-compatible provider in ~/.pi/agent/models.json. Switch model, switch brain — locca switch gpt-oss-20b.

locca demo · 15 s