llama.vim + llama-server: an offline copilot that just works

I found a small tool that made me smile. It’s llama.vim, a Vim/Neovim plugin that gives inline code suggestions from a local model. It just works, it’s fast, and it looks pretty in the editor. For me, it’s a simple, solid offline alternative to GitHub Copilot.

What I like

setup I used

how I verified requests

Open Vim and type. The inline red text is the suggestion. Press Tab to accept it.

Or you can probe the server directly (helpful for debugging or curiosity).

chat style (OpenAI-compatible)

curl -s http://127.0.0.1:8012/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 32
  }'

llama.vim uses FIM for inline suggestions; you can call the infill endpoint directly

infill (fill‑in‑the‑middle)

curl -s http://127.0.0.1:8012/infill \
  -H 'Content-Type: application/json' \
  -d '{
    "input_prefix": "def add(a, b):",
    "input_suffix": "#",
    "prompt": "",
    "n_predict": 16
  }'

Where this fits