Petter Reinholdtsen: Talking to the Computer, and Getting Some Nonsense Back...

Talking to the Computer, and Getting Some Nonsense Back...

14th April 2026

At last, I can run my own large language model artificial idiocy generator at home on a Debian testing host using Debian packages directly from the Debian archive. After months of polishing the llama.cpp, whisper.cpp and ggml packages, and their dependencies, I was very happy to see today that they all entered Debian testing this morning. Several release-critical issues in dependencies have been blocking the migration for the last few weeks, and now finally the last one of these has been fixed. I would like to extend a big thanks to everyone involved in making this happen.

I've been running home-build editions of whisper.cpp and llama.cpp packages for a while now, first building from the upstream Git repository and later, as the Debian packaging progressed, from the relevant Salsa Git repositories for the ROCM packages, GGML, whisper.cpp and llama.cpp. The only snag with the official Debian packages is that the JavaScript chat client web pages are slightly broken in my setup, where I use a reverse proxy to make my home server visible on the public Internet while the included web pages only want to communicate with localhost / 127.0.0.1. I suspect it might be simple to fix by making the JavaScript code dynamically look up the URL of the current page and use that to determine where to find the API service, but until someone fixes BTS report #1128381, I just have to edit /usr/share/llama.cpp-tools/llama-server/themes/simplechat/simplechat.js every time I upgrade the package. I start my server like this on my machine with a nice AMD GPU (donated to me as a Debian developer by AMD two years ago, thank you very much):

  LC_ALL=C llama-server \
    -ngl 256  \
    -c $(( 42 * 1024)) \
    --temp 0.7 \
    --repeat_penalty 1.1 \
    -n -1 \
    -m Qwen3-Coder-30B-A3B-Instruct-Q5_K_S.gguf

It only takes a few minutes to load the model for the first time and prepare a nice API server for me at https://my.reverse.proxy.example.com:8080/v1/, available (note, this sets up the server up without authentication; use a reverse proxy with authentication if you need it) for all the API clients I care to test. I switch models regularly to test different new ones, the Qwen3-Coder one just happen to be the one I use at the moment. Perhaps these packages is something for you to have fun with too?

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Tags: debian, english.

Petter Reinholdtsen

Archive

Tags