Pre-requisites
- Have a running LLM inference server with an OpenAI-compatible API (e.g. Ollama, llama.cpp)
Setup
We’ve built Skald to work on top of as much open source infrastructure as possible, meaning we use Postgres withpgvector instead of a service like Pinecone, for example.
That means the only parts of the stack that need to be adapted to work with a fully “local” setup are the LLM and the embeddings setup.
On our Cloud deployment, we use OpenAI as the primary LLM provider, and Voyage AI for vector embeddings.
As a result, we need substitutes for both of those when self-hosting.
Embeddings
We’ll start with embeddings because it’s easier. Ourdocker-compose.yml comes configured with an optional embedding-service component that’s attached to the local-embedding profile. This service uses the Python package sentence-transformers to expose an /embed and a /rerank endpoint that our other services can use to generate vector embeddings and rerank results without sending data elsewhere.
In order to use it, you should set the env var EMBEDDING_PROVIDER=local and run the stack with docker-compose --profile local-embedding up. This should then just work out of the box.
LLM
For the LLM configuration, you’ll need to have a running LLM inference server yourself. This server should have an OpenAI compatible API and support tool-calling. We’ve tested this usingllama.cpp but equivalent providers should also work.
Once you have this running, you can then set LLM_PROVIDER=local and LOCAL_LLM_BASE_URL to your server’s endpoint.
As of today, the only configuration we support is the URL, meaning this is best suited for a server running on the same machine for security reasons. We intend to expand this to include support for security-related variables that would make it viable to run the LLM server on another machine, and we’d be keen to hear from people who are deploying open-source models how you’ve been securing these servers.
Postgres
By default we will spin up a Postgres instance as part of the Docker Compose stack for you, and we will installpgvector on it. If you’re running a production deploy you would ideally host and manage Postgres yourself. If you do so, you just need to set the DATABASE_URL env var to point to your instance, and run the stack without starting the Postgres service.
If you do host Postgres elsewhere, the one thing you need to remember is to install the
pgvector extension on the instance.