Pre-requisites
- Have a running LLM inference server with an OpenAI-compatible API e.g. Ollama, llama.cpp (we’ve explicitly tested with llama.cpp)
Setup
We’ve built Skald to work on top of as much open-source infrastructure as possible, so we use Postgres withpgvector for our main DB and vector DB.
That means the only parts of the core stack that need to be adapted to work with a fully “local” setup are the LLM, the embeddings setup, and the document extraction pipeline.
On our Cloud deployment, we support OpenAI, Anthropic, and Groq as the LLM providers, use Voyage AI for vector embeddings, and Datalab for document extraction.
As a result, we need substitutes for all three of those when self-hosting in a fully local setup.
Embeddings
Ourdocker-compose.yml comes configured with an optional embedding-service component that’s attached to the local-embedding profile. This service uses the Python package sentence-transformers to expose an /embed and a /rerank endpoint that our other services can use to generate vector embeddings and rerank results without sending data elsewhere.
In order to use it, you should set the env var EMBEDDING_PROVIDER=local and run the stack with docker-compose --profile local-embedding up. This should then just work out of the box.
LLM
For the LLM configuration, you’ll need to have a running LLM inference server yourself. This server should have an OpenAI compatible API and support tool-calling. We’ve tested this usingllama.cpp but equivalent providers should also work.
Once you have this running, you can then set LLM_PROVIDER=local and LOCAL_LLM_BASE_URL to your server’s endpoint.
As of today, the only configuration we support is the URL, meaning this is best suited for a server running on the same machine for security reasons. We intend to expand this to include support for security-related variables that would make it viable to run the LLM server on another machine, and we’d be keen to hear from people who are deploying open-source models how you’ve been securing these servers.
Document extraction (optional)
Skald does not require a document extraction setup to work and you can skip this if you’ll be dealing exclusively with plaintext. But if you want to use document extraction features, you need to set appropriate environment variables for connecting to S3 or an S3-compatible object storage service. This is where documents will be stored.local profile. This will spin up a local Docling server and configure the appropriate environment variables for it.
Postgres
By default we will spin up a Postgres instance as part of the Docker Compose stack for you, and we will installpgvector on it. If you’re running a production deploy you would ideally host and manage Postgres yourself. If you do so, you just need to set the DATABASE_URL env var to point to your instance, and run the stack without starting the Postgres service.
If you do host Postgres elsewhere, the one thing you need to remember is to install the
pgvector extension on the instance.