Skip to main content
We understand that there are use cases that require that no data be sent out to third-party services, including LLM providers like OpenAI. For this reason, we’ve set up Skald so that it can be deployed without connecting to any third-party provider at all, giving you full control over your data. As we’re new to the self-hosted LLM space, we’ve done the ground work to make this happen, but we understand that to make this work well there’s more that needs to be done. If you’re knowledgeable about this, don’t hesitate to reach out or even submit a PR on our repo. We’ve tested and ensured that the setup outlined here works, but it’s certainly limited.

Pre-requisites

  • Have a running LLM inference server with an OpenAI-compatible API e.g. Ollama, llama.cpp (we’ve explicitly tested with llama.cpp)

Setup

We’ve built Skald to work on top of as much open-source infrastructure as possible, so we use Postgres with pgvector for our main DB and vector DB. That means the only parts of the core stack that need to be adapted to work with a fully “local” setup are the LLM, the embeddings setup, and the document extraction pipeline. On our Cloud deployment, we support OpenAI, Anthropic, and Groq as the LLM providers, use Voyage AI for vector embeddings, and Datalab for document extraction. As a result, we need substitutes for all three of those when self-hosting in a fully local setup.

Embeddings

Our docker-compose.yml comes configured with an optional embedding-service component that’s attached to the local-embedding profile. This service uses the Python package sentence-transformers to expose an /embed and a /rerank endpoint that our other services can use to generate vector embeddings and rerank results without sending data elsewhere. In order to use it, you should set the env var EMBEDDING_PROVIDER=local and run the stack with docker-compose --profile local-embedding up. This should then just work out of the box.

LLM

For the LLM configuration, you’ll need to have a running LLM inference server yourself. This server should have an OpenAI compatible API and support tool-calling. We’ve tested this using llama.cpp but equivalent providers should also work. Once you have this running, you can then set LLM_PROVIDER=local and LOCAL_LLM_BASE_URL to your server’s endpoint. As of today, the only configuration we support is the URL, meaning this is best suited for a server running on the same machine for security reasons. We intend to expand this to include support for security-related variables that would make it viable to run the LLM server on another machine, and we’d be keen to hear from people who are deploying open-source models how you’ve been securing these servers.

Document extraction (optional)

Skald does not require a document extraction setup to work and you can skip this if you’ll be dealing exclusively with plaintext. But if you want to use document extraction features, you need to set appropriate environment variables for connecting to S3 or an S3-compatible object storage service. This is where documents will be stored.
AWS_REGION=<your_s3_region>
AWS_ACCESS_KEY_ID=<your_aws_access_key_id>
AWS_SECRET_ACCESS_KEY=<your_aws_secret_access_key>
S3_BUCKET_NAME=<your_s3_bucket>
Then for the document extraction itself you should run the stack with the local profile. This will spin up a local Docling server and configure the appropriate environment variables for it.

Postgres

By default we will spin up a Postgres instance as part of the Docker Compose stack for you, and we will install pgvector on it. If you’re running a production deploy you would ideally host and manage Postgres yourself. If you do so, you just need to set the DATABASE_URL env var to point to your instance, and run the stack without starting the Postgres service.
If you do host Postgres elsewhere, the one thing you need to remember is to install the pgvector extension on the instance.

RabbitMQ

The same concepts that apply to Postgres apply to RabbitMQ. Ideally you’d host this yourself in a prod deployment, and for that you should spin up the stack without the RabbitMQ service and set the following vars:
RABBITMQ_HOST
RABBITMQ_PORT
RABBITMQ_USER
RABBITMQ_PASSWORD
RABBITMQ_VHOST