LiteLLM Kubernetes Deployment: Complete HashiCorp Vault Integration

All posts

LiteLLM requires multiple sensitive credentials — OpenAI and Anthropic API keys, a database password, and a master key for its own API. The obvious approach is to throw them in Kubernetes secrets and move on. This post does it properly instead: HashiCorp Vault stores all credentials, External Secrets Operator syncs them into Kubernetes, and nothing sensitive lives in your manifests.

Prerequisites

This guide assumes a running Kubernetes cluster with an ingress controller, a Vault instance with the Kubernetes auth method available, and a PostgreSQL database. We’re focused on the LiteLLM deployment — not standing up these foundations.

Vault Configuration

Enable a KV v2 secrets engine scoped to LiteLLM, populate it with your credentials, then create a policy that grants read-only access to those paths. The policy is what you’ll attach to the External Secrets Operator token.

# Enable KV secrets engine
vault secrets enable -path=litellm kv-v2

# Store credentials
vault kv put litellm/api-keys \
  master-key="sk-your-secure-master-key" \
  salt-key="sk-your-secure-salt-key" \
  openai-key="sk-your-openai-api-key" \
  anthropic-key="sk-your-anthropic-api-key"

vault kv put litellm/database \
  url="postgresql://litellm:password@postgres:5432/litellm?sslmode=require"

# Create read-only policy for ESO
vault policy write eso-litellm-policy - <<EOF
path "litellm/data/api-keys" {
  capabilities = ["read"]
}

path "litellm/data/database" {
  capabilities = ["read"]
}
EOF

# Create token for ESO (1-year TTL)
vault token create -policy=eso-litellm-policy -ttl=8760h -display-name="eso-litellm"

Authentication Note

This uses token-based auth for simplicity. Kubernetes authentication — where Vault validates the pod’s service account token directly — is more secure and removes the need to manage a long-lived token. That’ll be covered separately.

External Secrets Operator

ESO is the bridge between Vault and Kubernetes. It watches your ExternalSecret resources, fetches the referenced values from Vault, and creates or updates the corresponding Kubernetes Secret objects automatically — including on rotation.

helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets \
  --namespace external-secrets-system --create-namespace

SecretStore

The SecretStore tells ESO where Vault lives and how to authenticate. Create the token secret first, then reference it in the store.

apiVersion: v1
kind: Secret
metadata:
  name: vault-token
  namespace: ai-services
type: Opaque
stringData:
  token: your-vault-token-here
---
apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
  name: vault-backend
  namespace: ai-services
spec:
  provider:
    vault:
      server: "https://vault.example.com:8200"
      path: "secret"
      version: "v2"
      auth:
        tokenSecretRef:
          name: "vault-token"
          key: "token"

ExternalSecret

Each entry in data maps a Vault key to a Kubernetes secret key. The refreshInterval controls how often ESO checks for updated values — 15 seconds is fine for most deployments.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: litellm-secrets
  namespace: ai-services
spec:
  refreshInterval: 15s
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: litellm-secrets
    creationPolicy: Owner
  data:
  - secretKey: LITELLM_MASTER_KEY
    remoteRef:
      key: litellm/api-keys
      property: master-key
  - secretKey: LITELLM_SALT_KEY
    remoteRef:
      key: litellm/api-keys
      property: salt-key
  - secretKey: OPENAI_API_KEY
    remoteRef:
      key: litellm/api-keys
      property: openai-key
  - secretKey: ANTHROPIC_API_KEY
    remoteRef:
      key: litellm/api-keys
      property: anthropic-key
  - secretKey: DATABASE_URL
    remoteRef:
      key: litellm/database
      property: url

LiteLLM Deployment

The deployment pulls every credential from the litellm-secrets object ESO manages. STORE_MODEL_IN_DB enables PostgreSQL persistence and RUN_MIGRATION handles schema setup on first boot.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm
  namespace: ai-services
spec:
  replicas: 2
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
      - name: litellm
        image: ghcr.io/berriai/litellm:main-stable
        ports:
        - containerPort: 4000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: litellm-secrets
              key: DATABASE_URL
        - name: STORE_MODEL_IN_DB
          value: "True"
        - name: RUN_MIGRATION
          value: "True"
        - name: LITELLM_MASTER_KEY
          valueFrom:
            secretKeyRef:
              name: litellm-secrets
              key: LITELLM_MASTER_KEY
        - name: LITELLM_SALT_KEY
          valueFrom:
            secretKeyRef:
              name: litellm-secrets
              key: LITELLM_SALT_KEY
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: litellm-secrets
              key: OPENAI_API_KEY
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: litellm-secrets
              key: ANTHROPIC_API_KEY
        livenessProbe:
          httpGet:
            path: /health/liveliness
            port: 4000
          initialDelaySeconds: 40
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /health/readiness
            port: 4000
          initialDelaySeconds: 10
          periodSeconds: 5
        resources:
          requests:
            memory: "512Mi"
            cpu: "200m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

Service & Ingress

apiVersion: v1
kind: Service
metadata:
  name: litellm-service
  namespace: ai-services
spec:
  selector:
    app: litellm
  ports:
  - port: 4000
    targetPort: 4000
    name: http
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: litellm-ingress
  namespace: ai-services
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - litellm.example.com
    secretName: litellm-tls
  rules:
  - host: litellm.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: litellm-service
            port:
              number: 4000

Verification

# Confirm ESO synced the secrets from Vault
kubectl get externalsecret -n ai-services
kubectl describe externalsecret litellm-secrets -n ai-services

# Check pods are running
kubectl get pods -n ai-services
kubectl logs -n ai-services deployment/litellm

# Test health endpoint
kubectl port-forward -n ai-services svc/litellm-service 4000:4000
curl http://localhost:4000/health

# Test the API
curl -X POST https://litellm.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Key Points

Vault’s KV v2 engine keeps all credentials out of your manifests and version control
ESO handles sync automatically — rotate a secret in Vault and Kubernetes picks it up within the refresh interval
The policy grants read-only access to specific paths — ESO can’t touch anything outside litellm/
Token auth works but Kubernetes auth is better for production — it eliminates long-lived tokens entirely
RUN_MIGRATION only needs to be True on first boot; safe to leave on, but you can disable it after the schema is initialized

Kubernetes Secrets Management with Vault