☀️ Light Mode

Building a High-Availability HashiCorp Vault Cluster with Raft Storage

Hashicorp vault HA configuration with raft.
Building a High-Availability HashiCorp Vault Cluster with Raft Storage

HashiCorp Vault HA Cluster

Building a 3-Node Raft Cluster Across Cloud and On-Premises Infrastructure

Deploy a production-grade HashiCorp Vault high-availability cluster using integrated Raft storage, spanning multiple cloud providers and on-premises infrastructure with automatic failover and load balancing.

Why Vault HA with Raft?

Single-node Vault deployments are a liability in production. When that server goes down, your entire infrastructure loses access to secrets – databases can’t connect, applications can’t authenticate, deployments grind to a halt. I learned this the hard way when a simple server restart took down my entire homelab for 20 minutes while everything waited for Vault to come back.

The Solution: HashiCorp Vault’s integrated Raft storage provides built-in high availability without external dependencies like Consul. With Raft, you get automatic leader election, strong consistency across nodes, and the ability to survive node failures while maintaining access to your secrets.

This guide shows you how to build a 3-node Vault cluster that can lose any single node and keep running. We’ll deploy nodes across cloud providers and on-premises infrastructure, use HAProxy for automatic failover, and configure everything for production use.

Why Raft Over External Storage?

The Traditional Approach

Historically, Vault HA required an external storage backend like Consul, etcd, or DynamoDB. This meant managing additional infrastructure, dealing with network dependencies, and troubleshooting two distributed systems instead of one.

Integrated Raft Benefits

  • No external storage dependencies
  • Simplified architecture and operations
  • Built-in leader election
  • Strong consistency guarantees
  • Lower operational complexity

How Raft Works

  • Quorum-based consensus (need N/2 + 1 nodes)
  • Automatic leader election on failure
  • Synchronous replication to majority
  • Followers forward writes to leader
  • Can tolerate (N-1)/2 node failures

Architecture Overview

3-Node Distributed Deployment

This setup demonstrates a hybrid cloud architecture – one node on-premises for low-latency local access, two nodes in the cloud for geographic redundancy. All nodes communicate over an encrypted overlay network (ZeroTier, Tailscale, or WireGuard work well).

Node 1: On-Premises

Location: Home network/datacenter

Role: Disaster recovery node

Benefits: Low-latency local access, no cloud costs

Node 2: Cloud Provider

Location: Oracle Cloud / Linode / DigitalOcean

Role: Primary leader candidate

Benefits: High availability, geographic redundancy

Node 3: Cloud Provider

Location: Same or different cloud region

Role: Follower / failover candidate

Benefits: Completes quorum, enables HA

Quorum Math

3 nodes = can lose 1 node: If you have 2 cloud nodes and 1 fails, you still have quorum (2/3). If your on-prem node goes down, the cloud nodes maintain service. This gives you true high availability.

Why not 2 nodes? With 2 nodes, you need both for quorum (2/2). Losing one means the cluster stops accepting writes. Always use odd numbers: 3, 5, or 7 nodes.

TLS Certificate Requirements

Critical: Mutual TLS for Raft

Raft cluster communication requires mutual TLS authentication. Your certificates MUST include both serverAuth and clientAuth in Extended Key Usage. Missing clientAuth will cause “tls: bad certificate” errors during cluster formation.

Step 1: Create Certificate Authority

One-Time CA Setup

Before generating node certificates, create your own Certificate Authority. This CA will sign all Vault node certificates and must be trusted by all nodes in the cluster.

#!/bin/bash
# Create your own Certificate Authority

# Generate CA private key
openssl genrsa -out rootCA.key 4096

# Generate CA certificate (valid for 10 years)
openssl req -x509 -new -nodes \
  -key rootCA.key \
  -sha256 -days 3650 \
  -out rootCA.crt \
  -subj "/C=US/ST=State/L=City/O=Your Organization/OU=Certificate Authority/CN=Vault Root CA"

echo "✓ Created rootCA.crt and rootCA.key"
echo "⚠ Keep rootCA.key secure - it signs all your certificates"

Step 2: Generate Node Certificates

Important: HAProxy Hostname in All Certificates

If you’re using HAProxy for load balancing, all three node certificates must include the HAProxy hostname (e.g., vault.example.com) in their Subject Alternative Names. This allows clients to connect through HAProxy without certificate errors.

#!/bin/bash
# gen-cert.sh - Generate Vault TLS Certificates with Raft Requirements

if [ "$#" -ne 2 ]; then
    echo "Usage: $0 <hostname> <IP>"
    echo "Example: $0 vault1.example.com 192.168.1.10"
    exit 1
fi

HOSTNAME=$1
IP=$2

# Create certificate configuration
cat > cert.cnf <<EOF
[ req ]
default_bits       = 2048
default_md         = sha256
distinguished_name = req_distinguished_name
req_extensions     = v3_req
prompt             = no

[ req_distinguished_name ]
C  = US
ST = State
L  = City
O  = Your Organization
OU = Infrastructure
CN = ${HOSTNAME}

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = ${HOSTNAME}
DNS.2 = vault.example.com
IP.1  = ${IP}
EOF

# Generate private key and CSR
openssl req -new -newkey rsa:2048 -nodes \
  -keyout ${HOSTNAME}-key.pem \
  -out ${HOSTNAME}.csr \
  -config cert.cnf

# Sign certificate with CA
# CRITICAL: Both -extfile and -extensions must be specified
# Otherwise v3 extensions (including clientAuth) won't be included
openssl x509 -req \
  -in ${HOSTNAME}.csr \
  -CA rootCA.crt \
  -CAkey rootCA.key \
  -CAcreateserial \
  -out ${HOSTNAME}.pem \
  -days 365 \
  -sha256 \
  -extfile cert.cnf \
  -extensions v3_req

# Cleanup
rm ${HOSTNAME}.csr cert.cnf

echo "✓ Created: ${HOSTNAME}-key.pem and ${HOSTNAME}.pem"

# Verify certificate includes clientAuth
echo ""
echo "Verifying Extended Key Usage:"
openssl x509 -in ${HOSTNAME}.pem -text -noout | grep -A 1 "Extended Key Usage"
EOF

chmod +x gen-cert.sh

Generate Certificates for Each Node

# On-prem node
./gen-cert.sh vault1.example.com 192.168.1.10

# Cloud node 1
./gen-cert.sh vault2.example.com 10.0.1.20

# Cloud node 2
./gen-cert.sh vault3.example.com 10.0.1.30

# Verify each certificate shows:
# TLS Web Server Authentication, TLS Web Client Authentication

Certificate Deployment

Each Vault node needs:

  • rootCA.crt – Same CA certificate on all nodes (for trust)
  • vaultX.pem – Node-specific certificate
  • key.pem – Node-specific private key (rename from vaultX-key.pem)

Docker Compose Configuration

Containerized Deployment

Using Docker makes Vault deployment consistent across different infrastructure providers. The same configuration works on bare metal, VMs, or cloud instances.

Directory Structure

~/vault/
├── docker-compose.yml
├── Dockerfile
├── rootCA.crt
├── config/
│   └── config.hcl
├── certs/
│   ├── vault1.pem (or vault2.pem, vault3.pem)
│   └── key.pem
└── data/  (Raft storage - created automatically)

Dockerfile with CA Trust

FROM hashicorp/vault:latest

# Add your internal CA to container trust store
COPY rootCA.crt /usr/local/share/ca-certificates/rootCA.crt
RUN apk add --no-cache ca-certificates && \
    update-ca-certificates

Why Custom Dockerfile?

Vault nodes need to trust each other’s certificates. By embedding your CA certificate in the container image, you ensure TLS verification works without tls_skip_verify hacks.

docker-compose.yml

version: "3.8"
services:
  vault:
    build: .
    container_name: vault
    restart: unless-stopped
    cap_add:
      - IPC_LOCK
    environment:
      VAULT_ADDR: "https://vault1.example.com:8200"
      VAULT_API_ADDR: "https://vault1.example.com:8200"
    ports:
      - "8200:8200"  # API port
      - "8201:8201"  # Cluster port
    volumes:
      - ./config:/vault/config:ro
      - ./data:/vault/data:rw
      - ./certs:/certs:ro
    entrypoint: ["vault", "server", "-config=/vault/config/config.hcl"]

Vault Configuration (config.hcl)

# Node 1 Configuration (adjust for each node)
storage "raft" {
  path = "/vault/data"
  node_id = "vault1"
  
  # Automatically try to join other nodes on startup
  retry_join {
    leader_api_addr = "https://vault2.example.com:8200"
  }
  retry_join {
    leader_api_addr = "https://vault3.example.com:8200"
  }
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/certs/vault1.pem"
  tls_key_file = "/certs/key.pem"
}

api_addr = "https://vault1.example.com:8200"
cluster_addr = "https://vault1.example.com:8201"
disable_mlock = true
ui = true

Configuration Notes

  • node_id: Must be unique per node (vault1, vault2, vault3)
  • retry_join: Allows automatic cluster formation on restart
  • api_addr: How clients reach this node
  • cluster_addr: How nodes communicate with each other (port 8201)

Cluster Initialization

Critical Initialization Order

For Raft storage, you MUST join before unsealing. This is different from file-based storage where you unseal first. The order matters because nodes need to establish Raft cluster membership while sealed.

Step 1: Initialize Leader Node

# On Node 2 (cloud node, will be initial leader)
cd ~/vault
docker-compose up -d

# Initialize the cluster (ONE TIME ONLY)
docker exec vault vault operator init

# OUTPUT - SAVE THIS SECURELY:
# Unseal Key 1: base64-encoded-key-1
# Unseal Key 2: base64-encoded-key-2
# Unseal Key 3: base64-encoded-key-3
# Unseal Key 4: base64-encoded-key-4
# Unseal Key 5: base64-encoded-key-5
# Initial Root Token: hvs.xxxxxxxxxxxxxxxx

# Unseal the leader (need 3 of 5 keys)
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 

# Verify it's unsealed and active
docker exec vault vault status

Step 2: Join Follower Nodes

# On Node 1 (on-prem) and Node 3 (cloud)
cd ~/vault
docker-compose up -d

# JOIN to the leader (while still sealed)
docker exec vault vault operator raft join https://vault2.example.com:8200

# OUTPUT: Joined    true

# NOW unseal with the SAME keys from initialization
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 

# Verify status
docker exec vault vault status

Step 3: Verify Cluster Formation

# On any node, set environment and check cluster
export VAULT_ADDR=https://vault2.example.com:8200
export VAULT_TOKEN=

vault operator raft list-peers

# Expected output:
# Node     Address                    State       Voter
# ----     -------                    -----       -----
# vault2   vault2.example.com:8201    leader      true
# vault1   vault1.example.com:8201    follower    true
# vault3   vault3.example.com:8201    follower    true

✓ Cluster Successfully Formed

All three nodes are now part of the Raft cluster. Vault2 is the current leader, but if it fails, vault1 or vault3 will automatically be elected leader within seconds.

HAProxy Load Balancer

True High Availability

While Vault provides automatic leader election, your applications need a single endpoint that always works. HAProxy provides health-checked load balancing – it automatically detects failed nodes and routes traffic only to healthy Vault instances.

HAProxy Configuration

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode tcp
    option tcplog
    timeout connect 5000
    timeout client  50000
    timeout server  50000

frontend vault_frontend
    bind *:8200
    mode tcp
    default_backend vault_backend

backend vault_backend
    mode tcp
    balance roundrobin
    option tcp-check
    tcp-check connect
    
    # Server definitions with health checks
    # Check every 5s, mark down after 3 failures, mark up after 2 successes
    server vault1 192.168.1.10:8200 check inter 5s fall 3 rise 2
    server vault2 10.0.1.20:8200 check inter 5s fall 3 rise 2
    server vault3 10.0.1.30:8200 check inter 5s fall 3 rise 2

Health Check Parameters

  • inter 5s: Check every 5 seconds
  • fall 3: Mark server down after 3 consecutive failures (15 seconds total)
  • rise 2: Mark server up after 2 consecutive successes (10 seconds total)

This configuration provides ~15 second failover time – acceptable for most production use cases.

DNS Configuration

# Create single DNS entry pointing to HAProxy
vault.example.com → 203.0.113.50 (HAProxy IP)

# Applications use this single address:
export VAULT_ADDR=https://vault.example.com:8200

Benefits of HAProxy + DNS

  • Applications use one address: vault.example.com:8200
  • HAProxy routes to healthy nodes only
  • Automatic failover without client changes
  • Load distribution across all nodes for reads
  • No application-level retry logic needed

Vault Context Switching

Managing Multiple Vault Endpoints

With three individual nodes plus the HAProxy endpoint, you need a quick way to switch between them for administration and testing. Shell functions make this trivial.

# Add to ~/.zshrc or ~/.bashrc

# Vault addresses (adjust to your actual hostnames)
export VAULT1="https://vault1.example.com:8200"
export VAULT2="https://vault2.example.com:8200"
export VAULT3="https://vault3.example.com:8200"
export VAULT_HA="https://vault.example.com:8200"

# Unseal keys (source from secure file)
export UNSEAL_KEY1="your-key-1"
export UNSEAL_KEY2="your-key-2"
export UNSEAL_KEY3="your-key-3"

# Root token (source from secure file)
export VAULT_ROOT_TOKEN="hvs.xxxxxxxxxxxx"

# Context switching functions
vault-node1() {
  export VAULT_ADDR="$VAULT1"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: Node 1 (on-prem)"
}

vault-node2() {
  export VAULT_ADDR="$VAULT2"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: Node 2 (cloud leader)"
}

vault-node3() {
  export VAULT_ADDR="$VAULT3"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: Node 3 (cloud follower)"
}

vault-ha() {
  export VAULT_ADDR="$VAULT_HA"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: HAProxy (load balanced)"
}

# Quick unseal script
vault-unseal() {
  local addr=${1:-$VAULT_ADDR}
  echo "Unsealing $addr..."
  vault operator unseal -address="$addr" "$UNSEAL_KEY1" > /dev/null
  vault operator unseal -address="$addr" "$UNSEAL_KEY2" > /dev/null
  vault operator unseal -address="$addr" "$UNSEAL_KEY3" > /dev/null
  vault status -address="$addr" | grep "Sealed"
}

# Unseal all nodes
vault-unseal-all() {
  vault-unseal "$VAULT1"
  vault-unseal "$VAULT2"
  vault-unseal "$VAULT3"
  echo ""
  vault operator raft list-peers -address="$VAULT2"
}

Usage Examples

# Switch to HAProxy endpoint (normal usage)
vault-ha
vault kv put secret/myapp/config api_key="secret123"

# Check cluster status on leader
vault-node2
vault operator raft list-peers

# Unseal specific node after restart
vault-unseal https://vault1.example.com:8200

# Unseal all three nodes after full cluster restart
vault-unseal-all

Network Connectivity

Connecting Cloud and On-Premises

For a hybrid deployment spanning cloud and on-prem infrastructure, you need encrypted connectivity between locations. Modern overlay networks make this simple without complex VPN configurations.

ZeroTier

Software-defined networking with central management. Create a network, join nodes, assign IPs. Handles NAT traversal automatically.

curl -s https://install.zerotier.com | sudo bash
sudo zerotier-cli join YOUR_NETWORK_ID

Tailscale

WireGuard-based mesh network with SSO integration. Easiest setup, works everywhere, includes MagicDNS.

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

WireGuard

Manual VPN configuration for full control. More setup work, maximum flexibility.

sudo apt install wireguard
# Configure peers manually
# in /etc/wireguard/wg0.conf

Firewall Configuration

Ensure these ports are open between all Vault nodes:

  • 8200/tcp: Vault API (client connections)
  • 8201/tcp: Vault cluster communication (Raft)
# Ubuntu/Debian
sudo ufw allow 8200/tcp
sudo ufw allow 8201/tcp

# RHEL/CentOS
sudo firewall-cmd --permanent --add-port=8200/tcp
sudo firewall-cmd --permanent --add-port=8201/tcp
sudo firewall-cmd --reload

Testing Failover

Verify Your HA Setup

The real test of high availability is graceful degradation under failure. Let’s verify automatic failover works.

Test 1: Leader Failure

# Check current leader
vault operator raft list-peers

# Simulate leader failure (stop vault2)
# On vault2:
docker-compose down

# Watch automatic leader election (on vault1 or vault3)
watch -n 1 'vault operator raft list-peers'

# Verify HAProxy still works
vault status -address=https://vault.example.com:8200

# Applications continue working - writes now go to new leader
vault kv put secret/test key="written during failover"

Test 2: Follower Failure

# Stop a follower node
# On vault3:
docker-compose down

# Cluster still has quorum (2/3 nodes)
vault operator raft list-peers

# Verify operations still work
vault kv get secret/test

# HAProxy automatically removes failed node from pool
# Check HAProxy stats to see node marked down

Test 3: Network Partition

# Simulate network partition (on-prem node isolated)
# On vault1, block cluster port:
sudo iptables -A INPUT -p tcp --dport 8201 -j DROP

# Cloud nodes (vault2, vault3) maintain quorum
# vault1 can still serve reads but not writes

# Restore connectivity
sudo iptables -D INPUT -p tcp --dport 8201 -j DROP

# vault1 automatically rejoins cluster

Operational Considerations

Unseal After Restart

Vault starts sealed after any restart. With Shamir seal (default), you must manually unseal nodes.

  • Only leader needs unsealing after cluster restart
  • Followers auto-unseal once leader is up
  • Individual node restarts require manual unseal

Auto-Unseal (Future)

For production, consider auto-unseal with cloud KMS:

  • AWS KMS (essentially free under 20K req/month)
  • Azure Key Vault (similar free tier)
  • GCP Cloud KMS

Migration: vault operator unseal -migrate

Backup Strategy

# Take Raft snapshot (on leader)
vault operator raft snapshot save backup-$(date +%Y%m%d).snap

# Restore snapshot (emergency recovery)
vault operator raft snapshot restore -force backup.snap

# Automated daily backups
cat > /etc/cron.daily/vault-backup <<'EOF'
#!/bin/bash
BACKUP_DIR="/backup/vault"
DATE=$(date +%Y%m%d)
export VAULT_ADDR=https://vault.example.com:8200
export VAULT_TOKEN=

vault operator raft snapshot save ${BACKUP_DIR}/vault-${DATE}.snap
find ${BACKUP_DIR} -name "vault-*.snap" -mtime +30 -delete
EOF
chmod +x /etc/cron.daily/vault-backup

Monitoring & Alerting

# Prometheus metrics endpoint
curl https://vault.example.com:8200/v1/sys/metrics?format=prometheus

# Key metrics to monitor:
# - vault_core_unsealed (should be 1)
# - vault_core_active (1 on leader, 0 on followers)
# - vault_raft_peers (should equal number of nodes)
# - vault_raft_leader (should be 1)

# Example Prometheus alert
- alert: VaultSealed
  expr: vault_core_unsealed == 0
  for: 2m
  annotations:
    summary: "Vault node is sealed"
    
- alert: VaultNoLeader
  expr: sum(vault_raft_leader) == 0
  for: 1m
  annotations:
    summary: "Vault cluster has no leader"

Cloud Provider Recommendations

Oracle Cloud (Free Tier)

Best for homelabs:

  • 4 ARM Ampere instances (24GB RAM total)
  • 200GB storage
  • 10TB egress/month
  • Actually free forever

Linode/Akamai

Simple pricing:

  • $5/month for 1GB RAM instance
  • Excellent documentation
  • Fast provisioning
  • Good for production workloads

DigitalOcean

Developer friendly:

  • $4-6/month for basic droplets
  • Great API and tooling
  • VPC networking included
  • Managed databases available

Security Best Practices

Protect Your Unseal Keys

  • Never commit to git – use .gitignore
  • Store in password manager – 1Password, Bitwarden, etc.
  • Distribute to key holders – no single person should have all 5 keys
  • Print and store physically – safety deposit box or safe

Losing unseal keys means permanent data loss. There is no recovery mechanism.

Token Management

  • Don’t use root token for daily operations
  • Create admin tokens: vault token create -policy=admin -period=768h
  • Use service-specific policies for applications
  • Enable audit logging: vault audit enable file file_path=/vault/logs/audit.log

Production-Ready Vault HA

You now have a production-grade HashiCorp Vault cluster that can survive node failures, provides automatic failover, and scales across cloud and on-premises infrastructure. This setup demonstrates proper secrets management architecture suitable for everything from homelabs to production deployments.

The combination of Raft storage, HAProxy load balancing, and multi-region deployment gives you true high availability without complex external dependencies. Your applications get a single endpoint that always works, while Vault handles leader election, replication, and failover automatically.

🔒 Highly Available • 🌍 Multi-Region • 🚀 Production-Ready

Next Steps:

  • Integrate with Kubernetes via External Secrets Operator
  • Migrate to auto-unseal with cloud KMS
  • Set up automated backups and monitoring
  • Configure audit logging and SIEM integration
  • Implement policy-based access control for applications
Building a High-Availability HashiCorp Vault Cluster with Raft Storage
Building a High-Availability HashiCorp Vault Cluster with Raft Storage
NAXS Labs
Logo