Building a High-Availability HashiCorp Vault Cluster with Raft Storage

0 OPs

February 24, 2026

HashiCorp Vault HA Cluster

Building a 3-Node Raft Cluster Across Cloud and On-Premises Infrastructure

Deploy a production-grade HashiCorp Vault high-availability cluster using integrated Raft storage, spanning multiple cloud providers and on-premises infrastructure with automatic failover and load balancing.

Why Vault HA with Raft?

Single-node Vault deployments are a liability in production. When that server goes down, your entire infrastructure loses access to secrets – databases can’t connect, applications can’t authenticate, deployments grind to a halt. I learned this the hard way when a simple server restart took down my entire homelab for 20 minutes while everything waited for Vault to come back.

The Solution: HashiCorp Vault’s integrated Raft storage provides built-in high availability without external dependencies like Consul. With Raft, you get automatic leader election, strong consistency across nodes, and the ability to survive node failures while maintaining access to your secrets.

This guide shows you how to build a 3-node Vault cluster that can lose any single node and keep running. We’ll deploy nodes across cloud providers and on-premises infrastructure, use HAProxy for automatic failover, and configure everything for production use.

Why Raft Over External Storage?

The Traditional Approach

Historically, Vault HA required an external storage backend like Consul, etcd, or DynamoDB. This meant managing additional infrastructure, dealing with network dependencies, and troubleshooting two distributed systems instead of one.

Integrated Raft Benefits

No external storage dependencies
Simplified architecture and operations
Built-in leader election
Strong consistency guarantees
Lower operational complexity

How Raft Works

Quorum-based consensus (need N/2 + 1 nodes)
Automatic leader election on failure
Synchronous replication to majority
Followers forward writes to leader
Can tolerate (N-1)/2 node failures

Architecture Overview

3-Node Distributed Deployment

This setup demonstrates a hybrid cloud architecture – one node on-premises for low-latency local access, two nodes in the cloud for geographic redundancy. All nodes communicate over an encrypted overlay network (ZeroTier, Tailscale, or WireGuard work well).

Node 1: On-Premises

Location: Home network/datacenter

Role: Disaster recovery node

Benefits: Low-latency local access, no cloud costs

Node 2: Cloud Provider

Location: Oracle Cloud / Linode / DigitalOcean

Role: Primary leader candidate

Benefits: High availability, geographic redundancy

Node 3: Cloud Provider

Location: Same or different cloud region

Role: Follower / failover candidate

Benefits: Completes quorum, enables HA

Quorum Math

3 nodes = can lose 1 node: If you have 2 cloud nodes and 1 fails, you still have quorum (2/3). If your on-prem node goes down, the cloud nodes maintain service. This gives you true high availability.

Why not 2 nodes? With 2 nodes, you need both for quorum (2/2). Losing one means the cluster stops accepting writes. Always use odd numbers: 3, 5, or 7 nodes.

TLS Certificate Requirements

Critical: Mutual TLS for Raft

Raft cluster communication requires mutual TLS authentication. Your certificates MUST include both serverAuth and clientAuth in Extended Key Usage. Missing clientAuth will cause “tls: bad certificate” errors during cluster formation.

Step 1: Create Certificate Authority

One-Time CA Setup

Before generating node certificates, create your own Certificate Authority. This CA will sign all Vault node certificates and must be trusted by all nodes in the cluster.

#!/bin/bash
# Create your own Certificate Authority

# Generate CA private key
openssl genrsa -out rootCA.key 4096

# Generate CA certificate (valid for 10 years)
openssl req -x509 -new -nodes \
  -key rootCA.key \
  -sha256 -days 3650 \
  -out rootCA.crt \
  -subj "/C=US/ST=State/L=City/O=Your Organization/OU=Certificate Authority/CN=Vault Root CA"

echo "✓ Created rootCA.crt and rootCA.key"
echo "⚠ Keep rootCA.key secure - it signs all your certificates"

Step 2: Generate Node Certificates

Important: HAProxy Hostname in All Certificates

If you’re using HAProxy for load balancing, all three node certificates must include the HAProxy hostname (e.g., vault.example.com) in their Subject Alternative Names. This allows clients to connect through HAProxy without certificate errors.

#!/bin/bash
# gen-cert.sh - Generate Vault TLS Certificates with Raft Requirements

if [ "$#" -ne 2 ]; then
    echo "Usage: $0 <hostname> <IP>"
    echo "Example: $0 vault1.example.com 192.168.1.10"
    exit 1
fi

HOSTNAME=$1
IP=$2

# Create certificate configuration
cat > cert.cnf <<EOF
[ req ]
default_bits       = 2048
default_md         = sha256
distinguished_name = req_distinguished_name
req_extensions     = v3_req
prompt             = no

[ req_distinguished_name ]
C  = US
ST = State
L  = City
O  = Your Organization
OU = Infrastructure
CN = ${HOSTNAME}

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = ${HOSTNAME}
DNS.2 = vault.example.com
IP.1  = ${IP}
EOF

# Generate private key and CSR
openssl req -new -newkey rsa:2048 -nodes \
  -keyout ${HOSTNAME}-key.pem \
  -out ${HOSTNAME}.csr \
  -config cert.cnf

# Sign certificate with CA
# CRITICAL: Both -extfile and -extensions must be specified
# Otherwise v3 extensions (including clientAuth) won't be included
openssl x509 -req \
  -in ${HOSTNAME}.csr \
  -CA rootCA.crt \
  -CAkey rootCA.key \
  -CAcreateserial \
  -out ${HOSTNAME}.pem \
  -days 365 \
  -sha256 \
  -extfile cert.cnf \
  -extensions v3_req

# Cleanup
rm ${HOSTNAME}.csr cert.cnf

echo "✓ Created: ${HOSTNAME}-key.pem and ${HOSTNAME}.pem"

# Verify certificate includes clientAuth
echo ""
echo "Verifying Extended Key Usage:"
openssl x509 -in ${HOSTNAME}.pem -text -noout | grep -A 1 "Extended Key Usage"
EOF

chmod +x gen-cert.sh

Generate Certificates for Each Node

# On-prem node
./gen-cert.sh vault1.example.com 192.168.1.10

# Cloud node 1
./gen-cert.sh vault2.example.com 10.0.1.20

# Cloud node 2
./gen-cert.sh vault3.example.com 10.0.1.30

# Verify each certificate shows:
# TLS Web Server Authentication, TLS Web Client Authentication

Certificate Deployment

Each Vault node needs:

rootCA.crt – Same CA certificate on all nodes (for trust)
vaultX.pem – Node-specific certificate
key.pem – Node-specific private key (rename from vaultX-key.pem)

Docker Compose Configuration

Containerized Deployment

Using Docker makes Vault deployment consistent across different infrastructure providers. The same configuration works on bare metal, VMs, or cloud instances.

Directory Structure

~/vault/
├── docker-compose.yml
├── Dockerfile
├── rootCA.crt
├── config/
│   └── config.hcl
├── certs/
│   ├── vault1.pem (or vault2.pem, vault3.pem)
│   └── key.pem
└── data/  (Raft storage - created automatically)

Dockerfile with CA Trust

FROM hashicorp/vault:latest

# Add your internal CA to container trust store
COPY rootCA.crt /usr/local/share/ca-certificates/rootCA.crt
RUN apk add --no-cache ca-certificates && \
    update-ca-certificates

Why Custom Dockerfile?

Vault nodes need to trust each other’s certificates. By embedding your CA certificate in the container image, you ensure TLS verification works without tls_skip_verify hacks.

docker-compose.yml

version: "3.8"
services:
  vault:
    build: .
    container_name: vault
    restart: unless-stopped
    cap_add:
      - IPC_LOCK
    environment:
      VAULT_ADDR: "https://vault1.example.com:8200"
      VAULT_API_ADDR: "https://vault1.example.com:8200"
    ports:
      - "8200:8200"  # API port
      - "8201:8201"  # Cluster port
    volumes:
      - ./config:/vault/config:ro
      - ./data:/vault/data:rw
      - ./certs:/certs:ro
    entrypoint: ["vault", "server", "-config=/vault/config/config.hcl"]

Vault Configuration (config.hcl)

# Node 1 Configuration (adjust for each node)
storage "raft" {
  path = "/vault/data"
  node_id = "vault1"
  
  # Automatically try to join other nodes on startup
  retry_join {
    leader_api_addr = "https://vault2.example.com:8200"
  }
  retry_join {
    leader_api_addr = "https://vault3.example.com:8200"
  }
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/certs/vault1.pem"
  tls_key_file = "/certs/key.pem"
}

api_addr = "https://vault1.example.com:8200"
cluster_addr = "https://vault1.example.com:8201"
disable_mlock = true
ui = true

Configuration Notes

node_id: Must be unique per node (vault1, vault2, vault3)
retry_join: Allows automatic cluster formation on restart
api_addr: How clients reach this node
cluster_addr: How nodes communicate with each other (port 8201)

Cluster Initialization

Critical Initialization Order

For Raft storage, you MUST join before unsealing. This is different from file-based storage where you unseal first. The order matters because nodes need to establish Raft cluster membership while sealed.

Step 1: Initialize Leader Node

# On Node 2 (cloud node, will be initial leader)
cd ~/vault
docker-compose up -d

# Initialize the cluster (ONE TIME ONLY)
docker exec vault vault operator init

# OUTPUT - SAVE THIS SECURELY:
# Unseal Key 1: base64-encoded-key-1
# Unseal Key 2: base64-encoded-key-2
# Unseal Key 3: base64-encoded-key-3
# Unseal Key 4: base64-encoded-key-4
# Unseal Key 5: base64-encoded-key-5
# Initial Root Token: hvs.xxxxxxxxxxxxxxxx

# Unseal the leader (need 3 of 5 keys)
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 

# Verify it's unsealed and active
docker exec vault vault status

Step 2: Join Follower Nodes

# On Node 1 (on-prem) and Node 3 (cloud)
cd ~/vault
docker-compose up -d

# JOIN to the leader (while still sealed)
docker exec vault vault operator raft join https://vault2.example.com:8200

# OUTPUT: Joined    true

# NOW unseal with the SAME keys from initialization
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 
docker exec vault vault operator unseal 

# Verify status
docker exec vault vault status

Step 3: Verify Cluster Formation

# On any node, set environment and check cluster
export VAULT_ADDR=https://vault2.example.com:8200
export VAULT_TOKEN=

vault operator raft list-peers

# Expected output:
# Node     Address                    State       Voter
# ----     -------                    -----       -----
# vault2   vault2.example.com:8201    leader      true
# vault1   vault1.example.com:8201    follower    true
# vault3   vault3.example.com:8201    follower    true

✓ Cluster Successfully Formed

All three nodes are now part of the Raft cluster. Vault2 is the current leader, but if it fails, vault1 or vault3 will automatically be elected leader within seconds.

HAProxy Load Balancer

True High Availability

While Vault provides automatic leader election, your applications need a single endpoint that always works. HAProxy provides health-checked load balancing – it automatically detects failed nodes and routes traffic only to healthy Vault instances.

HAProxy Configuration

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode tcp
    option tcplog
    timeout connect 5000
    timeout client  50000
    timeout server  50000

frontend vault_frontend
    bind *:8200
    mode tcp
    default_backend vault_backend

backend vault_backend
    mode tcp
    balance roundrobin
    option tcp-check
    tcp-check connect
    
    # Server definitions with health checks
    # Check every 5s, mark down after 3 failures, mark up after 2 successes
    server vault1 192.168.1.10:8200 check inter 5s fall 3 rise 2
    server vault2 10.0.1.20:8200 check inter 5s fall 3 rise 2
    server vault3 10.0.1.30:8200 check inter 5s fall 3 rise 2

Health Check Parameters

inter 5s: Check every 5 seconds
fall 3: Mark server down after 3 consecutive failures (15 seconds total)
rise 2: Mark server up after 2 consecutive successes (10 seconds total)

This configuration provides ~15 second failover time – acceptable for most production use cases.

DNS Configuration

# Create single DNS entry pointing to HAProxy
vault.example.com → 203.0.113.50 (HAProxy IP)

# Applications use this single address:
export VAULT_ADDR=https://vault.example.com:8200

Benefits of HAProxy + DNS

Applications use one address: vault.example.com:8200
HAProxy routes to healthy nodes only
Automatic failover without client changes
Load distribution across all nodes for reads
No application-level retry logic needed

Vault Context Switching

Managing Multiple Vault Endpoints

With three individual nodes plus the HAProxy endpoint, you need a quick way to switch between them for administration and testing. Shell functions make this trivial.

# Add to ~/.zshrc or ~/.bashrc

# Vault addresses (adjust to your actual hostnames)
export VAULT1="https://vault1.example.com:8200"
export VAULT2="https://vault2.example.com:8200"
export VAULT3="https://vault3.example.com:8200"
export VAULT_HA="https://vault.example.com:8200"

# Unseal keys (source from secure file)
export UNSEAL_KEY1="your-key-1"
export UNSEAL_KEY2="your-key-2"
export UNSEAL_KEY3="your-key-3"

# Root token (source from secure file)
export VAULT_ROOT_TOKEN="hvs.xxxxxxxxxxxx"

# Context switching functions
vault-node1() {
  export VAULT_ADDR="$VAULT1"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: Node 1 (on-prem)"
}

vault-node2() {
  export VAULT_ADDR="$VAULT2"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: Node 2 (cloud leader)"
}

vault-node3() {
  export VAULT_ADDR="$VAULT3"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: Node 3 (cloud follower)"
}

vault-ha() {
  export VAULT_ADDR="$VAULT_HA"
  export VAULT_TOKEN="$VAULT_ROOT_TOKEN"
  echo "✓ Vault context: HAProxy (load balanced)"
}

# Quick unseal script
vault-unseal() {
  local addr=${1:-$VAULT_ADDR}
  echo "Unsealing $addr..."
  vault operator unseal -address="$addr" "$UNSEAL_KEY1" > /dev/null
  vault operator unseal -address="$addr" "$UNSEAL_KEY2" > /dev/null
  vault operator unseal -address="$addr" "$UNSEAL_KEY3" > /dev/null
  vault status -address="$addr" | grep "Sealed"
}

# Unseal all nodes
vault-unseal-all() {
  vault-unseal "$VAULT1"
  vault-unseal "$VAULT2"
  vault-unseal "$VAULT3"
  echo ""
  vault operator raft list-peers -address="$VAULT2"
}

Usage Examples

# Switch to HAProxy endpoint (normal usage)
vault-ha
vault kv put secret/myapp/config api_key="secret123"

# Check cluster status on leader
vault-node2
vault operator raft list-peers

# Unseal specific node after restart
vault-unseal https://vault1.example.com:8200

# Unseal all three nodes after full cluster restart
vault-unseal-all

Network Connectivity

Connecting Cloud and On-Premises

For a hybrid deployment spanning cloud and on-prem infrastructure, you need encrypted connectivity between locations. Modern overlay networks make this simple without complex VPN configurations.

ZeroTier

Software-defined networking with central management. Create a network, join nodes, assign IPs. Handles NAT traversal automatically.

curl -s https://install.zerotier.com | sudo bash
sudo zerotier-cli join YOUR_NETWORK_ID

Tailscale

WireGuard-based mesh network with SSO integration. Easiest setup, works everywhere, includes MagicDNS.

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

WireGuard

Manual VPN configuration for full control. More setup work, maximum flexibility.

sudo apt install wireguard
# Configure peers manually
# in /etc/wireguard/wg0.conf

Firewall Configuration

Ensure these ports are open between all Vault nodes:

8200/tcp: Vault API (client connections)
8201/tcp: Vault cluster communication (Raft)

# Ubuntu/Debian
sudo ufw allow 8200/tcp
sudo ufw allow 8201/tcp

# RHEL/CentOS
sudo firewall-cmd --permanent --add-port=8200/tcp
sudo firewall-cmd --permanent --add-port=8201/tcp
sudo firewall-cmd --reload

Testing Failover

Verify Your HA Setup

The real test of high availability is graceful degradation under failure. Let’s verify automatic failover works.

Test 1: Leader Failure

# Check current leader
vault operator raft list-peers

# Simulate leader failure (stop vault2)
# On vault2:
docker-compose down

# Watch automatic leader election (on vault1 or vault3)
watch -n 1 'vault operator raft list-peers'

# Verify HAProxy still works
vault status -address=https://vault.example.com:8200

# Applications continue working - writes now go to new leader
vault kv put secret/test key="written during failover"

Test 2: Follower Failure

# Stop a follower node
# On vault3:
docker-compose down

# Cluster still has quorum (2/3 nodes)
vault operator raft list-peers

# Verify operations still work
vault kv get secret/test

# HAProxy automatically removes failed node from pool
# Check HAProxy stats to see node marked down

Test 3: Network Partition

# Simulate network partition (on-prem node isolated)
# On vault1, block cluster port:
sudo iptables -A INPUT -p tcp --dport 8201 -j DROP

# Cloud nodes (vault2, vault3) maintain quorum
# vault1 can still serve reads but not writes

# Restore connectivity
sudo iptables -D INPUT -p tcp --dport 8201 -j DROP

# vault1 automatically rejoins cluster

Operational Considerations

Unseal After Restart

Vault starts sealed after any restart. With Shamir seal (default), you must manually unseal nodes.

Only leader needs unsealing after cluster restart
Followers auto-unseal once leader is up
Individual node restarts require manual unseal

Auto-Unseal (Future)

For production, consider auto-unseal with cloud KMS:

AWS KMS (essentially free under 20K req/month)
Azure Key Vault (similar free tier)
GCP Cloud KMS

Migration: vault operator unseal -migrate

Backup Strategy

# Take Raft snapshot (on leader)
vault operator raft snapshot save backup-$(date +%Y%m%d).snap

# Restore snapshot (emergency recovery)
vault operator raft snapshot restore -force backup.snap

# Automated daily backups
cat > /etc/cron.daily/vault-backup <<'EOF'
#!/bin/bash
BACKUP_DIR="/backup/vault"
DATE=$(date +%Y%m%d)
export VAULT_ADDR=https://vault.example.com:8200
export VAULT_TOKEN=

vault operator raft snapshot save ${BACKUP_DIR}/vault-${DATE}.snap
find ${BACKUP_DIR} -name "vault-*.snap" -mtime +30 -delete
EOF
chmod +x /etc/cron.daily/vault-backup

Monitoring & Alerting

# Prometheus metrics endpoint
curl https://vault.example.com:8200/v1/sys/metrics?format=prometheus

# Key metrics to monitor:
# - vault_core_unsealed (should be 1)
# - vault_core_active (1 on leader, 0 on followers)
# - vault_raft_peers (should equal number of nodes)
# - vault_raft_leader (should be 1)

# Example Prometheus alert
- alert: VaultSealed
  expr: vault_core_unsealed == 0
  for: 2m
  annotations:
    summary: "Vault node is sealed"
    
- alert: VaultNoLeader
  expr: sum(vault_raft_leader) == 0
  for: 1m
  annotations:
    summary: "Vault cluster has no leader"

Cloud Provider Recommendations

Oracle Cloud (Free Tier)

Best for homelabs:

4 ARM Ampere instances (24GB RAM total)
200GB storage
10TB egress/month
Actually free forever

Linode/Akamai

Simple pricing:

$5/month for 1GB RAM instance
Excellent documentation
Fast provisioning
Good for production workloads

DigitalOcean

Developer friendly:

$4-6/month for basic droplets
Great API and tooling
VPC networking included
Managed databases available

Security Best Practices

Protect Your Unseal Keys

Never commit to git – use .gitignore
Store in password manager – 1Password, Bitwarden, etc.
Distribute to key holders – no single person should have all 5 keys
Print and store physically – safety deposit box or safe

Losing unseal keys means permanent data loss. There is no recovery mechanism.

Token Management

Don’t use root token for daily operations
Create admin tokens: vault token create -policy=admin -period=768h
Use service-specific policies for applications
Enable audit logging: vault audit enable file file_path=/vault/logs/audit.log

Next Steps:

Integrate with Kubernetes via External Secrets Operator
Migrate to auto-unseal with cloud KMS
Set up automated backups and monitoring
Configure audit logging and SIEM integration
Implement policy-based access control for applications

Building a High-Availability HashiCorp Vault Cluster with Raft Storage

HashiCorp Vault HA Cluster

Why Vault HA with Raft?

Why Raft Over External Storage?

The Traditional Approach

Integrated Raft Benefits

How Raft Works

Architecture Overview

3-Node Distributed Deployment

Node 1: On-Premises

Node 2: Cloud Provider

Node 3: Cloud Provider

Quorum Math

TLS Certificate Requirements

Critical: Mutual TLS for Raft

Step 1: Create Certificate Authority

One-Time CA Setup

Step 2: Generate Node Certificates

Important: HAProxy Hostname in All Certificates

Generate Certificates for Each Node

Certificate Deployment

Docker Compose Configuration

Containerized Deployment

Directory Structure

Dockerfile with CA Trust

Why Custom Dockerfile?

docker-compose.yml

Vault Configuration (config.hcl)

Configuration Notes

Cluster Initialization

Critical Initialization Order

Step 1: Initialize Leader Node

Step 2: Join Follower Nodes

Step 3: Verify Cluster Formation

✓ Cluster Successfully Formed

HAProxy Load Balancer

True High Availability

HAProxy Configuration

Health Check Parameters

DNS Configuration

Benefits of HAProxy + DNS

Vault Context Switching

Managing Multiple Vault Endpoints

Usage Examples

Network Connectivity

Connecting Cloud and On-Premises

ZeroTier

Tailscale

WireGuard

Firewall Configuration

Testing Failover

Verify Your HA Setup

Test 1: Leader Failure

Test 2: Follower Failure

Test 3: Network Partition

Operational Considerations

Unseal After Restart

Auto-Unseal (Future)

Backup Strategy

Monitoring & Alerting

Cloud Provider Recommendations

Oracle Cloud (Free Tier)

Linode/Akamai

DigitalOcean

Security Best Practices

Protect Your Unseal Keys

Token Management

Kubernetes Secrets Management with Vault

Kubernetes RBAC: Creating Users, Groups, and Securing Credentials with Vault

Kubernetes Authentication with Okta OIDC

Kubernetes RBAC: Creating Users, Groups, and Securing Credentials with Vault

Kubernetes Secrets Management with Vault