Scaling flashQ with Clustering: High Availability Guide

Deploy flashQ in high availability mode with clustering. Learn leader election, automatic failover, and Kubernetes deployment strategies.

Clustering Architecture

flashQ clustering uses PostgreSQL as the coordination layer for leader election and shared state:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Node 1     │     │   Node 2     │     │   Node 3     │
│  (Leader)    │     │ (Follower)   │     │ (Follower)   │
│              │     │              │     │              │
│ ✓ Push/Pull  │     │ ✓ Push/Pull  │     │ ✓ Push/Pull  │
│ ✓ Background │     │ ✗ Background │     │ ✗ Background │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       └────────────────────┼────────────────────┘
                            │
                   ┌────────▼────────┐
                   │   PostgreSQL    │
                   │ (Shared State)  │
                   └─────────────────┘

What the Leader Does

Background tasks: Cron job execution, timeout checks, cleanup
Single responsibility: Only one node runs background tasks
Automatic election: Uses PostgreSQL advisory locks

What All Nodes Do

Handle requests: Push, pull, ack, fail operations
Serve API: HTTP and TCP connections
Heartbeat: Report health to PostgreSQL

Enabling Clustering

Environment Variables

Variable	Required	Description
`CLUSTER_MODE`	Yes	Set to `1` to enable clustering
`NODE_ID`	Yes	Unique identifier for this node
`DATABASE_URL`	Yes	PostgreSQL connection string
`NODE_HOST`	Optional	Host address for node registration

Starting a Cluster

# Node 1 - Will become leader (first to acquire lock)
CLUSTER_MODE=1 \
NODE_ID=node-1 \
DATABASE_URL=postgres://user:pass@db:5432/flashq \
HTTP=1 HTTP_PORT=6790 PORT=6789 \
./flashq-server

# Node 2 - Follower
CLUSTER_MODE=1 \
NODE_ID=node-2 \
DATABASE_URL=postgres://user:pass@db:5432/flashq \
HTTP=1 HTTP_PORT=6792 PORT=6793 \
./flashq-server

# Node 3 - Follower
CLUSTER_MODE=1 \
NODE_ID=node-3 \
DATABASE_URL=postgres://user:pass@db:5432/flashq \
HTTP=1 HTTP_PORT=6794 PORT=6795 \
./flashq-server

Leader Election

flashQ uses PostgreSQL advisory locks for leader election:

-- Leader election query (internal)
SELECT pg_try_advisory_lock(12345);

-- Returns TRUE if lock acquired (becomes leader)
-- Returns FALSE if another node holds the lock

Automatic Failover

Timeline:
┌─────────────────────────────────────────────────────────────┐
│ T+0s    Node 1 is leader, holds advisory lock               │
│ T+5s    Node 1 crashes                                      │
│ T+5s    PostgreSQL releases advisory lock automatically     │
│ T+6s    Node 2 acquires lock, becomes new leader            │
│ T+6s    Node 2 starts running background tasks              │
└─────────────────────────────────────────────────────────────┘

# Failover happens within ~1 second of leader failure

Checking Cluster Status

# Check node health and leader status
curl http://localhost:6790/health
{
  "status": "healthy",
  "node_id": "node-1",
  "is_leader": true,
  "uptime_seconds": 3600
}

# List all nodes in cluster
curl http://localhost:6790/cluster/nodes
{
  "nodes": [
    { "id": "node-1", "is_leader": true },
    { "id": "node-2", "is_leader": false },
    { "id": "node-3", "is_leader": false }
  ],
  "leader": "node-1"
}

Load Balancing

Distribute traffic across all nodes using a load balancer:

nginx Configuration

# nginx.conf
upstream flashq_tcp {
    least_conn;
    server node1:6789;
    server node2:6793;
    server node3:6795;
}

upstream flashq_http {
    least_conn;
    server node1:6790;
    server node2:6792;
    server node3:6794;
}

server {
    listen 6789;
    proxy_pass flashq_tcp;
}

server {
    listen 80;
    location / {
        proxy_pass http://flashq_http;
    }
}

Kubernetes Deployment

StatefulSet for flashQ Nodes

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: flashq
spec:
  serviceName: flashq
  replicas: 3
  selector:
    matchLabels:
      app: flashq
  template:
    metadata:
      labels:
        app: flashq
    spec:
      containers:
        - name: flashq
          image: flashq/flashq-server:latest
          env:
            - name: CLUSTER_MODE
              value: "1"
            - name: NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: flashq-secrets
                  key: database-url
            - name: HTTP
              value: "1"
          ports:
            - containerPort: 6789
              name: tcp
            - containerPort: 6790
              name: http
          livenessProbe:
            httpGet:
              path: /health
              port: 6790
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 6790
            initialDelaySeconds: 5

Service for Load Balancing

apiVersion: v1
kind: Service
metadata:
  name: flashq
spec:
  selector:
    app: flashq
  ports:
    - name: tcp
      port: 6789
      targetPort: 6789
    - name: http
      port: 6790
      targetPort: 6790
  type: ClusterIP

Client Configuration

Connect clients to the load balancer:

import { FlashQ } from 'flashq';

// Connect to load balancer
const client = new FlashQ({
  host: 'flashq-lb.internal',  // Load balancer address
  port: 6789,
  token: process.env.FLASHQ_TOKEN
});

await client.connect();

// All operations work transparently
await client.push('my-queue', { data: 'test' });

Best Practice: Always use an odd number of nodes (3, 5, 7) to prevent split-brain scenarios during network partitions.

Related Resources

Ready for High Availability?

Deploy flashQ with clustering for production workloads.

Get Started →