Features Blog Docs GitHub Get Started

Debugging flashQ: Logs, Metrics, and Troubleshooting Guide

Something's wrong with your queue? This guide covers the most common problems and how to diagnose them using logs, metrics, and built-in tools.

Diagnostic Tools Overview

1. HTTP API Endpoints

The HTTP API provides essential debugging endpoints:

# Health check - basic server status
curl http://localhost:6790/health

# Detailed statistics
curl http://localhost:6790/stats

# Prometheus metrics
curl http://localhost:6790/metrics/prometheus

# List all queues
curl http://localhost:6790/queues

# Get specific queue info
curl http://localhost:6790/queue/my-queue/stats

2. SDK Debugging Methods

import { FlashQ } from 'flashq';

const client = new FlashQ({ host: 'localhost', port: 6789 });

// Get server-wide statistics
const stats = await client.stats();
console.log('Server Stats:', JSON.stringify(stats, null, 2));

// Get metrics with history
const metrics = await client.metrics();
console.log('Metrics:', metrics);

// Get job counts by state
const counts = await client.getJobCounts('my-queue');
console.log('Job Counts:', counts);
// { waiting: 150, active: 10, completed: 5000, failed: 23, delayed: 50 }

// List jobs in a specific state
const failedJobs = await client.getJobs('my-queue', 'failed', 100);
failedJobs.forEach(job => {
  console.log(`Job ${job.id}: ${job.error}`);
});

// Get a specific job's details
const job = await client.getJob(jobId);
console.log('Job State:', job.state);
console.log('Job Data:', job.data);
console.log('Job Attempts:', job.attempts);

3. Useful Debugging Script

// debug-queue.ts - Save this for quick diagnostics
import { FlashQ } from 'flashq';

async function diagnose(queueName) {
  const client = new FlashQ({ host: 'localhost', port: 6789 });
  await client.connect();

  console.log('\n=== Queue Diagnostics ===\n');

  // 1. Job counts
  const counts = await client.getJobCounts(queueName);
  console.log('📊 Job Counts:');
  console.log(`   Waiting:   ${counts.waiting}`);
  console.log(`   Active:    ${counts.active}`);
  console.log(`   Completed: ${counts.completed}`);
  console.log(`   Failed:    ${counts.failed}`);

  // 2. Check for stuck jobs
  const activeJobs = await client.getJobs(queueName, 'active', 10);
  console.log('\n⏳ Active Jobs:');
  activeJobs.forEach(job => {
    const age = Date.now() - job.processedAt;
    const status = age > 60000 ? '⚠️ POSSIBLY STUCK' : '✅ OK';
    console.log(`   Job ${job.id}: ${Math.round(age/1000)}s old ${status}`);
  });

  // 3. DLQ check
  const dlq = await client.getDlq(queueName, 5);
  console.log(`\n☠️ Dead Letter Queue: ${dlq.length} jobs`);

  await client.close();
}

diagnose(process.argv[2] || 'default');

Problem: Jobs Stuck in Processing

Symptoms:

Diagnosis Steps

// 1. Check how long jobs have been active
const activeJobs = await client.getJobs('my-queue', 'active', 100);

activeJobs.forEach(job => {
  const activeTime = Date.now() - job.processedAt;
  if (activeTime > 60000) { // More than 1 minute
    console.log(`⚠️ Job ${job.id} stuck for ${Math.round(activeTime/1000)}s`);
    console.log(`   Worker: ${job.workerId}`);
  }
});

Common Causes & Solutions

1. Worker Crashed Without ACK

// Problem: Worker died before acknowledging
// Solution: flashQ auto-recovers after stall_timeout

// Set appropriate stall timeout when pushing jobs
await client.push('my-queue', data, {
  stall_timeout: 30000,  // 30 seconds
  timeout: 60000         // 60 second processing timeout
});

// Jobs will automatically return to queue after timeout

2. Infinite Loop in Processor

// Problem: Processor never returns
// BAD
const worker = new Worker('my-queue', async (job) => {
  while (true) {  // Infinite loop!
    await doSomething();
  }
});

// GOOD - Always have exit conditions
const worker = new Worker('my-queue', async (job) => {
  const maxIterations = 1000;
  for (let i = 0; i < maxIterations; i++) {
    const done = await doSomething();
    if (done) break;
  }
});

3. Manual Recovery

// Force-fail stuck jobs to return them to queue
const activeJobs = await client.getJobs('my-queue', 'active', 100);

for (const job of activeJobs) {
  const activeTime = Date.now() - job.processedAt;
  if (activeTime > 300000) { // Stuck for 5+ minutes
    console.log(`Force-failing stuck job ${job.id}`);
    await client.fail(job.id, 'Manual recovery: job was stuck');
  }
}

Problem: DLQ Filling Up

Symptoms:

Diagnosis

// Analyze DLQ errors
const dlqJobs = await client.getDlq('my-queue', 100);

// Group by error type
const errorGroups = {};
dlqJobs.forEach(job => {
  const errorType = job.error?.split(':')[0] || 'Unknown';
  errorGroups[errorType] = (errorGroups[errorType] || 0) + 1;
});

console.log('Error Distribution:');
Object.entries(errorGroups)
  .sort((a, b) => b[1] - a[1])
  .forEach(([error, count]) => {
    console.log(`  ${count}x ${error}`);
  });

Common Causes & Solutions

1. External API Failures

// Problem: Third-party API returns errors
// Solution: Add circuit breaker and better retry logic

import CircuitBreaker from 'opossum';

const breaker = new CircuitBreaker(callExternalAPI, {
  timeout: 10000,
  errorThresholdPercentage: 50,
  resetTimeout: 30000
});

const worker = new Worker('api-calls', async (job) => {
  try {
    return await breaker.fire(job.data);
  } catch (error) {
    if (breaker.opened) {
      throw new Error('Circuit breaker open, will retry');
    }
    throw error;
  }
});

2. Invalid Job Data

// Problem: Jobs have bad data that will never succeed
// Solution: Validate before processing, discard invalid

const worker = new Worker('process', async (job) => {
  // Validate first
  if (!job.data.userId || !job.data.action) {
    // Don't retry invalid jobs - discard them
    await client.discard(job.id);
    console.log(`Discarded invalid job ${job.id}`);
    return;
  }

  await processJob(job.data);
});

3. Retry DLQ Jobs After Fix

// After fixing the root cause, retry DLQ jobs
const retried = await client.retryDlq('my-queue');
console.log(`Retried ${retried} jobs from DLQ`);

// Or retry specific jobs
const dlqJobs = await client.getDlq('my-queue', 100);
for (const job of dlqJobs) {
  if (job.error?.includes('API rate limit')) {
    await client.retryDlq('my-queue', job.id);
  }
}

Problem: Slow Processing

Symptoms:

Diagnosis

// Check queue depth and processing rate
const stats = await client.stats();
console.log('Queue depths:', stats.queues);

// Monitor throughput over time
const metrics = await client.metrics();
console.log('Throughput history:', metrics.throughput);

Common Causes & Solutions

1. Low Worker Concurrency

// Problem: Only processing one job at a time
// Solution: Increase concurrency

// Before
const worker = new Worker('my-queue', processor);  // Default: 1

// After - for I/O bound tasks
const worker = new Worker('my-queue', processor, {
  concurrency: 20  // Process 20 jobs concurrently
});

2. Rate Limiting Too Strict

// Increase rate limit
await client.setRateLimit('my-queue', {
  max: 100,      // 100 jobs
  window: 1000   // per second
});

// Or clear rate limit entirely for testing
await client.clearRateLimit('my-queue');

3. Enable Binary Protocol

// 40% faster serialization with MessagePack
const client = new FlashQ({
  host: 'localhost',
  port: 6789,
  useBinary: true
});

Problem: Connection Issues

Symptoms:

Diagnosis

# Check if server is running
curl http://localhost:6790/health

# Check port is open
nc -zv localhost 6789

# Check server logs
docker logs flashq-server 2>&1 | tail -100

Common Causes & Solutions

1. Connection Pool Exhausted

// Problem: Too many connections
// Solution: Reuse client instances

// BAD - Creates new connection per request
app.post('/job', async (req, res) => {
  const client = new FlashQ({ host: 'localhost' });
  await client.connect();
  await client.push('queue', req.body);
  await client.close();  // Connection churn!
});

// GOOD - Singleton client
const client = new FlashQ({ host: 'localhost' });
await client.connect();

app.post('/job', async (req, res) => {
  await client.push('queue', req.body);  // Reuses connection
});

2. Network Timeouts

// Increase timeout for slow networks
const client = new FlashQ({
  host: 'remote-server.com',
  port: 6789,
  timeout: 30000  // 30 second timeout
});

Problem: Memory Issues

Symptoms:

Diagnosis

# Check server memory
docker stats flashq-server

# Check completed jobs count
curl http://localhost:6790/stats | jq '.completed_jobs'

Common Causes & Solutions

1. Too Many Completed Jobs Stored

// Problem: Keeping too many completed job results
// Solution: Set retention limits

// Per-job retention
await client.push('my-queue', data, {
  keepCompletedAge: 3600000,  // Keep for 1 hour only
  keepCompletedCount: 100     // Or keep last 100
});

// Clean old jobs periodically
await client.clean('my-queue', 3600000, 'completed', 1000);
// Removes completed jobs older than 1 hour, max 1000

Quick Reference: Debug Commands

Problem Command
Check server health curl localhost:6790/health
View all queue stats curl localhost:6790/stats
Check specific job client.getJob(jobId)
View DLQ client.getDlq('queue', 100)
Count jobs by state client.getJobCounts('queue')
List stuck jobs client.getJobs('queue', 'active', 100)
Check if paused client.isPaused('queue')
Retry failed jobs client.retryDlq('queue')
Still Stuck? Open an issue on GitHub with your diagnostic output and we'll help you troubleshoot.

Related Resources

Need Help?

Check our documentation or open an issue on GitHub.

Read the Docs →