Something's wrong with your queue? This guide covers the most common problems and how to diagnose them using logs, metrics, and built-in tools.
Diagnostic Tools Overview
1. HTTP API Endpoints
The HTTP API provides essential debugging endpoints:
# Health check - basic server status
curl http://localhost:6790/health
# Detailed statistics
curl http://localhost:6790/stats
# Prometheus metrics
curl http://localhost:6790/metrics/prometheus
# List all queues
curl http://localhost:6790/queues
# Get specific queue info
curl http://localhost:6790/queue/my-queue/stats
2. SDK Debugging Methods
import { FlashQ } from 'flashq';
const client = new FlashQ({ host: 'localhost', port: 6789 });
// Get server-wide statistics
const stats = await client.stats();
console.log('Server Stats:', JSON.stringify(stats, null, 2));
// Get metrics with history
const metrics = await client.metrics();
console.log('Metrics:', metrics);
// Get job counts by state
const counts = await client.getJobCounts('my-queue');
console.log('Job Counts:', counts);
// { waiting: 150, active: 10, completed: 5000, failed: 23, delayed: 50 }
// List jobs in a specific state
const failedJobs = await client.getJobs('my-queue', 'failed', 100);
failedJobs.forEach(job => {
console.log(`Job ${job.id}: ${job.error}`);
});
// Get a specific job's details
const job = await client.getJob(jobId);
console.log('Job State:', job.state);
console.log('Job Data:', job.data);
console.log('Job Attempts:', job.attempts);
3. Useful Debugging Script
// debug-queue.ts - Save this for quick diagnostics
import { FlashQ } from 'flashq';
async function diagnose(queueName) {
const client = new FlashQ({ host: 'localhost', port: 6789 });
await client.connect();
console.log('\n=== Queue Diagnostics ===\n');
// 1. Job counts
const counts = await client.getJobCounts(queueName);
console.log('📊 Job Counts:');
console.log(` Waiting: ${counts.waiting}`);
console.log(` Active: ${counts.active}`);
console.log(` Completed: ${counts.completed}`);
console.log(` Failed: ${counts.failed}`);
// 2. Check for stuck jobs
const activeJobs = await client.getJobs(queueName, 'active', 10);
console.log('\n⏳ Active Jobs:');
activeJobs.forEach(job => {
const age = Date.now() - job.processedAt;
const status = age > 60000 ? '⚠️ POSSIBLY STUCK' : '✅ OK';
console.log(` Job ${job.id}: ${Math.round(age/1000)}s old ${status}`);
});
// 3. DLQ check
const dlq = await client.getDlq(queueName, 5);
console.log(`\n☠️ Dead Letter Queue: ${dlq.length} jobs`);
await client.close();
}
diagnose(process.argv[2] || 'default');
Problem: Jobs Stuck in Processing
Symptoms:
- Jobs show as "active" but never complete
getJobCountsshows highactivecount that doesn't decrease- Workers appear idle but jobs aren't being processed
Diagnosis Steps
// 1. Check how long jobs have been active
const activeJobs = await client.getJobs('my-queue', 'active', 100);
activeJobs.forEach(job => {
const activeTime = Date.now() - job.processedAt;
if (activeTime > 60000) { // More than 1 minute
console.log(`⚠️ Job ${job.id} stuck for ${Math.round(activeTime/1000)}s`);
console.log(` Worker: ${job.workerId}`);
}
});
Common Causes & Solutions
1. Worker Crashed Without ACK
// Problem: Worker died before acknowledging
// Solution: flashQ auto-recovers after stall_timeout
// Set appropriate stall timeout when pushing jobs
await client.push('my-queue', data, {
stall_timeout: 30000, // 30 seconds
timeout: 60000 // 60 second processing timeout
});
// Jobs will automatically return to queue after timeout
2. Infinite Loop in Processor
// Problem: Processor never returns
// BAD
const worker = new Worker('my-queue', async (job) => {
while (true) { // Infinite loop!
await doSomething();
}
});
// GOOD - Always have exit conditions
const worker = new Worker('my-queue', async (job) => {
const maxIterations = 1000;
for (let i = 0; i < maxIterations; i++) {
const done = await doSomething();
if (done) break;
}
});
3. Manual Recovery
// Force-fail stuck jobs to return them to queue
const activeJobs = await client.getJobs('my-queue', 'active', 100);
for (const job of activeJobs) {
const activeTime = Date.now() - job.processedAt;
if (activeTime > 300000) { // Stuck for 5+ minutes
console.log(`Force-failing stuck job ${job.id}`);
await client.fail(job.id, 'Manual recovery: job was stuck');
}
}
Problem: DLQ Filling Up
Symptoms:
- Many jobs in Dead Letter Queue
- Same errors repeating
- Jobs failing after max retries
Diagnosis
// Analyze DLQ errors
const dlqJobs = await client.getDlq('my-queue', 100);
// Group by error type
const errorGroups = {};
dlqJobs.forEach(job => {
const errorType = job.error?.split(':')[0] || 'Unknown';
errorGroups[errorType] = (errorGroups[errorType] || 0) + 1;
});
console.log('Error Distribution:');
Object.entries(errorGroups)
.sort((a, b) => b[1] - a[1])
.forEach(([error, count]) => {
console.log(` ${count}x ${error}`);
});
Common Causes & Solutions
1. External API Failures
// Problem: Third-party API returns errors
// Solution: Add circuit breaker and better retry logic
import CircuitBreaker from 'opossum';
const breaker = new CircuitBreaker(callExternalAPI, {
timeout: 10000,
errorThresholdPercentage: 50,
resetTimeout: 30000
});
const worker = new Worker('api-calls', async (job) => {
try {
return await breaker.fire(job.data);
} catch (error) {
if (breaker.opened) {
throw new Error('Circuit breaker open, will retry');
}
throw error;
}
});
2. Invalid Job Data
// Problem: Jobs have bad data that will never succeed
// Solution: Validate before processing, discard invalid
const worker = new Worker('process', async (job) => {
// Validate first
if (!job.data.userId || !job.data.action) {
// Don't retry invalid jobs - discard them
await client.discard(job.id);
console.log(`Discarded invalid job ${job.id}`);
return;
}
await processJob(job.data);
});
3. Retry DLQ Jobs After Fix
// After fixing the root cause, retry DLQ jobs
const retried = await client.retryDlq('my-queue');
console.log(`Retried ${retried} jobs from DLQ`);
// Or retry specific jobs
const dlqJobs = await client.getDlq('my-queue', 100);
for (const job of dlqJobs) {
if (job.error?.includes('API rate limit')) {
await client.retryDlq('my-queue', job.id);
}
}
Problem: Slow Processing
Symptoms:
- Queue backlog growing
- High latency between push and processing
- Throughput lower than expected
Diagnosis
// Check queue depth and processing rate
const stats = await client.stats();
console.log('Queue depths:', stats.queues);
// Monitor throughput over time
const metrics = await client.metrics();
console.log('Throughput history:', metrics.throughput);
Common Causes & Solutions
1. Low Worker Concurrency
// Problem: Only processing one job at a time
// Solution: Increase concurrency
// Before
const worker = new Worker('my-queue', processor); // Default: 1
// After - for I/O bound tasks
const worker = new Worker('my-queue', processor, {
concurrency: 20 // Process 20 jobs concurrently
});
2. Rate Limiting Too Strict
// Increase rate limit
await client.setRateLimit('my-queue', {
max: 100, // 100 jobs
window: 1000 // per second
});
// Or clear rate limit entirely for testing
await client.clearRateLimit('my-queue');
3. Enable Binary Protocol
// 40% faster serialization with MessagePack
const client = new FlashQ({
host: 'localhost',
port: 6789,
useBinary: true
});
Problem: Connection Issues
Symptoms:
- Connection timeouts
- "Connection refused" errors
- Intermittent disconnects
Diagnosis
# Check if server is running
curl http://localhost:6790/health
# Check port is open
nc -zv localhost 6789
# Check server logs
docker logs flashq-server 2>&1 | tail -100
Common Causes & Solutions
1. Connection Pool Exhausted
// Problem: Too many connections
// Solution: Reuse client instances
// BAD - Creates new connection per request
app.post('/job', async (req, res) => {
const client = new FlashQ({ host: 'localhost' });
await client.connect();
await client.push('queue', req.body);
await client.close(); // Connection churn!
});
// GOOD - Singleton client
const client = new FlashQ({ host: 'localhost' });
await client.connect();
app.post('/job', async (req, res) => {
await client.push('queue', req.body); // Reuses connection
});
2. Network Timeouts
// Increase timeout for slow networks
const client = new FlashQ({
host: 'remote-server.com',
port: 6789,
timeout: 30000 // 30 second timeout
});
Problem: Memory Issues
Symptoms:
- Server memory growing continuously
- OOM kills
- Slow response times
Diagnosis
# Check server memory
docker stats flashq-server
# Check completed jobs count
curl http://localhost:6790/stats | jq '.completed_jobs'
Common Causes & Solutions
1. Too Many Completed Jobs Stored
// Problem: Keeping too many completed job results
// Solution: Set retention limits
// Per-job retention
await client.push('my-queue', data, {
keepCompletedAge: 3600000, // Keep for 1 hour only
keepCompletedCount: 100 // Or keep last 100
});
// Clean old jobs periodically
await client.clean('my-queue', 3600000, 'completed', 1000);
// Removes completed jobs older than 1 hour, max 1000
Quick Reference: Debug Commands
| Problem | Command |
|---|---|
| Check server health | curl localhost:6790/health |
| View all queue stats | curl localhost:6790/stats |
| Check specific job | client.getJob(jobId) |
| View DLQ | client.getDlq('queue', 100) |
| Count jobs by state | client.getJobCounts('queue') |
| List stuck jobs | client.getJobs('queue', 'active', 100) |
| Check if paused | client.isPaused('queue') |
| Retry failed jobs | client.retryDlq('queue') |
Still Stuck? Open an issue on GitHub with your diagnostic output and we'll help you troubleshoot.