If you've ever built an application that needs to process tasks in the backgroundβsending emails, generating reports, processing images, or calling AI APIsβyou've probably encountered job queues. They're the backbone of scalable applications, allowing you to offload work from your main application thread and process it asynchronously.
flashQ is a high-performance job queue built specifically for modern AI workloads. It's designed to be fast, simple, and reliableβwithout requiring you to manage Redis or any external infrastructure.
The Problem with Traditional Job Queues
Most job queues in the Node.js ecosystem rely on Redis. Tools like BullMQ, Bull, and Bee-Queue are excellent, but they come with a significant operational overhead:
- Redis management: You need to provision, configure, and maintain a Redis instance
- Memory costs: Redis stores everything in memory, which gets expensive at scale
- Persistence concerns: Redis persistence (RDB/AOF) requires careful tuning
- Network latency: Every job operation requires a network round-trip to Redis
- Payload limitations: Redis has practical limits on value sizes (~512MB, but performance degrades much earlier)
For AI workloads specifically, these limitations become even more painful. AI applications often need to:
- Send large payloads (embeddings, images, long text contexts)
- Chain multiple operations together (embed β search β generate)
- Rate limit API calls to avoid hitting provider quotas
- Handle long-running jobs (minutes, not seconds)
Enter flashQ
flashQ was built from the ground up to solve these problems. Here's what makes it different:
1. No Redis Required
flashQ is a standalone server written in Rust. You run a single binary, and you're done. No Redis to provision, no connection strings to manage, no memory to monitor.
# Start flashQ server
./flashq-server
# Or with Docker
docker run -p 6789:6789 flashq/flashq
For persistence, flashQ can optionally connect to PostgreSQL. But for many use cases, the in-memory mode is perfectly sufficient.
2. BullMQ-Compatible API
If you're already using BullMQ, switching to flashQ is trivial. The API is intentionally compatible:
// Before (BullMQ)
import { Queue, Worker } from 'bullmq';
// After (flashQ)
import { Queue, Worker } from 'flashq';
That's it. Your existing code works with minimal changes.
3. 10x Faster
Because flashQ eliminates the network hop to Redis and is written in Rust with careful attention to performance, it's significantly faster:
| Metric | flashQ | BullMQ + Redis |
|---|---|---|
| Push throughput | 1.9M jobs/sec | ~50K jobs/sec |
| Processing throughput | 280K jobs/sec | ~30K jobs/sec |
| Latency (p99) | <1ms | ~5-10ms |
4. Built for AI Workloads
flashQ has features specifically designed for AI applications:
- 10MB payload limit: Send embeddings, images, and large contexts without workarounds
- Job dependencies: Chain jobs with
depends_onfor RAG pipelines - Rate limiting: Built-in token bucket rate limiting per queue
- Long timeouts: Jobs can run for minutes without being marked as stalled
- Progress tracking: Report progress for long-running inference jobs
How flashQ Works
At its core, flashQ is a TCP server that accepts commands and manages job queues. Here's the architecture:
βββββββββββββββββββ βββββββββββββββββββ
β Your App β β Workers β
β (Producer) β β (Consumers) β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ
β β
β TCP/HTTP/gRPC β
βββββββββββββ¬ββββββββββββ
β
ββββββββΌβββββββ
β flashQ β
β Server β
ββββββββ¬βββββββ
β
ββββββββΌβββββββ
β PostgreSQL β (optional)
βββββββββββββββ
The server maintains queues in memory using efficient data structures:
- 32 shards for parallel access without lock contention
- Priority queues (binary heaps) for job ordering
- Hash maps for O(1) job lookups
- Atomic counters for job IDs and metrics
A Simple Example
Let's build a simple AI pipeline that generates embeddings and stores them:
import { Queue, Worker } from 'flashq';
// Create a queue
const queue = new Queue('embeddings');
// Add a job
await queue.add('generate', {
text: 'The quick brown fox jumps over the lazy dog',
model: 'text-embedding-3-small'
});
// Process jobs
const worker = new Worker('embeddings', async (job) => {
const { text, model } = job.data;
// Call OpenAI API
const response = await openai.embeddings.create({
input: text,
model: model
});
// Return the embedding
return response.data[0].embedding;
});
When Should You Use flashQ?
flashQ is ideal for:
- AI/ML pipelines: LLM calls, embeddings, image generation, batch inference
- Startups and small teams: No infrastructure to manage
- High-throughput applications: When you need more than 30K jobs/sec
- Large payloads: When your job data exceeds a few KB
- Development environments: No Docker Compose file just to run Redis
You might prefer BullMQ + Redis if:
- You're already running Redis for other purposes (caching, sessions)
- You need Redis-specific features (pub/sub, streams)
- Your team has deep Redis expertise
Getting Started
Ready to try flashQ? It takes about 5 minutes to get started:
# Install the SDK
npm install flashq
# Start the server (using Docker)
docker run -d -p 6789:6789 flashq/flashq
# Or download the binary
curl -L https://github.com/egeominotti/flashq/releases/latest/download/flashq-linux -o flashq
chmod +x flashq
./flashq
Then in your code:
import { Queue, Worker } from 'flashq';
const queue = new Queue('my-queue');
await queue.add('task', { hello: 'world' });
const worker = new Worker('my-queue', async (job) => {
console.log(job.data); // { hello: 'world' }
});
Check out the documentation for advanced features like job dependencies, rate limiting, and clustering.
Conclusion
flashQ represents a new approach to job queuesβone that prioritizes simplicity and performance without sacrificing features. If you're building AI applications and tired of managing Redis, give flashQ a try.
We're open source and actively developing new features. Join us on GitHub and let us know what you think!