The Journey of Making 404fuzz Blazingly Fast ⚡
When I started building 404fuzz, I had one goal: make it fast. Really fast. But I quickly learned that speed in Node.js isn't just about throwing more requests at a server. It's about understanding how Node.js actually works.
Let me take you on the journey from my first "obvious" solution to a fuzzer that achieves 121 RPS on modest hardware.
Chapter 1: The Promise.all() Trap 🪤
My First Thought
"Easy! I'll just load all my wordlist paths and fire them all at once with Promise.all()!"
// My first naive approach
const wordlist = ['admin', 'backup', 'config', ...]; // 10,000 paths
const promises = wordlist.map(path => fetch(`${target}/${path}`));
await Promise.all(promises); // Fire everything!
The Brutal Reality
This crashed everything. My laptop froze. The target server probably hated me. What went wrong?
Here's what I learned: Promise.all() is NOT parallel execution.
Understanding Node.js: Concurrent, Not Parallel 🔄
Let me explain with a diagram:
┌─────────────────────────────────────────────────┐
│ Node.js Single Thread │
├─────────────────────────────────────────────────┤
│ │
│ Your Code (Asynchronous) │
│ ↓ │
│ Event Demultiplexer (Receives all events) │
│ ↓ │
│ Event Queue [Event1, Event2, Event3, ...] │
│ ↓ │
│ Event Loop (while(queue.length > 0)) │
│ ├─ Takes Event1 │
│ ├─ Executes Callback │
│ ├─ Returns immediately (non-blocking!) │
│ └─ Takes Event2... │
│ │
└─────────────────────────────────────────────────┘
Key Insight: Node.js is concurrent (non-blocking), not parallel (multiple things at once).
When you do Promise.all() with 10,000 requests:
- ❌ You don't get 10,000 parallel threads
- ❌ You DO get 10,000 open connections
- ❌ You DO consume massive memory
- ❌ You DO overwhelm both your system and the target
Result: System crash, memory exhaustion, or you become an accidental DDoS attacker.
Chapter 2: The Queue Model - Controlled Chaos 🎯
The Better Approach
I needed bounded concurrency - control how many requests run at once, queue the rest.
┌────────────────────────────────────────┐
│ Wordlist (10,000 paths) │
└────────────┬───────────────────────────┘
↓
┌────────────────────────────────────────┐
│ Request Queue │
│ [req1, req2, req3, req4, req5, ...] │
└────────────┬───────────────────────────┘
↓
┌────────────────────────────────────────┐
│ Concurrency Limit (e.g., 50) │
│ │
│ [Active1] [Active2] ... [Active50] │
│ ↓ ↓ ↓ │
│ Response Response Response │
│ ↓ ↓ ↓ │
│ Next from queue (req51) │
└────────────────────────────────────────┘
The Implementation
class RequestQueue {
constructor(concurrency = 50) {
this.concurrency = concurrency;
this.running = 0;
this.queue = [];
}
async add(task) {
// If we're at limit, queue it
if (this.running >= this.concurrency) {
await new Promise(resolve => this.queue.push(resolve));
}
this.running++;
try {
return await task();
} finally {
this.running--;
// Release next queued task
if (this.queue.length > 0) {
const resolve = this.queue.shift();
resolve();
}
}
}
}
The Results
- ✅ Memory stays stable
- ✅ Target server doesn't die
- ✅ Predictable resource usage
- ✅ You can tune with
-tflag (concurrency level)
But I wanted MORE speed. Time for the next level.
Chapter 3: Multi-Core Power - The Cluster Model 💪
The Problem
Node.js is single-threaded. My i5 processor has 8 cores. I'm using only 12.5% of my CPU!
The Solution: Node.js Cluster Module
┌─────────────────────────────────────────────────┐
│ Primary Process (Master) │
│ │
│ - Loads wordlist │
│ - Splits work among workers │
│ - Collects results │
└────────────┬───────────────────────┬────────────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
↓ ↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │ │ Worker 4 │
│ │ │ │ │ │ │ │
│ Queue │ │ Queue │ │ Queue │ │ Queue │
│ Model │ │ Model │ │ Model │ │ Model │
│ (-t 10) │ │ (-t 10) │ │ (-t 10) │ │ (-t 10) │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
↓ ↓ ↓ ↓
Target Target Target Target
The Implementation
// Primary process
if (cluster.isPrimary) {
const numWorkers = getCoreCount(options.cores); // -c flag
const workload = splitWordlist(wordlist, numWorkers);
for (let i = 0; i < numWorkers; i++) {
const worker = cluster.fork();
worker.send({ paths: workload[i], concurrency: options.threads });
}
}
// Worker process
if (cluster.isWorker) {
process.on('message', async ({ paths, concurrency }) => {
const queue = new RequestQueue(concurrency);
for (const path of paths) {
await queue.add(() => fuzzPath(path));
}
});
}
Chapter 4: The Sweet Spot - Balancing Act ⚖️
Here's where it gets interesting. More workers ≠ More speed
The Complexity
You're now balancing TWO variables:
-
Clusters (
-c): Number of worker processes -
Concurrency (
-t): Requests per worker
What I Discovered
Configuration RPS Why?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-c 8 -t 2 ~65 Too much IPC overhead
-c 4 -t 5 ~95 Better balance
-c 2 -t 10 ~121 SWEET SPOT! ⭐
-c 1 -t 20 ~85 Bottlenecked by single process
-c all -t 20 ~70 IPC kills performance
The Pattern: Fewer workers + higher concurrency = faster!
Why?
Fewer Workers (e.g., -c 2):
┌──────────────┐
│ Worker 1 │───┐
│ -t 10 │ │ Less communication
│ (10 reqs) │ ├─> overhead between
└──────────────┘ │ processes
│
┌──────────────┐ │
│ Worker 2 │───┘
│ -t 10 │
│ (10 reqs) │
└──────────────┘
More Workers (e.g., -c 8):
┌────────┐┌────────┐┌────────┐┌────────┐
│Worker 1││Worker 2││Worker 3││Worker 4│
│ -t 2 ││ -t 2 ││ -t 2 ││ -t 2 │
└───┬────┘└───┬────┘└───┬────┘└───┬────┘
│ │ │ │
└─────────┴─────────┴─────────┘
High IPC (Inter-Process
Communication) overhead!
Chapter 5: Putting It All Together 🎯
The Final Architecture
┌───────────────────────────────────────────────┐
│ 404fuzz Primary Process │
│ │
│ 1. Load wordlist │
│ 2. Parse target & options │
│ 3. Calculate optimal worker count (-c flag) │
│ 4. Split wordlist into chunks │
│ 5. Spawn workers │
└────────────┬──────────────────────────────────┘
│
┌──────┴──────┐
↓ ↓
┌─────────────┐ ┌─────────────┐
│ Worker 1 │ │ Worker 2 │
│ │ │ │
│ ┌────────┐ │ │ ┌────────┐ │
│ │ Queue │ │ │ │ Queue │ │
│ │ Model │ │ │ │ Model │ │
│ │ (-t N) │ │ │ │ (-t N) │ │
│ └───┬────┘ │ │ └───┬────┘ │
│ ↓ │ │ ↓ │
│ [10 reqs] │ │ [10 reqs] │
└──────┬──────┘ └──────┬──────┘
↓ ↓
Target Target
Usage Examples
# Fast & balanced (recommended)
404fuzz https://target.com -w wordlist.txt -c 2 -t 10
# Maximum concurrency, fewer workers
404fuzz https://target.com -w wordlist.txt -c half -t 20
# Use all cores (not always faster!)
404fuzz https://target.com -w wordlist.txt -c all -t 5
# Single core for testing
404fuzz https://target.com -w wordlist.txt -c 1 -t 50
The Results 📊
Hardware: Dell 7290, i5 8th Gen, 8GB RAM, 256GB SSD
Performance:
- Peak RPS: 121 requests/second
- Memory usage: Stable (~200-300MB)
- CPU usage: Efficient (50-60% on 2 cores)
Comparison:
Approach RPS Memory Crashed?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Promise.all() (naive) N/A >2GB YES 💥
Queue only (single core) ~45 ~150MB No
Queue + Cluster (optimal) ~121 ~250MB No ✅
Key Takeaways 🎓
- Node.js is concurrent, not parallel - Understanding the event loop is crucial
- Unbounded concurrency is dangerous - Always implement a queue with limits
- More workers ≠ better performance - IPC overhead is real
- Sweet spot exists - Fewer workers + higher concurrency often wins
- Experimentation is key - Every system is different, test your configs!
Try 404fuzz Yourself
# Clone the repository
git clone https://github.com/toklas495/404fuzz.git
cd 404fuzz
# Install dependencies
npm install
# Build and link globally
npm run build
# Verify installation
404fuzz
# Start fuzzing with recommended settings
404fuzz https://target.com/FUZZ -w /path/to/wordlist.txt -c 2 -t 10
What's Next?
Now that we've achieved speed, the next step is adding intelligence - making 404fuzz learn from responses, adapt its strategies, and discover paths smarter, not just faster.
But that's a story for another blog post. 😉
Built with ❤️ and lots of trial & error. If this helped you understand Node.js concurrency better, drop a ⭐ on the repo!
Top comments (0)