pretty ncube Posted on May 30 Treasure Hunt Engine: The Day We Realized the Event Bus Was Our Constraint # webdev # programming # rust # performance The Problem We Were Actually Solving We werent just chasing p99 latency; we were solving a fundamental mismatch between the event model and the treasure hunt logic. Each treasure hunt round emits thousands of micro-events: player joins, item picks, time updates, leaderboard recalculations, and realtime notifications. The Node.js event loop was choking under the backpressure. The BullMQ worker was blocked on Redis pubsub, not because of network latency, but because Node.jss single-threaded event loop couldnt keep up with the rate of incoming events. The Redis server itself was fine—CPU at 12%, memory at 68%, no evictions. The bottleneck wasnt the queue or the data store. It was the runtime. I added a debug trace using 0x and saw 78% of CPU time was spent in uv__io_poll , the epoll/select wrapper. The Node.js process was spending more time waiting for events than processing them. And because BullMQ uses Redis streams, every publish and consume was a network roundtrip. The 250 microsecond RTT from us-east-1 to the Redis cluster was adding up when we were publishing 47,000 events per second. The p99 latency followed the square root of the number of concurrent players. At 5,000 players, it was 80ms. At 10,000 players, 2.3 seconds. The system wasnt scaling linearly. It was falling off a cliff. What We Tried First (And Why It Failed) We tried horizontal scaling BullMQ workers. We spun up 8 workers behind an SQS queue. The SQS throughput was fine—50,000 events/sec sustained—but BullMQs Redis backpressure became a distributed locking nightmare. Workers fought over the same Redis key ranges, and the Redis pubsub fanout created a thundering herd on the Node.js event loop. We saw lock contention in XREADGROUP with 200ms timeouts. We tried sharding the Redis streams into 16 shards. The shard imbalance was brutal—some shards got 3x t
Back to Home
Treasure Hunt Engine: The Day We Realized the Event Bus Was Our Constraint
B
Blizine Admin
·2 min read·0 views
📰Dev.to — dev.to
B
Blizine Admin
View Profile Staff Writer