Back to Home
When the Default Postgres Pool Died at 3 AM

When the Default Postgres Pool Died at 3 AM

B
Blizine Admin
·2 min read·0 views

pretty ncube Posted on May 31 When the Default Postgres Pool Died at 3 AM # webdev # programming # rust # performance The Problem We Were Actually Solving Our treasure-hunt engine at Veltrix was simple on paper: read JSON blobs from S3, parse them, and return the top 50 results by relevance score. By month three we had 2.3 million daily active users, but every Tuesday at 02:47 the API latency spiked to 1.4 seconds and the Postgres pool collapsed with too many connections . The error message wasnt a surprise—we were still using the default max_connections = 100 —but what stumped us was that the spike happened even though 45 % of the connections were idle. Profiling with pg_stat_activity showed 89 blocked queries each time the relevance worker tried to UPDATE a cache table. The JSONB column had grown from 2 MB to 180 MB, and every UPDATE rewrote the whole row. Vacuum couldnt keep up because the autovacuum workers were also blocked. The constraint wasnt CPU or memory; it was the concurrency model baked into the default Postgres config. What We Tried First (And Why It Failed) We bolted on a Redis cache in front of Postgres. First week it cut median latency from 45 ms to 8 ms. Then the cache stampede hit: at 02:47 the Redis TTLs expired for 30 k keys, and 18 k simultaneous GETs raced to recompute the relevance scores. We tried SET key value NX PX to protect the recomputation, but the Lua script we pushed to Redis kept timing out after 5 ms because it called JSON.get on 500-line blobs. The Redis node saturated its network interface at 110 Mbps while the Postgres pool still ran against the same wall of row-level locks. The JSONB scans were now off the hot path, but every spike left the Postgres shared_buffers full of dirty blocks that had to be fsynced under memory pressure. We measured 110 k block reads per second during the spike—way above the 30 k our SSD could sustain without latency ballooning. The Architecture Decision We rewrote the relevance scorer in Rust and move

📰Dev.to — dev.to

Comments