Lisa Zulu Posted on May 30 Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event # webdev # programming # ai # machinelearning The Problem We Were Actually Solving Our real goal wasnt fancy LLM prompts or real-time leaderboards. It was keeping the Rails app under 450 ms p99 during peak load when every team simultaneously scanned a code, requested a new clue, and tried to outbid the person next door for a limited-time power-up. We benchmarked Locust at 5,000 concurrent users and saw that the slowest endpoint was /next-hint , which called a vector store in pgvector at 180 ms per query. That left only 270 ms for Rails routing, Redis reads for rate-limiting, and our custom concurrency limiter. The marketing slide said AI, but the product team really wanted a hint scheduler that wouldnt melt under load. We bolted a 1553-line llama.cpp wrapper written by the data science intern onto the hint endpoint, thinking we could cache all possible answers in a nightly cron job. The wrapper had a known hallucination rate of 3.2% on our own test set, but nobody configured the grammar mask to enforce that answers must contain only location names. So when someone asked Where is the next clue hidden? the engine happily returned Under your chair in the Sagrada Familia crypt—even though the venue map had no crypt. One user screenshot went viral, and suddenly the whole event looked like a scam. What We Tried First (And Why It Failed) The first fix was obvious: raise the error budget for the /next-hint endpoint from 10% to 30%, so the auto-scaler would spin up more pods when the vector query lagged. We pushed a Helm chart that updated the HPA target CPU from 70% to 85%, thinking the vector store would catch up. Five minutes later Prometheus fired the critical rule we had copied from the Kubernetes docs: expr: rate ( http_requests_total { status =~ "5.." }[ 5m ]) / rate ( http_requests_total [ 5m ]) > 0.1 Enter fullscreen mode Exit fullscreen mode The rule used a
Back to Home
Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event
B
Blizine Admin
·2 min read·0 views
📰Dev.to — dev.to
B
Blizine Admin
View Profile Staff Writer