Get the latest tech news
We saved $5k a month with a single Grafana query
I want to tell the story of how we saved 300ms from every pod start up on Checkly, and made every check run more efficient. If you don’t know, Checkly is a synthetic monitoring tool that lets teams monitor their API’s and sites continually, and find problems faster. With some users sending millions of request a day, that 300ms added up to massive overall compute savings. The story of how we saved that time takes us through Application Performance Monitoring (APM), to low-level instrumentation, to ‘one weird trick’ that saves tons of startup time, all the way to a coding practice that most would call an anti-pattern, but nonetheless improved performance significantly.
The time it took for the pod to be ready wasn’t done with some cool Kubernetes call or ECS magic, instead it’s just three lines of application code within our check runner: With the Grafana Loki logfmt method, I could use real LogQL expressions (similar to PromQL) to query the unstructured logs and treat is as a metric. It’s truly outstanding how changing a few version numbers in a package.json could save thousands of dollars every month, but the real story is how a combination of observability techniques can reveal root causes in surprising places.
Or read this on Hacker News