Get the latest tech news

We saved $5k a month with a single Grafana query


I want to tell the story of how we saved 300ms from every pod start up on Checkly, and made every check run more efficient. If you don’t know, Checkly is a synthetic monitoring tool that lets teams monitor their API’s and sites continually, and find problems faster. With some users sending millions of request a day, that 300ms added up to massive overall compute savings. The story of how we saved that time takes us through Application Performance Monitoring (APM), to low-level instrumentation, to ‘one weird trick’ that saves tons of startup time, all the way to a coding practice that most would call an anti-pattern, but nonetheless improved performance significantly.

The time it took for the pod to be ready wasn’t done with some cool Kubernetes call or ECS magic, instead it’s just three lines of application code within our check runner: With the Grafana Loki logfmt method, I could use real LogQL expressions (similar to PromQL) to query the unstructured logs and treat is as a metric. It’s truly outstanding how changing a few version numbers in a package.json could save thousands of dollars every month, but the real story is how a combination of observability techniques can reveal root causes in surprising places.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Month

Month

Photo of Grafana

Grafana

Photo of single Grafana query

single Grafana query

Related news:

News photo

Perplexity planning revenue sharing program with web publishers next month

News photo

Elon Musk reveals Grok 2 will ‘probably’ be ready by next month

News photo

Meta unveils a $25-per-month, interest-free Quest 3 payment plan. Is this deal worth it?