Reliable Insights

A blog on monitoring, scale and operational Sanity

October 30, 2017

Running into burning buildings because the fire alarm stopped

At what point should you consider an alert resolved?

Published by Brian Brazil in Posts

Tags: alerting, best practices, prometheus

September 25, 2017

Percentages go from 0 to 100, Ratios go from 0 to 1

While doing research for implementing exporters, I've noticed some confusion around ratios and percentages that I'd like to clear up.

Published by Brian Brazil in Posts

Tags: best practices, exporters, prometheus

September 11, 2017

It’s easy to convert Pull to Push

If you have to choose one of push or pull in your core, which should it be?

Published by Brian Brazil in Posts

Tags: best practices, design, prometheus, push

September 4, 2017

Functions to Avoid

As PromQL has evolved, there are some functions that should no longer be used.

Published by Brian Brazil in Posts

Tags: best practices, prometheus, promql

August 28, 2017

Avoid irate() in alerts

While the irate() function is useful for granular graphs, it is not suitable for alerting.

Published by Brian Brazil in Posts

Tags: alerting, best practices, prometheus, promql

Existential issues with metrics

The Prometheus instrumentation best practices say to "Avoid missing metrics". Let's look at why, and how to deal with it.

Published by Brian Brazil in Posts

Tags: best practices, java, prometheus, promql

When should you unit test instrumentation?

Should you unit test every bit of instrumentation you add? Not always.

Published by Brian Brazil in Posts

Tags: best practices, prometheus, testing

What’s in a name?

You may have noticed that most PromQL functions and operators remove the metric name in their result. Let's look at why.

Published by Brian Brazil in Posts

Tags: best practices, prometheus, promql, relabelling

Push needs Service Discovery

It's often claimed that an advantage of push-based monitoring systems is that, compared to pull-based systems like Prometheus, they don't need service discovery. This isn't true, and I'm going to explain why.

Published by Brian Brazil in Posts

Tags: best practices, design, prometheus, push, service discovery

February 27, 2017

Label Lookups and the Child

The Prometheus client library guidelines recommend having a Child be returned via labels(). Why?

Published by Brian Brazil in Posts

Tags: best practices, client libraries, design, java, prometheus, python

‹ Newer Posts Older Posts ›

youtube

Blog | Training | Book | Privacy