Reliable Insights

A blog on monitoring, scale and operational Sanity

Alerting on crash loops with Prometheus

If your applications are restarting regularly, whether due to segfaults or OOMs, it'd be nice to know.

Published by Brian Brazil in Posts

Tags: alerting, prometheus, promql

New Features in Prometheus 2.2.0

Prometheus 2.2.0 is now out, following on from 2.1.0 back in January with several fixes and improvements.

Published by Brian Brazil in Posts

Tags: prometheus, releases

Using sample_limit to avoid overload

Worried that your application metrics might suddenly explode in cardinality? sample_limit can save you.

Published by Brian Brazil in Posts

Tags: prometheus, reliability

February 26, 2018

Dude, where’s my exporter?

So you have just discovered Prometheus and want to try it out or use it to replace your old monitoring system but have run into a part of your stack that you cannot instrument with a client library and for which there are no officially supported exporters. What do you do?

Published by Conor Broderick in Posts

Tags: best practices, exporters, prometheus

February 19, 2018

Common pitfalls when using the Pushgateway

Jobs of an ephemeral nature are often not around long enough to have their metrics scraped by Prometheus. In order to remedy this the Pushgateway was developed to allow for these types of jobs to push their metrics to a metrics cache in order to be scraped by Prometheus long after the original jobs have gone away. This blogpost discusses some of the common pitfalls users tend to fall into when adding the Pushgateway to their monitoring stack.

Published by Conor Broderick in Posts

Tags: best practices, prometheus, pushgateway

February 12, 2018

Alerting on gauges in Prometheus 2.0

One of the major changes introduced in Prometheus 2.0 was that of staleness handling. Previously for instant vectors, Prometheus would return a point up to 5 minutes in the past which caused a number of different issues.

Published by Conor Broderick in Posts

Tags: alerting, prometheus, promql

February 5, 2018

What percentage of time is my service down for?

Have you ever wondered what percentage of time a given service or application spends up or down?

Published by Conor Broderick in Posts

Tags: blackbox_exporter, prometheus, promql

January 29, 2018

Adding Basic Auth to Prometheus with Apache

Having previously discussed why the Prometheus project does not support SSL and user authentication out of the box and detailing how to add basic authentication with Nginx, we will now demonstrate how to do the same with Apache.

Published by Conor Broderick in Posts

Tags: apache, auth, prometheus, security

January 22, 2018

New Features in Prometheus 2.1.0

Prometheus 2.1.0 is now out, following on from 2.0.0 last month with several fixes and improvements.

Published by Brian Brazil in Posts

Tags: prometheus, releases

January 15, 2018

Instrumenting a Ruby on Rails Application with Prometheus

In this blogpost we'll run you through a quick 'hello world' example instrumenting a Rails application with the Prometheus ruby client.

Published by Conor Broderick in Posts

Tags: client, instrumentation, prometheus, ruby

‹ Newer Posts Older Posts ›

youtube

Blog | Training | Book | Privacy