Reliable Insights – Page 29 – Robust Perception

October 17, 2015

Irate graphs are better graphs

Prometheus 0.16.1 was just released, and with it brings my addition of the irate function. This offers more responsive graphs and higher resolution dashboards.

Published by Brian Brazil in Posts

Tags: graphing, prometheus, promql

October 11, 2015

New Features in Prometheus 0.16.0

Prometheus 0.16.0 has been released with a whopping 312 commits and 89 changes by 18 contributors since 0.15.1. That's a lot to swallow, so let's take a look at main changes and improvements.

Published by Brian Brazil in Posts

Tags: prometheus, promql, releases, service discovery

October 9, 2015

Monitoring Batch Jobs in Python

Prometheus monitoring is usually against on long-lived daemons, but what if you've a batch job that you want to monitor?

Published by Brian Brazil in Posts

Tags: instrumentation, prometheus, pushgateway, python

October 8, 2015

Monitoring: Not Just For Outages

It's common to think of monitoring as something just to alert you when things are going wrong. At Robust Perception we believe in Inclusive Monitoring, where all aspects of systems are monitored and available to provide insight and drive decisions.

Published by Brian Brazil in Posts

Tags: best practices, estimation, inclusive monitoring, scaling

September 28, 2015

Healthchecking is Not Transitive

Systems such as Consul perform healthchecking of local services and expose this information to other machines within the cluster. Does this mean that the service will work when you try to talk to it?

Published by Brian Brazil in Posts

Tags: best practices, consul, healthchecking, reliability, rpc

September 21, 2015

Audio Alerting with Prometheus

Prometheus offers integrations with systems like PagerDuty, Email and Hipchat for alert notifications - but what if you want do something that's not supported out of the box? The Alertmanager's generic web hook has got you covered.

Published by Brian Brazil in Posts

Tags: alerting, alertmanager, blackbox_exporter, generic webhook, prometheus, python, video

September 16, 2015

Dropping metrics at scrape time with Prometheus

It's easy to get carried away by the power of labels with Prometheus. In the extreme this can overload your Prometheus server, such as if you create a time series for each of hundreds of thousands of users. Thankfully there's a way to deal with this without having to turn off monitoring or deploy a new version of your code.

Published by Brian Brazil in Posts

Tags: prometheus, promql, relabelling, reliability

September 14, 2015

Do you know your peak-to-mean ratio?

Traffic from users to your servers isn't a steady stream, it waxes and wanes over the day and week. The peak-to-mean ratio is your primary tool to avoid outages or unnecessary costs due to this.

Published by Brian Brazil in Posts

Tags: best practices, capacity, estimation, provisioning, reliability

September 8, 2015

Machine Monitoring with Prometheus Debs

CPU, RAM, disk and network usage are basic machine metrics you should be monitoring. This is easy with Prometheus and the later releases of Debian.

Published by Brian Brazil in Posts

Tags: debs, node exporter, prometheus

September 7, 2015

Get alerted before your SSL certificates expire

The most common way to learn about the expiry date of your website's SSL certificate is after it has expired. The blackbox exporter combined with Prometheus can let you know well in advance, letting you renew your certificate before users complain.

Published by Brian Brazil in Posts

Tags: alerting, blackbox_exporter, prometheus, reliability