Prometheus 0.16.1 was just released, and with it brings my addition of the irate
function. This offers more responsive graphs and higher resolution dashboards.
A blog on monitoring, scale and operational Sanity
Prometheus 0.16.1 was just released, and with it brings my addition of the irate
function. This offers more responsive graphs and higher resolution dashboards.
Prometheus monitoring is usually against on long-lived daemons, but what if you've a batch job that you want to monitor?
It's common to think of monitoring as something just to alert you when things are going wrong. At Robust Perception we believe in Inclusive Monitoring, where all aspects of systems are monitored and available to provide insight and drive decisions.
Systems such as Consul perform healthchecking of local services and expose this information to other machines within the cluster. Does this mean that the service will work when you try to talk to it?
Prometheus offers integrations with systems like PagerDuty, Email and Hipchat for alert notifications - but what if you want do something that's not supported out of the box? The Alertmanager's generic web hook has got you covered.
It's easy to get carried away by the power of labels with Prometheus. In the extreme this can overload your Prometheus server, such as if you create a time series for each of hundreds of thousands of users. Thankfully there's a way to deal with this without having to turn off monitoring or deploy a new version of your code.
Traffic from users to your servers isn't a steady stream, it waxes and wanes over the day and week. The peak-to-mean ratio is your primary tool to avoid outages or unnecessary costs due to this.
CPU, RAM, disk and network usage are basic machine metrics you should be monitoring. This is easy with Prometheus and the later releases of Debian.
The most common way to learn about the expiry date of your website's SSL certificate is after it has expired. The blackbox exporter combined with Prometheus can let you know well in advance, letting you renew your certificate before users complain.