How would you check that a HTTP endpoint is returning a 204?
A blog on monitoring, scale and operational Sanity
How would you check that a HTTP endpoint is returning a 204?
Trying to improve alerting piecemeal can be difficult.
Alert thresholds can be surprisingly tricky to get right.
For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?
Having to reconstruct how far a failed cron job had gotten and what exact parameters it was run with can be error prone and time consuming. There is a better way.
It often confuses users as to why resolved notifications don't contain updated annotations values. Let's dig into why.
The labels of an alert are its identity, so you have to be a little careful what you put in there.
In the previous post we looked at testing rules. You can also test alerts.
It's easy to check if HTTP and HTTPS endpoints are working with the Blackbox Exporter.
In a previous post we looked at dealing with reaching the open file limit. How about alerting before it happens?