In the previous post we looked at dealing with when all the targets for a job had disappeared. What if you wanted to alert on specific metrics from one target disappearing?
It's best to avoid metrics that appear and disappear, however it can happen that certain subsystems of a target don't always return all metrics that they should. It is possible to detect this situation by noticing that the up
metric exists, but the metric in question does not. In addition you will want to check that up
is 1, so that the alert doesn't spuriously fire when the target is down. If you already have down alerts for the job, there's no need to spam yourself with additional ones about missing metrics too.
The alert would look something like:
groups: - name: example rules: - alert: MyJobMissingMyMetric expr: up{job="myjob"} == 1 unless my_metric for: 10m
This uses unless
which returns the left hand side, unless there's a matching metric on the right hand side.
As with other binary operators you can use ignoring
if your metric has instrumentation labels that you wish to ignore, so for example unless ignoring(method)
would be appropriate if my_metric
had a method
label.
Want advice on how to avoid missing metrics in the first place? Contact us.
No comments.