We looked previously at the counter, how does the Prometheus gauge work?
A blog on monitoring, scale and operational Sanity
We previously looked at finding your biggest metrics, that involves an expensive query though. A new feature in Prometheus 1.3 offers another approach.
If you try and do max_over_time(rate(my_counter_total[5m])[1h])
or predict_linear(rate(my_counter_total[5m])[1d], 3600)
in Prometheus it won't work. How can you combine these functions?
It can seem like a good idea to use recording rules to make more explicit the content of a time series, particularly for those not used to labels. However this usually leads to confusing names and losing the benefits of labels.
I've previously mentioned that you shouldn't have the version of your software as either a target label, or exposed via a label on all metrics of your server as it'll make using the metrics more challenging. What should you do instead?
There's a common misunderstanding when dealing with Prometheus counters, and that is how to apply aggregation and other operations when using the rate
and other counter-only functions.
When you've broken a metric out into labels a common need is to tell what proportion each label represents of the total. The group_left
modifier of Prometheus is the key.