Prometheus 2.12.0 is now out, following on from 2.11.0 with many fixes and improvements.
A blog on monitoring, scale and operational Sanity
Choosing what range to use with the rate
function can be a bit subtle.
PromQL is superb for metrics alerting and graphing needs, for heavier statistical work there are better options.
On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.
Graphs from Prometheus use the query_range endpoint, and there's a non-trivial amount of confusion that it's more magic than it actually is.
For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?
Having to maintain dashboards for every Prometheus server you have would be a bit annoying. Thankfully Grafana has a feature for this.
The machine knows its own name, couldn't Prometheus use it?