There's more to selecting a piece of software than headline numbers. Read more
A blog on monitoring, scale and operational Sanity
There's more to selecting a piece of software than headline numbers. Read more
Choosing what range to use with the rate
function can be a bit subtle.
Graphs from Prometheus use the query_range endpoint, and there's a non-trivial amount of confusion that it's more magic than it actually is.
For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?
Using PromQL you can combine metrics for analysis.
How should a monitoring system deal with metrics no longer being there?
The node exporter and tools like iostat and sar use the same core data, but how do they relate to each other?