Services are not distinguished by their metric names in Prometheus.
A blog on monitoring, scale and operational Sanity
Services are not distinguished by their metric names in Prometheus.
Scraping targets across datacenters will make things better, right?
How should you design your Alertmanager routes for flexibility and growth?
How can you view older data, while keeping your monitoring reliable?
Prometheus performance almost always comes down to one thing: label cardinality.
On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.
For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?
Having to reconstruct how far a failed cron job had gotten and what exact parameters it was run with can be error prone and time consuming. There is a better way.
There's no way that sharing metrics with your users or customers can go wrong. Right?
Data is not the same as information.