Knowing which instances of your services and which machines in your fleet are no longer responding is a common requirement. Whether it's to get someone to investigate or to drive automation, in this post I'll look at how you can do it with Prometheus.