Reliable Insights

A blog on monitoring, scale and operational Sanity

January 5, 2016

Instrumenting Java with Prometheus

Getting started with a new library it helps to have an example to work from. Let's look at a simple example of using Prometheus instrumentation in Java.

Read more

December 30, 2015

Understanding Machine CPU usage

High CPU load is a common cause of issues. Let's look at how to dig into it with Prometheus and the Node exporter.

Read more

December 21, 2015

You look good, have you lost machines?

Whether you're on bare metal or using a cloud provider, there's a question you should always be able to answer. What machines do I have, and what is meant to be running on them?
Read more

December 16, 2015

Which are my biggest metrics?

As your Prometheus usage grows and starts to get loaded, it'd be useful to know which metrics are using the most resources so that you can re-evaluate their utility.

Read more

December 13, 2015

Reloading Prometheus’ Configuration

A common question from new users is if they need to restart Prometheus every time they change the configuration. The good news is that you don't, allowing your monitoring to continue uninterrupted as your system changes.

Read more

December 11, 2015

It’s overloaded? Try harder!

Failed requests are a fact of life, network weirdness and machine failures are inevitable. It can be tempting to simply retry the request when this happens, but this may cause more harm than good.

Read more

December 3, 2015

Exporting to Graphite with the Prometheus Python Client

Prometheus doesn't try to lock you into it's ecosystem - in fact it makes it straightforward to both get data both in and out. This reduces operational overhead and allows for smoother transitions between monitoring systems.

Read more

December 1, 2015

Automatically monitoring EC2 Instances

Having to manually update a list of machines in a configuration file gets annoying after a while. One of the features of Prometheus is service discovery, allowing you to automatically discover and monitor your EC2 instances!

Read more

November 28, 2015

Do you know what software you’re running?

When getting something working for the first time, it's easy to get caught up in Docker or Vargant. Before you run it in production with full access and user data, do you know what code you're running?

Read more

November 23, 2015

Monitoring directory sizes with the Textfile Collector

The node exporter includes many metrics out of the box, it can't possibly cover all use cases though. That's where the textfile collector comes in, allowing you to extend machine instrumentation for your use case.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy