Reliable Insights

A blog on monitoring, scale and operational Sanity

January 8, 2018

Measuring the performance impact of Meltdown/Spectre with Prometheus

The world of infosec is alarmed right now over the recent security vulnerabilities disclosed by Google on Wednesday that affect Intel, AMD, and ARM chips.
The now infamous Meltdown and Spectre bugs allow for the reading of sensitive information from a system's memory, including passwords, private keys and other sensitive information.

Thankfully fixes are being swiftly rolled out to patch these issues, however they come at a performance cost which we will use Prometheus to explore in this blogpost.

Read more

January 1, 2018

Rule groups for hierarchical aggregation

Prometheus 2.0 brought with it rule groups, making hierarchical aggregation easier than ever.

Read more

December 25, 2017

Keep It Simple scrape_interval-id

How many scrape intervals should you have in a Prometheus?

Read more

December 18, 2017

What’s the difference between group_interval, group_wait, and repeat_interval?

In this blogpost we try and clear up some confusion by outlining the key differences between commonly confused alerting configuration options: group_interval, group_wait, and repeat_interval.

Read more

December 11, 2017

Why are Prometheus histograms cumulative?

Have you ever wondered why the buckets in histograms are not just counters of events that fall into each bucket?

Read more

December 4, 2017

Using time series as alert thresholds

Usually alert thresholds are hardcoded in the alert. In more sophisticated setups, it would be useful for it to be parameterised based on another time series.

Read more

November 27, 2017

Black(box) Friday

While eager consumers flock to retailers both online and in-store for big savings during the infamous day after Thanksgiving known as Black Friday, one must wonder the cost incurred by companies accommodating both the crowds swarming their store floors and the increase in traffic from online shoppers.

In this blogpost we used Prometheus and the Blackbox exporter to observe the increase in latency experienced by some of the top online retailers in the US and UK.

Read more

November 20, 2017

Taking snapshots of Prometheus data

Prometheus 2.1 added an API endpoint to take snapshots, let's see how to use it.

Read more

November 13, 2017

Are increasing timestamps Counters or Gauges?

Every now and then someone asks what metric type a increasing timestamp should be. Let's take a look.

Read more

November 8, 2017

New Features in Prometheus 2.0.0

Prometheus 2.0.0 is now out, following on from 1.8.0 last month. This brings significant improvements.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy