When you've a complicated manual process that you want to improve, your first instinct as a developer might be to jump in and start coding. Hold off a bit, the first step is to document.
A blog on monitoring, scale and operational Sanity
When you've a complicated manual process that you want to improve, your first instinct as a developer might be to jump in and start coding. Hold off a bit, the first step is to document.
When running a production system there's an endless stream of issues that have the potential to cause you significant hassle. How should you deal with this?
As part of designing and building Prometheus, hundreds of technical decisions have to be made. Every one of them is important in building a sustainable consistent ecosystem. Today, let's look at one small decision that was made by the Prometheus developers in Consul service discovery.
Whether you're on bare metal or using a cloud provider, there's a question you should always be able to answer. What machines do I have, and what is meant to be running on them?
Read more
Failed requests are a fact of life, network weirdness and machine failures are inevitable. It can be tempting to simply retry the request when this happens, but this may cause more harm than good.
When getting something working for the first time, it's easy to get caught up in Docker or Vargant. Before you run it in production with full access and user data, do you know what code you're running?
When starting out it's easy to think that you need Docker, Kubernetes, Microservices, Continuous Deployment and all the other trending topics on Hacker News/Reddit/Lobsters. What do you really need?
This week Microsoft removed unlimited storage from their OneDrive offering, because surprise surprise people were using it as unlimited storage. Does your product have features that cost you time and money, without your users paying accordingly?
Your service's traffic is steadily growing, latency has increased a bit but it's within reason. One day you launch a new customer and the latency jumps through the roof causing an outage. What happened? You hit the knee.
I enjoy cooking and regularly make scrumptious meals for myself. Does this mean that I'm capable of running a busy kitchen? Of course not! So why assume that all software engineers can automatically run production services?