While not a problem specific to Prometheus, being affected by the open files ulimit is something you're likely to run into at some point.
Ulimits are an old Unix feature that allow limiting how much resources a user uses, such as processes, CPU time, and various types of memory. You can view your shell's current ulimits with ulimit -a
:
$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63796 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 95 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 63796 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
The one of interest here is open files, which is 1024 on my desktop. To find the limit for applications instrumented with Prometheus there's metrics for the limit and current usage:
# HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1024 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 65
And Prometheus itself also logs the limits at startup (it's the soft limit that matters):
level=info ts=... caller=main.go:225 fd_limits="(soft=1024, hard=1048576)"
For other currently running processes you can use prlimit
or look in /proc/PID/limits
.
A limit of 1024 per process is something I'm unlikely to run into in day to day desktop usage, but a server like Prometheus which may have sockets open to hundreds of targets, HTTP connections to service discovery, graph requests coming in, and data files open could between them hit this. When a process runs out of file descriptors, it tends not to ends well and Prometheus is not unusual in this regard. It and other Go programs will get a "too many open files" error when this happens.
So how do you increase this limit to avoid this? The first thing to be aware of is that increasing a ulimit beyond the hard limit will require elevated privileges, and thus changing ulimits permanently usually involves changing files in /etc
.
On an Ubuntu system the limits usually come from /etc/security/limits.conf
where you can set the nofile
limit across users or groups. If you're using systemd then /etc/systemd/system.conf
has a DefaultLimitNOFILE
you can specify across all units, or you can set it on a per-unit basis with LimitNOFILE
in the Service
section. In an emergency, you can also use prlimit
to change the limits for a running process.
Something to watch out for is that you can end up with different ulimits if you restart a process from a SSH session versus having it started at boot. Accordingly it's wise to always double check what your ulimits actually are to ensure your desired configuration has been applied.
As to what to set the file ulimit to, I'd suggest something large like a million that you're unlikely to ever hit. File descriptors are not a scarce resource.
Want to improve Prometheus reliability? Contact us.
No comments.