The node exporter exposes the various hardware monitoring metrics of Linux, including temperature, fans, and voltages.
You've probably come across the sensors
command from lm-sensors which on my desktop produces output like:
coretemp-isa-0000 Adapter: ISA adapter Package id 0: +46.0°C (high = +85.0°C, crit = +105.0°C) Core 0: +39.0°C (high = +85.0°C, crit = +105.0°C) Core 1: +42.0°C (high = +85.0°C, crit = +105.0°C) Core 2: +47.0°C (high = +85.0°C, crit = +105.0°C) Core 3: +44.0°C (high = +85.0°C, crit = +105.0°C) acpitz-virtual-0 Adapter: Virtual device temp1: +27.8°C (crit = +106.0°C) temp2: +29.8°C (crit = +106.0°C) asus-isa-0000 Adapter: ISA adapter cpu_fan: 0 RPM
The node exporter has the same information under node_hwmon_*
. This machine only happens to have temperature and a (non-reporting) fan, so let's look at those first.
Temperature is the main metric and is in node_hwmon_temp_celsius
:
# HELP node_hwmon_temp_celsius Hardware monitor for temperature (input) # TYPE node_hwmon_temp_celsius gauge node_hwmon_temp_celsius{chip="acpitz",sensor="temp1"} 27.8 node_hwmon_temp_celsius{chip="acpitz",sensor="temp2"} 29.8 node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp1"} 47 node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp2"} 40 node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp3"} 42 node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp4"} 47 node_hwmon_temp_celsius{chip="platform_coretemp_0",sensor="temp5"} 42
There's also node_hwmon_temp_crit_celsius
and node_hwmon_temp_crit_alarm_celsius
for alerting, however in a Prometheus setup you'd usually have thresholds in your alerting rules.
Those names aren't quite what sensors
is showing. While lm-sensors does some tweaking of names in the background, the same raw information is in another two metrics from the node exporter:
# HELP node_hwmon_sensor_label Label for given chip and sensor # TYPE node_hwmon_sensor_label gauge node_hwmon_sensor_label{chip="platform_coretemp_0",label="core_0",sensor="temp2"} 1 node_hwmon_sensor_label{chip="platform_coretemp_0",label="core_1",sensor="temp3"} 1 node_hwmon_sensor_label{chip="platform_coretemp_0",label="core_2",sensor="temp4"} 1 node_hwmon_sensor_label{chip="platform_coretemp_0",label="core_3",sensor="temp5"} 1 node_hwmon_sensor_label{chip="platform_coretemp_0",label="package_id_0",sensor="temp1"} 1 node_hwmon_sensor_label{chip="platform_eeepc_wmi",label="cpu_fan",sensor="fan1"} 1# HELP node_hwmon_chip_names Annotation metric for human-readable chip names # TYPE node_hwmon_chip_names gauge node_hwmon_chip_names{chip="acpitz",chip_name="acpitz"} 1 node_hwmon_chip_names{chip="platform_coretemp_0",chip_name="coretemp"} 1 node_hwmon_chip_names{chip="platform_eeepc_wmi",chip_name="asus"} 1
Using group_left
you can join the label
and chip_name
labels in PromQL, though beware that they may be missing in some cases so you need to be resilient to that.
For fans there's node_hwmon_fan_rpm
for the speed, and node_hwmon_pwm_enable
to indicate if pulse-width modulation (a way of controlling fan speed) is enabled.
Voltage will be under node_hwmon_in_volts
and node_hwmon_cpu_volts
. Current is node_hwmon_curr_amps
.
Power usage can be reported directly by the hardware, in which case it'll be node_hwmon_energy_watt
. It can also an energy counter node_hwmon_energy_joule_total
, and you'll get power in Watts (which are joules per second) if you use PromQL's rate()
on it.
The humidity ratio can be found in node_hwmon_humidity
.
node_hwmon_fault
and node_hwmon_alarm
can indicate hardware issues, and node_hwmon_beep_enabled
if beeping is enabled.
As these metrics are coming from a wide range of different hardware and associated software, the metrics available will vary across machines. For example some metrics may have a input
in the name, or have additional metrics indicating the minimum and maximum values over some time period. Support for more metrics is likely to be added to the node exporter over time too.
Want to know if something other than LP0 is on fire? Contact us.
No comments.