Skip to main content

check-sensors

Overview

Monitors hardware health by detecting fan and power supply errors through IPMI (Intelligent Platform Management Interface) sensors.

Command-Line Options

OptionTypeDefaultDescription
--timeoutInteger300Command execution timeout in seconds
--sinkStringdo_nothingTelemetry sink destination
--sink-optsMultiple-Sink-specific configuration
--verbose-outFlagFalseDisplay detailed output
--log-levelChoiceINFODEBUG, INFO, WARNING, ERROR, CRITICAL
--log-folderString/var/log/fb-monitoringLog directory
--heterogeneous-cluster-v1FlagFalseEnable heterogeneous cluster support

Exit Conditions

Exit CodeDescription
OK (0)Feature flag disabled (killswitch active)
OK (0)No sensor errors detected
WARN (1)PSU redundancy issue detected OR ipmi-sensors command failed
CRITICAL (2)Fan error detected OR PSU error detected
UNKNOWN (3)Command execution error before parsing

Usage Examples

Basic Sensor Check

health_checks check-sensors [CLUSTER] app

With Telemetry

health_checks check-sensors \
--sink otel \
--sink-opts "log_resource_attributes={'attr_1': 'value1'}" \
[CLUSTER] \
app