Skip to main content

cluster-availability

Overview

Monitors the percentage of cluster nodes in DOWN or DRAIN states against threshold values. Provides cluster-wide health status for availability monitoring.

Command-Line Options

OptionTypeDefaultDescription
--critical_thresholdInteger (0-100)25Percentage of unavailable nodes for CRITICAL status
--warning_thresholdInteger (0-100)15Percentage of unavailable nodes for WARN status
--timeoutInteger300Command execution timeout in seconds
--sinkStringdo_nothingTelemetry sink destination
--sink-optsMultiple-Sink-specific configuration
--verbose-outFlagFalseDisplay detailed output
--log-levelChoiceINFODEBUG, INFO, WARNING, ERROR, CRITICAL
--log-folderString/var/log/fb-monitoringLog directory
--heterogeneous-cluster-v1FlagFalseEnable heterogeneous cluster support

Exit Conditions

Exit CodeCondition
OK (0)Feature flag disabled (killswitch active)
OK (0)Bad node percentage <= warning_threshold
WARN (1)Bad node percentage > warning_threshold OR command failed
CRITICAL (2)Bad node percentage > critical_threshold

Usage Examples

cluster-availability - Default Thresholds

health_checks check-service cluster-availability \
[CLUSTER] \
app

cluster-availability - Custom Thresholds

health_checks check-service cluster-availability \
--critical_threshold 30 \
--warning_threshold 20 \
[CLUSTER] \
app

cluster-availability - With Telemetry

health_checks check-service cluster-availability \
--sink otel \
--sink-opts "log_resource_attributes={'attr_1': 'value1'}" \
[CLUSTER] \
app

cluster-availability - Debug Mode

health_checks check-service cluster-availability \
--log-level DEBUG \
--verbose-out \
[CLUSTER] \
app