Skip to main content

check-dstate

Overview

Detects processes stuck in uninterruptible sleep (D-state) that exceed a specified age threshold. Identifies processes blocked on I/O operations or kernel calls that may indicate system issues.

D-state processes are typically waiting on disk I/O, network operations, or other kernel resources. Prolonged D-state indicates potential hardware or driver issues.

Dependencies

  • strace: System call tracer for process inspection

Package Installation

# RHEL/CentOS
yum install procps-ng strace

# Ubuntu/Debian
apt-get install procps strace

Command-Line Options

OptionTypeDefaultDescription
--elapsedInteger300Minimum process age (seconds) to trigger failure
--process-nameString-Filter by process name (regex supported); can specify multiple
--timeoutInteger300Command execution timeout in seconds
--sinkStringdo_nothingTelemetry sink destination
--sink-optsMultiple-Sink-specific configuration
--verbose-outFlagFalseDisplay detailed output
--log-levelChoiceINFODEBUG, INFO, WARNING, ERROR, CRITICAL
--log-folderString/var/log/fb-monitoringLog directory
--heterogeneous-cluster-v1FlagFalseEnable heterogeneous cluster support

Exit Conditions

Exit CodeCondition
OK (0)Feature flag disabled (killswitch active)
OK (0)No D-state processes exceed age threshold
WARN (1)Exception during process check execution
CRITICAL (2)One or more processes stuck in D-state

Usage Examples

Basic Check (Default 5 Minutes)

health_checks check-process check-dstate [CLUSTER] app

Custom Age Threshold (10 Minutes)

health_checks check-process check-dstate \
--elapsed 600
[CLUSTER] \
app

Filter Specific Process

health_checks check-process check-dstate \
--process-name "nfs.*" \
--elapsed 180 \
[CLUSTER] \
app

Multiple Process Filters

health_checks check-process check-dstate \
--process-name "nfs.*" \
--process-name "mount.*" \
--process-name "umount.*" \
--elapsed 300 \
[CLUSTER] \
app

With Telemetry

health_checks check-process check-dstate \
--sink otel \
--sink-opts "log_resource_attributes={'attr_1': 'value1'}" \
--verbose-out \
[CLUSTER] \
prolog

Debug Mode

health_checks -dstate \
--log-level DEBUG \
--verbose-out
[CLUSTER] \
app