Skip to main content

check-running-process

Overview

Validates that one or more specified processes are currently running on the system. Uses ps command to detect active processes with automatic filtering of grep and self-detection.

Command-Line Options

OptionTypeDefaultDescription
--process-name / -pStringRequiredProcess name to verify (repeatable)
--timeoutInteger300Command execution timeout in seconds
--sinkStringdo_nothingTelemetry sink destination
--sink-optsMultiple-Sink-specific configuration
--verbose-outFlagFalseDisplay detailed output
--log-levelChoiceINFODEBUG, INFO, WARNING, ERROR, CRITICAL
--log-folderString/var/log/fb-monitoringLog directory
--heterogeneous-cluster-v1FlagFalseEnable heterogeneous cluster support

Exit Conditions

Exit CodeCondition
OK (0)Feature flag disabled (killswitch active)
OK (0)All processes are running
WARN (1)Command execution failed
CRITICAL (2)Process not running

Multi-Process: Overall exit code is maximum (worst) of all individual checks.

Usage Examples

check-running-process - Single Process

health_checks check-process check-running-process \
--process-name nvidia-smi \
--sink stdout \
[CLUSTER] \
app

check-running-process - Multiple Processes

health_checks check-process check-running-process \
--process-name dcgmi \
--process-name nvidia-smi \
--process-name slurmd \
--sink otel \
--sink-opts "log_resource_attributes={'attr_1': 'value1'}" \
[CLUSTER] \
app

check-running-process - Custom Timeout

health_checks check-process check-running-process \
--process-name monitoring-agent \
--timeout 60 \
--sink stdout \
[CLUSTER] \
app

check-running-process - Debug Mode

health_checks check-process check-running-process \
--process-name myapp \
--log-level DEBUG \
--verbose-out \
--sink file --sink-opts filepath=/var/log/process_check.json \
[CLUSTER] \
app