Skip to main content

check-syslogs

System log analysis tool that detects hardware and network errors by searching for critical patterns in dmesg and syslog files.

Available Health Checks

CheckPurposeKey Feature
link-flapsNetwork link stabilityDetect InfiniBand and Ethernet link flaps
xidGPU error detectionIdentify NVIDIA XID hardware errors
io-errorsStorage health validationDetect NVMe I/O errors

Quick Start

# Link flap check
health_checks check-syslogs link-flaps [CLUSTER] app

# XID error check
health_checks check-syslogs xid [CLUSTER] app

# I/O error check
health_checks check-syslogs io-errors [CLUSTER] app