Skip to main content

check-syslogs

System log analysis tool that detects hardware and network errors by searching for critical patterns in dmesg and syslog files.

Available Health Checks

CheckPurposeKey Feature
link-flapsNetwork link stabilityDetect InfiniBand and Ethernet link flaps
xidGPU error detectionIdentify NVIDIA XID hardware errors
io-errorsStorage health validationDetect NVMe I/O errors
mceCPU/memory error detectionDetect Machine Check Exceptions
pcie-aerPCIe bus error detectionDetect PCIe Advanced Error Reporting errors

Quick Start

# Link flap check
health_checks check-syslogs link-flaps [CLUSTER] app

# XID error check
health_checks check-syslogs xid [CLUSTER] app

# I/O error check
health_checks check-syslogs io-errors [CLUSTER] app

# MCE error check
health_checks check-syslogs mce [CLUSTER] app

# PCIe AER error check
health_checks check-syslogs pcie-aer [CLUSTER] app