check-syslogs
System log analysis tool that detects hardware and network errors by searching for critical patterns in dmesg and syslog files.
Available Health Checks
| Check | Purpose | Key Feature |
|---|---|---|
| link-flaps | Network link stability | Detect InfiniBand and Ethernet link flaps |
| xid | GPU error detection | Identify NVIDIA XID hardware errors |
| io-errors | Storage health validation | Detect NVMe I/O errors |
Quick Start
# Link flap check
health_checks check-syslogs link-flaps [CLUSTER] app
# XID error check
health_checks check-syslogs xid [CLUSTER] app
# I/O error check
health_checks check-syslogs io-errors [CLUSTER] app