📄️ Getting Started
GCM Health Checks is a Python CLI with a suite of Health Checks
📄️ Kubernetes Deployment
GCM Health Checks can be deployed on Kubernetes GPU clusters using Node Problem Detector (NPD) as a DaemonSet. NPD runs each GCM health check as a subprocess at a configurable interval and reports results as Kubernetes node conditions.
🗃️ Health Checks
22 items
📄️ Telemetry Types
GCM supports two types of telemetry
📄️ Adding New Health Check
GCM Health Checks are designed to be easily extensible. Each check follows the same patterns, so adding a new one is mostly about copying the right structure and plugging in your logic. This guide walks through each step.
🗃️ Exporters
6 items
📄️ Deep dive into Health Checks code
All the Health Checks share some boiler plate code. In this section we'll go through some of the Health Checks code and annotate with comments the functionality of each piece of code.
📄️ Adding New Exporter
GCM supports multiple exporters, each one is responsible for exporting data to a different destination. To add a new exporter, you'll need to:
📄️ Contributing
Check out GCM Health Checks contributing guide here.