Adding New Collector

GCM should be easily extensible. To monitor something new with GCM, you'll need to:

Add a new CLI command.

See main.add_command on gcm.py

Copy/Paste existing code to get a python struct that your CLI cmd runs.

This step requires you to create a new file under monitoring/cli. Then define the base structure to get all the benefits of the CLI options that GCM offers, this means copy/pasting most of the def main function, see https://github.com/facebookresearch/gcm/blob/main/gcm/monitoring/cli/sacctmgr_qos.py#L144-L212

Add a call to run_data_collection_loop.

Step 2 already has a call to run_data_collection_loop, but you'll have to edit it to ensure the new collection gets the right parameters.

data_collection_tasks is the relevant argument, this receives a list of tuples with:

a generator of dataclasses.
sink types.

Each tuple defines a single collection, so if you're sending multiple tuples it'll do multiple collections sequentially.

A few things to keep in mind as you're implementing a generator of dataclasses (1):

generator is good for scaling, you're not loading all the data into memory as opposed to an iterator
create required schemas under gcm/schemas

Sink types (2) tells the exporter what type of data you're producing, see Telemetry types supported by GCM. The convention is that a generator (1) produces only one of the supported types.

You can call the GCM cli and confirm that this step is working:

$ gcm <your_collection_name> --help
...
$ gcm <your_collection_name> --sink=stdout --once --log-level=DEBUG
...

Update configuration files/daemons to trigger new collection to run.

Now all you have to do is deploy the service. This may involve config changes, building binaries, creating new daemons / pods /containers.

For FB internal deployment see here.