sacct_publish

Overview

Transforms and publishes sacct output in parsable2 format to configured sinks. Takes raw sacct command output, parses pipe-delimited (parsable2) format, converts timestamps to timezone-aware ISO8601 format, enriches with cluster metadata, and publishes to sink in batches with retry logic.

Data Type: DataType.LOG Schema: Sacct

Execution Scope

Invoked as a pipeline component (typically by sacct_backfill or in shell pipelines).

Output Schema

Sacct

Published with DataType.LOG:

{
    # SLURM Fields (from sacct -o all)
    "JobID": str,                            # Job identifier
    "JobIDRaw": str,                         # Raw job ID
    "JobName": str,                          # User-specified job name
    "User": str,                             # Job owner
    "Account": str,                          # SLURM account
    "Partition": str,                        # Job partition
    "State": str,                            # Job state (COMPLETED, FAILED, etc.)
    "ExitCode": str,                         # Exit code
    "AllocCPUs": int,                        # Allocated CPUs
    "ReqMem": str,                           # Requested memory
    "AllocNodes": int,                       # Allocated nodes
    # ... plus 40+ additional sacct fields

    # Timestamps (converted to ISO8601 with timezone)
    "Start": str,                            # Job start time
    "End": str,                              # Job end time
    "Submit": str,                           # Submission time
    "Eligible": str,                         # Eligible time

    # Enriched Fields
    "cluster": str,                          # Cluster name
    "derived_cluster": str,                  # Sub-cluster (same as cluster if not `--heterogeneous-cluster-v1`)
    "time": int,                             # Collection timestamp (Unix epoch)
    "end_ds": str,                           # Date string in Pacific Time (YYYY-MM-DD)
}

Note: The exact fields depend on the SLURM version. See sacct documentation for details.

Command-Line Options

Option	Type	Default	Description
`SACCT_OUTPUT`	File/stdin	File or stdin containing sacct output
`--cluster`	String	Auto-detected	Cluster name for metadata enrichment
`--sink`	String	Required	Sink destination, see Exporters
`--sink-opts`	Multiple	-	Sink-specific options
`--log-level`	Choice	INFO	DEBUG, INFO, WARNING, ERROR, CRITICAL
`--log-folder`	String	`/var/log/fb-monitoring`	Log directory
`--stdout`	Flag	False	Display metrics to stdout in addition to logs
`--heterogeneous-cluster-v1`	Flag	False	Enable per-partition metrics for heterogeneous clusters
`--interval`	Integer	300	Seconds between collection cycles (5 minutes)
`--once`	Flag	False	Run once and exit (no continuous monitoring)
`--retries`	Integer	Shared default	Retry attempts on sink failures
`--dry-run`	Flag	False	Print to stdout instead of publishing to sink
`--chunk-size`	Integer	Shared default	The maximum size in bytes of each chunk when writing data to sink.
`--delimiter`	String	`\|#`	Field delimiter, must be unique enough such that no field has an instance of the delimiter. User input fields like `job name` can break parsing.
`--sacct-timezone`	String	System timezone	Timezone of sacct timestamps
`--sacct-output-io-errors`	Choice	`strict`	UTF-8 decoding: `strict`, `ignore`, `replace`. Occasionally, user-defined sacct fields (e.g. comment, job name) are not valid utf-8, so we allow the user to choose what should be done in this case
`--ignore-line-errors`	Flag	False	Skip lines with field count mismatches

Usage Examples

From stdin (piped from sacct)

sacct -P -o all -S 2024-01-01 -E 2024-01-02 | \
  gcm sacct_publish --sink stdout

From file

gcm sacct_publish sacct_output.txt \
  --sink graph_api \
  --sink-opts scribe_category=slurm_jobs

With timezone handling

# sacct output is in UTC
gcm sacct_publish sacct_output.txt \
  --sacct-timezone UTC \
  --sink file \
  --sink-opts filepath=output.json

Ignore invalid lines

# Skip lines with malformed data
gcm sacct_publish sacct.txt \
  --ignore-line-errors \
  --sacct-output-io-errors replace \
  --sink stdout

With sacct_backfill

gcm sacct_backfill new \
  --publish-cmd "gcm sacct_publish --sink graph_api --sink-opts scribe_category=jobs"

Overview​

Execution Scope​

Output Schema​

Sacct​

Command-Line Options​

Usage Examples​

From stdin (piped from sacct)​

From file​

With timezone handling​

Ignore invalid lines​

With sacct_backfill​