spdl.pipeline.ProcessGroupResourceUsage¶
- class ProcessGroupResourceUsage(pid: int, pgid: int, cpu_percent: float | None = None, rss_bytes: int | None = None, pss_bytes: int | None = None, private_bytes: int | None = None, disk_read_bytes: int | None = None, disk_write_bytes: int | None = None, num_procs: int | None = None, net_rx_bytes: int | None = None, net_tx_bytes: int | None = None)[source]¶
Snapshot of resource usage across all processes in the process group.
Collected periodically by
ProcessGroupStatsMonitorand passed to the user-provided callback.Memory metrics — three complementary views are provided:
RSS (Resident Set Size, from
/proc/[pid]/stat): Total physical pages mapped by each process. Shared pages (shared libraries, CUDA context, etc.) are counted once per process that maps them, so summing RSS across a process group overcounts actual physical memory when pages are shared.PSS (Proportional Set Size, from
/proc/[pid]/smaps_rollup): Each shared page is divided equally among all processes that map it. Summing PSS across a process group gives the most accurate estimate of actual physical memory consumption without double-counting.Private bytes (
Private_Clean + Private_Dirtyfrom/proc/[pid]/smaps_rollup): Only pages exclusive to each process — memory that would be freed if the process exited. This undercounts total usage because it excludes shared memory entirely, but isolates per-process allocations (model weights, activations, buffers).
The difference
RSS − Privateapproximates each process’s shared memory contribution. Readingsmaps_rollupis more expensive thanstat(the kernel walks page tables), but it is a single-file read per process so the overhead is modest.Which metric to use:
Use PSS as the primary metric for comparing memory across configuration changes — it reflects the true physical memory cost of the process group without double-counting.
Use Private to isolate per-process allocations (model weights, activations, buffers) from shared overhead.
Use RSS as an upper-bound sanity check. When
num_procs == 1, RSS equals PSS.PSS − Privatecan be derived in queries to see how much shared memory is attributed to this group.
Attributes
CPU utilization as a percentage of a single core over the last interval.
Total bytes read from storage across the process group.
Total bytes written to storage across the process group.
Total network bytes received (excluding loopback).
Total network bytes transmitted (excluding loopback).
Number of processes in the process group.
Total private memory (Private_Clean + Private_Dirty) in bytes.
Total proportional set size in bytes across the process group.
Total resident set size in bytes across the process group.
PID of the monitoring process.
Process group ID being monitored.
- cpu_percent: float | None = None¶
CPU utilization as a percentage of a single core over the last interval.
Computed as
delta_cpu_usec / delta_wall_usec * 100. A value of 200.0 means two cores were fully utilized.Noneon the first snapshot (no previous value to diff against).
- private_bytes: int | None = None¶
Total private memory (Private_Clean + Private_Dirty) in bytes.
Only pages exclusive to each process.
Nonewhensmaps_rollupis unavailable.