Collecting Runtime Statistics

See also

Please check out the complete example of performance_analysis, which shows how to log the runtime performance statistics to TensorBoard.

Note

If you are Meta employee, please refer to this.

The process of optimizing performance is driven by continuous observation, analysis, and iterative improvement. It is utmost important to measure the performance, so that one can find the bottleneck and improves it.

Performance optimization involves a cyclical process of observing, analyzing, and refining.

The SPDL is designed in a way that allows to collect runtime statistics and export them so that one can analyze and determine the bottleneck.

In this section, we explain how you can export the statistics. (We will go over the detail of how to analyze the statistics in Optimization Guide.)

There are two kinds of statistics that Pipeline collects, TaskPerfStats and QueuePerfStats.

The TaskPerfStats carries the information about functions passed to Pipeline.pipe(), and it is collected by TaskStatsHook. The QueuePerfStats carries the information about the flow of data going through the pipeline, and it is collected by StatsQueue.

The following is the steps to export the stats.

  1. Subclass StatsQueue and TaskStatsHook and override interval_stats_callback method.†

  2. In the interval_stats_callback method, save the fields of QueuePerfStats to a location you can access later. ††

  3. For StatsQueue, provide the class object (not an instance) to PipelineBuilder.build() method.

  4. For TaskStatsHook, create a factory function that takes a name of the stage function and returns a list of TaskHook s applied to the stage, then provide the factory function to PipelineBuilder.build() method.

Note

  • When overriding the method, ensure that it does not hold the GIL, as this can degrade pipeline performance.
  • The destination can be anywhere such as a remote database, or a local file.