spdl.pipeline.run_pipeline_in_subprocess

run_pipeline_in_subprocess(config_or_builder: PipelineConfig[T], /, *, num_threads: int, max_failures: int | Fraction = -1, report_stats_interval: float = -1, queue_class: type[AsyncQueue] | None = None, task_hook_factory: Callable[[StageInfo], list[TaskHook]] | None = None, background_tasks: list[Callable[[], BackgroundTask]] | None = None, **kwargs: Any) Iterable[T][source]

Run the given Pipeline in a subprocess, and iterate on the result.

The returned Iterable supports multiple iterations. The subprocess is created once and reused — each call to iter() (or for ... in) builds a fresh Pipeline inside the same subprocess without spawning a new process. This avoids the overhead of subprocess creation (fork/spawn, initializer execution, and pickling) on every iteration.

For multi-epoch training, create the iterable once before the epoch loop and iterate it each epoch:

src = run_pipeline_in_subprocess(config, num_threads=4)
for epoch in range(num_epochs):
    for batch in src:
        train(batch)
Parameters:
Yields:

The results yielded from the pipeline.

See also