spdl.pipeline.run_pipeline_in_subinterpreter¶
- run_pipeline_in_subinterpreter(config: PipelineConfig[T], /, *, num_threads: int, max_failures: int | Fraction = -1, report_stats_interval: float = -1, queue_class: type[AsyncQueue] | None = None, task_hook_factory: Callable[[StageInfo], list[TaskHook]] | None = None, background_tasks: list[Callable[[], BackgroundTask]] | None = None, **kwargs: Any) Iterable[T][source]¶
[Experimental] Run the given Pipeline in a subinterpreter, and iterate on the result.
The returned
Iterablesupports multiple iterations. The subinterpreter is created once and reused — each call toiter()(orfor ... in) builds a freshPipelineinside the same subinterpreter without creating a new one. This avoids the overhead of repeated subinterpreter creation and initializer execution on every iteration.For multi-epoch training, create the iterable once before the epoch loop and iterate it each epoch:
src = run_pipeline_in_subinterpreter(config, num_threads=4) for epoch in range(num_epochs): for batch in src: train(batch)
- Parameters:
config – The definition of
Pipeline.num_threads – Passed to
build_pipeline().max_failures – Passed to
build_pipeline().report_stats_interval – Passed to
build_pipeline().queue_class – Passed to
build_pipeline().task_hook_factory – Passed to
build_pipeline().background_tasks – Passed to
build_pipeline().kwargs – Passed to
iterate_in_subinterpreter().
- Yields:
The results yielded from the pipeline.
See also
iterate_in_subinterpreter()implements the logic for manipulating an iterable in a subinterpreter.Parallelism and Performance for the context in which this function was created.