spdl.pipeline.iterate_in_subprocess¶
- iterate_in_subprocess(fn: Callable[[], Iterable[T]], *, buffer_size: int = 3, initializer: Callable[[], None] | Sequence[Callable[[], None]] | None = None, mp_context: str | None = None, timeout: float | None = None, daemon: bool = False) Iterable[T][source]¶
[Experimental] Run the given
iterablein a subprocess.The subprocess is created once and reused across iterations. The returned
Iterablesupports multiple iterations — each call toiter()(orfor ... in) instructs the worker to create a fresh iterator from the underlying iterable without spawning a new process. Because process creation involves overhead (fork/spawn, initializer execution, and pickling), reusing the same worker is more efficient than calling this function repeatedly.Note
fn()is called once in the subprocess to create the iterable. Each subsequentiter()call creates a fresh iterator by callingiter(iterable)on the same object. Iffn()returns a properIterable(a class with__iter__that creates a new iterator each time), re-iteration works as expected.However, if
fn()returns a generator (or any single-use iterator), re-iteration will silently yield no items. This is because a generator is its own iterator —iter(generator)returnsself— so once exhausted, callingiter()again returns the same exhausted object. The first iteration will work correctly, but all subsequent iterations will appear empty.- Parameters:
fn – Function that returns an iterator. Use
functools.partial()to pass arguments to the function.buffer_size – Maximum number of items to buffer in the queue.
initializer – Functions executed in the subprocess before iteration starts.
mp_context – Context to use for multiprocessing. If not specified, a default method is used.
timeout – Timeout for inactivity. If the generator function does not yield any item for this amount of time, the process is terminated.
daemon – Whether to run the process as a daemon. Use it only for debugging.
- Returns:
Iterator over the results of the generator function.
Note
The function and the values yielded by the iterator of generator must be picklable.
See also
run_pipeline_in_subprocess()for runinng aPipelinein a subprocessParallelism and Performance for the context in which this function was created.
Remote Iterable Protocol for implementation details