neuralbench.modules.DownstreamWrapper¶

pydantic model neuralbench.modules.DownstreamWrapper[source][source]¶

Configuration for wrapping a (pretrained) model for downstream fine-tuning or linear probing.

This class provides a declarative way to configure how a pretrained model should be adapted for downstream tasks, including optional on-the-fly preprocessing, layer freezing, output aggregation, and adding a trainable probe on top of the model.

Parameters:

on_the_fly_preprocessor (OnTheFlyPreprocessor | None, optional) – On-the-fly preprocessing applied to the input before the model forward pass. Typically model-specific (e.g. QuantileAbsScaler for BIOT). Default is None.
channel_adapter_config (ChannelMerger | ChannelProjection | None, optional) – Configuration for a channel adapter that projects from arbitrary input channels to a fixed number of target channels. Supply a ChannelMerger for position-based spatial attention, or a ChannelProjection for a simple Conv1d(kernel_size=1) linear mixing. Default is None.
model_output_key (str | int | None, optional) – Key or index to extract from model output dictionary. If None, assumes the model returns a tensor directly. Default is None.
layers_to_freeze (list[str] | None, optional) – List of layer name patterns to freeze (set requires_grad=False). Cannot be used together with layers_to_unfreeze. Default is None.
layers_to_unfreeze (list[str] | tp.Literal["last"] | None, optional) – List of layer name patterns to unfreeze (set requires_grad=True), while freezing all others. Cannot be used together with layers_to_freeze. If “last”, unfreezes the last layer (nn.Module) of the model. Default is None.
strict_matching (bool, optional) – If True, when freezing/unfreezing layers, only the first part of the layer name (before the first dot) must match exactly. If False, any part of the layer name can match the patterns. Default is True.
aggregation ({"flatten", "mean", "first"} or int, optional) – Method to aggregate the model output. "flatten" flattens all dimensions except batch; "mean" averages over the temporal/sequence dimension (dim=1); "first" selects only the first timestep/token; an int splits into n groups, averages each group, then concatenates; None performs no aggregation.
probe_config (Mlp | "linear" | None, optional) – Configuration for the probe layer added on top. None uses identity (no additional layer), e.g. if the model already has a linear layer of the right output size. "linear" adds a single linear layer. An Mlp instance adds a multi-layer perceptron with specified configuration.

Fields:

aggregation (Literal['flatten', 'mean', 'first'] | int | None)
channel_adapter_config (neuraltrain.models.common.ChannelMerger | neuralbench.modules.ChannelProjection | None)
layers_to_freeze (list[str] | None)
layers_to_unfreeze (list[str] | Literal['last'] | None)
model_output_key (str | int | None)
on_the_fly_preprocessor (neuraltrain.models.preprocessor.OnTheFlyPreprocessor | None)
probe_config (neuraltrain.models.common.Mlp | Literal['linear'] | None)
strict_matching (bool)

field on_the_fly_preprocessor: OnTheFlyPreprocessor | None = None[source]¶

field channel_adapter_config: ChannelMerger | ChannelProjection | None = None[source]¶

field model_output_key: str | int | None = None[source]¶

field layers_to_freeze: list[str] | None = None[source]¶

field layers_to_unfreeze: list[str] | Literal['last'] | None = None[source]¶

field strict_matching: bool = True[source]¶

field aggregation: Literal['flatten', 'mean', 'first'] | int | None = 'flatten'[source]¶

field probe_config: Mlp | Literal['linear'] | None = 'linear'[source]¶

property n_adapter_target_channels: int | None[source]¶: Target channel count of the adapter, or None if no adapter is configured.

build(model: Module, dummy_batch: dict[str, Tensor | None], n_outputs: int, input_channel_names: list[str] | None = None) → DownstreamWrapperModel[source][source]¶

← Back to API reference