neuraltrain.models.conv_transformer.ConvTransformer¶
- class neuraltrain.models.conv_transformer.ConvTransformer(*, dim: int = 512, encoder_config: SimplerConv | SimpleConv, temporal_downsampling_config: TemporalDownsampling | None = None, conv_pos_emb_kernel_size: int | None = None, neuro_device_types: list[str] | None = None, add_cls_token: bool = False, pre_transformer_layer_norm: bool = False, transformer_config: TransformerEncoder | Conformer | None = None, output_avg_pool: bool = False, output_layer_dim: int | None = 0)[source][source]¶
Convolutional encoder followed by optional temporal aggregation and a transformer.
- Parameters:
dim (int) – Internal token dimension.
encoder_config (neuraltrain.models.simplerconv.SimplerConv | neuraltrain.models.simpleconv.SimpleConv) – Configuration for the convolutional encoder.
temporal_downsampling_config (neuraltrain.models.common.TemporalDownsampling | None) – Configuration for the optional temporal downsampling module.
conv_pos_emb_kernel_size (int | None) – If provided, use convolutional positional embedding with this kernel size.
neuro_device_types (list[str] | None) – List of expected neuro device types that can be used to embed the device type in the transformer.
add_cls_token (bool) – If True, add a [CLS] token to the input of the transformer.
pre_transformer_layer_norm (bool) – If True, apply layer normalization before the transformer.
transformer_config (neuraltrain.models.transformer.TransformerEncoder | neuraltrain.models.conformer.Conformer | None) – Configuration for the transformer encoder.
output_avg_pool (bool) – If True, average the tokens outputted by the transformer.
output_layer_dim (int | None) – Set to 0 for no output layer, or None to use the same dimension as the transformer. Of note, both Bendr and Wav2vec2.0 use an output linear projection though it’s not mentioned in their respective papers.