neuralset.extractors.base.LabelEncoder

pydantic model neuralset.extractors.base.LabelEncoder[source][source]

Encode a given field from an event, e.g. to be used as a label.

Parameters:
  • event_types (str or tuple of str) – Type of event(s) to apply this extractor to.

  • event_field (str) – Field to encode from the event.

  • allow_missing (bool) – If True, allow missing events without raising errors.

  • treat_missing_as_separate_class (bool) – If True, treat missing events as a separate class with index -1, or one-hot vector with last index set to 1. This is only relevant if allow_missing is True. Note: If using LabelEncoder for a multilabel classification task, set this to False for missing labels to be represented by a vector of all zeros.

  • return_one_hot (bool) – If True, return one-hot representation of the index. Otherwise, return an int in [0, n_unique_values - 1] (or the corresponding values provided in predefined_mapping, and -1 for missing events if treat_missing_as_separate_class=True).

  • predefined_mapping (dict, optional) – If provided, use this mapping from label to index instead of computing it from data. Values must be >= 0. If return_one_hot=True, these indices MUST be contiguous and start from 0.

Fields:
field treat_missing_as_separate_class: bool = False[source]
field return_one_hot: bool = False[source]
field predefined_mapping: dict[str, int] | None = None[source]
prepare(obj: DataFrame | Sequence[Event] | Sequence[Segment]) None[source][source]

Pre-compute and cache extractor data for a collection of events.

This method triggers _get_data on every matching event so that expensive computation (e.g. model inference) is done once and cached. It then calls the extractor on a single event to populate the output shape, which is needed when allow_missing=True.

Call prepare before using the extractor in a dataloader.

Parameters:

obj (DataFrame or sequence of Event or sequence of Segment) – The structure containing the events. When calling prepare on several objects, prefer passing a list of events or segments over a DataFrame to avoid redundant conversion overhead.

get_static(event: Event) Tensor[source][source]

Return a single feature vector for the given event.

Override this method in subclasses to produce a static (non-temporal) embedding for one event. The returned tensor should have no time dimension — temporal wrapping is handled by BaseStatic automatically.

Parameters:

event (Event) – The event to extract a feature from.

Returns:

A tensor of shape (*feature_shape,) (no time axis).

Return type:

torch.Tensor

requirements: tp.ClassVar[tuple[str, ...]] = ()[source]