neuraltrain.losses.losses.SigLipLoss¶
- class neuraltrain.losses.losses.SigLipLoss(norm_kind: str | None = 'y', temperature: bool = True, bias: bool = True, identical_candidates_threshold: float | None = 0.999, reduction: str = 'sum', reweigh_positives: bool = False)[source][source]¶
SigLIP contrastive loss.
Sigmoid loss for Language-Image Pretraining (SigLIP) from [1].
- Parameters:
norm_kind ({"x", "y", "xy"} or None) –
- How to normalize the estimates and/or candidates before computing their dot products.
'x': normalize estimates only.'y': normalize candidates only (approach originally used in brainmagick).'xy': normalize both estimates and candidates.None: do not normalize.
temperature (bool) – If True, use a learnable temperature parameter initialized to
ln(10).bias (bool) – If True, use a learnable bias parameter initialized to
-10(since most pairs are negative).identical_candidates_threshold (float or None) – If given, estimates are matched not only to their candidate, but all candidates that have a large cosine similarity to their candidate (larger or equal this threshold). Assumes such other candidates with high cosine similarity are duplicates. Intended to use only if candidate generator is frozen.
reduction (str) – Reduction applied to the binary cross-entropy loss (forwarded to
F.binary_cross_entropy_with_logits).reweigh_positives (bool) – If True and identical_candidates_threshold is set, down-weight duplicate positive pairs so only one copy contributes to the loss.
References
Note
Official jax implementation: https://github.com/google-research/big_vision/blob/474dd2ebde37268db4ea44decef14c7c1f6a0258/big_vision/trainers/proj/image_text/siglip.py
- forward(estimate: Tensor, candidate: Tensor) Tensor[source][source]¶
Warning: estimate and candidate are not necessarily symmetrical.
If estimate of shape [B, C] and candidate of shape [B’, C] with B’>=B, the first B samples of candidate are targets, while the remaining B’-B samples of candidate are only used as negatives.