neuraltrain.models.diffusion_prior.DiffusionPrior¶

pydantic model neuraltrain.models.diffusion_prior.DiffusionPrior[source][source]¶

Diffusion prior module adapted from MindEye [1].

Although the parameters text_embed and image_embed appear to refer specifically to text and image data, they can represent any embedding: text_embed is the input (x) to the diffusion prior, and image_embed is the target (y) that the prior aims to denoise.

Parameters:

depth (int) – Number of Transformer layers in the prior network.
dim_head (int) – Dimension per attention head.
prior_learned_query_mode ({"token", "pos_emb", "all_pos_emb"}) – How to handle learned queries for image tokens.
timesteps (int) – Number of diffusion denoising steps.
cond_drop_prob (float) – Dropout probability applied to the conditioning input for classifier-free guidance.
predict ({"x_start", "v"}) – Prediction target: "x_start" predicts the clean embedding directly; "v" uses the velocity parameterisation from Imagen.

References

Fields:

cond_drop_prob (float)
depth (int)
dim_head (int)
predict (Literal['x_start', 'v'])
prior_learned_query_mode (Literal['token', 'pos_emb', 'all_pos_emb'])
timesteps (int)

field depth: int = 6[source]¶

field dim_head: int = 64[source]¶

field prior_learned_query_mode: Literal['token', 'pos_emb', 'all_pos_emb'] = 'pos_emb'[source]¶

field timesteps: int = 100[source]¶

field cond_drop_prob: float = 0.2[source]¶

field predict: Literal['x_start', 'v'] = 'x_start'[source]¶

build(dim: int, num_out_tokens: int, num_in_tokens: int) → NewDiffusionPrior[source][source]¶

← Back to API reference