Embeddings: Custom concepts for image generation

Encode custom visual concepts, styles, or subjects into specialized tokens for consistent generation.

Introduction

Embeddings (also called Textual Inversions) are specialized text tokens that encode complex visual concepts, styles, or subjects. Unlike LoRAs which modify the model's weights, embeddings work by teaching the model's text encoder to recognize new tokens that represent specific visual ideas.

An embedding creates a learned representation of a visual concept derived from training images. When applied, the model interprets it as an instruction to either include or avoid the associated visual concept, depending on whether it's used with a positive or negative weight.

Embeddings are useful when you need to generate consistent results across multiple runs, or when you need to capture concepts that are hard to express with plain text. Common use cases include repeatable generation of specific characters or subjects (ensuring facial features and outfits stay stable), encoding distinctive artistic styles that resist text description, and anchoring visual consistency across batch generations.

Embeddings are only supported by models that use the CLIP text encoder, specifically SD 1.5 and SDXL architectures. Newer models like FLUX and HiDream use different text encoders and do not support embeddings. Third-party models like Recraft and Ideogram also don't support them. If you're working with these architectures, consider using LoRAs or IP Adapters for similar functionality.

A generated image of a woman waving hello with distorted hands
Without embedding With 'Negative Hands' embedding
A 'hand-fixing' embedding used with a negative weight can improve hand details without changing your overall image concept.

Request structure

Embeddings are added through the embeddings array parameter, which can include multiple embeddings simultaneously.

[
  {
    // other parameters...
    "embeddings": [
      { "model": "civitai:118418@134583", "weight": 1.5 },
      { "model": "civitai:98259@539032", "weight": 0.8 }
    ]
  }
]

The weight parameter controls how strongly the embedding influences the generation, with a range from -4.0 to 4.0. Positive weights enhance or add the embedded concept to your generation, while negative weights suppress or remove that concept. Higher absolute values create stronger influence in either direction.

Embeddings vs. LoRAs

Both embeddings and LoRAs can influence style and content, but they work at different levels of the pipeline:

  • Embeddings operate in the text encoder. They add new vocabulary the model can understand, but they don't change how the model generates pixels. This makes them smaller (typically under 100KB) and less disruptive, but also less powerful for large stylistic shifts.
  • LoRAs modify the generation model's weights directly. They can produce stronger visual changes and handle more complex adaptations, but they're larger and architecture-bound.

For maximum control, you can combine both in the same request. A style LoRA paired with a negative artifact-fixing embedding is a common production pattern.

Tips

  1. Use negative embeddings for quality control. Negative embeddings trained on common artifacts (bad hands, blurry faces, watermarks) can improve output quality across any prompt without altering your creative intent.
  2. Keep weights moderate. Weights above 2.0 or below -2.0 can cause visible distortion. Start at 1.0 and adjust incrementally.
  3. Layer sparingly. Using more than 2-3 embeddings simultaneously can cause them to interfere with each other. Prioritize the most impactful ones.