---
title: "IP Adapters: Reference-based image generation | Runware Docs"
url: https://runware.ai/docs/learn/ip-adapters
description: Use reference images to guide generation, enabling style transfer and visual consistency across outputs.
relatedDocuments:
  - https://runware.ai/docs/learn/image-to-image
  - https://runware.ai/docs/learn/controlnet
---
## Introduction

IP-Adapters (Image Prompt Adapters) enable **reference-based generation**, where one or more input images guide the visual output alongside your text prompt. Unlike standard image-to-image which directly transforms the input, IP-Adapters **extract visual features from reference images** and inject them into the generation process to create entirely new content that inherits those features.

This makes IP-Adapters the right tool when you want to transfer a visual quality (a style, a color palette, a subject's appearance) into a new composition without being constrained by the reference image's layout or structure.

![Close-up of a realistic honeybee standing on a wooden surface](https://runware.ai/docs/assets/ipadapter-source.DZfSo2Mm_1UG7dq.jpg)

*Reference image*

> **Prompt**: Close-up of a realistic honeybee standing on a wooden surface

![Futuristic robot designed to resemble a bee, standing on a wooden surface](https://runware.ai/docs/assets/ipadapter-output.DBkCqjAN_11lg8W.jpg)

*Reference image + 'robot' prompt*

> **Prompt**: robot

## How it works

IP-Adapters work by passing your reference images through a **vision encoder** (typically a CLIP image encoder) that extracts high-level visual features: color distributions, textures, shapes, stylistic patterns, and subject characteristics. These features are then injected into the generation model's attention layers, where they influence the output alongside your text prompt.

The key difference from image-to-image is structural. Image-to-image uses the reference as a **pixel-level starting point**, preserving layout and composition while modifying details. IP-Adapters use the reference as a **feature-level conditioning signal**, preserving visual qualities while allowing entirely new compositions.

This means you can:

- Provide a photo of a product and generate it in a completely different scene.
- Use an artwork as a style reference and apply that aesthetic to any subject.
- Feed multiple references to blend visual characteristics from different sources.

## Request structure

The `ipAdapters` parameter is an array of IP-Adapter configurations. Each configuration specifies a model, reference images, and influence settings.

```json
[
  {
    "taskType": "imageInference",
    "model": "civitai:101055@128078",
    "positivePrompt": "robot",
    "ipAdapters": [{
      "model": "runware:55@4",
      "guideImages": ["a1b2c3d4-e5f6-7890-abcd-ef1234567890"],
      "weight": 0.8
    }],
    "steps": 30,
    "width": 1024,
    "height": 1024
  }
]
```

The required fields are `model` (which IP-Adapter architecture to use) and `guideImages` (an array of image UUIDs, URLs, or base64 data). Optional parameters include:

- **`weight`**: Controls the strength of the reference image's influence (0.0 to 2.0). Higher values make the output look more like the reference.
- **`combineMethod`**: How multiple guide images are combined when you provide more than one.
- **`embedScaling`**: Controls how the visual embeddings are scaled during injection.
- **`weightType`**: Determines how weight is distributed across the model's attention layers.

## IP Adapters vs. image-to-image vs. ControlNet

These three features all use reference images, but they serve different purposes:

| Feature | What it preserves | What changes | Best for |
| --- | --- | --- | --- |
| **Image-to-image** | Layout, composition, overall structure | Colors, textures, style, detail level | Restyling, upscaling, prompt-guided modifications |
| **ControlNet** | Specific structural elements (edges, depth, pose) | Everything else | Precise structural control, pose matching |
| **IP Adapters** | Visual style, subject features, color palette | Composition, layout, scene | Style transfer, subject consistency, visual references |

The choice depends on **what you want to keep** from the reference. If you want to keep the layout, use image-to-image. If you want to keep specific structural elements like edges or poses, use ControlNet. If you want to keep the visual "feel" while generating something new, use IP Adapters.

## FLUX Redux

FLUX Redux (`runware:105@1`) is an IP-Adapter model built specifically for the FLUX architecture. It enables **image variation generation**: given a reference image, it reproduces the image with variations, letting you refine existing images or create multiple alternatives from a single reference.

Unlike standard IP Adapters, FLUX Redux works differently:

- Use `runware:105@1` as the IP-Adapter model.
- Provide the input image in the `guideImage` parameter inside the `ipAdapters` object.
- Use a FLUX base model (typically `runware:101@1`) as the generation model. Other FLUX models work too.
- There is no `weight` parameter. The `positivePrompt` has no effect on the output.

```json
[
  {
    "taskType": "imageInference",
    "model": "runware:101@1",
    "positivePrompt": "__BLANK__",
    "ipAdapters": [{
      "guideImage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "model": "runware:105@1"
    }],
    "steps": 30,
    "width": 1024,
    "height": 1024
  }
]
```

> [!NOTE]
> Use `__BLANK__` as your `positivePrompt` to generate pure variations without any text guidance. This tells the model to focus exclusively on the visual information from the reference image.

## Tips

1. **Start with weight 0.5-0.7.** Full weight (1.0+) often overpowers the text prompt, making the output a near-copy of the reference. Lower weights give a balanced blend of reference style and prompt content.
2. **Use clear, well-lit reference images.** The vision encoder extracts features better from images with good contrast and clear subjects. Dark, noisy, or cluttered references produce weaker conditioning.
3. **Combine with text prompts for control.** IP Adapters work best when paired with a descriptive prompt. The prompt defines what to generate, the reference defines how it should look. Without a prompt, the model tends to reproduce the reference too literally.
4. **Multiple references for style blending.** You can pass multiple images in the `guideImages` array to blend characteristics from different sources. Use lower weights (0.3-0.5) when blending to prevent one reference from dominating.