IP Adapters: Reference-based image generation

Use reference images to guide generation, enabling style transfer and visual consistency across outputs.

Introduction

IP-Adapters (Image Prompt Adapters) enable reference-based generation, where one or more input images guide the visual output alongside your text prompt. Unlike standard image-to-image which directly transforms the input, IP-Adapters extract visual features from reference images and inject them into the generation process to create entirely new content that inherits those features.

This makes IP-Adapters the right tool when you want to transfer a visual quality (a style, a color palette, a subject's appearance) into a new composition without being constrained by the reference image's layout or structure.

Close-up of a realistic honeybee standing on a wooden surface — Reference image

Futuristic robot designed to resemble a bee, standing on a wooden surface — Reference image + 'robot' prompt

How it works

IP-Adapters work by passing your reference images through a vision encoder (typically a CLIP image encoder) that extracts high-level visual features: color distributions, textures, shapes, stylistic patterns, and subject characteristics. These features are then injected into the generation model's attention layers, where they influence the output alongside your text prompt.

The key difference from image-to-image is structural. Image-to-image uses the reference as a pixel-level starting point, preserving layout and composition while modifying details. IP-Adapters use the reference as a feature-level conditioning signal, preserving visual qualities while allowing entirely new compositions.

This means you can:

Provide a photo of a product and generate it in a completely different scene.
Use an artwork as a style reference and apply that aesthetic to any subject.
Feed multiple references to blend visual characteristics from different sources.

Request structure

The ipAdapters parameter is an array of IP-Adapter configurations. Each configuration specifies a model, reference images, and influence settings.

import { createClient } from '@runware/sdk'

const client = await createClient({ apiKey: process.env.RUNWARE_API_KEY })
await client.connect()

const [result] = await client.run({
  model: 'civitai:101055@128078',
  positivePrompt: 'robot',
  ipAdapters: [
    {
      model: 'runware:55@4',
      guideImages: [
        'a1b2c3d4-e5f6-7890-abcd-ef1234567890'
      ],
      weight: 0.8
    }
  ],
  steps: 30,
  width: 1024,
  height: 1024
})

import asyncio
import os

from runware import Runware


async def main():
    async with Runware(api_key=os.environ["RUNWARE_API_KEY"]) as client:
        results = await client.run({
            "model": "civitai:101055@128078",
            "positivePrompt": "robot",
            "ipAdapters": [
                {
                    "model": "runware:55@4",
                    "guideImages": [
                        "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
                    ],
                    "weight": 0.8
                }
            ],
            "steps": 30,
            "width": 1024,
            "height": 1024
        })


asyncio.run(main())

curl https://api.runware.ai/v1 \
  -H "Authorization: Bearer $RUNWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "taskType": "imageInference",
      "model": "civitai:101055@128078",
      "positivePrompt": "robot",
      "ipAdapters": [
        {
          "model": "runware:55@4",
          "guideImages": [
            "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
          ],
          "weight": 0.8
        }
      ],
      "steps": 30,
      "width": 1024,
      "height": 1024
    }
  ]'

runware run civitai:101055@128078 \
  positivePrompt=robot \
  ipAdapters.0.model=runware:55@4 \
  ipAdapters.0.guideImages.0=a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
  ipAdapters.0.weight=0.8 \
  steps=30 \
  width=1024 \
  height=1024

{
  "taskType": "imageInference",
  "model": "civitai:101055@128078",
  "positivePrompt": "robot",
  "ipAdapters": [
    {
      "model": "runware:55@4",
      "guideImages": [
        "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
      ],
      "weight": 0.8
    }
  ],
  "steps": 30,
  "width": 1024,
  "height": 1024
}

The required fields are model (which IP-Adapter architecture to use) and guideImages (an array of image UUIDs, URLs, or base64 data). Optional parameters include:

weight: Controls the strength of the reference image's influence (0.0 to 2.0). Higher values make the output look more like the reference.
combineMethod: How multiple guide images are combined when you provide more than one.
embedScaling: Controls how the visual embeddings are scaled during injection.
weightType: Determines how weight is distributed across the model's attention layers.

IP Adapters vs. image-to-image vs. ControlNet

These three features all use reference images, but they serve different purposes:

Feature	What it preserves	What changes	Best for
Image-to-image	Layout, composition, overall structure	Colors, textures, style, detail level	Restyling, upscaling, prompt-guided modifications
ControlNet	Specific structural elements (edges, depth, pose)	Everything else	Precise structural control, pose matching
IP Adapters	Visual style, subject features, color palette	Composition, layout, scene	Style transfer, subject consistency, visual references

The choice depends on what you want to keep from the reference. If you want to keep the layout, use image-to-image. If you want to keep specific structural elements like edges or poses, use ControlNet. If you want to keep the visual "feel" while generating something new, use IP Adapters.

FLUX Redux

FLUX Redux (runware:105@1) is an IP-Adapter model built specifically for the FLUX architecture. It enables image variation generation: given a reference image, it reproduces the image with variations, letting you refine existing images or create multiple alternatives from a single reference.

Unlike standard IP Adapters, FLUX Redux works differently:

Use runware:105@1 as the IP-Adapter model.
Provide the input image in the guideImage parameter inside the ipAdapters object.
Use a FLUX base model (typically runware:101@1) as the generation model. Other FLUX models work too.
There is no weight parameter. The positivePrompt has no effect on the output.

import { createClient } from '@runware/sdk'

const client = await createClient({ apiKey: process.env.RUNWARE_API_KEY })
await client.connect()

const [result] = await client.run({
  model: 'runware:101@1',
  positivePrompt: '__BLANK__',
  ipAdapters: [
    {
      guideImage: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890',
      model: 'runware:105@1'
    }
  ],
  steps: 30,
  width: 1024,
  height: 1024
})

import asyncio
import os

from runware import Runware


async def main():
    async with Runware(api_key=os.environ["RUNWARE_API_KEY"]) as client:
        results = await client.run({
            "model": "runware:101@1",
            "positivePrompt": "__BLANK__",
            "ipAdapters": [
                {
                    "guideImage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
                    "model": "runware:105@1"
                }
            ],
            "steps": 30,
            "width": 1024,
            "height": 1024
        })


asyncio.run(main())

curl https://api.runware.ai/v1 \
  -H "Authorization: Bearer $RUNWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "taskType": "imageInference",
      "model": "runware:101@1",
      "positivePrompt": "__BLANK__",
      "ipAdapters": [
        {
          "guideImage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
          "model": "runware:105@1"
        }
      ],
      "steps": 30,
      "width": 1024,
      "height": 1024
    }
  ]'

runware run runware:101@1 \
  positivePrompt=__BLANK__ \
  ipAdapters.0.guideImage=a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
  ipAdapters.0.model=runware:105@1 \
  steps=30 \
  width=1024 \
  height=1024

{
  "taskType": "imageInference",
  "model": "runware:101@1",
  "positivePrompt": "__BLANK__",
  "ipAdapters": [
    {
      "guideImage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "model": "runware:105@1"
    }
  ],
  "steps": 30,
  "width": 1024,
  "height": 1024
}

Use __BLANK__ as your positivePrompt to generate pure variations without any text guidance. This tells the model to focus exclusively on the visual information from the reference image.

Tips

Start with weight 0.5-0.7. Full weight (1.0+) often overpowers the text prompt, making the output a near-copy of the reference. Lower weights give a balanced blend of reference style and prompt content.
Use clear, well-lit reference images. The vision encoder extracts features better from images with good contrast and clear subjects. Dark, noisy, or cluttered references produce weaker conditioning.
Combine with text prompts for control. IP Adapters work best when paired with a descriptive prompt. The prompt defines what to generate, the reference defines how it should look. Without a prompt, the model tends to reproduce the reference too literally.
Multiple references for style blending. You can pass multiple images in the guideImages array to blend characteristics from different sources. Use lower weights (0.3-0.5) when blending to prevent one reference from dominating.