Steps: Controlling refinement in image generation

How the steps parameter controls the iterative refinement process, with architecture-specific ranges and speed vs quality tradeoffs.

Introduction

The steps parameter defines how many iterations the model performs during generation. Each step refines the output further, gradually turning noise into a coherent image. More steps generally mean more detail, but with diminishing returns and longer generation times.

This parameter appears across all generation tasks (text-to-image, image-to-image, inpainting, and outpainting) and its behavior is consistent: higher step counts produce more refined output at the cost of speed.

How steps work

In diffusion-based models, each step removes a portion of noise from the image. In flow-matching models (like FLUX and Z-Image), each step moves the output along a learned trajectory from noise to data. Regardless of the internal mechanism, the progression follows the same pattern:

Steps0

A barely visible jellyfish shape in a sunlit canyon at 1 step — Generation time: 0.884sThe structure can barely be seen

A rough jellyfish shape in a sunlit canyon at 5 steps — Generation time: 0.945sStructure completed but overall very poor quality

A jellyfish in a sunlit canyon at 10 steps, improving but with artifacts — Generation time: 1.631sBetter, but still with artifacts

A jellyfish in a sunlit canyon at 15 steps, nearly complete — Generation time: 1.817sIt's almost there. This image can be used

A detailed jellyfish in a sunlit canyon at 20 steps — Generation time: 2.140sSmall details are appearing and turning the image into a high-quality one

A highly detailed jellyfish in a sunlit canyon at 50 steps with refined lighting — Generation time: 3.092sDetails such as lighting are better achieved at the cost of longer inference time

The generation process follows these phases:

Early steps (1-5): Establish basic composition, rough shapes, and color distribution.
Middle steps (5-15): Form recognizable objects, define spatial relationships, and develop textures.
Later steps (15-30): Refine details, enhance coherence, and develop subtle lighting.
Final steps (30+): Polish fine details and smooth transitions, with increasingly subtle changes.

Request structure

The steps parameter is a number passed at the top level of your generation request.

import { createClient } from '@runware/sdk'

const client = await createClient({ apiKey: process.env.RUNWARE_API_KEY })
await client.connect()

const [result] = await client.run({
  model: 'civitai:101055@128078',
  positivePrompt: 'A giant jellyfish floating through a sunlit canyon',
  steps: 30,
  width: 1024,
  height: 1024
})

import asyncio
import os

from runware import Runware


async def main():
    async with Runware(api_key=os.environ["RUNWARE_API_KEY"]) as client:
        results = await client.run({
            "model": "civitai:101055@128078",
            "positivePrompt": "A giant jellyfish floating through a sunlit canyon",
            "steps": 30,
            "width": 1024,
            "height": 1024
        })


asyncio.run(main())

curl https://api.runware.ai/v1 \
  -H "Authorization: Bearer $RUNWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "taskType": "imageInference",
      "model": "civitai:101055@128078",
      "positivePrompt": "A giant jellyfish floating through a sunlit canyon",
      "steps": 30,
      "width": 1024,
      "height": 1024
    }
  ]'

runware run civitai:101055@128078 \
  positivePrompt="A giant jellyfish floating through a sunlit canyon" \
  steps=30 \
  width=1024 \
  height=1024

{
  "taskType": "imageInference",
  "model": "civitai:101055@128078",
  "positivePrompt": "A giant jellyfish floating through a sunlit canyon",
  "steps": 30,
  "width": 1024,
  "height": 1024
}

Recommended ranges by architecture

Different model architectures have different sweet spots for step count:

Architecture	Recommended range	Max	Notes
SD 1.5	20-40	50	Trained at 512px. Benefits from higher steps for detail
SDXL	20-35	50	Good results at 25 steps with DPM++ 2M Karras
FLUX Dev	20-30	50	Efficient architecture, diminishing returns above 30
FLUX Schnell	4-8	50	Guidance-distilled for fast generation
Z-Image	20-30	50	Flow-matching architecture, similar behavior to FLUX
Z-Image Turbo	4-8	50	Speed-optimized variant for low step counts
SD 1.5 LCM	4-8	50	Latent Consistency Model, designed for very few steps
SDXL LCM	4-8	50	Same LCM optimization for SDXL
SDXL Lightning	4-8	50	Distilled for speed

Model distillation

Some models are created through knowledge distillation, where a smaller, faster model is trained to mimic a larger one. Distilled architectures like LCM (Latent Consistency Model) or FLUX Schnell can produce high-quality images in 4-8 steps compared to the 20-30 steps their non-distilled counterparts need. This makes them ideal for real-time or batch applications where speed is critical, though they may occasionally trade some detail quality for this efficiency.

Steps in image-to-image tasks

In image-to-image, inpainting, and outpainting, the steps parameter interacts with strength. The model doesn't run all the steps. It starts partway through the denoising schedule based on the strength value.

For example, with steps: 40 and strength: 0.5, the model only performs the last 20 steps. This means higher step counts become more important at lower strengths, since fewer actual steps are used for refinement.

Scheduler interaction

The choice of scheduler affects how much value you get from additional steps:

Fast-converging schedulers (Euler, UniPC, LCM): Reach good quality at 15-25 steps. Extra steps add little.
Detail-oriented schedulers (DPM++ 3M, DPM++ 2M SDE): Continue refining up to 40-50 steps.
Stochastic schedulers (Euler Ancestral, DPM++ SDE): Don't converge. More steps produce different results, not necessarily better ones.

Tips

Start at 25-30 steps for standard models. This hits the sweet spot for most SD 1.5, SDXL, and FLUX models. Only go higher if you see visible improvement.
Use 4-8 steps for distilled models. LCM, Lightning, Schnell, and similar architectures are designed for low step counts. Running them at 30 steps wastes compute.
Increase steps when using low strength. In image-to-image at strength: 0.3, a step count of 50 gives you ~15 actual refinement steps, while 20 would give you only ~6.
Match steps to your scheduler. A fast scheduler like Euler at 15 steps is great for prototyping. Switch to DPM++ 2M Karras at 30 steps for final renders.
Profile your use case. In batch processing, cutting from 30 to 20 steps can halve your costs with minimal quality loss. Test and find your acceptable threshold.