Dimensions: Image size and aspect ratio
How width and height affect generation quality, with architecture-specific limits and aspect ratio guidance.
Introduction
The width and height parameters define the canvas size of your generated image. Beyond resolution, these values determine the aspect ratio, which directly influences composition, framing, and how the model distributes visual elements across the output.
Aspect ratios
The ratio between width and height shapes how the model composes the scene:
- Square (1:1), 1024 × 1024: General-purpose. Works well for portraits, product shots, and social media.
- Portrait (taller than wide), 768 × 1024, 832 × 1216: Better results for character portraits and full-body shots. The vertical space encourages the model to compose along a vertical axis.
- Landscape (wider than tall), 1024 × 768, 1216 × 832: Ideal for scenic views, environments, and panoramic compositions.
- Ultra-wide, 1536 × 640, 1792 × 512: For banners, cinematic shots, and panoramas.
A floating island with waterfalls spilling into the sky, surrounded by colorful clouds and giant birds, under a vibrant sunset, ultra-wide cinematic shot
The model adapts its composition to fill the canvas. A landscape prompt in a portrait frame will stack elements vertically, while the same prompt in landscape will spread them horizontally. Choosing the right ratio before generating is often more effective than cropping afterwards.
Request structure
Width and height are passed at the top level of your generation request. Values must be multiples of 8 (or 16 for FLUX models).
[
{
"taskType": "imageInference",
"model": "civitai:101055@128078",
"positivePrompt": "A floating island with waterfalls and colorful clouds",
"width": 1216,
"height": 832,
"steps": 30
}
]Architecture limits
Each model architecture has specific dimension constraints. Generating within the trained resolution range produces the best results, since models learn visual patterns at specific scales.
| Architecture | Range | Multiple of | Trained at | Notes |
|---|---|---|---|---|
| SD 1.5 | 128–1024 | 8 | 512 × 512 | Best quality at 512. Higher resolutions can produce duplicated subjects |
| SDXL | 128–2048 | 8 | 1024 × 1024 | Optimized for 1024. Good quality up to 1536 |
| FLUX | 128–2048 | 16 | 1024 × 1024 | Flexible architecture, handles various resolutions well |
| Z-Image | 128–2048 | 16 | 1024 × 1024 | Flow-matching architecture from Alibaba |
Generating outside a model's trained resolution can produce artifacts. SD 1.5 at 1024 × 1024 often generates duplicated or merged subjects (two heads, repeated objects). If you need higher resolution output, generate at the model's native resolution and then upscale the result.
Dimensions in image-to-image
When using image-to-image or outpainting , you can set output dimensions that differ from the source image. This forces the model to recompose the scene to fit the new canvas:
- Same aspect ratio, different size (512 × 512 → 1024 × 1024): Uniform scaling. Generally produces good results since no recomposition is needed.
- Different aspect ratio (1024 × 1024 → 1536 × 768): The model must decide how to redistribute content, potentially stretching elements or generating new content to fill the expanded dimension. Use a higher
strengthvalue (0.8+) to give the model enough freedom to recompose naturally.
For controlled expansion in a specific direction, outpainting is often a better choice than changing dimensions in image-to-image, since it preserves the original content and only generates new content in the expanded area.
Tips
- Match your model's trained resolution. Generate at 512 for SD 1.5, 1024 for SDXL and FLUX. Going outside this range is possible but costs quality.
- Generate small, upscale later. For high-resolution output, it's more efficient and higher quality to generate at native resolution and use upscaling to increase the size afterwards.
- Choose aspect ratio before generating. Cropping a square image into landscape loses content. Generating in the target ratio from the start lets the model compose for that shape.
- Use multiples of 64 for safe values. While the minimum multiple varies by architecture (8 or 16), using multiples of 64 (512, 768, 1024, 1280) ensures compatibility across all models.
- Consider aspect ratio for the subject. Portraits work better in vertical frames, landscapes in horizontal. Let the content dictate the ratio rather than defaulting to square.