Ideogram 4.0
Ideogram 4.0 is Ideogram's most capable text-to-image model for design-heavy image generation. It is built for frontier text rendering across languages, structured prompt control through natural language or JSON, bounding-box layout control, transparent background generation, and high-fidelity 2K output. It is well suited to posters, branded graphics, packaging, product visuals, typography-led compositions, and other workflows where design precision matters as much as visual quality.
Complete technical specification for integration
Ready-to-use code snippets for common workflows
Step-by-step tutorials for advanced use cases
← All GuidesText and design output
How to use Ideogram 4.0 for typography-heavy designs: rendering long and dense text, multilingual and handwritten scripts, descriptive and bbox-anchored layout, image-level and per-element color palettes, transparent backgrounds, aspect-ratio presets, and the three rendering-speed tiers.
Introduction
Text inside images is the historical weak spot of generative models. Most image models treat letters as visual texture rather than content. They recognise that text looks a certain way without understanding spelling or character order. Headlines come back garbled and brand names get warped, so any design that depends on legible copy ends up unusable.
Ideogram 4.0 is built for the opposite problem. It treats text as a first-class element with explicit content rather than visual decoration, which makes it reliable for designs where the words have to be exactly right. It handles dense small copy, multilingual scripts, handwritten lettering, and text that's been rotated or inverted as a deliberate design choice.
{ "high_level_description": "A contemporary modernist art exhibition poster for a fictional gallery, designed in Swiss-style typography with a single large hand-printed circular element.", "style_description": { "aesthetics": "Swiss-school exhibition poster design, total typographic discipline, single bold colour gesture.", "lighting": "Flat front-on poster lighting, no rendered shadow, cream paper texture visible.", "medium": "Hand-pulled silkscreen print on cream paper.", "art_style": "Modernist graphic design with Helvetica Neue Bold typography and a single hand-printed colour element.", "color_palette": ["#F2E9D6", "#C8442A", "#161616"] }, "compositional_deconstruction": { "background": "Clean cream off-white background filling the frame with a subtle paper texture.", "elements": [ { "type": "obj", "desc": "A large hand-printed circle in muted vermilion red occupying the right two-thirds of the poster, with a slightly irregular silkscreen edge texture." }, { "type": "text", "text": "FIELDS\nOF\nFOLD", "desc": "Exhibition title in massive black Helvetica Neue Bold, broken across three lines stacked left-aligned along the upper-left of the poster, tightly leaded." }, { "type": "text", "text": "PAINTINGS BY MIRA TANJA OKONKWO", "desc": "Subtitle in smaller spaced black sans-serif capitals directly below the title block." }, { "type": "text", "text": "OPENING SEPT 14 · RUNNING THROUGH NOV 2", "desc": "Date line in small black caps along the lower-left of the poster." }, { "type": "text", "text": "ARCANUM CONTEMPORARY · LISBON", "desc": "Gallery name and location in small black caps at the very bottom of the poster." } ] } }
This guide covers the text-rendering capabilities, then the design output features that depend on them: descriptive and bbox-anchored layout, image-level and per-element colour palettes, transparent-background output, the aspect-ratio presets, and the three rendering-speed tiers.
Text rendering
The four sub-sections below each isolate one capability. The same text element type drives all of them. What changes is what the model is being asked to render and how the surrounding desc describes it.
Dense small text
Most image models can render a headline. Almost none can render a paragraph. As soon as the copy gets long, the text degrades into shapes that look like writing but aren't actually words. Ideogram 4.0 handles paragraph-length copy and small annotations at near-readable sizes, even when the layout is a wall of labels at different positions.
The star chart below carries more than fifteen distinct labels: five constellation names inside the disc, eleven small star annotations packed around its rim, the plate caption, and the publisher mark.
{ "high_level_description": "A vintage hand-engraved astronomical star chart from a 19th-century almanac, packed with constellation names and labelled stars. A plain working chart with no decorative figures, no allegorical illustrations, no cherubs, no cartouches.", "style_description": { "aesthetics": "Sober scientific cartography, almanac-page solemnity, plain working-chart restraint with no ornamental engravings beyond the chart itself.", "lighting": "Flat, even, no scene lighting effects.", "medium": "Pen-and-ink engraving printed on faded cream parchment paper.", "art_style": "19th-century hand-engraved astronomical plate with fine crosshatched linework and italic serif annotations. Strictly the chart and its labels and nothing else.", "color_palette": ["#EFE3C1", "#0A1426", "#6F5630", "#FFFFFF"] }, "compositional_deconstruction": { "background": "Faded cream parchment paper filling the entire frame with subtle aging. The page is plain everywhere except for the central chart. No decorative figures, no cartouches with classical figures, no engraved vignettes in the corners.", "elements": [ { "type": "obj", "desc": "A large perfectly circular sky chart in deep midnight blue filling the central two-thirds of the page, with hundreds of small white star points of varying brightness and thin grey constellation lines connecting the major stars across Lyra, Cygnus, Aquila, Hercules, and Draco." }, { "type": "text", "text": "LYRA", "desc": "Constellation name in medium-sized engraved serif italic capitals, clearly readable, slightly arched, positioned inside the dark blue disc near the upper-left." }, { "type": "text", "text": "CYGNUS", "desc": "Constellation name in medium-sized engraved serif italic capitals, clearly readable, positioned inside the disc near the lower-right." }, { "type": "text", "text": "AQUILA", "desc": "Constellation name in medium-sized engraved serif italic capitals, positioned inside the disc near the lower-centre." }, { "type": "text", "text": "HERCULES", "desc": "Constellation name in medium-sized engraved serif italic capitals, positioned inside the disc near the upper-right." }, { "type": "text", "text": "DRACO", "desc": "Constellation name in medium-sized engraved serif italic capitals, positioned inside the disc near the upper edge." }, { "type": "text", "text": "Vega · α Lyrae · Mag 0.03", "desc": "Small but clearly legible annotation in sepia italic serif on the cream parchment just outside the disc, with a leader line pointing to Vega." }, { "type": "text", "text": "Deneb · α Cygni · Mag 1.25", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line pointing to Deneb." }, { "type": "text", "text": "Altair · α Aquilae · Mag 0.77", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line pointing to Altair." }, { "type": "text", "text": "Albireo · β Cygni · Mag 3.18", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into lower Cygnus." }, { "type": "text", "text": "Sulafat · γ Lyrae · Mag 3.24", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into lower Lyra." }, { "type": "text", "text": "Sadr · γ Cygni · Mag 2.23", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into central Cygnus." }, { "type": "text", "text": "Tarazed · γ Aquilae · Mag 2.72", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line near Altair." }, { "type": "text", "text": "Sheliak · β Lyrae · Mag 3.52", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into lower Lyra." }, { "type": "text", "text": "Rastaban · β Draconis · Mag 2.79", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into Draco." }, { "type": "text", "text": "Eltanin · γ Draconis · Mag 2.23", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into Draco." }, { "type": "text", "text": "Kornephoros · β Herculis · Mag 2.78", "desc": "Small but clearly legible annotation in sepia italic serif outside the disc, with a leader line into Hercules." }, { "type": "text", "text": "PLATE XVII — NORTHERN SUMMER SKY", "desc": "Large clearly legible caption beneath the circular chart in engraved serif capitals, centred, with a thin double-rule above and below." }, { "type": "text", "text": "Royal Almanack · MDCCCXC", "desc": "Publisher mark in clearly legible italic serif along the bottom edge of the page, centred." } ] } }
The five constellation names hold their italic engraved character inside the disc. The eleven star annotations packed around the rim stay legible at engraving-plate sizes, even with leader lines threading between them. The plate number and publisher mark at the bottom edge sit cleanly in their assigned positions.
Multilingual scripts
Text rendering in image models has historically meant Latin script. Anything else returns shapes that look approximately like the script in question but don't spell anything. Ideogram 4.0 handles the major non-Latin scripts with the same precision, which matters for international signage and multilingual packaging in any brand work that has to cross markets.
{ "high_level_description": "A clean modern airport wayfinding sign on a brushed-aluminum panel, presenting baggage claim directions in four languages.", "style_description": { "aesthetics": "Contemporary airport wayfinding clarity, universal pictogram discipline, neutral institutional voice.", "lighting": "Soft cool overhead ambient terminal lighting, subtle reflection on the brushed aluminum.", "photo": "Documentary photograph on a 35mm lens at a slight angle, sharp focus across the sign.", "medium": "Photograph.", "color_palette": ["#C8CACC", "#FFFFFF", "#2A2D31"] }, "compositional_deconstruction": { "background": "Brushed aluminum sign panel with subtle vertical grain, mounted on a clean light wall.", "elements": [ { "type": "obj", "desc": "Standardized white pictogram of a suitcase on the left side of the sign, with a downward-pointing arrow in white directly above it." }, { "type": "text", "text": "Baggage Claim", "desc": "English label in bold dark grey Frutiger-style sans-serif, upper-right of the sign." }, { "type": "text", "text": "手荷物受取所", "desc": "Japanese label directly below the English line, same weight and size." }, { "type": "text", "text": "استلام الأمتعة", "desc": "Arabic label directly below the Japanese line, set right-to-left." }, { "type": "text", "text": "Recogida de equipajes", "desc": "Spanish label directly below the Arabic line, same weight as the others." } ] } }
Each line of the sign is a separate text element with its own content. The same baggage-claim instruction written four ways at equal weight is a fairly extreme test, and each script's character carries through to the final render.
Handwritten and stylized lettering
Handwriting is harder than print because every glyph is an organic shape rather than a typographic instance. Most image models render "handwriting" as a fake script font with no actual hand quality. Ideogram 4.0 treats handwriting as an aesthetic to render, not a typeface to substitute, and renders dip-pen cursive, calligraphy, and sketched lettering as if a real hand made them.
The recipe page below uses handwriting where it would actually appear in 1924: a title, two section headings, an ingredient list, a method paragraph, and a signature, all in a single hand.
{ "high_level_description": "An open page from a 1920s leather-bound family cookbook, with a handwritten recipe in dip-pen cursive on aged cream paper, a small ink sketch of the finished dish in the upper-right corner, and the cook's signature at the bottom.", "style_description": { "aesthetics": "Warm family-album sensibility, gentle kitchen wear, dip-pen confidence, intimate domestic record.", "lighting": "Soft warm top-down light, very gentle shadow along the inner gutter.", "medium": "Photograph.", "photo": "Top-down archival photograph on a 50mm lens, flat to the page, sharp focus across the sheet.", "color_palette": ["#EFE2C3", "#5A3A1A", "#2F2A1A", "#9B7A45"] }, "compositional_deconstruction": { "background": "Aged cream cookbook page filling the frame, faint horizontal ruling barely visible, a small faded brown tea-stain ring near the lower-left, very gentle paper foxing throughout.", "elements": [ { "type": "text", "text": "Aunt Beatrice's Lemon Sponge", "desc": "Recipe title handwritten across the top of the page in large flowing dip-pen cursive with a confident downstroke on the capital A, dark sepia ink, slightly slanted right." }, { "type": "obj", "desc": "A small dip-pen ink sketch of a round frosted sponge cake on a doily, in the upper-right corner of the page, drawn in the same sepia ink as the writing." }, { "type": "text", "text": "Ingredients —", "desc": "Section heading in medium cursive, underlined with a single neat pen stroke, positioned just below the title on the left side." }, { "type": "text", "text": "4 eggs, 6 oz caster sugar, 6 oz self-raising flour, rind of two lemons, a knob of butter for the tin.", "desc": "Ingredient list handwritten in flowing cursive as one running line broken across two lines, sepia ink, beneath the Ingredients heading." }, { "type": "text", "text": "Method —", "desc": "Second section heading in matching medium cursive, also underlined, positioned below the ingredient list." }, { "type": "text", "text": "Whisk eggs and sugar until pale and thick. Fold in flour and rind gently. Bake at a moderate oven for twenty-five minutes until springing back to the touch.", "desc": "Method paragraph handwritten in matching cursive across three lines, sepia ink, beneath the Method heading." }, { "type": "text", "text": "— Beatrice Holloway, Easter 1924", "desc": "Signature in slightly larger flowing cursive at the lower-right of the page, finished with a long elegant pen flourish underneath." } ] } }
The slight forward slant and the variation in stroke weight read as a real hand. The flourish under Beatrice's signature is a small piece of personality the model added because the structure asked for it.
Inverted and rotated text
Text doesn't always sit horizontally on the canvas. Coins, seals, stamps, and curved type lockups all require letters that follow a curve, and a good portion of design work asks for text rotated or inverted as a deliberate choice. Most image models handle this badly, producing horizontal text visually distorted to fit a shape, rather than letters that actually fit the curve.
The wax seal below has two arcs of text following its rim. The top arc reads upright. The bottom arc is rotated to follow the curve, the way real engraved seals are designed.
{ "high_level_description": "A close-up macro photograph of an antique wax seal on parchment. THE DEFINING FEATURE OF THIS SEAL IS ITS INVERTED BOTTOM TEXT: the motto along the top rim reads upright, but the institutional name along the bottom rim is engraved upside-down, just like the reverse of a coin or a classical Vatican Republic seal.", "style_description": { "aesthetics": "Notarial gravitas, antique republican formality. The seal follows the classical 'circumscription' convention where each rim letter's top faces outward, which inverts the bottom arc relative to the top.", "lighting": "Soft warm directional light from upper right, raking across the embossed relief.", "photo": "Macro product photograph on a 100mm lens at f/4, sharp on the embossed text along both rim arcs.", "medium": "Photograph.", "color_palette": ["#E5D5B8", "#7A1A1A", "#A02828", "#241914"] }, "compositional_deconstruction": { "background": "Aged cream parchment paper filling the frame with subtle fiber texture.", "elements": [ { "type": "obj", "desc": "Circular crimson wax seal centered in the frame, irregular edge from the natural spread of melted wax, raised relief from pressed embossing." }, { "type": "text", "text": "VINCULUM ET LIBERTAS", "desc": "Latin motto in serif capitals arcing along the TOP HALF of the seal's rim. Letters oriented UPRIGHT, reading correctly when the seal lies flat. Each letter's top points outward toward the top edge." }, { "type": "text", "text": "RESPVBLICA AQVITANIAE", "desc": "Republic name arcing along the BOTTOM HALF of the seal's rim. CRITICAL: each letter's top points OUTWARD toward the bottom edge, which means each letter is PHYSICALLY UPSIDE-DOWN as viewed face-on. The viewer must rotate the seal 180 degrees to read this motto. DO NOT render this bottom arc upright like the top arc." }, { "type": "obj", "desc": "Small heraldic emblem of a stylized oak branch and key crossed over a central shield, embossed in the center of the seal." }, { "type": "text", "text": "MMXII", "desc": "Roman numeral year in small serif capitals directly beneath the central emblem, oriented UPRIGHT." } ] } }
The motto along the top arcs upright. The institutional name along the bottom arcs with each glyph rotated to follow the curve, so the text reads correctly to someone tilting the seal. This is the typographic discipline an industrial-design pipeline expects, and what most generators flatten into horizontal type that's been bent into a shape.
Layout control
Layout in a structured prompt has two levers: descriptive positioning in each element's desc field, and explicit bbox coordinates that pin an element to a region of the canvas.
Descriptive positioning is the lighter touch. The model reads phrases like "centered along the top", "in the lower-right corner", or "directly beneath the title block" and places elements accordingly. It works well when the layout has clear hierarchy and the model has enough room to make small decisions.
bbox is the heavier touch. It's an array of four integers, [y_min, x_min, y_max, x_max], in 0–1000 normalised coordinates with the origin at the top-left. The model honours the box through its shared positional embedding, so the element lands inside the named region rather than approximately near it.
The bbox order is row-first (y, x rather than x, y). Designers normally think in (x, y). Build the bbox as [top, left, bottom, right] to keep the order straight. Values must be integers, in [0, 1000], with y_min ≤ y_max and x_min ≤ x_max.
The concert ticket below is generated with explicit bbox coordinates on every element. The ASCII diagram on the left is a separate, illustrative pass through Nano Banana to sketch roughly where each element lands. The photograph on the right is what Ideogram rendered from the actual bbox coordinates.
{ "compositional_deconstruction": { "elements": [ { "type": "text", "bbox": [40, 220, 110, 480], "text": "ADMIT ONE", "desc": "Bold black sans-serif label in centered capitals." }, { "type": "text", "bbox": [140, 60, 240, 660], "text": "THE FILLMORE WEST", "desc": "Venue name in large condensed black serif capitals." }, { "type": "text", "bbox": [270, 220, 320, 480], "text": "AN EVENING WITH", "desc": "Small italic header in dark sepia ink." }, { "type": "text", "bbox": [360, 40, 540, 680], "text": "JONAS HARWELL TRIO", "desc": "Performer name in large black slab-serif capitals." }, { "type": "text", "bbox": [580, 100, 640, 600], "text": "FRIDAY · APRIL 12 · 8:00 PM", "desc": "Date and time line in medium-weight black sans-serif." }, { "type": "obj", "bbox": [0, 700, 1000, 720], "desc": "Thin dotted vertical perforation line running top to bottom." }, { "type": "text", "bbox": [40, 770, 100, 950], "text": "ADMIT ONE", "desc": "Small bold black sans-serif label centred at the top of the right stub, duplicating the main-area header so the stub still reads after the ticket is torn." }, { "type": "text", "bbox": [220, 750, 300, 970], "text": "NO. 0274", "desc": "Ticket number in small ink-blot black serif, centred on the stub." }, { "type": "text", "bbox": [400, 750, 480, 970], "text": "ROW G · SEAT 14", "desc": "Seat assignment in small condensed sans-serif, centred on the stub." }, { "type": "text", "bbox": [700, 770, 790, 940], "text": "$4.50", "desc": "Ticket price in small bold sans-serif, centred on the lower portion of the stub." } ] } }
ADMIT ONE sits at the top because its bbox is [40, 220, 110, 480]: top edge near y=40, near-centred horizontally between x=220 and x=480. The venue name fills the upper title block from [140, 60, 240, 660]. The performer name dominates the lower half via [360, 40, 540, 680]. The perforation obj is a tall narrow rectangle running full-height at [0, 700, 1000, 720]. Each right-stub element carves out its own small box inside the x=750-and-right zone, with a duplicate ADMIT ONE at the top of the stub and the price sitting just below the seat assignment so the stub reads as four evenly weighted rows. The elements land inside the named rectangles rather than approximately near them, and the perforation cleanly separates the stub from the main ticket area without the model negotiating where it should sit.
You can mix the two approaches. Pin the elements whose position is non-negotiable with bbox, and let the rest fall through descriptive positioning. Inside a single element, both fields can coexist: bbox declares the rectangle, and desc still carries the style and treatment notes.
Colour palette control
Colour conditioning in the structured prompt is explicit. Instead of describing colours in language ("warm sunset tones with terracotta and cream"), you list hex values the model treats as the colours to favour in the composition.
There are two places color_palette can appear in the JSON:
- Inside
style_description, at the image level. Up to 16 colours. The global palette for the entire image. - Inside an individual element, at the per-element level. Up to 5 colours per element. Targeted conditioning for one specific object or piece of text.
Both fields take an array of uppercase #RRGGBB hex strings. Shorthand #RGB and #RRGGBBAA formats are rejected by the verifier.
The two posters below come from the same structured prompt, distinguished only by a different style_description.color_palette and a one-word change in the medium description.
{ "style_description": { "color_palette": ["#C44536", "#F2B68C", "#2E1F1B", "#F4E4C1"] }, "compositional_deconstruction": { "background": "Stylized craggy mountain ridge under a sunset sky, geometric tree-line band crossing the lower third." } }
{ "style_description": { "color_palette": ["#1A3A4A", "#6FA8B5", "#0E1F2A", "#E8EEEF"] }, "compositional_deconstruction": { "background": "Stylized craggy mountain ridge at pre-dawn under a cool blue sky, geometric tree-line band crossing the lower third." } }
Both posters carry the same scene description: a mountain ridge under a wide sky, a lone pine, a sun on the horizon, the title block above, the imprint along the bottom. The font weights, the exact silhouette of the pine, the mountain peaks, and the foreground will vary between any two runs of an image model. What does not vary is the mood, and the mood is what the palette controls. The model treats the array as a target conditioning signal, not as a hint to interpret in language.
Per-element color_palette gives one element its own conditioning channel. A text element with its own palette can hold a brand colour that the rest of the scene doesn't have. An obj element with its own palette can carry a product colour without bleeding into the background. Up to 5 colours per element.
Transparent backgrounds
Design pipelines rarely use generated images as the final composition. They need elements that drop into a layered file: logos, icons, monograms, ornaments, badges. Ideogram 4.0 can produce these with a transparent background by asking for one explicitly in the prompt's background field, and by requesting outputFormat: "PNG" so the alpha channel survives.
{ "high_level_description": "An elegant typographic monogram emblem featuring interlocking serif letters M and T in a vintage engraved style, on a fully transparent background.", "style_description": { "aesthetics": "Vintage engraved monogram restraint, jewellery-stamp gravitas, no rendered scene.", "lighting": "Flat with no rendered light or shadow.", "medium": "Engraved emblem with gold-foil accents.", "art_style": "Vintage wood-engraved monogram with thin line work and ornamental flourishes.", "color_palette": ["#172A4A", "#C49B3B"] }, "compositional_deconstruction": { "background": "Pure transparent background with no scene elements. No paper texture, no shadow, no gradient.", "elements": [ { "type": "obj", "desc": "Interlocking serif capital letters M and T in deep navy blue with subtle gold-leaf detail on the serifs, occupying most of the frame." }, { "type": "obj", "desc": "Thin gold-leaf decorative double-rule frame in an elongated oval shape surrounding the monogram, with small ornamental flourishes at top and bottom." } ] } }
The monogram above is a single image element delivered with no scene behind it. Drop it into a card design or a letterhead template without having to mask anything by hand.
Transparency only survives in formats that carry an alpha channel. Always set outputFormat: "PNG" when you want the background to come through transparent.
Aspect ratios
Every output is approximately 4 million pixels. At 1:1 that's 2048 × 2048. Wider or taller ratios trade square pixel count for shape, and the API accepts only the predefined presets. The common ones for design work:
| Ratio | Dimensions | Typical use |
|---|---|---|
| 1:1 | 2048 × 2048 | Social squares, album covers, packshots |
| 16:9 | 2560 × 1440 | Landscape banners, video stills |
| 9:16 | 1440 × 2560 | Vertical video, story posts |
| 3:2 | 2496 × 1664 | 35mm photo proportions, posters |
| 2:3 | 1664 × 2496 | Portrait posters, book covers |
| 4:5 | 1792 × 2240 | Editorial portrait, Instagram portrait |
| 5:4 | 2240 × 1792 | Specimen cards, museum labels |
| 8:5 | 2560 × 1600 | Widescreen banners |
There are 23 presets in total, including extreme aspect ratios like 22:9, 23:9, 8:3, 12:5, and the very long 3:1 and 1:3 for ultra-wide banners or pillar formats. The full list lives in the model's request schema. Sending a width and height outside the presets is rejected by the API. Pick the closest preset to your target output.
Quality tiers
Three rendering-speed tiers determine how much compute the model spends per generation:
-
TURBOis the fastest tier. The first iterations of an idea, low-stakes content, anything where the next pass is more important than this one. -
DEFAULTis the middle tier and the right choice for most production work. -
QUALITYis the slowest tier. Final delivery, typography-dense compositions, hero assets.
The same prompt at each tier produces visibly different output. The differences are most pronounced in fine text and material rendering, the kind of work where the extra compute earns its price.
The vintage watchmaker's storefront window below comes from the same structured prompt at each tier, a photoreal scene with hand-painted gold-leaf lettering on dark green glass.
TURBO is fast enough for rapid iteration. The proprietor line, the establishment date, and the brush-stroke detail on the gold leaf tend to drift at this tier. DEFAULT recovers most of the small-text crispness and the reflected street behind the glass. QUALITY holds the condensed capitals sharply through to the corner lettering and renders the gold-leaf grain and the glass reflections with finer detail.
Tips
- Reach for Ideogram when the words have to be right. Other text-to-image models have caught up on subject rendering and style. Copy fidelity is still the differentiator. Brand work, packaging, posters, anywhere the words are the brief.
- Quote the literal copy inside
textfields. Thetextfield is exactly what gets rendered. Apostrophes, accents, special characters, all reproduced as written. If you want a curly apostrophe in "Dr. Faukland's", write a curly apostrophe. - Describe the position, weight, and treatment in
desc. "Bold black serif capitals across the top", "small italic underneath", "lower-right corner". Thedescis where you spend the typographic discipline a sentence prompt can't carry. - Use
bboxwhen descriptive positioning isn't tight enough.[y_min, x_min, y_max, x_max]in 0-1000 normalised coordinates, row-first ordering, written as[top, left, bottom, right]. Pin the elements that must land exactly, leave the rest descriptive. - Use
color_palettefor directed colour. Describing colours in prose is interpretation. Thecolor_palettearray is a conditioning signal. 16 colours image-level, 5 per element, uppercase#RRGGBB. - Use PNG output for any asset that needs transparency. JPG flattens the alpha channel and gives you a white or black background where you wanted transparency. The
outputFormat: "PNG"setting is one line of payload. - Pick the aspect ratio that matches the final canvas, not the closest one to your composition. Resizing or cropping a 16:9 to fit a 4:5 Instagram portrait loses the framing the model designed to. Generate at the target ratio.
- Use QUALITY tier for hero assets and typography-dense work. The extra compute pays off where small text precision matters. For thumbnails and exploratory iterations, TURBO is enough.