MODEL IDgoogle:4@3

live

Nano Banana 2

by GoogleFebruary 26, 2026

Nano Banana 2 (officially known as Gemini 3.1 Flash Image) is Google’s upgraded AI image generation and editing model that brings advanced visual creation capabilities to a broad audience. It generates detailed, expressive images from text and image prompts with sharp details, richer lighting, and improved adherence to complex instructions. Nano Banana 2 also supports multi-object and multi-character consistency, accurate text rendering within images, and flexible resolution control up to 4K. It is now integrated across Google’s AI platforms including the Gemini app, Search AI Mode, and other Gemini-powered services.

Prompting Nano Banana 2

How to write prompts for Nano Banana 2: detailed scene descriptions, structured layering, legible text rendering, and thinking-level control.

Introduction

Nano Banana 2 reads a prompt as a set of instructions to satisfy rather than a mood to approximate. It tracks which objects appear and where they sit, and it renders legible text inside the image, which most diffusion models can't do reliably. Both behaviors reward prompts that are specific about content rather than vague about vibe.

A detailed cyberpunk ramen bar at night with neon signs, a chef in steam, patrons at the counter, and a holographic menu

Every element named in that prompt is in the frame: the neon signs, the chef in steam, the two patrons, the holographic menu, the rain. This guide covers how to write prompts that get that result, how to structure them, how to put readable text in an image, and how the thinking setting changes the model's effort on hard prompts.

Describing the whole scene

Nano Banana 2 holds a long list of details at once, so a prompt that names every object and its position usually lands more accurately than a short, suggestive one. Describe the scene the way you'd brief a set designer: what's on the bench, what's to the left, what frames the edges.

A cluttered greenhouse potting bench with terracotta pots, a brass watering can, an open journal, string lights, and ferns

The watering can lands on the left, the journal in the center, the lights overhead. Naming positions, not just objects, is what gives you that control. When a detail's placement matters, say where it goes.

Structuring a prompt

For complex scenes, organizing the prompt in layers keeps the model from dropping details. Move from the subject outward to the environment, then the technical and atmospheric cues.

A white-and-red striped lighthouse on a rocky cliff edge, stormy ocean below with waves crashing against the rocks, viewed from a low angle looking up, weathered stone base and a glowing lamp room at the top, dramatic overcast light breaking through the clouds, shot wide at 24mm, lonely and dramatic atmosphere

SubjectEnvironmentFramingDetailsLightingCameraMood

A red-and-white striped lighthouse on a cliff above a stormy sea, shot from a low angle under breaking clouds

You don't need every layer on every prompt. A portrait might only need subject, lighting, and camera. The layers are a checklist for what you could specify, not a template you have to fill.

Camera and lens language

Nano Banana 2 responds to the vocabulary of photography. Naming a lens or a camera angle controls framing and depth in a way plain adjectives like "close-up" can't. Here is one subject shot two ways, changing only the lens:

A barista at a cafe counter in tight head-and-shoulders framing with a blurred background — 85mm, tight framing

The same barista and cafe counter shown wide, the whole interior visible with deep focus — 24mm wide-angle

Same barista, same counter. The 85mm lens compresses the scene and throws the background out of focus, isolating the subject. The 24mm lens pulls the whole room in with deep focus and stretched perspective. The words did the work, not a different setup.

Terms worth keeping in your vocabulary:

Focal length: 24mm (wide), 50mm (natural), 85mm (portrait), 200mm (telephoto).
Shot distance: wide shot, medium shot, close-up, macro.
Angle: low-angle, eye-level, overhead, Dutch angle.
Depth: shallow depth of field, deep focus, bokeh.

Negative prompts

Nano Banana 2 has no negativePrompt parameter, but it follows plain-language instructions, so you can write the negative directly into the prompt as a "Negative prompt:" clause and the model obeys it. Here is the same scene with and without that clause:

The Trevi Fountain in Rome busy with tourists on a sunny afternoon — Plain prompt

The Trevi Fountain in Rome with no people on a sunny afternoon — With a negative prompt clause

The only difference between the two prompts is the trailing Negative prompt: people, crowds, tourists. The model reads it as an instruction and clears the crowd, even though there's no negativePrompt field to put it in. List what you don't want after the clause, the same way you would in a model that has a dedicated field.

Rendering text in images

Most image models treat text as texture and produce garbled lettering. Nano Banana 2 renders readable text, which makes it usable for signage, posters, and packaging. The rule is simple: put the exact words in quotation marks so the model treats them as literal content instead of scene description.

A bakery storefront with a wooden hanging sign reading GOLDEN CRUST and a window decal reading FRESH DAILY — Storefront signage

A vintage travel poster reading WANDER NORTH over a stylized mountain range and aurora in indigo and gold — Poster title

A sidewalk chalkboard reading TODAY'S SPECIAL and Maple Latte $5 with chalk coffee-cup doodles — Handwritten chalk

The quoted strings come through across three different treatments: engraved gold serif, condensed poster type, and handwritten chalk. Keep the text short. A few words on a sign render reliably. Long paragraphs are where accuracy starts to slip, so for dense copy, generate the text elements separately and compose them.

Controlling reasoning with thinking levels

The settings.thinking parameter sets how much the model reasons before it renders, with two levels. MINIMAL is the fast default. HIGH spends more compute reconciling a prompt's requirements before it draws, which pays off when a prompt piles on many simultaneous constraints that a single fast pass tends to drop or misplace.

The prompt below is deliberately dense: a three-floor cutaway with specific rooms, colors, objects, and counts, each in a set place. The same prompt, run at each level:

An isometric cutaway of a three-floor dollhouse with furnished rooms — thinking: MINIMAL

Compare how many of the listed elements land in each. HIGH puts the extra time into reconciling every constraint before it draws, so it's the level to reach for when a prompt is this dense, at the cost of slower generation.

import { createClient } from '@runware/sdk'

const client = await createClient({ apiKey: process.env.RUNWARE_API_KEY })
await client.connect()

const [result] = await client.run({
  model: 'google:4@3',
  positivePrompt: 'A detailed isometric cutaway of a three-story dollhouse with specific furniture, colors, and counts in each room',
  width: 1024,
  height: 1024,
  settings: {
    thinking: 'HIGH'
  }
})

import asyncio
import os

from runware import Runware


async def main():
    async with Runware(api_key=os.environ["RUNWARE_API_KEY"]) as client:
        results = await client.run({
            "model": "google:4@3",
            "positivePrompt": "A detailed isometric cutaway of a three-story dollhouse with specific furniture, colors, and counts in each room",
            "width": 1024,
            "height": 1024,
            "settings": {
                "thinking": "HIGH"
            }
        })


asyncio.run(main())

curl https://api.runware.ai/v1 \
  -H "Authorization: Bearer $RUNWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "taskType": "imageInference",
      "taskUUID": "d4e5f6a7-b8c9-0123-def0-456789012345",
      "model": "google:4@3",
      "positivePrompt": "A detailed isometric cutaway of a three-story dollhouse with specific furniture, colors, and counts in each room",
      "width": 1024,
      "height": 1024,
      "settings": {
        "thinking": "HIGH"
      }
    }
  ]'

runware run google:4@3 \
  positivePrompt="A detailed isometric cutaway of a three-story dollhouse with specific furniture, colors, and counts in each room" \
  width=1024 \
  height=1024 \
  settings.thinking=HIGH

{
  "taskType": "imageInference",
  "taskUUID": "d4e5f6a7-b8c9-0123-def0-456789012345",
  "model": "google:4@3",
  "positivePrompt": "A detailed isometric cutaway of a three-story dollhouse with specific furniture, colors, and counts in each room",
  "width": 1024,
  "height": 1024,
  "settings": {
    "thinking": "HIGH"
  }
}

Tips

Name positions, not just objects. "A watering can on the left, a journal in the center" gives the model a layout to follow. A list of objects with no placement leaves the arrangement to chance.
Quote text you want rendered. Wrap exact words in quotation marks so the model treats them as literal content. Everything outside the quotes is read as scene description.
Keep rendered text short. A few words on a sign or poster render reliably. For long copy, generate the text element on its own and compose it into the layout afterward.
Layer complex prompts. Work from subject to environment to lighting and camera. A structured prompt drops fewer details than the same information in one run-on sentence.
Raise thinking only for genuinely complex prompts. Most prompts, including everyday counts and layouts, are fine at MINIMAL. Reach for HIGH when a prompt stacks many constraints at once, and accept the extra latency.