Nano Banana 2

Gemini 3.1 Flash Image fast high quality AI image generation and editing

Nano Banana 2

Nano Banana 2 (officially known as Gemini 3.1 Flash Image) is Google’s upgraded AI image generation and editing model that brings advanced visual creation capabilities to a broad audience. It generates detailed, expressive images from text and image prompts with sharp details, richer lighting, and improved adherence to complex instructions. Nano Banana 2 also supports multi-object and multi-character consistency, accurate text rendering within images, and flexible resolution control up to 4K. It is now integrated across Google’s AI platforms including the Gemini app, Search AI Mode, and other Gemini-powered services.

Google
Commercial use
Text to ImageImage to ImageImage Editing
Pricing starts at $0.04657 for 512x512. For every input image used, it's an additional $0.00028. When using grounded search, $0.14 will be added on top.

Average savings vs typical market rates

512x512 Save ~22%$0.04657
1KSave ~13%$0.06895
2KSave ~14%$0.10255
4KSave ~4%$0.15295
+ input image$0.00028
+ grounded searchSave ~6%$0.014

README

Overview

Nano Banana 2 is a high-performance image generation and editing model built for structured, production-grade workflows. It produces high-fidelity images with stronger prompt adherence, improved spatial reasoning, and more reliable layout control compared to earlier Nano Banana releases.

Version 2 introduces clearer text rendering, better material realism, and improved handling of complex multi-subject scenes. It is designed for practical generation tasks such as product mockups, UI concepts, structured layouts, and photorealistic compositions.

How it Works

Prompt Interpretation

Nano Banana 2 parses prompts with improved semantic and spatial reasoning. It better understands how objects relate to each other in physical space, allowing for more reliable handling of reflections, transparency, perspective, and structured compositions.

Clear, specific prompts generally produce more consistent and controllable results.

Image Generation

The model generates high-resolution images with detailed material rendering and lighting consistency. It supports both realistic and stylized outputs, maintaining composition stability even in dense scenes.

Text inside images is rendered more sharply and with stronger layout discipline than earlier versions.

Scene Logic and Real-World Grounding

Nano Banana 2 demonstrates improved real-world reasoning. Transparent objects refract correctly, reflections behave more naturally, and multi-subject interactions maintain coherent perspective. This makes it more reliable for scenes involving physical logic or structured layouts.

Key Features

  • Stronger Prompt Adherence
    Handles complex and structured requests with improved scene logic.
  • SOTA Text Rendering
    Produces sharper, more usable in-image typography for posters, UI, packaging, and infographics.
  • Material and Lighting Realism
    Improved handling of reflections, transparency, layered compositions, and surface detail.
  • Multi-Subject Stability
    Maintains perspective and consistency in dense or interactive scenes.
  • High-Resolution Output
    Supports outputs up to 4K resolution.
  • Improved Speed and Efficiency
    Faster generation with improved quality-to-cost performance.

How to Use

  1. Provide a detailed text prompt describing the scene or layout.
  2. Specify any structured elements such as text content, composition, or materials.
  3. Run the generation.
  4. Refine the prompt if adjustments to layout, lighting, or structure are needed.

For structured layouts or typography-heavy scenes, explicitly define text hierarchy and placement for best results.

Example prompt:
“A modern event poster titled ‘Future Systems Summit 2026’ with clear typography hierarchy, date and location information, minimalist layout, sharp readable text, and structured composition.”

Documentation

You can find full usage details, parameters, and examples here:
https://runware.ai/docs/providers/google#nano-banana-2-gemini-31-flash-image

More models from this creator

Nano Banana Pro (also known as Nano Banana 2) is a Gemini 3 Pro Image Preview model for controlled visual creation. It improves reasoning over lighting and camera angle. It supports high resolution output and multi image blending for production ready design workflows and creative tools.

Google Veo 3.1 is a cinematic video generation model for developers. It turns text prompts or reference images into high fidelity scenes with richer native audio, better prompt adherence, and granular shot control. Use it for story driven clips with smoother motion and consistent style.

Google Veo 3.1 Fast is a high speed variant of Veo 3.1 for rapid creative iteration. It supports text prompts, image prompts, and reference images. It targets low latency workflows while keeping cinematic quality for short form and multi shot video generation with native audio.

Gemini Flash Image 2.5, commonly known as Nano Banana, generates and edits images from rich prompts and multi image inputs. It maintains character identity across frames. It supports targeted edits and completions that use strong world knowledge. Ideal for visual apps that need speed and control.

Google Veo 3 Fast is an optimized video generation model for rapid iteration and lower cost. It creates short clips from text or images with native audio that includes dialogue, sound effects and music. It keeps realistic motion, strong physics and reliable prompt control.

Imagen 4 Ultra is Google's highest quality text to image model. It focuses on photorealism, sharp details, and accurate text rendering. It targets production workloads that need strict prompt adherence, optional higher resolution output, and fast generation through the Gemini API.

Imagen 4 Fast is a latency optimized text to image model in the Imagen 4 family. It targets interactive apps and high volume pipelines. It keeps strong Imagen 4 visual quality while cutting generation time, so teams can iterate faster and reduce serving costs in production.

Google Veo 3 is a state of the art generative video model with native audio. It supports text prompts and image prompts, produces short HD clips with dialogue, sound effects and music, and delivers realistic motion with strong prompt adherence for cinematic video generation.

Imagen 4 Preview is Google's next generation text to image model for developers. It supports 2K resolution with improved detail rendering and robust typography control. Use it to generate photorealistic or stylized assets for product shots, slides, marketing visuals, and prototypes.

Imagen 3 is Google’s high quality text to image model. It produces detailed, photorealistic images with improved lighting and fewer artifacts. It offers strong prompt adherence, better text rendering, and supports editing workflows through the Gemini API and Vertex AI.

Google Veo 2 is a text to video model that produces high resolution clips with strong control over camera movement, composition, and scene dynamics. It supports cinematic framing, object aware motion, extended durations, and up to 4K outputs for production grade workflows.

Imagen 3 Fast is a streamlined text to image model that targets low latency use cases. It delivers bright images with strong contrast and improved prompt adherence. Ideal for apps that need fast image generation inside Vertex AI and Firebase with stable, predictable performance.