Reference images are how you tell Nano Banana who and what to render: the same character across a 12-shot storyboard, the same product across a campaign, the same brand mark across language variants. The model accepts them. The question is how many, of what kind, and how to phrase the prompt around them.

This guide sticks to what Google has officially documented, including the per-variant reference limits, the character-vs-object split, the prompting patterns Google recommends, and the failure modes the same docs acknowledge. Inferred numbers are flagged separately.

TL;DR

Nano Banana Pro (Gemini 3 Pro Image) accepts up to 5 character references plus 6 object references per workflow (source). Google DeepMind also frames this as "up to 5 people" and "fidelity of up to 14 objects" overall (source).
Nano Banana 2 / Gemini 3.1 Flash Image accepts up to 4 character references plus 10 object references, totalling the same 14-image envelope (source).
Reference images only work as well as the prompt directing them. Google's own developer blog tells you to use phrasing like "Take the [element from image 1] and place it with the [element from image 2]" (source).
Google explicitly acknowledges that "maintaining absolute consistency of character features across multiple images sometimes needs refinement through follow-up prompts", so identity drift across long sessions is a documented limitation, not a bug (source).

Why reference images matter

Text prompts describe a type of person. Reference images pin down a specific one. The difference matters in three scenarios:

Sequential narrative. A storyboard, comic page, or ad series needs the same protagonist across multiple frames. A pure text prompt produces a different face every render.
Brand assets. A logo, mascot, or product hero shot needs to match an existing reference exactly. Drift breaks the brand.
Multi-subject composition. A team photo or cast lineup needs every individual recognizable, not the average of "five smiling people."

Google DeepMind frames the same on the Gemini 3 Pro Image product page: "[placing] your cast in fresh scenes with new outfits, or [blending] multiple reference images to build complex compositions that retain chosen details."

How many references each variant supports

This is the table Google publishes in the Gemini API image generation docs:

Model	Character references	Object references	Total envelope
Gemini 3 Pro Image (Nano Banana Pro)	Up to 5	Up to 6	Up to 14 reference images overall
Gemini 3.1 Flash Image (Nano Banana 2)	Up to 4	Up to 10	Up to 14 reference images overall

Two nuances:

Per-category counts don't arithmetically add to 14. The 14 is the upper envelope the API accepts, not a strict sum, and Google phrases per-category caps as "up to" thresholds within that envelope.
The original Gemini 2.5 Flash Image (Nano Banana) also supports multi-image input, but Google publishes per-category counts only for the Gemini 3 family. The model page describes it as "best for high-volume generation, conversational image editing, and low-latency creative workflows."

For Nano Banana Pro, Google DeepMind's product page is the strongest single quote on character consistency: "Maintain the consistency and resemblance of up to five characters" with "the fidelity of up to fourteen objects in a single workflow." (See the Subject consistency section on the DeepMind page.)

Tagging conventions in prompts

A reference image without a prompt is just a hint. The model has to be told what to do with it. Google's developer blog gives two specific phrasings worth memorizing:

Multi-image composition. When combining references, be explicit:

"Take the [element from image 1] and place it with/on the [element from image 2]."

Targeted edits to one reference. For inpainting against an existing image:

"change only the [specific element] to [new element/description]. Keep everything else in the image exactly the same, preserving the original style, lighting, and composition."

Both quotes are from Google's official prompting guide for Gemini 2.5 Flash Image and apply unchanged to the Gemini 3 family.

The pattern is explicit reference indexing. Don't write "make a scene with these people in a coffee shop." Write "place the woman from image 1 and the man from image 2 sitting at a window table in a coffee shop, the espresso machine from image 3 in the background." The more your prompt mirrors the structure of your reference set, the less the model has to guess. For brand work, mirror the asset names from your brief inside the prompt so the model has consistent grounding across follow-up edits.

Reference image quality: what Google's docs imply

Google's Vertex AI best practices doc and the developer blog stop short of publishing a strict quality checklist. The strongest signals across the official material:

Match the aspect ratio you want out. From Google's prompting blog: when editing, "the model generally preserves the input image's aspect ratio. If you upload multiple images with different aspect ratios, the model will adopt the aspect ratio of the last image provided." The same blog adds that "if you need a specific ratio for a new image and prompting doesn't produce it, the best practice is to provide a reference image with the correct dimensions."
Use the API upload path that fits the file size. Google's docs distinguish inline image data (total request payloads under 20 MB) from the File API (larger files or images reused across multiple requests). Campaign-scale work that calls the same reference repeatedly should use the File API.
Acknowledged drift limitation. The prompting blog states plainly: "if you notice a character's features begin to drift after many iterative edits, you can restart a new conversation with a detailed description to retain consistency." Long edit chains will degrade identity; the fix is a fresh conversation, not more in-session edits.

What Google has not officially published, but is widely repeated, is that reference images should be high-resolution, well-lit, and free of heavy occlusion. That is plausible and consistent with how diffusion models behave, but the precise thresholds (e.g., "1024×1024 or larger") are not in Google's docs. Treat those as community heuristics.

Common pitfalls: what is documented vs. what is folklore

Two pitfalls are documented by Google itself, two are commonly observed by practitioners but not formally specified.

Documented by Google:

Character drift across long edit sessions. Already quoted above. The prompting blog tells you to restart the conversation when this happens.
Aspect ratio confusion with mixed-size references. Google's prompting blog notes that the last image's aspect ratio wins. Mixing portrait and landscape references without thinking about order will produce an output you didn't expect.

Community-observed (reported in the developer forum, not in Google's official docs):

Identity preservation breakdown when upscaling multi-person images. Discussed in the Gemini API developer forum; Google has not published a formal acknowledgment, so treat it as a known field issue, not an officially confirmed limitation.
Too many references can confuse the model. A widely repeated practitioner observation: supplying close to the maximum 14 references, especially when several characters look alike, increases the chance the model blends features instead of keeping them distinct. Google's docs don't quantify this, so treat it as a workflow heuristic: start with the minimum number of references that make your intent clear.

A practical workflow

A workflow that respects what Google has actually documented:

Write the text prompt first. References lock in identity; they don't substitute for direction.
Pick the model. Native 4K hero render with up to 5 distinct people → Nano Banana Pro (/nano-banana-pro). Fast iteration with up to 4 characters and up to 10 product references → Nano Banana 2 (/nano-banana-2). High-volume low-stakes work → original Nano Banana (/nano-banana).
Upload references in priority order. First image is the strongest anchor; the last image dictates the output aspect ratio.
Tag them in the prompt. Use Google's phrasing: "Take the [element from image 1] and place it with the [element from image 2]."
Generate, then iterate with targeted edits. Follow-up phrasing: "change only the [specific element] … keep everything else exactly the same."
If identity drifts, restart. Open a new conversation with a detailed text description, then re-attach the same references. Google's own documented workaround.

Frequently asked questions

How many reference images can I upload to Nano Banana Pro?

Up to 14 total per workflow. The Gemini API docs state Gemini 3 Pro Image accepts up to 5 character images and up to 6 object images. Google DeepMind's product page summarizes this as "up to five characters" and "fidelity of up to fourteen objects in a single workflow."

How many reference images does Nano Banana 2 accept?

Same 14-image envelope but a different split: up to 4 character references and up to 10 object references, per the Gemini API documentation. The Gemini app surface allows up to 5 characters; the Developer API specification is 4.

Does the original Nano Banana support reference images?

Yes. Gemini 2.5 Flash Image accepts multimodal input including reference images per the model docs. Google has not published the same per-category character/object split for it that it publishes for the Gemini 3 family.

How do I write a prompt that references uploaded images?

Use explicit indexing: "Take the [element from image 1] and place it with the [element from image 2]." That phrasing is from Google's prompting guide. For edits to a single reference, use "change only the [specific element] to [new element]. Keep everything else in the image exactly the same."

My character's face changed after several edits, is that a bug?

No, it is documented behavior. Google's prompting blog states: "If you notice a character's features begin to drift after many iterative edits, you can restart a new conversation with a detailed description to retain consistency." Restart the conversation rather than continuing to edit.

What aspect ratio will the output be?

The output adopts the aspect ratio of the last image provided when multiple references are supplied. If your deliverable is 16:9, ensure the last upload is 16:9. If a specific ratio is needed for a brand-new image and prompting doesn't produce it, supplying a same-ratio reference is Google's documented workaround.

Should reference images be high resolution?

Google's official docs do not publish a minimum resolution. Common practitioner advice is to use clear, well-lit, unoccluded references, which is a community heuristic, not a Google guarantee. What is documented is the upload-path split: under 20 MB total payload uses inline image data; larger or reused files should use the File API.

Try Nano Banana with reference images

The fastest way to test reference behavior on your own assets is to run a workflow end-to-end. The studio at gptimg.co/nano-banana-pro accepts references, prompts, and generation without a Google Cloud project. Cheaper Flash tier: /nano-banana-2. Baseline model: /nano-banana. For OpenAI head-to-head, see Nano Banana Pro vs GPT Image.

Sources

Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind (official launch announcement)
Gemini 3 Pro Image (Nano Banana Pro) (Google DeepMind product page, character/object consistency figures)
Nano Banana 2: Combining Pro capabilities with lightning-fast speed (Nano Banana 2 announcement)
Nano Banana image generation (official Gemini API documentation including per-model reference image limits)
Gemini 2.5 Flash Image (Nano Banana) (original Nano Banana model documentation)
How to prompt Gemini 2.5 Flash Image Generation for the best results (Google Developers Blog prompting guide on drift, aspect ratio, multi-image phrasing)
Tips for getting the best image generation and editing in the Gemini app (Google blog prompting tips)
Gemini image generation best practices (Vertex AI documentation)
Identity Preservation Breakdown in Multi-Person Image Generation (Gemini API developer forum discussion, community-observed limitation)

Last reviewed against source pages: 2026-04-18. Limits, splits, and recommendations evolve; verify in the linked official sources before locking a production workflow.