
How to Prompt Nano Banana for Photorealistic Faces and Hands
Documented Nano Banana prompt techniques for photorealistic faces and hands — anatomy descriptors, multi-image references, lighting, and when to use Pro.
Faces and hands are where AI failure is hardest to hide. A wrong nostril shape reads as "off." A sixth finger reads as broken. Google DeepMind's Nano Banana family covers three variants: the original Gemini 2.5 Flash Image, the Nano Banana 2 refresh, and Nano Banana Pro (Gemini 3 Pro Image). All three have narrowed the gap, but none is immune. Each rewards prompts that respect the historical failure modes.
This guide stays inside what Google has officially documented and what reputable third-party sources have written about generative AI anatomy.
Why faces and hands are hard
The constraints are not a bug in any one model. They are a property of how diffusion-based image models learn from 2D data.
Hands are 2D projections of a 3D object the model never sees in 3D. Prof. Peter Bentley of UCL told BBC Science Focus: "These are 2D image generators that have absolutely no concept of the three-dimensional geometry of something like a hand." Asked for interlaced fingers, the model returned "two wrists and a ball of fingers" (source).
Hands are under-represented in training data. Britannica cites Stability AI: "within AI datasets, human images display hands less visibly than they do faces" (source). One 2026 explainer reports that under 1% of LAION-5B captions contain explicit hand-related descriptors (GensGPT).
The pixel budget for a hand is small. At 1024×1024, a hand often occupies a 100×150-pixel region, not many pixels for five fingers, fingernails, and joint articulation (zsky.ai). Faces get more pixels, which is part of why faces crossed the photorealism threshold first.
Diffusion models do not count. There is "no built-in counting mechanism — the model approximates the distribution of finger-like shapes it learned during training, and that distribution sometimes peaks at four, six, or seven rather than five" (zsky.ai).
Faces have their own problems. The uncanny valley punishes microscopic asymmetries because the human visual cortex is uniquely tuned to faces. Google acknowledges this: the official Gemini app prompting guide states that "not every image Gemini generates will be perfect — it can still struggle with small faces, accurate spelling, and fine details in images." Every prompt pattern below is designed to avoid hitting that limitation.
What Nano Banana variants document about portraits
| Model | Release | What Google documents about people |
|---|---|---|
| Nano Banana (Gemini 2.5 Flash Image) | 2025 | Prompt-based identity preservation, multi-image references |
| Nano Banana 2 (Gemini 3.1 Flash Image) | 2026 | Up to 5 characters consistent; up to 14 objects; 2K/4K upscaling |
| Nano Banana Pro (Gemini 3 Pro Image) | Nov 20, 2025 | Up to 5 characters; native 1K/2K/4K; lowest text-rendering error rates |
Sources: Google DeepMind's Nano Banana Pro page and Nano Banana 2 page.
A few of those claims directly inform face-and-hand prompting:
- "Realistic images of landscapes, plants, people and animals with true-to-life details" is Google's photorealism claim. Note what is not claimed: no benchmark for "anatomically perfect hands."
- "Up to five characters" is the explicit ceiling. Above that, Google does not claim consistency.
- "Richer anatomical and surface-level fidelity such as wrinkles, eye color nuance" is third-party language quoted on Google's Nano Banana 2 page, which puts it in claimed-vs-confirmed territory.
- "Small faces" remain a documented weak spot across the family.
The honest reading: Nano Banana 2 and Pro are better at faces than the original Flash, and Pro's native 4K gives hands more pixels. None of the three claims to have solved the finger-counting problem.
Prompt patterns that respect the documented limits
The rest of this guide is prompt patterns. None are exotic. They are the techniques that recur across reputable Nano Banana writeups and that line up with Google's own structure recommendations from the Gemini blog.
Pattern 1: Use Google's recommended prompt structure
Google's official guidance breaks a prompt into Subject, Composition, Action & Location, and Style. Their example, a "stoic robot barista with glowing blue optics", sets up a specific character that follow-up prompts can place in new scenes. For a human portrait:
Subject: a 35-year-old woman with shoulder-length auburn hair, green eyes, light freckles across the nose bridge. Composition: medium close-up, head and shoulders. Action & Location: standing in a north-facing studio with soft window light from camera left. Style: photorealistic, 85mm lens, f/2.8, natural skin texture with visible pores.
"Specific" beats "evocative." The model's failure mode is averaging its training distribution to fill unspecified details.
Pattern 2: Add anatomy descriptors explicitly
Generic prompts give the model latitude exactly where you do not want latitude. Multiple Nano Banana writeups recommend explicit anatomy modifiers (Leonardo.Ai, Fotor, CyberLink):
- Faces: name eye color, hair texture, skin tone, jawline, age range, distinguishing marks. Add catchlight direction.
- Skin: ask for "natural skin texture," "visible pores," "no airbrushing" to counter the bias toward retouched training images.
- Hands: state finger count and pose. "Both hands visible, five fingers each, relaxed at sides" makes the model commit to a configuration.
Pattern 3: Lighting that flatters
Lighting is the single biggest lever for face realism. Documented patterns:
- Soft directional light: "north-facing window light," "softbox at 45 degrees camera-left," "overcast daylight." Soft light hides the micro-asymmetries that trigger uncanny-valley reactions.
- Golden hour: warm, low-angle light flatters skin and creates natural rim light.
- Catchlights: ask explicitly for "natural catchlights in the eyes." CyberLink's guide calls this out as a portrait-specific addition.
- Avoid hard top-down light unless you want moody high-contrast, since it exaggerates the under-eye and nose-bridge artifacts AI tends to bungle.
Pattern 4: Specific tactics for hands
Hands need their own paragraph in the prompt:
- Specify the gesture. "Hands in pockets," "fingers interlaced in lap," "right hand gripping the mug handle." Multiple fix-AI-hands guides converge on this advice (zsky.ai, PromptsEra).
- Prefer simple poses. Open palm, hands at sides, hands resting on a surface. Avoid interlaced fingers and small detailed objects.
- Use objects as scaffolding. A railing, mug, or steering wheel constrains finger positions.
- Crop hands out of frame when they are not the subject. A professional headshot has no obligation to show hands.
- Add "natural hand anatomy, five fingers, correct proportions." Negative modifiers like "no extra fingers, no fused fingers" are a documented pattern across the AI-hand-fix literature.
- Generate at higher resolution when hands matter. Pro's native 4K gives the hand region roughly 4x the pixels of a 1K render, directly addressing the "100×150 pixel hand" problem.
Pattern 5: Multi-image references for face consistency
The most reliable way to keep a face consistent across renders is to attach one or more reference images and re-state the identity constraints in each follow-up prompt. The Gemini Tips page frames it as "establishing a clearly defined character with specific details in the first prompt" so that follow-ups can place "that same character in entirely new contexts."
A prompt-engineering writeup on identity preservation gives the explicit checklist: use an "identity header" that locks facial geometry, hair length and color, and unique markers; attach the same reference image in each edit; restate the constraints rather than relying on conversational memory (Sider.ai).
Identity preservation is not a hard lock. Media.io notes that "small changes in lighting, styling, background, camera angle, or prompt wording can cause Gemini to reinterpret facial details" (source). Treat consistency as a strong tendency that responds to constraint reinforcement, not a guarantee.
Pattern 6: Iterate, then inpaint
Some renders will still have a six-fingered hand or an off-axis eye. The standard repair is inpainting: mask the broken region and regenerate just that area. Across the AI-hands literature, inpainting is "the most practical, reliable method for fixing AI hands in images that are otherwise perfect" (zsky.ai). Nano Banana's edit workflow handles this inside the same chat: send the generated image plus "fix the right hand, five fingers, holding the mug handle" and the model patches it in place.
When to step up to Nano Banana Pro
For low-stakes social posts and concept exploration, the original Nano Banana and Nano Banana 2 are usually enough. The reasons to move to Pro for a face or hand render:
- Print or large-format output. Pro renders natively at 1K, 2K, or 4K (source). 4K means roughly 4x the pixels for any small region.
- Multi-character scenes near the documented ceiling. Pro and Nano Banana 2 both claim consistency for up to five characters. A four-person family portrait sits near that ceiling.
- In-image text on the same render as a portrait. Pro is the variant Google singles out for "the lowest error rates (mostly under 10%)" on single-line text rendering, useful for nameplates, magazine cover headlines, or poster captions layered over a portrait.
- High-stakes campaign work. The premium per-image price (~$0.24 at 4K) is small compared to a misfired shoot.
For drafts, mood-boards, and social, the cheaper variants do most of the work. See the Nano Banana Pro vs GPT Image comparison for the cost-per-pixel break-even logic.
Claimed vs confirmed
| Claim | Status |
|---|---|
| Native 4K output on Pro | Confirmed by Google |
| Up to 5 characters consistent | Confirmed by Google for Pro and Nano Banana 2 |
| "Lowest error rates (mostly under 10%)" on text | Google's own benchmark, not independently audited |
| "Richer anatomical fidelity… wrinkles, eye color nuance" | Third-party quote amplified on Google's Nano Banana 2 page |
| "Small faces and fine details" remain weak | Confirmed by Google |
| "AI cannot count fingers" as a category limitation | Documented by multiple third-party sources |
| "X% of hands generated correctly" | No model maker publishes this number |
No model maker publishes a "hands rendered correctly" percentage, and any blog quoting one without a methodology should be treated with caution. The documented limitation exists, the prompt patterns above help, and the failure rate has come down generation over generation.
Frequently asked questions
Does Nano Banana Pro render hands better than the original Nano Banana?
Google does not publish a hands-specific benchmark. What is documented: Pro renders at higher native resolution (up to 4096×4096) and Google attributes "richer anatomical and surface-level fidelity" to the Gemini 3 image stack via a third-party quote on the Nano Banana 2 page. More pixels plus stronger anatomy detail is a reasonable expectation; perfect hands are not a guarantee.
What is the single most important prompt change for face realism?
Specificity. Replace "a portrait of a woman" with a sentence that names age range, hair, eyes, skin texture, lighting direction, and lens. Google's Subject/Composition/Action/Style structure (source) is the suggested scaffold.
How do I keep the same face across multiple renders?
Attach a reference image, re-state identity constraints in every follow-up prompt, and keep lighting and camera framing similar. Be aware of the documented caveat that Gemini does not "lock" identity the way specialist tools do (Media.io).
What if a hand still has six fingers?
Inpaint the hand region with a corrective prompt: "right hand, five fingers, fingers slightly curled, gripping the mug handle." Inpainting is the consistently recommended fix for an otherwise good image (zsky.ai).
Are there poses to avoid?
Yes. Tightly interlaced fingers, both hands holding small detailed objects, and crowd scenes with many small hands all stack the geometry problem. Open palms, hands in pockets, hands resting on surfaces, and hand-on-object scaffolding are the documented easy cases.
Where can I try Nano Banana?
The Gemini app, Google AI Studio, and the Gemini API. You can also use the Nano Banana, Nano Banana 2, and Nano Banana Pro studios on gptimg.co. Credit packs are shared across models, so switching between Flash and Pro is one dropdown change.
Sources
- Tips for getting the best image generation and editing in the Gemini app, official Google blog
- Gemini 3 Pro Image (Nano Banana Pro), Google DeepMind product page
- Gemini 3.1 Flash Image (Nano Banana 2), Google DeepMind product page
- Why AI-generated hands are the stuff of nightmares, BBC Science Focus, interview with Prof. Peter Bentley, UCL
- Why does AI art screw up hands and fingers?, Britannica
- AI Hands Fix: 9 Techniques to Get Perfect Fingers Every Time (2026), zsky.ai
- How to Fix AI Hands: Complete Guide (2026), zsky.ai
- AI Hands, Anatomy & Body Fixes (2026 Expert Guide), GensGPT
- How to Fix Bad Hands in Stable Diffusion (2026), PromptsEra
- Nano Banana Prompt Guide, Leonardo.Ai
- 40+ Nano Banana Pro Prompts for Gemini, Fotor
- 56 Best Nano Banana Prompts, CyberLink
- How to Make Gemini Not Change Your Face, Media.io
- How to Write Gemini Prompts That Keep Subject Identity Consistent Across Edits, Sider.ai
Last reviewed against source pages: 2026-04-18. Model capabilities and prompt behavior change with each release; confirm in the linked sources before acting on the techniques above.
Autor

Categorías
Más artículos sobre generación de imágenes con inteligencia artificial

How to Get Sharp, High-Resolution Images from Nano Banana
Documented prompt patterns, resolution flags, and reference-image techniques for sharper, high-resolution Nano Banana, Nano Banana 2, and Pro outputs.


Nano Banana Pro vs Nano Banana 2: Specs & Pricing Comparison (2026)
Nano Banana Pro vs Nano Banana 2: side-by-side specs, official per-image prices at 1K/2K/4K, when to pick the Pro for finals or 2 for fast iteration.


How to Change Camera Angle in Nano Banana (Cinematography Prompts)
Documented camera angle vocabulary for Nano Banana and Nano Banana Pro (wide shot, low-angle, Dutch tilt, focal length, aperture) with sourced prompt patterns.
