2026/04/18

How to Write JSON Prompts for Nano Banana Pro

JSON-style prompts for Nano Banana Pro: what's documented by Google, what's community convention, common keys, and when JSON helps versus hurts.

JSON prompts for Nano Banana Pro (Gemini 3 Pro Image, model id gemini-3-pro-image-preview) have become a popular convention since the model launched in November 2025. JSON prompt generators, GitHub schemas, LinkedIn threads, and YouTube tutorials all promise "100% accuracy" or "92% precision" if you swap your descriptive paragraph for a structured JSON object.

The honest version is more interesting. Google itself does not document a JSON prompt format. The official guides recommend descriptive natural-language paragraphs. The community built JSON conventions on top of that, and the evidence on whether JSON actually changes outputs is mixed. This guide splits what Google has confirmed from what the community has proposed, lists the JSON keys you will see most often, and shows when reaching for JSON pays off.

TL;DR

JSON prompting is a community convention, not an officially supported API format. Google's Nano Banana prompt guide recommends narrative paragraphs around five components: style, subject, setting, action, composition.
The model parses natural language. Whatever JSON you send is converted to tokens like any other text. There is no JSON parser on the image side that maps "lighting": "..." to a specific control.
JSON's real benefit is on the human side: reproducibility, batch generation, programmatic templating, and version control of prompt fields.
Independent tests are mixed. Chase Jarvis's controlled comparison found JSON and natural-language outputs "essentially the same"; community guides report subjective accuracy gains.
Don't confuse two different "JSON" features: Gemini's structured output / responseSchema constrains the model's response to a JSON shape. That is a real, documented API feature, but it is for text responses, not for shaping an image prompt.

Why people reach for JSON prompts

Three reasons keep coming up across community write-ups:

Clarity for the human writer. A flat paragraph mixes adjectives across subject, lighting, lens, and mood. JSON forces each adjective into a labelled slot, so you notice when you forgot to specify the lens.
Reproducibility. Change a single field (say "camera.lens": "85mm" to "24mm") and re-run with everything else fixed. Same reason designers keep tokens in JSON.
Batch generation. Template the JSON, fill subject.product_name from a CSV, and serialize into the contents field of the Gemini API call. The model still receives a string, but the producer side has a clean fan-out.

None of these require the model to "understand JSON." They are organizational benefits for the prompt author.

What Google actually documents

Google DeepMind's Nano Banana prompt guide and the Google AI for Developers image generation docs both push descriptive prose, not structured fields. The recommended structure is:

Component	What it covers
Style	Photograph, illustration, watercolour, 3D render, etc.
Subject	Character or object — appearance, clothing, expression
Setting	Location, environment, era
Action	What the subject is doing in the frame
Composition	Shot type, angle, framing

The Google AI for Developers docs explicitly say: "Describe the scene, don't just list keywords. The model's core strength is its deep language understanding. A narrative, descriptive paragraph will almost always produce a better, more coherent image than a simple list of disconnected words." That guidance does not change for Nano Banana Pro. The Gemini 3 Pro Image Preview model page refers back to the same image-generation guide.

The Google Developers Blog post on prompting Gemini 2.5 Flash Image repeats the same advice with templates like "A photorealistic [shot type] of [subject], [action or expression], set in [environment]...", which is flowing prose with placeholders, not JSON.

There is no Google-published JSON schema for image prompts and no documented prompt.lighting field that the API parses. Anyone telling you otherwise is selling you a community pattern, not an API contract.

The "JSON" feature Google does document

To avoid a common mix-up: Gemini's API does support structured output via responseSchema. That feature constrains what the model returns to a JSON object you define. It is widely used to extract fields from images, generate captions in a fixed shape, or pipe model outputs into downstream code. It is not the same as putting JSON in your image-generation prompt. It controls the response side rather than the prompt side, and does not apply to the image bytes themselves.

Common JSON keys observed in the community

Even though no schema is official, a handful of top-level keys recur across community proposals. The clearest catalogue is alexewerlof's GitHub gist, positioned as a "structured prompting schema for high-fidelity image generation." Marketing-focused write-ups like the Atlabs guide and the aiformarketings guide converge on similar fields.

Across those community sources, the keys you will see most often are:

Key	Typical contents	Equivalent in Google's narrative model
`subject`	Character or object: type, description, clothing, expression, pose	Subject
`style`	Photograph, illustration, painting, render	Style
`setting` / `scene`	Location, environment, time of day	Setting
`lighting`	Source, direction, quality (golden hour, softbox, neon, etc.)	Folded into Style/Setting
`camera`	Lens focal length, aperture, ISO, film stock	Folded into Style
`composition`	Shot type, angle, framing, focus point	Composition
`palette`	Colour scheme or dominant colours	Folded into Style
`mood`	Emotional tone (serene, ominous, joyful)	Folded into Style
`text_rendering`	In-image text content and typography	Covered separately by Google's text-in-images guidance
`negative` / `prohibitions`	Things to avoid	Not officially supported. Nano Banana does not have a documented negative-prompt field

Two cautions:

The negative field is the leakiest. Google's docs do not document a negative-prompt mechanism for Nano Banana. The model reads the list as descriptive text like everything else, which sometimes biases the output toward the words you wanted to avoid.
camera parameters like aperture: f/2.0 are interpreted as descriptive text. The model isn't simulating an aperture. It has learned that "shot at f/2.0" correlates with shallow depth of field in its training data. The effect is real but statistical, not optical.

A worked example: JSON and its plain-English twin

Here is the same prompt in both formats. Both produce comparable results from Nano Banana Pro because, under the hood, both end up as a string of tokens passed to the same model.

JSON form (community convention):

{
  "style": "editorial photograph",
  "subject": {
    "type": "woman, mid-30s",
    "clothing": "charcoal wool coat, cream silk scarf",
    "expression": "calm, looking off-camera"
  },
  "setting": "rain-slicked Tokyo side street at dusk, neon signs in the background",
  "lighting": "ambient neon and a cool key light from camera-left",
  "camera": {
    "lens": "85mm",
    "aperture": "f/1.8"
  },
  "composition": "medium shot, subject on the left third, vertical 9:16 framing",
  "palette": "teal, magenta, deep blue with warm skin tones",
  "mood": "contemplative, cinematic"
}

Plain-English equivalent (the form Google's guide recommends):

An editorial photograph of a calm woman in her mid-30s, wearing a charcoal wool coat and a cream silk scarf, looking off-camera. She stands on a rain-slicked Tokyo side street at dusk with neon signs glowing in the background. Ambient neon mixes with a cool key light from camera-left. Shot at 85mm, f/1.8, medium shot framed vertically (9:16) with the subject placed on the left third. Teal, magenta, and deep blue dominate the palette against warm skin tones. The mood is contemplative and cinematic.

Both feed Nano Banana Pro the same information: wardrobe, location, lighting setup, lens, framing, palette, mood. The JSON version is easier to mutate programmatically; the prose version is easier to read aloud and matches Google's documented guidance. Both are valid. Neither has a magic accuracy multiplier.

When JSON helps and when it hurts

Helpful when:

You are running batches. Hundreds of product shots, characters, or social posts where you want one variable to change per row. JSON in your codebase, serialized into the prompt at send time, is the right tool.
You are versioning prompts in git. Diffs on a JSON object are readable; diffs on a paragraph rewrite are noise.
You are working with a team. Designers and engineers can edit named fields without touching each other's prose.
You build downstream tooling. A prompt builder UI, a CMS-driven generator, or a CLI tool maps cleanly to a JSON schema.

Hurts (or at best, neutral) when:

You are running a one-off creative prompt. The friction of writing JSON for a single image rarely earns its keep. Prose is faster and reads more naturally to the model.
You believe JSON gives stricter control. Chase Jarvis's controlled comparison extracted an image's description as JSON, translated it to natural language, and ran both versions through Nano Banana. He found "these images all look essentially the same, with the expected random variation." His take: JSON is "a placebo" that adds non-visual tokens (brackets, quotes, commas) and competes for the model's attention budget.
You over-specify. Twenty nested fields where five would do dilutes the prompt. Models have finite context budgets; spending them on "meta.guidance_scale": 7.5 (which Nano Banana does not expose anyway) is wasted ink.
You rely on a negative key. As above, there is no documented negative-prompt API for Nano Banana. Your "negative": ["blurry", "extra fingers"] becomes literal text in the prompt, and may bias toward what you wanted to avoid.

The mental model worth holding: JSON is a human-side organizational tool, not a strict API. The model parses natural language. JSON helps you write natural language more methodically.

Tooling: how to use JSON prompts in production

The pattern in most community implementations:

Author the JSON in code. Keep prompt templates as .json files or typed TypeScript objects.
Serialize before sending. JSON.stringify(promptObject, null, 2). The indentation is cheap tokens and helps if you ever paste into Google AI Studio for debugging.
Optional: prepend a natural-language summary. A user_intent or summary field written as a plain sentence ("An editorial portrait of a woman on a Tokyo street at dusk") gives the model the gist immediately, with the JSON fields as elaboration.
Send as the contents field. The Gemini API expects text in contents. There is no "JSON mode" for image prompts; your serialized JSON is just a string.
Log the JSON and the output. Because the JSON is structured, you can index it later and find every prompt with "camera.lens": "85mm".

Third-party tools like JSON Prompt Generator let non-developers author JSON visually, then copy the serialized output into Google AI Studio. They are front-ends for the same convention, not official Google products. For team workflows, build a small template layer: a JSON Schema for your team's prompt shape, validation that fails the build on missing fields, and a serializer that posts to the API.

Practical advice if you are starting today

Don't switch one-off prompts to JSON. For ad-hoc creative work in the Gemini app, prose matches what Google documents and reads more naturally.
Reach for JSON when templating. Anything you would put in a spreadsheet, put in JSON.
Treat community schemas as starting points, not specs. Pick four or five keys you actually need (subject, setting, lighting, camera, composition is a common minimum) and ignore the rest.
Test both formats on your own brief. Chase Jarvis's result is one test. Run your own A/B and default to whichever is easier to maintain.
Don't oversell JSON to your team. "92% accuracy" sets expectations the model will not meet. The honest pitch is "easier to template, version, and batch."

The original Nano Banana (Gemini 2.5 Flash Image) and the current Nano Banana Pro both behave the same way: they parse natural language. JSON is a wrapper for your benefit, not theirs.

Frequently asked questions

Does Google officially support JSON prompts for Nano Banana Pro?

No. The official Nano Banana prompt guide and the Google AI for Developers image-generation docs recommend descriptive natural-language paragraphs covering style, subject, setting, action, and composition. JSON prompts are a community convention layered on top.

Will a JSON prompt produce a better image than the same prompt in prose?

The evidence is mixed. Chase Jarvis's controlled test found JSON and natural-language outputs essentially indistinguishable. Marketing-focused community write-ups claim accuracy gains in the 60-92% range, but those numbers are not from controlled studies and the methodologies are usually not published. Your best bet: A/B test both formats on your own brief.

What about `responseSchema`, isn't that JSON for Gemini?

That is a different feature. Structured output via responseSchema constrains the model's response to a JSON shape, useful for extracting structured data from images, generating captions, or piping outputs into code. It does not apply to image generation, where the output is a PNG, not a JSON object.

Does Nano Banana support negative prompts via a `negative` JSON key?

No documented support. Google's image-generation docs do not list a negative-prompt mechanism. If you put "negative": ["blurry"] in your JSON, the model reads it as descriptive text, which sometimes biases the output toward the words you wanted to avoid. Phrase what you want positively instead ("sharp, in-focus subject").

Are there official JSON schemas I should follow?

No official schemas. Community proposals like alexewerlof's gist and the pauhu/gemini-image-prompting-handbook repository are the most-cited starting points, but none are endorsed by Google. Pick the keys that map to your team's workflow and keep the schema as small as possible.

Where can I try JSON prompts on Nano Banana Pro?

The fastest way is Google AI Studio. Paste your serialized JSON into the prompt box and run. For production work, the Gemini API accepts the same string in its contents field. The studio at gptimg.co wraps the model in a browser UI with a free trial quota, useful for comparing prose vs JSON side by side without writing API code.

Sources

Nano Banana prompt guide (Google DeepMind, official prompting guidance)
Nano Banana image generation (Google AI for Developers)
Gemini 3 Pro Image Preview model page (Google AI for Developers)
How to prompt Gemini 2.5 Flash Image (Google Developers Blog)
Structured output (responseSchema) (Google AI for Developers)
Does JSON Prompting Actually Work? Tested with Nano Banana (Chase Jarvis)
Nano Banana structured JSON prompt schema (alexewerlof, community proposal)
Nano Banana Pro JSON Prompting Guide (Atlabs AI, community guide)
Nano Banana JSON Prompt Format (aiformarketings, community guide)
gemini-image-prompting-handbook (pauhu, open-source community schema)

Last reviewed against source pages: 2026-04-18. Google's documentation may add or change recommendations; confirm in the linked sources before standardizing on any pattern.

Alle Beiträge

Autor

gptimg Team

Kategorien

Guides

TL;DR Why people reach for JSON prompts What Google actually documents The "JSON" feature Google does document Common JSON keys observed in the community A worked example: JSON and its plain-English twin When JSON helps and when it hurts Tooling: how to use JSON prompts in production Practical advice if you are starting today Frequently asked questions Does Google officially support JSON prompts for Nano Banana Pro?Will a JSON prompt produce a better image than the same prompt in prose?What about responseSchema, isn't that JSON for Gemini?Does Nano Banana support negative prompts via a negative JSON key?Are there official JSON schemas I should follow?Where can I try JSON prompts on Nano Banana Pro?Sources

How to Write JSON Prompts for Nano Banana Pro

JSON-style prompts for Nano Banana Pro: what's documented by Google, what's community convention, common keys, and when JSON helps versus hurts.

TL;DR

JSON prompting is a community convention, not an officially supported API format. Google's Nano Banana prompt guide recommends narrative paragraphs around five components: style, subject, setting, action, composition.
The model parses natural language. Whatever JSON you send is converted to tokens like any other text. There is no JSON parser on the image side that maps "lighting": "..." to a specific control.
JSON's real benefit is on the human side: reproducibility, batch generation, programmatic templating, and version control of prompt fields.
Independent tests are mixed. Chase Jarvis's controlled comparison found JSON and natural-language outputs "essentially the same"; community guides report subjective accuracy gains.
Don't confuse two different "JSON" features: Gemini's structured output / responseSchema constrains the model's response to a JSON shape. That is a real, documented API feature, but it is for text responses, not for shaping an image prompt.

Why people reach for JSON prompts

Three reasons keep coming up across community write-ups:

Clarity for the human writer. A flat paragraph mixes adjectives across subject, lighting, lens, and mood. JSON forces each adjective into a labelled slot, so you notice when you forgot to specify the lens.
Reproducibility. Change a single field (say "camera.lens": "85mm" to "24mm") and re-run with everything else fixed. Same reason designers keep tokens in JSON.
Batch generation. Template the JSON, fill subject.product_name from a CSV, and serialize into the contents field of the Gemini API call. The model still receives a string, but the producer side has a clean fan-out.

None of these require the model to "understand JSON." They are organizational benefits for the prompt author.

What Google actually documents

Google DeepMind's Nano Banana prompt guide and the Google AI for Developers image generation docs both push descriptive prose, not structured fields. The recommended structure is:

Component	What it covers
Style	Photograph, illustration, watercolour, 3D render, etc.
Subject	Character or object — appearance, clothing, expression
Setting	Location, environment, era
Action	What the subject is doing in the frame
Composition	Shot type, angle, framing

The "JSON" feature Google does document

Common JSON keys observed in the community

Across those community sources, the keys you will see most often are:

Key	Typical contents	Equivalent in Google's narrative model
`subject`	Character or object: type, description, clothing, expression, pose	Subject
`style`	Photograph, illustration, painting, render	Style
`setting` / `scene`	Location, environment, time of day	Setting
`lighting`	Source, direction, quality (golden hour, softbox, neon, etc.)	Folded into Style/Setting
`camera`	Lens focal length, aperture, ISO, film stock	Folded into Style
`composition`	Shot type, angle, framing, focus point	Composition
`palette`	Colour scheme or dominant colours	Folded into Style
`mood`	Emotional tone (serene, ominous, joyful)	Folded into Style
`text_rendering`	In-image text content and typography	Covered separately by Google's text-in-images guidance
`negative` / `prohibitions`	Things to avoid	Not officially supported. Nano Banana does not have a documented negative-prompt field

Two cautions:

The negative field is the leakiest. Google's docs do not document a negative-prompt mechanism for Nano Banana. The model reads the list as descriptive text like everything else, which sometimes biases the output toward the words you wanted to avoid.
camera parameters like aperture: f/2.0 are interpreted as descriptive text. The model isn't simulating an aperture. It has learned that "shot at f/2.0" correlates with shallow depth of field in its training data. The effect is real but statistical, not optical.

A worked example: JSON and its plain-English twin

Here is the same prompt in both formats. Both produce comparable results from Nano Banana Pro because, under the hood, both end up as a string of tokens passed to the same model.

JSON form (community convention):

{
  "style": "editorial photograph",
  "subject": {
    "type": "woman, mid-30s",
    "clothing": "charcoal wool coat, cream silk scarf",
    "expression": "calm, looking off-camera"
  },
  "setting": "rain-slicked Tokyo side street at dusk, neon signs in the background",
  "lighting": "ambient neon and a cool key light from camera-left",
  "camera": {
    "lens": "85mm",
    "aperture": "f/1.8"
  },
  "composition": "medium shot, subject on the left third, vertical 9:16 framing",
  "palette": "teal, magenta, deep blue with warm skin tones",
  "mood": "contemplative, cinematic"
}

Plain-English equivalent (the form Google's guide recommends):

When JSON helps and when it hurts

Helpful when:

You are running batches. Hundreds of product shots, characters, or social posts where you want one variable to change per row. JSON in your codebase, serialized into the prompt at send time, is the right tool.
You are versioning prompts in git. Diffs on a JSON object are readable; diffs on a paragraph rewrite are noise.
You are working with a team. Designers and engineers can edit named fields without touching each other's prose.
You build downstream tooling. A prompt builder UI, a CMS-driven generator, or a CLI tool maps cleanly to a JSON schema.

Hurts (or at best, neutral) when:

You are running a one-off creative prompt. The friction of writing JSON for a single image rarely earns its keep. Prose is faster and reads more naturally to the model.
You believe JSON gives stricter control. Chase Jarvis's controlled comparison extracted an image's description as JSON, translated it to natural language, and ran both versions through Nano Banana. He found "these images all look essentially the same, with the expected random variation." His take: JSON is "a placebo" that adds non-visual tokens (brackets, quotes, commas) and competes for the model's attention budget.
You over-specify. Twenty nested fields where five would do dilutes the prompt. Models have finite context budgets; spending them on "meta.guidance_scale": 7.5 (which Nano Banana does not expose anyway) is wasted ink.
You rely on a negative key. As above, there is no documented negative-prompt API for Nano Banana. Your "negative": ["blurry", "extra fingers"] becomes literal text in the prompt, and may bias toward what you wanted to avoid.

The mental model worth holding: JSON is a human-side organizational tool, not a strict API. The model parses natural language. JSON helps you write natural language more methodically.

Tooling: how to use JSON prompts in production

The pattern in most community implementations:

Author the JSON in code. Keep prompt templates as .json files or typed TypeScript objects.
Serialize before sending. JSON.stringify(promptObject, null, 2). The indentation is cheap tokens and helps if you ever paste into Google AI Studio for debugging.
Optional: prepend a natural-language summary. A user_intent or summary field written as a plain sentence ("An editorial portrait of a woman on a Tokyo street at dusk") gives the model the gist immediately, with the JSON fields as elaboration.
Send as the contents field. The Gemini API expects text in contents. There is no "JSON mode" for image prompts; your serialized JSON is just a string.
Log the JSON and the output. Because the JSON is structured, you can index it later and find every prompt with "camera.lens": "85mm".

Practical advice if you are starting today

Don't switch one-off prompts to JSON. For ad-hoc creative work in the Gemini app, prose matches what Google documents and reads more naturally.
Reach for JSON when templating. Anything you would put in a spreadsheet, put in JSON.
Treat community schemas as starting points, not specs. Pick four or five keys you actually need (subject, setting, lighting, camera, composition is a common minimum) and ignore the rest.
Test both formats on your own brief. Chase Jarvis's result is one test. Run your own A/B and default to whichever is easier to maintain.
Don't oversell JSON to your team. "92% accuracy" sets expectations the model will not meet. The honest pitch is "easier to template, version, and batch."

The original Nano Banana (Gemini 2.5 Flash Image) and the current Nano Banana Pro both behave the same way: they parse natural language. JSON is a wrapper for your benefit, not theirs.

Nano Banana prompt guide (Google DeepMind, official prompting guidance)
Nano Banana image generation (Google AI for Developers)
Gemini 3 Pro Image Preview model page (Google AI for Developers)
How to prompt Gemini 2.5 Flash Image (Google Developers Blog)
Structured output (responseSchema) (Google AI for Developers)
Does JSON Prompting Actually Work? Tested with Nano Banana (Chase Jarvis)
Nano Banana structured JSON prompt schema (alexewerlof, community proposal)
Nano Banana Pro JSON Prompting Guide (Atlabs AI, community guide)
Nano Banana JSON Prompt Format (aiformarketings, community guide)
gemini-image-prompting-handbook (pauhu, open-source community schema)

Last reviewed against source pages: 2026-04-18. Google's documentation may add or change recommendations; confirm in the linked sources before standardizing on any pattern.

Alle Beiträge

Autor

gptimg Team

How to Write JSON Prompts for Nano Banana Pro

Autor

Kategorien

Weitere Beiträge zum KI-Bildgenerator

Nano Banana vs Midjourney v7: A Spec & Pricing Comparison

How to Prompt Nano Banana Pro for Realistic Lighting

The Nano Banana Family Timeline: Banana → Pro → 2 → 3?

How to Write JSON Prompts for Nano Banana Pro

Autor

Kategorien

Weitere Beiträge zum KI-Bildgenerator

Nano Banana vs Midjourney v7: A Spec & Pricing Comparison

How to Prompt Nano Banana Pro for Realistic Lighting

The Nano Banana Family Timeline: Banana → Pro → 2 → 3?