GPT Image 2: OpenAI's first image model that reasons before it renders
GPT Image 2 (gpt-image-2) launched April 2026 as OpenAI's most capable image generation model. It adds native reasoning, near-perfect text rendering, 4K output, multi-image batching, and token-based pricing. Here is what actually changed and how to decide when to use it.
OpenAI launched ChatGPT Images 2.0 on April 21, 2026, and GPT Image 2 is the API-side model behind the same generation. That date matters because image tools have been moving fast enough that old advice from even a few months ago can point you at the wrong endpoint, the wrong price, or a model that is already on its way out.
The short version: GPT Image 2 is the OpenAI image model to look at first when your output has to survive contact with real copy, real layout constraints, or real editing feedback. It goes beyond prettier image generation and tackles the parts of the work that used to waste the most time: misspelled poster text, broken UI labels, inconsistent reference objects, awkward edits, and resolution ceilings that forced a second pass elsewhere.
I would still treat it like a production tool, not magic. It has sharper control, but it still benefits from clear art direction, reference images when identity matters, and a quick review pass before anything ships.
What changed with the April release
The consumer name is ChatGPT Images 2.0. OpenAI's ChatGPT release notes say it is available on all ChatGPT plans, while images with thinking are available on paid plans when you select Thinking or Pro models. In plain English, everyone gets the new image experience, but the slower planning mode is gated behind paid ChatGPT access.
On the developer side, GPT Image 2 is listed as OpenAI's current high-end image generation and editing model. In the API, the name you pass is gpt-image-2. OpenAI also exposes a versionless alias that automatically tracks future upgrades, which is useful for product surfaces where you want improvements without changing application code every time OpenAI refreshes the underlying image model.
The visible improvement is text. Older image models could create beautiful posters and then ruin them with one bad word. GPT Image 2 is much better at typography, labels, multilingual text, diagrams, and interface-like layouts. OpenAI's own launch examples lean hard into that: posters, handwritten pages, educational diagrams, comic pages, multilingual typography, travel brochures, and desktop-style scenes with small interface details.
That choice of examples is telling. The pitch is not "more cinematic fantasy art." It is "usable visual work with less cleanup."
The practical difference
The places where GPT Image 2 changes day-to-day work are fairly concrete.
If you make social ads, you can put the actual headline into the prompt and expect fewer typo-driven reruns. If you make product mockups, packaging comps, UI concepts, slide graphics, thumbnails, or marketplace images, the model is less likely to treat text as decorative texture. If you use image editing, the API supports edits from source images, reference images, and masks, so you can ask for a localized change without starting from a blank canvas.
The model also supports flexible sizes. OpenAI's image guide says GPT Image 2 accepts sizes that stay within a few boundaries: the longest edge can be up to 3840 pixels, both dimensions must be multiples of 16, the long side cannot be more than three times the short side, and the total pixel count must sit between 655,360 and 8,294,400 pixels. Common choices like 1024 by 1024, 1536 by 1024, 1024 by 1536, and 3840 by 2160 are all within that shape.
Quality is also straightforward. You can ask for low, medium, high, or let the system choose automatically. Low is for drafts and fast iteration. Medium is the sensible default for most publishable web images. High is where I would start for product comps, ads, editorial assets, and anything with small text.
Output format still matters. PNG is the default, but JPEG and WebP are available, with configurable compression. OpenAI says JPEG is faster than PNG, which lines up with the usual production tradeoff: use JPEG or WebP when latency and file size matter, and use PNG when you need a clean master.
The part that did not improve
Transparent backgrounds are the easy trap. GPT Image 2 does not currently support them. OpenAI's guide says requests for a transparent background are not supported for this model.
That is not a minor detail if you are making product cutouts, stickers, icons, compositing layers, or ecommerce assets that need alpha. In those cases, you either choose another image model that supports transparency or generate the image first and remove the background afterward. GPT Image 2 may still be the right model for the main render, but it is not the whole workflow for alpha-ready assets.
The other limitation is consistency. OpenAI says GPT image models can still struggle with recurring characters, brand elements, exact composition placement, and precise text placement in some cases. I would not promise a client that a generated mascot will remain identical across twenty images unless you are using strong references, narrowing the prompt, and reviewing each result.
Better is not the same as deterministic.
Pricing in plain terms
GPT Image 2 uses token-based pricing. That sounds abstract, but the billing pieces are simple enough once you split them up.
For standard API usage, OpenAI lists image input at $8 per million tokens, cached image input at $2 per million tokens, and image output at $30 per million tokens. Text input is $5 per million tokens, with cached text input at $1.25 per million tokens. Batch usage cuts those rates in half: image input becomes $4 per million tokens, cached image input becomes $1, image output becomes $15, text input becomes $2.50, and cached text input becomes $0.625.
For people planning budgets, the per-image estimates are more useful. OpenAI's image guide lists a 1024 by 1024 GPT Image 2 image at about $0.006 on low quality, $0.053 on medium quality, and $0.211 on high quality before any meaningful prompt or reference-image input cost. At 1024 by 1536 or 1536 by 1024, the same guide lists low at about $0.005, medium at about $0.041, and high at about $0.165.
For larger OpenAI-style 4K generation, expect the price to move up. The 3840 by 2160 high-quality hold used in production pricing is about $0.41 per image. Treat that as a planning number, not a universal bill, because edits with many references can add input-token cost. OpenAI also notes that GPT Image 2 processes image inputs at high fidelity automatically, so reference-heavy editing workflows can cost more than a simple text prompt.
The pattern is clear enough: use low quality to explore, medium for most web assets, high for final text-heavy or detail-heavy work, and batch when you can wait.
DALL-E 3 is now legacy work
OpenAI's deprecations page says DALL-E 2 and DALL-E 3 were marked for removal from the API on May 12, 2026. The official replacement listed there is GPT Image 1 or GPT Image 1 Mini, because that deprecation notice predates the GPT Image 2 launch.
That timeline should still change how you think about new work. If you are starting a fresh image workflow in 2026, DALL-E 3 should not be the default. It may still matter for old integrations, old prompts, or cost comparisons, but it is no longer the place to build forward.
The migration question is not simply "which model makes nicer pictures?" GPT Image 2 also changes the editing surface, size range, pricing model, and text reliability. A DALL-E 3 prompt that was tuned around avoiding text may become simpler. A pipeline that assumed a fixed per-image price needs a real estimate. A tool that generated one image at a time may now benefit from a different batching or editing pattern.
API choice: simple calls or conversational editing
OpenAI now gives you two practical paths for image generation.
The Image API is the direct route. Use it when you want one prompt to produce or edit one set of images. It covers generation and edits, and it is the easier mental model for most product integrations.
The Responses API is the better fit when the image sits inside a broader conversation. It supports multi-turn editing, so the user can generate an image, ask for a change, preserve context, and keep moving without rebuilding the whole request by hand. That is useful for design tools, creative assistants, and any workflow where the first result is treated as a draft rather than the final asset.
There is no moral win in choosing the more elaborate API. If your app has a simple generate button, use the simpler path. If the user is talking through edits, use the conversational path.
Where I would use GPT Image 2 first
I would start with GPT Image 2 for ads, posters, thumbnails with short headlines, multilingual graphics, infographics, UI mockups, book or album concepts, packaging studies, and product scenes where reference-image fidelity matters. It is also a strong default for image-to-image work where the prompt has to preserve the main subject but change a specific region or styling direction.
I would pause before using it for transparent-background assets, extremely strict brand reproduction, or long series work where every character detail has to match perfectly. It can help there, but the review burden is still real.
For quick comparisons, the most annoying part is usually not the prompt. It is switching between products, billing systems, output rules, and download flows just to see which model handles your image best. Z.Tools' AI Image Generator keeps that testing loop in one place, so you can compare GPT Image 2 with other image models before committing a workflow around one provider.

AI 画像生成
テキストからAIで画像を作成