LTX Video 2.3: the open-source video model finally fast enough for production iteration

Lightricks LTX Video 2.3 is open, fast, and practical enough to build around. Here is what changed, how it compares with closed video APIs, and when to use it.

Z.Tools blog OG image: lightricks-ltx-2-3-video

LTX-2.3 is the Lightricks video model release that makes the LTX story much easier to explain: fast enough to test, open enough to build on, and practical enough for people who need more than a one-off demo clip.

The earlier LTX-2 launch already had the big headline. In the October 2025 announcement, Lightricks described LTX-2 as an open-source foundation model for synchronized video and audio, native 4K output, up to 50 frames per second, and 10-second clips. That was an ambitious claim because video models usually force a choice between quality, control, and cost.

LTX-2.3 is the more interesting release for day-to-day work. Lightricks frames it as a sharper, cleaner, more controllable version of the LTX-2 family, with native portrait video, better image-to-video behavior, cleaner audio, and stronger prompt following. That sounds like a routine version bump until you look at what actually annoys people when they generate video: frozen subjects, prompts that lose timing, faces that soften mid-shot, vertical clips that look like cropped landscape, and audio that feels glued on after the fact.

what changed in LTX-2.3

The most useful improvement is not a single spec. It is that LTX-2.3 cleans up several weak spots at once.

Lightricks says the model uses a rebuilt latent space and an updated VAE trained on better data. In plain terms, it should hold fine details more reliably: hair, fabric texture, edge detail, small text, product surfaces, and faces that sit somewhere between close-up and background. Those are the details that make AI video feel either usable or disposable.

Prompt adherence also moved forward. Lightricks describes a much larger text connector, and the company specifically calls out complex prompts with multiple subjects, spatial relationships, style notes, timing, motion, and expression. I still would not treat any video model as a precise director, but LTX-2.3 gives you more reason to write like a director: say when the camera moves, what the subject does, where the scene changes, and what should stay visually consistent.

The image-to-video upgrade matters just as much. Lightricks says LTX-2.3 reduces frozen starts and the lazy slow-pan effect that can make animated stills feel like slideshow motion. That is the difference between "the camera drifted across my image" and "the subject actually moved."

Native portrait support is another practical fix. LTX-2.3 can generate vertical video up to 1080 by 1920, trained on portrait-orientation data rather than cropped from landscape. That matters for social clips, mobile product demos, creator ads, and anything where 9:16 is the real canvas.

fast versus pro is the workflow

The LTX-2.3 family is split into Fast and Pro variants, and the distinction is refreshingly honest. Fast is for trying ideas. Pro is for shots you actually care about.

Lightricks says Fast supports clips up to 20 seconds and is designed for iteration, batch generation, and short-form work. Pro is capped at 10 seconds and is aimed at higher fidelity, smoother motion, truer color, and final deliverables.

That is exactly how I would use it. Start with Fast when the prompt is still under negotiation. Test the subject, camera path, framing, first frame, and rough pacing. Then move the winning idea into Pro only after the clip has earned the extra spend.

This is also where LTX-2.3 feels different from models that only sell a polished endpoint. The open-source documentation describes LTX-2.3 as a DiT-based audio-video model with open weights, local execution, LoRA-based customization, camera-aware motion logic, and multimodal control through text, image, video, audio, and depth. You can use it through a hosted workflow, but the model is not trapped there.

pricing is simple enough to reason about

The current LTX API pricing page bills text-to-video and image-to-video by generated seconds, with higher resolution and Pro generation costing more.

For LTX-2.3 Fast, 1080p generation costs around six cents per second. At 1440p it costs around twelve cents per second, and at 4K it costs around twenty-four cents per second. For LTX-2.3 Pro, 1080p costs around eight cents per second, 1440p costs around sixteen cents per second, and 4K costs around thirty-two cents per second.

So a 10-second 1080p Pro clip lands around eighty cents before any platform markup or workflow-specific credit rules. A 20-second 1080p Fast clip lands around one dollar and twenty cents. The useful part is not that this is "cheap" in the abstract. It is that the price maps cleanly to a creative decision: longer and faster for exploration, shorter and more detailed for final shots.

Audio-to-video, retake, and extension features are priced separately in the official API docs, currently around ten cents per second at 1080p for the Pro routes. For most browser users, that detail matters less than the principle: audio-led or edit-specific operations have their own economics, so do not treat every second of video as the same kind of second.

AI Video Generator

AI Video Generator

Create videos from text, images, or transform existing footage

Keep reading