← Back to Library

how to get the most out of the best AI video model EVER

Chase argues that Cling 3.0 represents a genuine leap forward in AI video generation — specifically because it handles multi-shot video, text rendering, and emotional expression better than anything else on the market. The model excels at cutting between multiple shots within a single generation, giving creators unprecedented control over how scenes unfold.

What's Actually Different About Cling 3.0

The core innovation here is what Chase calls "multi-shots." Rather than generating one continuous shot, users can now program three distinct cuts into a single generation. Each shot can be adjusted independently by dragging to change its duration — with a maximum of 15 seconds per shot.

This matters because previous models forced creators to generate clips separately and stitch them together afterward. Multi-shots eliminate that extra step entirely.

The second major addition is something called Elements. Think of it as reference images, but for video generation. Users can upload 360-degree views of characters — side angles, front-facing poses, back views — giving the AI a complete picture of how subjects appear from different perspectives. This dramatically improves consistency across multi-shot sequences.

Chase demonstrates by creating an Element: a woman with brown hair described in simple terms. Once uploaded, this Element can be referenced in prompts using "@" syntax or through the Elements menu.

The Six Things Every Prompt Needs

The real value Chase provides is a prompting framework he developed for Cling 3.0 users. The model responds best when prompts include exactly six components: camera, scene, subject, action, audio, and style.

This matters because AI video generation defaults to average quality when given vague instructions. Using precise terminology — like "low angle tracking shot using a 24mm anamorphic lens with slow dolly pushin" — produces dramatically better results than casual language descriptions.

The vocabulary matters enormously. These terms are what the model was trained on, and they function like a film director's nomenclature.

Chase recommends that users sign up for Shotdeck.com, a free database of cinematic scenes from major films. Users can search specific movies — he demonstrates with Dune 2 — and extract technical details: shot type, lens size, composition, lighting, camera movement, and film stock. These technical specifics can then be fed directly into Cling 3.0 prompts.

Where the Model Still Struggles

Two significant limitations deserve attention when using Cling 3.0.

First, Elements technology is still maturing. Overloading prompts with too many Elements alongside multiple shot changes sometimes causes the model to ignore hard cut instructions — collapsing separate shots into one long clip and producing unexpected audio artifacts.

Second, generation speed remains slower than competing models like VO 3.1 Fast. Creators producing longer videos requiring iterative refinement should factor this limitation into their workflows.

"If we don't explicitly tell it these things, then it's just going to default to the mean, which is going to give you a mediocre output."

Chase notes that the model's true strength emerges when given minimal constraints — allowing natural generation without starting images or heavy Element references. The resulting videos demonstrate remarkable emotional depth and facial expression quality that competitors haven't matched.

Bottom Line

Chase's core argument holds: Cling 3.0 genuinely represents the current peak of AI video generation, particularly for creators who want cinematic control through multi-shot sequences and precise prompting. His six-element framework provides a practical methodology for achieving those results — and Shotdeck offers a legitimate learning resource for building that vocabulary.

The vulnerability is practical rather than theoretical: the model remains expensive to run at scale, slower than alternatives, and Element-based workflows still require experimentation. Users should start small with prompts before adding complexity.

Claim 3.0 is here >> and it's good. >> No, I mean really [music] good. And >> it doesn't just excel in action. It's a huge leap forward for multi-shots, text, and emotion.

[laughter] And when you put that all together, we have a model that is a step above everything else on the market. So, in today's video, I'm going to give you the best practices for getting the most out of Clling 3.0 so you can start putting this beast to work. So, for today's video, I'm going to be demoing all this inside of Higsfield. Higsfield is just a one-stop shop for a bunch of different AI content creation tools, whether that's image generators like Nanobanana Pro or things like Cling 3.0.

I'll put a link down below for Higsfield and also have a discount code for you guys. So, Cling 3.0, like I mentioned in the intro, this truly is a step above everything else we have seen so far. So, what we're going to do now is I'm going to break down some of the key features very quickly and then we're going to dive into sort of best practices and things like prompt techniques so you can start using this yourself and actually get some really high quality outputs. So, let's zoom in on the lefth hand side and take a look at these prompts because the effectiveness of cleaning 3.0 when it comes to multi-shots is where it really shines.

I think multi-shots really came kind of on the scene with so 2 originally. When I talk about multi-shots, I mean it's not just one continuous shot. We can kind of go to different cuts all within the same generation. And clean 3 really breaks this down well because we have the ability to a set multi-shots and then specifically break it out shot by shot by shot, right?

Right. We also have the ability to change the duration of each of those shots. So I can change the duration here just by dragging it. And we get a max of 15 seconds.

On top of that, we have the ability to still control the scenes using key frames. So I can do a first frame as well as an M frame, which again gives us a ton of control. But the other big thing they've added here is elements. Now think of ...