Most AI video creators are essentially gambling with their content—rolling the dice on inconsistent results despite increasingly powerful tools. Chase H argues this isn't a technology problem; it's a system problem. Without a framework for guiding AI generations, even the best models produce unpredictable, often disappointing outputs. His solution: a five-step process that transforms AI creation from random experimentation into controlled, repeatable production.
The Framework Problem
The core issue is straightforward: AI video models have become remarkably capable, but most creators lack any structured approach to using them. Without an established system, every generation remains a lottery—random outcomes, inconsistent characters, unpredictable visual quality. The result? Wasted time and money on results that feel generic or broken.
Chase H's five-step framework addresses this directly: storyboarding, foundation image creation, key frame generation, video generation, and editing. Each stage builds toward predictable, high-quality output.
Step One: Storyboarding
The first step establishes everything that follows. This requires using a prompt template combined with an AI assistant like Claude or ChatGPT to generate five essential elements:
- The overarching concept or narrative
- Visual references and tonal influences (Chase H uses films like The Revenant as benchmarks)
- Setting details
- Character definitions
- A shot list broken into individual scenes
The process involves back-and-forth dialogue with AI, describing intent in plain language until the vision crystallizes. For visual reference guidance, he recommends ShotDeck—a free repository providing technical information about cinematic shots from different films: lens types, lighting approaches, camera specifications.
Step Two: Foundation Image Creation
This step is considered the most critical for consistent results. Using tools like Midjourney inside the Midjourney ecosystem (which he refers to as a one-stop shop), creators generate their main characters with specific attention to:
- Shot type: medium shot balances detail without losing facial consistency
- Technical specifications drawn from cinematic sources (like Alexa 65 camera, ultra vista lens)
- Reference imagery that will maintain character consistency across all future scenes
The process involves feeding AI the visual reference and explaining the character vision, then generating iterations until finding the preferred version. This foundation image becomes the reference point for every subsequent scene involving that character.
Step Three: Key Frame Generation
Now combining the foundation image with the shot list creates starting frames for each scene. The key insight: video generation tools work best when you provide an exact starting frame rather than just describing action.
AI generates prompts for these key frames, sometimes requiring both start and end frames for additional control. Using ShotDeck references again—searching specific scenes like campfire lighting—provides the visual language needed to guide AI toward desired outputs.
Step Four: Video Generation
With proper foundation work completed, video generation becomes one of the easier steps. The process uses models supporting first-frame-to-video functionality (like Kling 3.0), where creators simply bring their reference image and let AI generate the motion.
Prompting for video is actually simpler than image prompts—AI generates these automatically from templates. The key is including all reference images and descriptions built in previous stages.
Critics might note that relying heavily on visual references risks mimicking rather than creating original content—a valid concern for artists seeking distinctive voices. Additionally, the framework assumes access to premium AI tools and platforms that may not be available to all creators.
"Until you're able to get some level of control and consistency with your AI generations, you're just going to be wasting your time and your money."
Bottom Line
Chase H's framework is genuinely useful for creators frustrated by inconsistent AI outputs. The five-step process provides the structure most users lack. His strongest argument—that systems beat randomness—applies broadly beyond just AI video. The vulnerability: relying on cinematic references may produce derivative work, and the tools referenced require subscriptions that not all creators can afford. For those willing to invest in the system, the results should be significantly more predictable than random generation.