Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Shufeng Nan, Mengtian Li*, Sixiao Zheng, Yuwei Lu, Han Zhang, Yanwei Fu*
Teaser Image
Traditional Previz vs. Mind-of-Director Framework. Traditional Previz requires iterative collaboration across multiple departments (where N denotes iterations, typically N ≫ 1), involving script writing, 2D storyboarding, 3D scene construction, character blocking, animatic production, and camera planning. In contrast, Mind-of-Director automates this pipeline (N = 1) through multi-modal agents that collaborate in real-time decision-making to generate high-quality, semantically aligned, and visually coherent previz sequences directly from an idea, enabling a single creator to prototype cinematic scenes with minimal manual effort in the game engine.

Abstract

We present Mind-of-Director, a multi-modal, agent-driven framework for film previsualization (previz) that models the collaborative decision-making process of a film production team. Given a high-level creative idea, Mind-of-Director orchestrates multiple specialized agents to collectively produce coherent film previz sequences within the game engine. The framework consists of four cooperative modules: Script Development, where agents draft and refine the screenplay through iterative dialogue; Virtual Scene Design, which transforms text into semantically aligned 3D environments; Character Behaviour Control, which determines character blocking and motion based on narrative intent; and Camera Planning, which optimizes framing, movement, and composition for cinematic camera effects. A real-time visual editing system built in the game engine further enables interactive inspection and synchronized timeline adjustment across scenes, behaviours, and cameras. Extensive experiments and human evaluations show that Mind-of-Director generates high-quality, semantically grounded film previz sequences in approximately 25 minutes per idea, demonstrating the effectiveness of agent collaboration for both automated prototyping and human-in-the-loop filmmaking.

Video Demonstrations

Method Pipeline

Method Pipeline
Overview of the Mind-of-Director Framework. Given a high-level idea, our multi-modal agent-driven framework simulates a collaborative decision-making workflow through four modules: (1) Script Development refines the screenplay via a Discuss–Revise–Judge process; (2) Virtual Scene Design builds consistent 3D environments using 2D-guided and rule-based generation; (3) Character Behaviour Control optimizes character blocking and motion through agent feedback; (4) Camera Planning selects and validates cinematic shots via a Debate–Judge–Validation loop. All modules are integrated in Unity for real-time visualization and iterative refinement.

Discussion

The great performance highlights the value of collaborative decision-making among multi-modal agents. Rather than relying on isolated model outputs, our system mirrors the distributed cognition of a real film crew, where creativity emerges through negotiation, critique, and consensus. Beyond numerical improvements, this work redefines how AI participates in creative workflows: Mind-of-Director moves AI from a passive generator to an active collaborator capable of multi-role reasoning across narrative, spatial, and cinematic domains. This demonstrates that creative intelligence can arise from structured inter-agent dialogue, suggesting broader implications for computational creativity. Moreover, the framework can naturally generalize to adjacent domains including interactive storytelling, game design, and AR/VR prototyping. By grounding AI-driven creativity in collaborative reasoning, the system bridges automation with artistic control and expands the design space for rapid content creation. Nevertheless, the current system handles only moderately complex scenes with limited characters, and its narratives remain primarily human-centric. Future work will explore richer environments, multi-character interactions, and adaptive learning with real-time engine feedback, paving the way for fully autonomous and scalable creative systems.

Conclusion

We present Mind-of-Director, a multi-modal agent-driven framework that automates film previz through collaborative decision-making. Inspired by real-world film crews, the framework consists of four stages: Script Development, Virtual Scene Design, Character Behaviour Control, and Camera Planning. By integrating these stages, Mind-of-Director effectively transforms a high-level creative idea into an editable film previz sequence, offering valuable insights and references for the creative process.

Coming Soon

This section will be available soon.