Moodio — All‑in‑One Video Agent
A web‑based agent that helps anyone brainstorm, generate and edit targeted AI video outputs — with a behavior‑driven data flywheel improving the model with every session. Work in progress — beta launched March 2026.
This product didn't go through traditional handoff. I designed every flow in Figma and vibe‑coded each screen shoulder‑to‑shoulder with engineering in Cursor + Claude — taking ideas from sketch to production in the same afternoon. The result: design and code stayed one conversation, not two artifacts.
Where professional craft compounds.
Turn every project into an AI asset. Accelerate professional film creation.
One agent that runs your studio's whole workflow. Built by a CMU foundation-model research team — working on expert data curation, agent evaluation, and data-flywheel post-training. Top-conference research adopted by 100+ AI labs including Google DeepMind, ByteDance, and xAI.
Three Core Capabilities
One agent for the whole studio — built on three complementary pillars.
- Industry-leading visual retrieval. Search across millions of frames with cinematic precision — outperforms both Google Gemini Embedding 2 and Alibaba Qwen-3-VL Embedding on Moodio's benchmarks.
- AI workflow spanning the full film pipeline. From brief to mood, shot list to cut — one chat surface that switches between retrieval, generation, and inline edit without leaving the canvas.
- Personalized agent shaped by your team. A data flywheel built from real user interactions — the agent learns each studio's taste, references, and conventions session by session.
Moodio is a research-driven company.
Our research has become a benchmark in video understanding and cinematic generation evaluation — published at CVPR, NeurIPS, and ECCV, adopted by frontier labs at Google DeepMind, ByteDance, xAI, Kling, Midjourney and more.
CameraBench
Understanding camera motion in any video. Adopted by Google DeepMind, xAI, Kling, and frontier video labs.
Read paper →CHAI
Building precise video language with human-AI oversight. The framework behind Moodio's caption quality and retrieval precision.
Read paper →VQAScore
State-of-the-art metric for evaluating text-to-visual generation. Google DeepMind named it the strongest replacement for CLIPScore.
Read paper →GenAI-Bench
Benchmark for compositional text-to-visual generation. Uniquely adopted in the official Google Imagen-4 technical report.
Read paper →Moodio Retrieval
The first to bring visual reference into the video generation workflow. Retrieval outperforms both Google Gemini Embedding 2 and Alibaba Qwen-3-VL Embedding.
Read paper →Moodio Agent
First end-to-end video production agent with reward learning from real user interactions. The data flywheel behind studio-grade workflow.
Read paper →Frontier labs and studios use our datasets and models.
Two numbers tell the story.
our research
In use at