DataPizza · B2C · AI Agent · 2025 — Now

Moodio — All‑in‑One Video Agent

A web‑based agent that helps anyone brainstorm, generate and edit targeted AI video outputs — with a behavior‑driven data flywheel improving the model with every session. Work in progress — beta launched March 2026.

Role: Solo Designer
+ Vibe‑Code Co‑Builder
Timeline: Jan 2025 — Now
Team: 1 designer (me)
6 research & engineering
3 product & data
5 filmmakers · 3 advisors
Tools: Figma · Cursor · Claude
Next.js · LLM APIs

Build Method · Vibe‑Coded with Engineering

This product didn't go through traditional handoff. I designed every flow in Figma and vibe‑coded each screen shoulder‑to‑shoulder with engineering in Cursor + Claude — taking ideas from sketch to production in the same afternoon. The result: design and code stayed one conversation, not two artifacts.

Section 01

Where professional craft compounds.

Turn every project into an AI asset. Accelerate professional film creation.

One agent that runs your studio's whole workflow. Built by a CMU foundation-model research team — working on expert data curation, agent evaluation, and data-flywheel post-training. Top-conference research adopted by 100+ AI labs including Google DeepMind, ByteDance, and xAI.

Section 02

Three Core Capabilities

One agent for the whole studio — built on three complementary pillars.

Industry-leading visual retrieval. Search across millions of frames with cinematic precision — outperforms both Google Gemini Embedding 2 and Alibaba Qwen-3-VL Embedding on Moodio's benchmarks.
AI workflow spanning the full film pipeline. From brief to mood, shot list to cut — one chat surface that switches between retrieval, generation, and inline edit without leaving the canvas.
Personalized agent shaped by your team. A data flywheel built from real user interactions — the agent learns each studio's taste, references, and conventions session by session.

Section 03

Moodio is a research-driven company.

Our research has become a benchmark in video understanding and cinematic generation evaluation — published at CVPR, NeurIPS, and ECCV, adopted by frontier labs at Google DeepMind, ByteDance, xAI, Kling, Midjourney and more.

NeurIPS 2025

Spotlight · Top 3%

CameraBench

Understanding camera motion in any video. Adopted by Google DeepMind, xAI, Kling, and frontier video labs.

Read paper →

CVPR 2026

Highlight · Top 3%

CHAI

Building precise video language with human-AI oversight. The framework behind Moodio's caption quality and retrieval precision.

Read paper →

ECCV 2024

Adopted in Google Imagen-3

VQAScore

State-of-the-art metric for evaluating text-to-visual generation. Google DeepMind named it the strongest replacement for CLIPScore.

Read paper →

CVPR 2024

Best Paper · SynData Workshop

GenAI-Bench

Benchmark for compositional text-to-visual generation. Uniquely adopted in the official Google Imagen-4 technical report.

Read paper →

Whitepaper

Cinematic Video Search · SOTA

Moodio Retrieval

The first to bring visual reference into the video generation workflow. Retrieval outperforms both Google Gemini Embedding 2 and Alibaba Qwen-3-VL Embedding.

Read paper →

Whitepaper · Forthcoming

Agent Post-Training

Moodio Agent

First end-to-end video production agent with reward learning from real user interactions. The data flywheel behind studio-grade workflow.

Read paper →

Section 04

Frontier labs and studios use our datasets and models.

Two numbers tell the story.

2M+

HuggingFace downloads

100+

industry labs using
our research

In use at

Google DeepMind

ByteDance

xAI

Kling

Midjourney

Runway

Luma

Adobe

Tencent

Microsoft

SenseTime

Xiaomi

Next project

SOCA — AI Job‑Seeking Platform

→