Not Really started as "Claude Code, but for media." It evolved into something else — a full agentic tooling and experience layer designed specifically for a media creative experience.
It is not Claude Code but wrangled to do media. It is whatever it needs to be, FOR media.
All humans are terrible prompters — you don't know what you want until you see something, then ideas start flowing. But humans are great preference machines. Shown options, you know what you prefer.
The goal is not to make a better prompt box. It's to build a system where the human steers through preference signals, and the system learns to anticipate. Show them things — search results, reference images, style explorations — and let their reactions reveal what words can't. The interface is a shared workspace — a canvas — where humans and AI agents communicate through lightweight signals and build creative artifacts together.
Coding agents work because: (1) programming languages are rigorous and Markovian, (2) verification tools — compilers, test runners, linters — let the agent check its own work. The agent loop closes: generate code, verify, fix, verify again.
Media has neither property. It's subjective and has no "compiler." The agent generates an image and has no idea if it's good. The loop is open.
Claude Code for media requires filling four gaps that coding agents get for free:
For coding, correctness is objective — tests pass or fail. For media, "correctness" lives in the human's head. Turn-based chat forces full articulation upfront (impossible for creative work) and delays feedback (too costly).
The system must make feedback early, frequent, and low-friction. A 👍 on an image is worth a thousand words of prompt engineering. The experience should be fun — unlike coding, the creative process is as important as the output because it inspires and leads to novel creation.
Design principles:
Coding agents verify output via compiler, linter, type-checker, test suite — deterministic, immediate, machine-readable. Media agents are blind after generating. Without non-AI verification, the AI part doesn't work well.
The Actor-Critic pattern: a generating agent (Actor) paired with an evaluating VLM (Critic). The Critic provides natural language feedback — "background too cluttered, subject off-center, palette doesn't match" — not just scalar scores. NL critique lets the agent reason about what to fix, like reading compiler errors. In practice: fast scalar metrics for hard checks (alignment, technical quality) + VLM critic for soft checks (aesthetics, composition, vibe).
Evaluation axes (with code analogs):
Git makes the coding agent loop safe: every change is reversible, diffable, branchable, mergeable. The agent can try things aggressively. Media has no equivalent.
git revert is free. Destructive pixel overwrites make the agent brittle. Non-destructive editing (layers, masks, param snapshots) enables experimentation.git diff shows what changed. Media needs visual/audio diffs, overlay toggles, difference maps.This is the media equivalent of .editorconfig, claude.md, README — the non-code context that coding agents write down, commit, and know to look for.
Without preference memory, every session starts cold. The human re-explains "warm tones," "less saturated," "more negative space" every time — like a coding agent forgetting your project uses TypeScript.
What to learn:
How to build it:
.editorconfig for aestheticsA multi-agent AI creative workspace built on a generic engine + configurable team blueprints (hatsets):
.editorconfig for aesthetics.The output isn't one image — it's an organized creative artifact: a book of pages, each a canvas of grouped, connected, reacted-to widgets.
Is this Claude Code?
Is this Figma with a prompt box?
Is this a wrapper around better image models?
Is this another AI art generator?
Is this a chatbot that makes pictures?
