Not Really Roadmap
🚶
22 done
16 to go
Phase 0FoundationDone
Multi-Agent OrchestrationDone

Master/specialist/critic architecture with send_message delegation and dynamic agent registry

Agent Lifecycle (Serverless)Done

Long-running AI agents on Vercel's 30s function limits — session continuity, crash recovery, and zombie defense without persistent VMs

Async Execution EngineDone

Non-blocking tool execution — agents fire background jobs and react to results without polling

Canvas + RenderersDone

Four view modes: Flow, Menu, Boring, TUI

Widget SystemDone

Images, groups, choices, questions, comments, layouts

Reaction SystemDone

User reactions (thumbs, heart) + agent score emojis with evaluation messages

Real-Time UpdatesDone

SSE events, Redis pub/sub, durable event bus, multi-window sync with echo suppression

WhiteboardDone

Persistent agent memory — StrReplace editing, event-driven, pre-loaded in system prompt

Researcher AgentDone

Web image search, vision filtering, blob storage pipeline

PinboardDone

Up to 8 pinned media items as persistent visual references — all agents see pins in system prompt

Voice InputDone

Right-click record — real-time speech-to-text via Soniox, posted as a comment for hands-free direction

Layout Widget SystemDone

Proportional row-major panel grids with template presets, create_layout tool, download compositing via sharp

Phase 1EngineMake It Good
1.1 Hatset SystemDone

YAML-based team blueprints. Dynamic agent registry, Hat Shop UI, per-hatset specialist roster and messaging topology.

1.2 Multi-Model Image GenerationDone

Model registry with pluggable adapters. Hatsets configure which model each specialist uses.

1.3 Communication DietDone

Reduce token waste and duplicate work across agents for a snappier experience.

1.4 Skill SystemDone

Reusable instruction modules for agents. Shared knowledge extracted into skills/*.md files, loaded via preload or on-demand read_skill tool.

1.5 Multi-Model NextPlanned

Per-call model selection and additional model tiers for progressive fidelity workflows.

1.6 Eval HarnessPlanned

Structured evaluation pipeline — the Media Compiler. VLM-based critic with prompt alignment, technical QA, style checks.

1.7 Ambient MasterPlanned

Remove the wake button. Debounced auto-wake with event accumulation and threshold-based urgency.

1.8 Semantic LayoutPlanned

Zone-based canvas auto-placement: hero, flow, sidebar zones with automatic packing.

1.9 Communication Diet NextPlanned

Further agent communication optimizations.

Phase 2PlatformMake It a Product
2.1 Cross-Page IntelligencePlanned

Master context includes page summaries, pulls relevant material across pages, contextual questioning.

2.2 Hat ShopActive

Browsable hatset marketplace. Browse, search, preview, fork, and customize community-submitted hatsets.

2.3 Analytics SystemDone

Observability logging, dashboard, and CLI tools. Separate from book_events — tracks LLM token usage, tool performance, visitor behavior, and costs.

2.4 Hat MakerPlanned

Distill session patterns into new hatsets. Analyze book history, propose configurations, publish to Hat Shop.

2.5 Preference SystemPlanned

Cross-book preference memory, per-user profiles, drift detection, user-editable preference cards.

Phase 3ScaleMake It Real
3.1 Login & User SystemDone

Clerk-based auth with user handles, book ownership (public/private), and editor permissions.

3.2 Multi-User CollaborationDone

Per-user identity on all content, live presence, agents passively see user handles.

3.3 Asset LibraryPlanned

Cross-book media reuse. Personal asset library with curated asset packs.

3.4 Media Type ExpansionPlanned

Video (Sora, Runway), audio (MusicGen, ElevenLabs), text/copy. Each new media type ships as a hatset.

Phase 4ApplicationsSame Engine, Different Outputs
🎨 Creative StudioDone

General-purpose image exploration — brainstorm visual directions, iterate on style, build a cohesive vision.

📚 Comic Book StudioActive

Plan and produce illustrated comic pages with consistent characters, panel layouts, and wireframe-to-render pipeline.

📸 Photo PlaygroundDone

Upload a photo, get placed into fun scenes — identity-conditioned generation with automatic critic evaluation.

🖼️ MonetDone

Solo AI image studio — one master agent handles everything with conversational output style.

📢 Brand Ad StudioPlanned

Brand-consistent ad creatives across platforms — structured briefs, palette extraction, multi-format output.

🎵 Album ArtPlanned

Cohesive music visual identity — cover, singles, social banners with mood-driven exploration.

🧒 Children's BookPlanned

Write a story, get it illustrated page by page with consistent characters and art style.

🏠 Interior DesignPlanned

Upload room photos, get them reimagined in different styles — i2i with style transfer and A/B comparison.

❓ Open Questions
Agent LifecycleResolved

Token-budget sliding window (maxHistoryTokens per agent), server tool pruning, time-based loop guard (270s), bounce refresh for session continuation, stale recovery with CAS + heartbeat + zombie defense.

Hat CompositionResolved

Composition already exists at multiple levels: tools compose into agents, agents compose into hatsets, skills compose into agents. No need for hatset-level composition — the existing layers provide sufficient modularity.

Implicit Signals

Hovers, dwell time, skips — richer preference data than explicit reactions. Invest after explicit signal pipeline is proven.

The Journey Problem

Every artifact created during exploration has value. The book IS the journey — needs a presentation/sharing layer.

Are we ever done?
Not Really