Master/specialist/critic architecture with send_message delegation and dynamic agent registry
Long-running AI agents on Vercel's 30s function limits — session continuity, crash recovery, and zombie defense without persistent VMs
Non-blocking tool execution — agents fire background jobs and react to results without polling
Four view modes: Flow, Menu, Boring, TUI
Images, groups, choices, questions, comments, layouts
User reactions (thumbs, heart) + agent score emojis with evaluation messages
SSE events, Redis pub/sub, durable event bus, multi-window sync with echo suppression
Persistent agent memory — StrReplace editing, event-driven, pre-loaded in system prompt
Web image search, vision filtering, blob storage pipeline
Up to 8 pinned media items as persistent visual references — all agents see pins in system prompt
Right-click record — real-time speech-to-text via Soniox, posted as a comment for hands-free direction
Proportional row-major panel grids with template presets, create_layout tool, download compositing via sharp
YAML-based team blueprints. Dynamic agent registry, Hat Shop UI, per-hatset specialist roster and messaging topology.
Model registry with pluggable adapters. Hatsets configure which model each specialist uses.
Reduce token waste and duplicate work across agents for a snappier experience.
Reusable instruction modules for agents. Shared knowledge extracted into skills/*.md files, loaded via preload or on-demand read_skill tool.
Per-call model selection and additional model tiers for progressive fidelity workflows.
Structured evaluation pipeline — the Media Compiler. VLM-based critic with prompt alignment, technical QA, style checks.
Remove the wake button. Debounced auto-wake with event accumulation and threshold-based urgency.
Zone-based canvas auto-placement: hero, flow, sidebar zones with automatic packing.
Further agent communication optimizations.
Master context includes page summaries, pulls relevant material across pages, contextual questioning.
Browsable hatset marketplace. Browse, search, preview, fork, and customize community-submitted hatsets.
Observability logging, dashboard, and CLI tools. Separate from book_events — tracks LLM token usage, tool performance, visitor behavior, and costs.
Distill session patterns into new hatsets. Analyze book history, propose configurations, publish to Hat Shop.
Cross-book preference memory, per-user profiles, drift detection, user-editable preference cards.
Clerk-based auth with user handles, book ownership (public/private), and editor permissions.
Per-user identity on all content, live presence, agents passively see user handles.
Cross-book media reuse. Personal asset library with curated asset packs.
Video (Sora, Runway), audio (MusicGen, ElevenLabs), text/copy. Each new media type ships as a hatset.
General-purpose image exploration — brainstorm visual directions, iterate on style, build a cohesive vision.
Plan and produce illustrated comic pages with consistent characters, panel layouts, and wireframe-to-render pipeline.
Upload a photo, get placed into fun scenes — identity-conditioned generation with automatic critic evaluation.
Solo AI image studio — one master agent handles everything with conversational output style.
Brand-consistent ad creatives across platforms — structured briefs, palette extraction, multi-format output.
Cohesive music visual identity — cover, singles, social banners with mood-driven exploration.
Write a story, get it illustrated page by page with consistent characters and art style.
Upload room photos, get them reimagined in different styles — i2i with style transfer and A/B comparison.
Token-budget sliding window (maxHistoryTokens per agent), server tool pruning, time-based loop guard (270s), bounce refresh for session continuation, stale recovery with CAS + heartbeat + zombie defense.
Composition already exists at multiple levels: tools compose into agents, agents compose into hatsets, skills compose into agents. No need for hatset-level composition — the existing layers provide sufficient modularity.
Hovers, dwell time, skips — richer preference data than explicit reactions. Invest after explicit signal pipeline is proven.
Every artifact created during exploration has value. The book IS the journey — needs a presentation/sharing layer.
