Product roadmap
Where Clip Foundry is heading β from a faceless short engine to a full production platform. We're at v1 β here's the path to v5.
- v1You are here
Faceless engine
One API call (or MCP) turns a script into a ready-to-post short.
- Script β scenes, AI voiceover, visuals, beat-synced render
- 9 styles, multi-voice TTS, 9:16 / 1:1 / 16:9
- REST + SSE + native MCP server, signed asset delivery
- Resumable jobs, token billing, cost-safe pricing
- v1.5Up next
Production quality core
Make the output repeatably good, not just fast β and editable scene by scene.
- Scene consistency engine β shared visual DNA across scenes
- Quality tiers: draft β standard β premium β cinematic
- Asset re-ranking β best of several candidates per scene
- Scene-level rerender (image / voice / pacing) β fix one scene, not the whole film
- Premium audio: voice profiles, mastering, advanced captions
- v2Planned
Story engine & control
From topic β video to a steerable story engine with deliberate structure.
- Inputs: topic / URL / document / outline β script
- Story templates (hookβrevealβpayoff, myth vs fact, timeline, β¦)
- Editable scene planner + retention-aware hook optimizer
- Expanded visual style families with their own grammar
- v3Planned
Brand kits & mid-form
Your own assets & branding, and the jump from shorts to multi-minute explainers.
- Brand kits: logo, colors, fonts, intro/outro, CTA, watermark
- Asset library + per-scene overrides (mix AI with your footage)
- Extended shorts: 30 / 45 / 60 / 90 / 120s with pacing curves
- Chapter engine for 3β10 min explainers
- v4Planned
Production platform
project β chapters β scenes β assets β renders, fully API-first and editable.
- Projects / chapters / scenes / brand-kits as first-class API resources
- Scene & chapter editor, revisions, branching
- Timeline patch API β patch render without a full rerender
- Job graph, partial retries, cost estimate before run
- v5Planned
Long-form & feedback loop
10β20 min+ video, one-click publishing, and optimization that learns from results.
- Long-form engine: 10β20 min and beyond
- Publish connectors: YouTube, TikTok, Instagram Reels
- Retention / hook / style / voice analytics ingestion
- Auto-optimization: suggest better hook, style, pacing, voice
Directional, not a dated commitment β priorities shift with what users need most.