This document outlines the strategic vision for the CAPTURE project, a unified and enhanced media production workflow for the PHOTON platform. The project's goal is to consolidate the strengths of existing tools (voce, .photon/tools, melt), integrate modern capabilities, and create a flexible, powerful, and efficient system for capturing, processing, and producing high-quality audio and video content.
Our current ecosystem consists of several key components, each with its own strengths and weaknesses:
-
.photon/tools(legacy scripts):- Strengths: Powerful, battle-tested
melt-based workflow (video_melt.sh) for automated editing from LosslessCut EDL files. Sophisticated overlay and text-to-speech generation. - Weaknesses: Brittle, shell-based implementation. Tightly coupled with a now-outdated
meltinstallation.
- Strengths: Powerful, battle-tested
-
voceproject:- Strengths: Modern Python-based implementation. Robust GStreamer-based capture for screen, microphone, and system audio. Good foundation for modular design.
- Weaknesses: Created as a workaround for the broken
meltdependency, so it lacks the powerful post-production automation of the original scripts.
-
meltframework:- Strengths: A powerful and scriptable multimedia framework, ideal for automated, non-linear editing. The cornerstone of our original, efficient workflow.
- Weaknesses: Historically, it has been difficult to install and maintain, leading to the creation of
voce.
The CAPTURE project will be a modular, Python-based toolkit that provides a complete, end-to-end solution for media production. It will be designed as a library of workflows, allowing for flexible and extensible production pipelines.
The core principles of the CAPTURE project are:
- Integration: Seamlessly combine the best features of
voce(GStreamer capture), the legacy scripts (meltautomation), and new tools (Glaxnimate, Frei0r). - Modularity: Structure the project as a collection of independent but interoperable modules (e.g., capture, audio, editing, effects).
- Flexibility: Use a configuration-driven approach (e.g., YAML files) to define and manage different production workflows.
- Efficiency: Automate as much of the production process as possible, from capture to final render.
The CAPTURE project will be organized around the following core capabilities:
- Multi-source Recording: Leverage GStreamer for simultaneous capture of:
- Screen (full screen, specific window, or region).
- Microphone audio.
- System audio.
- Browser-based sources (e.g., using WebRTC or other browser automation tools).
- Device Management: A robust system for detecting and selecting audio and video devices.
- Real-time Monitoring: Provide real-time feedback on audio levels and capture status.
- Automated Rough Cuts:
- Silence Detection: Integrate the silence detection scripts from
voce/silenceto automatically remove dead air. - EDL-based Assembly: Fully reintegrate and enhance the
video_melt.shworkflow, using EDLs from LosslessCut to drivemeltfor automated assembly.
- Silence Detection: Integrate the silence detection scripts from
- Transcription: Integrate speech-to-text capabilities to generate transcripts and subtitles.
- Asset Management: A system for organizing and managing captured media, generated assets, and project files.
meltIntegration: Re-establishmeltas the core engine for automated editing, transitions, and effects.- Glaxnimate & Frei0r:
- Glaxnimate: Integrate Shotcut's new Glaxnimate support for programmatic creation of complex 2D animations and overlays.
- Frei0r: Leverage the Frei0r effects library for a wide range of filters and transitions that can be scripted with
melt.
- Text-to-Speech & Audio Processing: Enhance the existing
speak_wavfunctionality and provide a library of audio processing workflows (e.g., noise reduction, equalization, compression). - Overlay Engine: Rebuild the PHP-based overlay system in Python, providing a more robust and flexible solution for generating titles, captions, and other graphics.
- Workflow Library: Create a library of pre-defined production workflows (e.g., "screencast," "presentation," "interview").
- YAML Configuration: Allow users to define and customize workflows using simple YAML files.
- CLI Interface: A powerful and intuitive command-line interface for executing workflows and managing the production pipeline.
- Language: Python will be the primary language for the project, providing a robust and flexible foundation.
- Core Libraries:
- GStreamer: For all capture-related tasks.
melt: As a subprocess for all automated editing and rendering.- FFmpeg: For transcoding, audio processing, and other media manipulation tasks.
- Project Structure:
src/capture/: The main Python source code.capture/: GStreamer-based capture modules.audio/: Audio processing and TTS modules.editing/:meltand EDL processing modules.effects/: Glaxnimate, Frei0r, and overlay modules.workflows/: The workflow management system.
workflows/: A directory of YAML files defining different production workflows.templates/: Templates for MLT XML, subtitles, and other generated files.
- Establish the Foundation: Create the new
captureproject with the proposed modular structure. - Migrate
voce: Port the GStreamer capture and audio processing capabilities fromvoceinto the newcaptureproject. - Reintegrate
melt: Re-implement thevideo_melt.shlogic in Python, creating a robust and flexible EDL-driven editing engine. - Integrate New Capabilities: Begin integrating Glaxnimate, Frei0r, and other new tools into the workflow.
- Deprecate Old Tools: Once the
captureproject is feature-complete,voceand the legacy.photon/toolsscripts can be archived.
- Web UI: A web-based interface for managing workflows and monitoring the production process.
- AI Integration: Explore AI-powered tools for automated video editing, content analysis, and other advanced capabilities.
- Cloud Integration: Support for cloud-based storage and rendering.