Zhenxing Mi, Yuxin Wang, Dan Xu
The Hong Kong University of Science and Technology (HKUST)
cut_1.mp4
One4D is a unified framework for 4D generation and reconstruction that can seamlessly transition between 4D generation from a single image, 4D reconstruction from a full video, mixed generation and reconstruction from sparse frames, and 4D generation from a text prompt via Unified Masked Conditioning (UMC). With Decoupled LoRA Control (DLC), which employs two modality-specific LoRA adapters to form decoupled computation branches for RGB frames and pointmaps, connected by lightweight, zero-initialized control links that gradually learn mutual pixel-level consistency, One4D produces high-quality RGB frames and accurate pointmaps across both generation and reconstruction tasks.
Figure 1: The One4D Unified Framework architecture.
- ποΈ Unified Masked Conditioning (UMC): Enables seamlessly transition between 4D generation from a single image, 4D reconstruction from a full video, and mixed generation and reconstruction from sparse frames using a single unified model.
- π§© Decoupled LoRA Control (DLC): Decouples RGB and XYZ computation to minimize interference while maintaining pixel-wise cross-modal control.
Figure 2: Comparison of Decoupled LoRA Control against other architectures.
Generating a consistent 4D scene from a single input image.
cut_2.mp4
cut_6.mp4
cut_5.mp4
cut_4.mp4
cut_3.mp4
Reconstructing the 4D scene given only a few sparse frames.
cut_13.mp4
cut_12.mp4
cut_11.mp4
cut_8.mp4
cut_7.mp4
High-fidelity reconstruction from a full video input.
cut_16.mp4
cut_15.mp4
cut_14.mp4
cut_10.mp4
cut_9.mp4
Generating a consistent 4D scene from a pure text prompt.
cut_19.mp4
cut_20.mp4
If you find our work useful for your research, please consider citing us:
@article{mione4d2025,
title={One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control},
author={Mi, Zhenxing and Wang, Yuxin and Xu, Dan},
journal={arXiv preprint arXiv:2511.18922},
year={2025}
}
