Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation

Tian, Yuchuan; Han, Jianhong; Chen, Hanting; Xi, Yuanyuan; Zhang, Guoyang; Hu, Jie; Xu, Chao; Wang, Yunhe

Abstract:Due to the unaffordable size and intensive computation costs of low-level vision models, All-in-One models that are designed to address a handful of low-level vision tasks simultaneously have been popular. However, existing All-in-One models are limited in terms of the range of tasks and performance. To overcome these limitations, we propose Instruct-IPT -- an All-in-One Image Processing Transformer that could effectively address manifold image restoration tasks with large inter-task gaps, such as denoising, deblurring, deraining, dehazing, and desnowing. Rather than popular feature adaptation methods, we propose weight modulation that adapts weights to specific tasks. Firstly, we figure out task-sensitive weights via a toy experiment and introduce task-specific biases on top of them. Secondly, we conduct rank analysis for a good compression strategy and perform low-rank decomposition on the biases. Thirdly, we propose synchronous training that updates the task-general backbone model and the task-specific biases simultaneously. In this way, the model is instructed to learn general and task-specific knowledge. Via our simple yet effective method that instructs the IPT to be task experts, Instruct-IPT could better cooperate between tasks with distinct characteristics at humble costs. Further, we propose to maneuver Instruct-IPT with text instructions for better user interfaces. We have conducted experiments on Instruct-IPT to demonstrate the effectiveness of our method on manifold tasks, and we have effectively extended our method to diffusion denoisers as well. The code is available at this https URL.

Comments:	15 pages, 4 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.00676 [cs.CV]
	(or arXiv:2407.00676v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.00676

Computer Science > Computer Vision and Pattern Recognition

Title:Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators