UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

Abstract

Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focus on single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose the first universal CSR framework, dubbed UCIP, with dynamic prompt learning, intending to jointly support the CSR distortions of any compression codecs/modes. Particularly, an efficient dynamic prompt strategy is proposed to mine the content/spatial-aware task-adaptive contextual information for the universal CSR task, using only a small amount of prompts with spatial size 1x1. To simplify contextual information mining, we introduce the novel MLP-like framework backbone for our UCIP by adapting the Active Token Mixer (ATM) to CSR tasks for the first time, where the global information modeling is only taken in horizontal and vertical directions with offset prediction. We also build an all-in-one benchmark dataset for the CSR task by collecting the datasets with the popular 6 diverse traditional and learning-based codecs, including JPEG, HEVC, VVC, HIFIC, etc., resulting in 23 common degradations. Extensive experiments have shown the consistent and excellent performance of our UCIP on universal CSR tasks.

Method

In this work, we propose the first universal framework with dynamic prompt strategy and MLP-like module to tackle the challenge CSR tasks. Benifitting from our dynamic prompt-based offset learning, our UCIP is capable of encoding optimal content-aware contextual information, while maintaining the task-aware adaptability via prompt components. For more information, please refer to our paper.

Framework of the proposed UCIP.

UCSR Dataset Information

We build an all-in-one benchmark dataset for the compressed image super-resolution (CSR) task by collecting the datasets with the popular 6 diverse traditional and learning-based codecs, including traditional codecs: JPEG, HM, VTM; and learning-based codecs: \( C_{\text{PSNR}} \), \( C_{\text{SSIM}} \), HIFIC, resulting in 23 common degradations. We list the detailed quality factor (QF), quantization parameter (QP) and compression mode (Mode) in the following (From left to right: poorer quality -> better quality):

JPEG: QF=10,20,30,40
HM: QP=47,42,37,32
VTM: QP=47,42,37,32
\( C_{\text{PSNR}} \): Mode=1,2,3,4
\( C_{\text{SSIM}} \): Mode=1,2,3,4
HIFIC: Mode='low', 'med', 'high'

We establish our UCSR dataset based on popular high-quality dataset DF2K. Considering the original image as the ground-truth, we generate the compressed low-resolution image with x4 bicubic downsampling and different compression codecs. For evaluation, we follow the image super-resolution (ISR) problems and adopt five common benchmarks: Set5, Set14, BSD100, Urban100 and Manga109. We compress these images with various codecs based on their x4 downsampled version. Our full dataset, including training and testing images, can be downloaded via this link. Notice that, for the ground-truth of DF2K, please download them via DIV2K (800 images) and Flickr2K (2650 images).

BibTeX


      @inproceedings{li2024ucip,
        title={UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt},
        author={Li, Xin and Li, Bingchen and Jin, Yeying and Lan, Cuiling and Zhu, Hanxin and Ren, Yulin and Chen, Zhibo},
        booktitle={European Conference on Computer Vision},
        year={2024},
        organization={Springer}
      }