Xirong Li

ICMR2020: iCap: Interactive Image Captioning with Predictive Text

Our ICMR’20 paper on interactive image captioning is online.

In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose ABD-Cap, asynchronous bidirectional decoding for image caption completion. With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.

Zhengxiong Jia, Xirong Li: iCap: Interactive Image Captioning with Predictive Text. In: ACM International Conference on Multimedia Retrieval (ICMR), 2020.

AE for the Multimedia Systems journal

I am delighted to serve as an Associate Editor of the Multimedia Systems journal. This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications.

MICCAI2019: Fully Deep Learning for Slit-lamp Photo based Nuclear Cataract Grading

Our MICCAI2019 paper on automated nuclear cataract grading is online.

Age-related cataract is a priority eye disease, with nuclear cataract as its most common type. This paper aims for automated nuclear cataract grading based on slit-lamp photos. Different from previous efforts which rely on traditional feature extraction and grade modeling techniques, we propose in this paper a fully deep learning based solution. Given a slit-lamp photo, we localize its nuclear region by Faster R-CNN, followed by a ResNet-101 based grading model. In order to alleviate the issue of imbalanced data, a simple batch balancing strategy is introduced for improving the training of the grading network. Tested on a clinical dataset of 157 slit-lamp photos from 39 female and 31 male patients, the proposed solution outperforms the state-of-the-art, reducing the mean absolute error from 0.357 to 0.313. In addition, our solution processes a slit-lamp photo in approximately 0.1 second, which is two order faster than the state-of-the-art. With its effectiveness and efficiency, the new solution is promising for automated nuclear cataract grading.

miccai2019-nuclear-cataract-grading

Chaoxi Xu, Xiangjia Zhu, Wenwen He, Yi Lu, Xixi He, Zongjiang Shang, Jun Wu, Keke Zhang, Yinglei Zhang, Xianfang Rong, Zhennan Zhao, Lei Cai, Dayong Ding, Xirong Li: Fully Deep Learning for Slit-lamp Photo based Nuclear Cataract Grading. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019, (early accept).

MMM2019: Four Models for Automatic Recognition of Left and Right Eye in Fundus Images

Our MMM2019 paper on recognizing Left / Right Eye in Fundus Images is online.

Fundus image analysis is crucial for eye condition screening and diagnosis and consequently personalized health management in a long term. This paper targets at left and right eye recognition, a basic module for fundus image analysis. We study how to automatically assign left-eye/right-eye labels to fundus images of posterior pole. For this under-explored task, four models are developed. Two of them are based on optic disc localization, using extremely simple max intensity and more advanced Faster R-CNN, respectively. The other two models require no localization, but perform holistic image classification using classical Local Binary Patterns (LBP) features and fine-tuned ResNet18, respectively. The four models are tested on a real-world set of 1,633 fundus images from 834 subjects. Fine-tuned ResNet-18 has the highest accuracy of 0.9847. Interestingly, the LBP based model, with the trick of left-right contrastive classification, performs closely to the deep model, with an accuracy of 0.9718.

Xin Lai, Xirong Li, Rui Qian, Dayong Ding, Jun Wu, Jieping Xu: Four Models for Automatic Recognition of Left and Right Eye in Fundus Images. the 25th International Conference on MultiMedia Modeling (MMM), 2019.

ACCV2018: Laser Scar Detection in Fundus Images using Convolutional Neural Networks

We are going to present our work on detecting laser scars in color fundus images at the 14th Asian Conference on Computer Vision (ACCV 2018) at Perth, Australia. This is a joint work with Vistel Inc. and Peking Union Medical College Hospital.

In diabetic eye screening programme, a special pathway is designed for those who have received laser photocoagulation treatment. The treatment leaves behind circular or irregular scars in the retina. Laser scar detection in fundus images is thus important for automated DR screening. Despite its importance, the problem is understudied in terms of both datasets and methods. This paper makes the first attempt to detect laser-scar images by deep learning. To that end, we contribute to the community Fundus10K, a large-scale expert-labeled dataset for training and evaluating laser scar detectors. We study in this new context major design choices of state-of-the-art Convolutional Neural Networks including Inception-v3, ResNet and DenseNet. For more effective training we exploit transfer learning that passes on trained weights of ImageNet models to their laser-scar countcerparts. Experiments on the new dataset shows that our best model detects laser-scar images with sensitivity of 0.962, specificity of 0.999, precision of 0.974 and AP of 0.988 and AUC of 0.999. The same model is tested on the public LMD-BAPT test set, obtaining sensitivity of 0.765, specificity of 1, precision of 1, AP of 0.975 and AUC of 0.991, outperforming the state-of-the-art with a large margin. Data is available at https://github.com/li-xirong/fundus10k/

Qijie Wei, Xirong Li, Hao Wang, Dayong Ding, Weihong Yu, Youxin Chen: Laser Scar Detection in Fundus Images using Convolutional Neural Networks. Asian Conference on Computer Vision (ACCV), 2018.

Feature Re-Learning with Data Augmentation for Content-based Video Recommendation

We are going to present our work on content-based video recommendation in the Multimedia Grand Challenge session of the forthcoming ACM Multimedia 2018 Conference at Seoul. Source code will be available shortly at https://github.com/danieljf24/cbvr.

This paper describes our solution for the Hulu Content-based Video Relevance Prediction Challenge. Noting the deficiency of the original features, we propose feature re-learning to improve video relevance prediction. To generate more training instances for supervised learning, we develop two data augmentation strategies, one for frame-level features and the other for video-level features. In addition, late fusion of multiple models is employed to further boost the performance. Evaluation conducted by the organizers shows that our best run outperforms the Hulu baseline, obtaining relative improvements of 26.2% and 30.2% on the TV-shows track and the Movies track, respectively, in terms of recall@100. The results clearly justify the effectiveness of the proposed solution.

Jianfeng Dong, Xirong Li, Chaoxi Xu, Gang Yang, Xun Wang: Feature Re-Learning with Data Augmentation for Content-based Video Recommendation. ACM Multimedia, 2018, (Grand challenge paper).