Search | arXiv e-print repository

Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution

Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

Abstract: In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, Sta… ▽ More In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, State Space Models (SSMs) especially Mamba have emerged, capable of modeling long-range dependencies with linear computational complexity. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks, which may help to super-resolve low-resolution medical images in an efficient way. Specifically, we obtain self-priors by perturbing the brightness inpainting of the input image during network training, which can learn detailed texture and brightness information that is beneficial for super-resolution. Furthermore, we combine Mamba with Unet network to mine global features at different levels. We also design an improved 2D-Selective-Scan (ISS2D) module to divide image features into different directional sequences to learn long-range dependencies in multiple directions, and adaptively fuse sequence information to enhance super-resolved feature representation. Both qualitative and quantitative experimental results demonstrate that our approach outperforms current state-of-the-art methods on two public medical datasets: the IXI and fastMRI. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05969 [pdf, other]

Deform-Mamba Network for MRI Super-Resolution

Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

Abstract: In this paper, we propose a new architecture, called Deform-Mamba, for MR image super-resolution. Unlike conventional CNN or Transformer-based super-resolution approaches which encounter challenges related to the local respective field or heavy computational cost, our approach aims to effectively explore the local and global information of images. Specifically, we develop a Deform-Mamba encoder wh… ▽ More In this paper, we propose a new architecture, called Deform-Mamba, for MR image super-resolution. Unlike conventional CNN or Transformer-based super-resolution approaches which encounter challenges related to the local respective field or heavy computational cost, our approach aims to effectively explore the local and global information of images. Specifically, we develop a Deform-Mamba encoder which is composed of two branches, modulated deform block and vision Mamba block. We also design a multi-view context module in the bottleneck layer to explore the multi-view contextual content. Thanks to the extracted features of the encoder, which include content-adaptive local and efficient global information, the vision Mamba decoder finally generates high-quality MR images. Moreover, we introduce a contrastive edge loss to promote the reconstruction of edge and contrast related content. Quantitative and qualitative experimental results indicate that our approach on IXI and fastMRI datasets achieves competitive performance. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.00038 [pdf, other]

JungleGPT: Designing and Optimizing Compound AI Systems for E-Commerce

Authors: Sherry Ruan, Tian Zhao

Abstract: LLMs have significantly advanced the e-commerce industry by powering applications such as personalized recommendations and customer service. However, most current efforts focus solely on monolithic LLMs and fall short in addressing the complexity and scale of real-world e-commerce scenarios. In this work, we present JungleGPT, the first compound AI system tailored for real-world e-commerce applica… ▽ More LLMs have significantly advanced the e-commerce industry by powering applications such as personalized recommendations and customer service. However, most current efforts focus solely on monolithic LLMs and fall short in addressing the complexity and scale of real-world e-commerce scenarios. In this work, we present JungleGPT, the first compound AI system tailored for real-world e-commerce applications. We outline the system's design and the techniques used to optimize its performance for practical use cases, which have proven to reduce inference costs to less than 1% of what they would be with a powerful, monolithic LLM. △ Less

Submitted 28 May, 2024; originally announced July 2024.

arXiv:2406.13721 [pdf, other]

Which One Changes More? A Novel Radial Visualization for State Change Comparison

Authors: Shaolun Ruan, Yong Wang, Qiang Guan

Abstract: It is common to compare state changes of multiple data items and identify which data items have changed more in various applications (e.g., annual GDP growth of different countries and daily increase of new COVID-19 cases in different regions). Grouped bar charts and slope graphs can visualize both state changes and their initial and final states of multiple data items, and are thus widely used fo… ▽ More It is common to compare state changes of multiple data items and identify which data items have changed more in various applications (e.g., annual GDP growth of different countries and daily increase of new COVID-19 cases in different regions). Grouped bar charts and slope graphs can visualize both state changes and their initial and final states of multiple data items, and are thus widely used for state change comparison. But they leverage implicit bar differences or line slopes to indicate state changes, which has been proven less effective for visual comparison. Both visualizations also suffer from visual scalability issues when an increasing number of data items need to be compared. This paper fills the research gap by proposing a novel radial visualization called Intercept Graph to facilitate visual comparison of multiple state changes. It consists of inner and outer axes, and leverages the lengths of line segments intercepted by the inner axis to explicitly encode the state changes. Users can interactively adjust the inner axis to filter large changes of their interest and magnify the difference of relatively-similar state changes, enhancing its visual scalability and comparison accuracy. We extensively evaluate the Intercept Graph in comparison with baseline methods through two usage scenarios, quantitative metric evaluations, and well-designed crowdsourcing user studies with 50 participants. Our results demonstrate the usefulness and effectiveness of the Intercept Graph. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11659 [pdf, other]

Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes

Authors: Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Pierre Vera, Su Ruan

Abstract: Deep learning has gained significant attention in medical image segmentation. However, the limited availability of annotated training data presents a challenge to achieving accurate results. In efforts to overcome this challenge, data augmentation techniques have been proposed. However, the majority of these approaches primarily focus on image generation. For segmentation tasks, providing both ima… ▽ More Deep learning has gained significant attention in medical image segmentation. However, the limited availability of annotated training data presents a challenge to achieving accurate results. In efforts to overcome this challenge, data augmentation techniques have been proposed. However, the majority of these approaches primarily focus on image generation. For segmentation tasks, providing both images and their corresponding target masks is crucial, and the generation of diverse and realistic samples remains a complex task, especially when working with limited training datasets. To this end, we propose a new end-to-end hybrid architecture based on Hamiltonian Variational Autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images. Our method provides an accuracte estimation of the joint distribution of the images and masks, resulting in the generation of realistic medical images with reduced artifacts and off-distribution instances. As generating 3D volumes requires substantial time and memory, our architecture operates on a slice-by-slice basis to segment 3D volumes, capitilizing on the richly augmented dataset. Experiments conducted on two public datasets, BRATS (MRI modality) and HECKTOR (PET modality), demonstrate the efficacy of our proposed method on different medical imaging modalities with limited data. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.05421 [pdf, other]

3D MRI Synthesis with Slice-Based Latent Diffusion Models: Improving Tumor Segmentation Tasks in Data-Scarce Regimes

Authors: Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Pierre Vera, Su Ruan

Abstract: Despite the increasing use of deep learning in medical image segmentation, the limited availability of annotated training data remains a major challenge due to the time-consuming data acquisition and privacy regulations. In the context of segmentation tasks, providing both medical images and their corresponding target masks is essential. However, conventional data augmentation approaches mainly fo… ▽ More Despite the increasing use of deep learning in medical image segmentation, the limited availability of annotated training data remains a major challenge due to the time-consuming data acquisition and privacy regulations. In the context of segmentation tasks, providing both medical images and their corresponding target masks is essential. However, conventional data augmentation approaches mainly focus on image synthesis. In this study, we propose a novel slice-based latent diffusion architecture designed to address the complexities of volumetric data generation in a slice-by-slice fashion. This approach extends the joint distribution modeling of medical images and their associated masks, allowing a simultaneous generation of both under data-scarce regimes. Our approach mitigates the computational complexity and memory expensiveness typically associated with diffusion models. Furthermore, our architecture can be conditioned by tumor characteristics, including size, shape, and relative position, thereby providing a diverse range of tumor variations. Experiments on a segmentation task using the BRATS2022 confirm the effectiveness of the synthesized volumes and masks for data augmentation. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.03710 [pdf, other]

TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

Authors: Jiaxi Hu, Qingsong Wen, Sijie Ruan, Li Liu, Yuxuan Liang

Abstract: Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-sta… ▽ More Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-stationary Transformer but also encompass three key aspects: nested periodicity, absence of periodic distributions, and hysteresis among time variables. In this paper, we begin by validating this theory through wavelet analysis and propose the Transformer-based TwinS model, which consists of three modules to address the non-stationary periodic distributions: Wavelet Convolution, Period-Aware Attention, and Channel-Temporal Mixed MLP. Specifically, The Wavelet Convolution models nested periods by scaling the convolution kernel size like wavelet transform. The Period-Aware Attention guides attention computation by generating period relevance scores through a convolutional sub-network. The Channel-Temporal Mixed MLP captures the overall relationships between time series through channel-time mixing learning. TwinS achieves SOTA performance compared to mainstream TS models, with a maximum improvement in MSE of 25.8\% over PatchTST. △ Less

Submitted 14 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.19802 [pdf, other]

Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Authors: Shuyuan Liu, Jiawei Chen, Shouwei Ruan, Hang Su, Zhaoxia Yin

Abstract: Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information… ▽ More Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information. However, this amalgamation also ushers in new challenges in the pursuit of heightened intelligence. Specifically, attackers can manipulate LLMs to produce irrelevant or even malicious outputs by altering their prompts. Confronted with this challenge, we observe a notable absence of multi-modal datasets essential for comprehensively evaluating the robustness of LLM-based embodied models. Consequently, we construct the Embodied Intelligent Robot Attack Dataset (EIRAD), tailored specifically for robustness evaluation. Additionally, two attack strategies are devised, including untargeted attacks and targeted attacks, to effectively simulate a range of diverse attack scenarios. At the same time, during the attack process, to more accurately ascertain whether our method is successful in attacking the LLM-based embodied model, we devise a new attack success evaluation method utilizing the BLIP2 model. Recognizing the time and cost-intensive nature of the GCG algorithm in attacks, we devise a scheme for prompt suffix initialization based on various target tasks, thus expediting the convergence process. Experimental results demonstrate that our method exhibits a superior attack success rate when targeting LLM-based embodied models, indicating a lower level of decision-level robustness in these models. △ Less

Submitted 16 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.04490 [pdf, other]

Resource-Efficient and Self-Adaptive Quantum Search in a Quantum-Classical Hybrid System

Authors: Zihao Jiang, Zefan Du, Shaolun Ruan, Juntao Chen, Yong Wang, Long Cheng, Rajkumar Buyya, Ying Mao

Abstract: Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to… ▽ More Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to break limits. Major companies like IBM, Google, and Microsoft provide access to noisy intermediate-scale quantum (NISQ) computers. Despite the theoretical promise of Shor's and Grover's algorithms, practical implementation on current quantum devices faces challenges, such as demanding additional resources and a high number of controlled operations. To tackle these challenges and optimize the utilization of limited onboard qubits, we introduce ReSaQuS, a resource-efficient index-value searching system within a quantum-classical hybrid framework. Building on Grover's algorithm, ReSaQuS employs an automatically managed iterative search approach. This method analyzes problem size, filters fewer probable data points, and progressively reduces the dataset with decreasing qubit requirements. Implemented using Qiskit and evaluated through extensive experiments, ReSaQuS has demonstrated a substantial reduction, up to 86.36\% in cumulative qubit consumption and 72.72\% in active periods, reinforcing its potential in optimizing quantum computing application deployment. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00452 [pdf, other]

Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Authors: Jun Shi, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Hong An, Xudong Xue, Bing Yan

Abstract: Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in… ▽ More Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in segmentation tasks. In response, we propose an efficient Predictive Accuracy-based Active Learning (PAAL) method for medical image segmentation, first introducing predictive accuracy to define uncertainty. Specifically, PAAL mainly consists of an Accuracy Predictor (AP) and a Weighted Polling Strategy (WPS). The former is an attached learnable module that can accurately predict the segmentation accuracy of unlabeled samples relative to the target model with the predicted posterior probability. The latter provides an efficient hybrid querying scheme by combining predicted accuracy and feature representation, aiming to ensure the uncertainty and diversity of the acquired samples. Extensive experiment results on multiple datasets demonstrate the superiority of PAAL. PAAL achieves comparable accuracy to fully annotated data while reducing annotation costs by approximately 50% to 80%, showcasing significant potential in clinical applications. The code is available at https://github.com/shijun18/PAAL-MedSeg. △ Less

Submitted 29 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures

arXiv:2404.14073 [pdf, other]

Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning

Authors: Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang, Zhengyang Zhou, Sijie Ruan, Yuxuan Liang

Abstract: Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to de… ▽ More Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to decipher the trajectory representation learning process from a causal perspective. Building upon the SCM, we further present a Trajectory modeling framework (TrajCL) based on Causal Learning, which leverages the backdoor adjustment theory as an intervention tool to eliminate the spurious correlations between geospatial context and trajectories. Extensive experiments on two real-world datasets verify that TrajCL markedly enhances performance in trajectory classification tasks while showcasing superior generalization and interpretability. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: The paper has been accepted by IJCAI 2024

arXiv:2404.12139 [pdf, other]

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Authors: Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei

Abstract: Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while keeping VLPs' origin… ▽ More Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while keeping VLPs' original performance by breaking through two primary obstacles: 1) the scarcity of training data and 2) the suboptimal fine-tuning paradigms. To combat data scarcity, we build the Multi-View Caption (MVCap) dataset -- a comprehensive collection of over four million multi-view image-text pairs across more than 100K objects, providing more potential for VLP models to develop generalizable viewpoint-invariant representations. To address the limitations of existing paradigms in performance trade-offs and training efficiency, we design a novel fine-tuning framework named Omniview-Tuning (OVT). Specifically, OVT introduces a Cross-Viewpoint Alignment objective through a minimax-like optimization strategy, which effectively aligns representations of identical objects from diverse viewpoints without causing overfitting. Additionally, OVT fine-tunes VLP models in a parameter-efficient manner, leading to minimal computational cost. Extensive experiments on various VLP models with different architectures validate that OVT significantly improves the models' resilience to viewpoint shifts and keeps the original performance, establishing a pioneering standard for boosting the viewpoint invariance of VLP models. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 20 pages

arXiv:2404.05363 [pdf, other]

A parameter-free clustering algorithm for missing datasets

Authors: Qi Li, Xianjun Zeng, Shuliang Wang, Wenhao Zhu, Shijie Ruan, Zhimeng Yuan

Abstract: Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world. Existing clustering algorithms for missing datasets first impute the missing values and then perform clustering. However, both the imputation and clustering processes require input parameters. Too many input parameters inevitably increase the difficulty of obtaining accurate clusteri… ▽ More Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world. Existing clustering algorithms for missing datasets first impute the missing values and then perform clustering. However, both the imputation and clustering processes require input parameters. Too many input parameters inevitably increase the difficulty of obtaining accurate clustering results. Although some studies have shown that decision graphs can replace the input parameters of clustering algorithms, current decision graphs require equivalent dimensions among objects and are therefore not suitable for missing datasets. To this end, we propose a Single-Dimensional Clustering algorithm, i.e., SDC. SDC, which removes the imputation process and adapts the decision graph to the missing datasets by splitting dimension and partition intersection fusion, can obtain valid clustering results on the missing datasets without input parameters. Experiments demonstrate that, across three evaluation metrics, SDC outperforms baseline algorithms by at least 13.7%(NMI), 23.8%(ARI), and 8.1%(Purity). △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2312.15276 [pdf, other]

VIOLET: Visual Analytics for Explainable Quantum Neural Networks

Authors: Shaolun Ruan, Zhiding Liang, Qiang Guan, Paul Griffin, Xiaolin Wen, Yanna Lin, Yong Wang

Abstract: With the rapid development of Quantum Machine Learning, quantum neural networks (QNN) have experienced great advancement in the past few years, harnessing the advantages of quantum computing to significantly speed up classical machine learning tasks. Despite their increasing popularity, the quantum neural network is quite counter-intuitive and difficult to understand, due to their unique quantum-s… ▽ More With the rapid development of Quantum Machine Learning, quantum neural networks (QNN) have experienced great advancement in the past few years, harnessing the advantages of quantum computing to significantly speed up classical machine learning tasks. Despite their increasing popularity, the quantum neural network is quite counter-intuitive and difficult to understand, due to their unique quantum-specific layers (e.g., data encoding and measurement) in their architecture. It prevents QNN users and researchers from effectively understanding its inner workings and exploring the model training status. To fill the research gap, we propose VIOLET, a novel visual analytics approach to improve the explainability of quantum neural networks. Guided by the design requirements distilled from the interviews with domain experts and the literature survey, we developed three visualization views: the Encoder View unveils the process of converting classical input data into quantum states, the Ansatz View reveals the temporal evolution of quantum states in the training process, and the Feature View displays the features a QNN has learned after the training process. Two novel visual designs, i.e., satellite chart and augmented heatmap, are proposed to visually explain the variational parameters and quantum circuit measurements respectively. We evaluate VIOLET through two case studies and in-depth interviews with 12 domain experts. The results demonstrate the effectiveness and usability of VIOLET in helping QNN users and developers intuitively understand and explore quantum neural networks △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.09558 [pdf, other]

Towards Transferable Targeted 3D Adversarial Attack in the Physical World

Authors: Yao Huang, Yinpeng Dong, Shouwei Ruan, Xiao Yang, Hang Su, Xingxing Wei

Abstract: Compared with transferable untargeted attacks, transferable targeted adversarial attacks could specify the misclassification categories of adversarial samples, posing a greater threat to security-critical tasks. In the meanwhile, 3D adversarial samples, due to their potential of multi-view robustness, can more comprehensively identify weaknesses in existing deep learning systems, possessing great… ▽ More Compared with transferable untargeted attacks, transferable targeted adversarial attacks could specify the misclassification categories of adversarial samples, posing a greater threat to security-critical tasks. In the meanwhile, 3D adversarial samples, due to their potential of multi-view robustness, can more comprehensively identify weaknesses in existing deep learning systems, possessing great application value. However, the field of transferable targeted 3D adversarial attacks remains vacant. The goal of this work is to develop a more effective technique that could generate transferable targeted 3D adversarial examples, filling the gap in this field. To achieve this goal, we design a novel framework named TT3D that could rapidly reconstruct from few multi-view images into Transferable Targeted 3D textured meshes. While existing mesh-based texture optimization methods compute gradients in the high-dimensional mesh space and easily fall into local optima, leading to unsatisfactory transferability and distinct distortions, TT3D innovatively performs dual optimization towards both feature grid and Multi-layer Perceptron (MLP) parameters in the grid-based NeRF space, which significantly enhances black-box transferability while enjoying naturalness. Experimental results show that TT3D not only exhibits superior cross-model transferability but also maintains considerable adaptability across different renders and vision tasks. More importantly, we produce 3D adversarial examples with 3D printing techniques in the real world and verify their robust performance under various scenarios. △ Less

Submitted 10 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024

arXiv:2311.10472 [pdf, other]

doi 10.1007/978-981-97-1335-6_3

End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks

Authors: Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Pierre Vera, Su Ruan

Abstract: Despite the increasing use of deep learning in medical image segmentation, acquiring sufficient training data remains a challenge in the medical field. In response, data augmentation techniques have been proposed; however, the generation of diverse and realistic medical images and their corresponding masks remains a difficult task, especially when working with insufficient training sets. To addres… ▽ More Despite the increasing use of deep learning in medical image segmentation, acquiring sufficient training data remains a challenge in the medical field. In response, data augmentation techniques have been proposed; however, the generation of diverse and realistic medical images and their corresponding masks remains a difficult task, especially when working with insufficient training sets. To address these limitations, we present an end-to-end architecture based on the Hamiltonian Variational Autoencoder (HVAE). This approach yields an improved posterior distribution approximation compared to traditional Variational Autoencoders (VAE), resulting in higher image generation quality. Our method outperforms generative adversarial architectures under data-scarce conditions, showcasing enhancements in image quality and precise tumor mask synthesis. We conduct experiments on two publicly available datasets, MICCAI's Brain Tumor Segmentation Challenge (BRATS), and Head and Neck Tumor Segmentation Challenge (HECKTOR), demonstrating the effectiveness of our method on different medical imaging modalities. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.07980 [pdf, other]

QuantumEyes: Towards Better Interpretability of Quantum Circuits

Authors: Shaolun Ruan, Qiang Guan, Paul Griffin, Ying Mao, Yong Wang

Abstract: Quantum computing offers significant speedup compared to classical computing, which has led to a growing interest among users in learning and applying quantum computing across various applications. However, quantum circuits, which are fundamental for implementing quantum algorithms, can be challenging for users to understand due to their underlying logic, such as the temporal evolution of quantum… ▽ More Quantum computing offers significant speedup compared to classical computing, which has led to a growing interest among users in learning and applying quantum computing across various applications. However, quantum circuits, which are fundamental for implementing quantum algorithms, can be challenging for users to understand due to their underlying logic, such as the temporal evolution of quantum states and the effect of quantum amplitudes on the probability of basis quantum states. To fill this research gap, we propose QuantumEyes, an interactive visual analytics system to enhance the interpretability of quantum circuits through both global and local levels. For the global-level analysis, we present three coupled visualizations to delineate the changes of quantum states and the underlying reasons: a Probability Summary View to overview the probability evolution of quantum states; a State Evolution View to enable an in-depth analysis of the influence of quantum gates on the quantum states; a Gate Explanation View to show the individual qubit states and facilitate a better understanding of the effect of quantum gates. For the local-level analysis, we design a novel geometrical visualization Dandelion Chart to explicitly reveal how the quantum amplitudes affect the probability of the quantum state. We thoroughly evaluated QuantumEyes as well as the novel QuantumEyes integrated into it through two case studies on different types of quantum algorithms and in-depth expert interviews with 12 domain experts. The results demonstrate the effectiveness and usability of our approach in enhancing the interpretability of quantum circuits. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.02565 [pdf, other]

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Authors: Qianxiong Xu, Cheng Long, Ziyue Li, Sijie Ruan, Rui Zhao, Zhishuai Li

Abstract: Sensors are commonly deployed to perceive the environment. However, due to the high cost, sensors are usually sparsely deployed. Kriging is the tailored task to infer the unobserved nodes (without sensors) using the observed source nodes (with sensors). The essence of kriging task is transferability. Recently, several inductive spatio-temporal kriging methods have been proposed based on graph neur… ▽ More Sensors are commonly deployed to perceive the environment. However, due to the high cost, sensors are usually sparsely deployed. Kriging is the tailored task to infer the unobserved nodes (without sensors) using the observed source nodes (with sensors). The essence of kriging task is transferability. Recently, several inductive spatio-temporal kriging methods have been proposed based on graph neural networks, being trained based on a graph built on top of observed nodes via pretext tasks such as masking nodes out and reconstructing them. However, the graph in training is inevitably much sparser than the graph in inference that includes all the observed and unobserved nodes. The learned pattern cannot be well generalized for inference, denoted as graph gap. To address this issue, we first present a novel Increment training strategy: instead of masking nodes (and reconstructing them), we add virtual nodes into the training graph so as to mitigate the graph gap issue naturally. Nevertheless, the empty-shell virtual nodes without labels could have bad-learned features and lack supervision signals. To solve these issues, we pair each virtual node with its most similar observed node and fuse their features together; to enhance the supervision signal, we construct reliable pseudo labels for virtual nodes. As a result, the learned pattern of virtual nodes could be safely transferred to real unobserved nodes for reliable kriging. We name our new Kriging model with Increment Training Strategy as KITS. Extensive experiments demonstrate that KITS consistently outperforms existing kriging methods by large margins, e.g., the improvement over MAE score could be as high as 18.33%. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.08523 [pdf, other]

LLM-augmented Preference Learning from Natural Language

Authors: Inwon Kang, Sikai Ruan, Tyler Ho, Jui-Chien Lin, Farhad Mohsin, Oshani Seneviratne, Lirong Xia

Abstract: Finding preferences expressed in natural language is an important but challenging task. State-of-the-art(SotA) methods leverage transformer-based models such as BERT, RoBERTa, etc. and graph neural architectures such as graph attention networks. Since Large Language Models (LLMs) are equipped to deal with larger context lengths and have much larger model sizes than the transformer-based model, we… ▽ More Finding preferences expressed in natural language is an important but challenging task. State-of-the-art(SotA) methods leverage transformer-based models such as BERT, RoBERTa, etc. and graph neural architectures such as graph attention networks. Since Large Language Models (LLMs) are equipped to deal with larger context lengths and have much larger model sizes than the transformer-based model, we investigate their ability to classify comparative text directly. This work aims to serve as a first step towards using LLMs for the CPC task. We design and conduct a set of experiments that format the classification task into an input prompt for the LLM and a methodology to get a fixed-format response that can be automatically evaluated. Comparing performances with existing methods, we see that pre-trained LLMs are able to outperform the previous SotA models with no fine-tuning involved. Our results show that the LLMs can consistently outperform the SotA when the target text is large -- i.e. composed of multiple sentences --, and are still comparable to the SotA performance in shorter text. We also find that few-shot learning yields better performance than zero-shot learning. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.06873 [pdf, other]

A review of uncertainty quantification in medical image analysis: probabilistic and non-probabilistic methods

Authors: Ling Huang, Su Ruan, Yucheng Xing, Mengling Feng

Abstract: The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods hav… ▽ More The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods have been proposed as a potential solution to quantify the reliability of machine learning models and thus increase the interpretability and acceptability of the result. In this review, we offer a comprehensive overview of prevailing methods proposed to quantify uncertainty inherent in machine learning models developed for various medical image tasks. Contrary to earlier reviews that exclusively focused on probabilistic methods, this review also explores non-probabilistic approaches, thereby furnishing a more holistic survey of research pertaining to uncertainty quantification for machine learning models. Analysis of medical images with the summary and discussion on medical applications and the corresponding uncertainty evaluation protocols are presented, which focus on the specific challenges of uncertainty in medical image analysis. We also highlight some potential future research work at the end. Generally, this review aims to allow researchers from both clinical and technical backgrounds to gain a quick and yet in-depth understanding of the research in uncertainty quantification for medical image analysis machine learning models. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2210.03736 by other authors

arXiv:2309.05919 [pdf, other]

Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation

Authors: Ling Huang, Su Ruan, Pierre Decazes, Thierry Denoeux

Abstract: Single-modality medical images generally do not contain enough information to reach an accurate and reliable diagnosis. For this reason, physicians generally diagnose diseases based on multimodal medical images such as, e.g., PET/CT. The effective fusion of multimodal information is essential to reach a reliable decision and explain how the decision is made as well. In this paper, we propose a fus… ▽ More Single-modality medical images generally do not contain enough information to reach an accurate and reliable diagnosis. For this reason, physicians generally diagnose diseases based on multimodal medical images such as, e.g., PET/CT. The effective fusion of multimodal information is essential to reach a reliable decision and explain how the decision is made as well. In this paper, we propose a fusion framework for multimodal medical image segmentation based on deep learning and the Dempster-Shafer theory of evidence. In this framework, the reliability of each single modality image when segmenting different objects is taken into account by a contextual discounting operation. The discounted pieces of evidence from each modality are then combined by Dempster's rule to reach a final decision. Experimental results with a PET-CT dataset with lymphomas and a multi-MRI dataset with brain tumors show that our method outperforms the state-of-the-art methods in accuracy and reliability. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.03926 [pdf, other]

Large-Scale Automatic Audiobook Creation

Authors: Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer

Abstract: An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release… ▽ More An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2307.13125 [pdf, other]

doi 10.3390/jimaging9040081

Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review

Authors: Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Su Ruan

Abstract: Deep learning has become a popular tool for medical image analysis, but the limited availability of training data remains a major challenge, particularly in the medical field where data acquisition can be costly and subject to privacy regulations. Data augmentation techniques offer a solution by artificially increasing the number of training samples, but these techniques often produce limited and… ▽ More Deep learning has become a popular tool for medical image analysis, but the limited availability of training data remains a major challenge, particularly in the medical field where data acquisition can be costly and subject to privacy regulations. Data augmentation techniques offer a solution by artificially increasing the number of training samples, but these techniques often produce limited and unconvincing results. To address this issue, a growing number of studies have proposed the use of deep generative models to generate more realistic and diverse data that conform to the true distribution of the data. In this review, we focus on three types of deep generative models for medical image augmentation: variational autoencoders, generative adversarial networks, and diffusion models. We provide an overview of the current state of the art in each of these models and discuss their potential for use in different downstream tasks in medical imaging, including classification, segmentation, and cross-modal translation. We also evaluate the strengths and limitations of each model and suggest directions for future research in this field. Our goal is to provide a comprehensive review about the use of deep generative models for medical image augmentation and to highlight the potential of these models for improving the performance of deep learning algorithms in medical image analysis. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.11528 [pdf, other]

Improving Viewpoint Robustness for Visual Recognition via Adversarial Training

Authors: Shouwei Ruan, Yinpeng Dong, Hang Su, Jianteng Peng, Ning Chen, Xingxing Wei

Abstract: Viewpoint invariance remains challenging for visual recognition in the 3D world, as altering the viewing directions can significantly impact predictions for the same object. While substantial efforts have been dedicated to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. Motivated by the success of adversarial training in enhanci… ▽ More Viewpoint invariance remains challenging for visual recognition in the 3D world, as altering the viewing directions can significantly impact predictions for the same object. While substantial efforts have been dedicated to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. Motivated by the success of adversarial training in enhancing model robustness, we propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers. Regarding viewpoint transformation as an attack, we formulate VIAT as a minimax optimization problem, where the inner maximization characterizes diverse adversarial viewpoints by learning a Gaussian mixture distribution based on the proposed attack method GMVFool. The outer minimization obtains a viewpoint-invariant classifier by minimizing the expected loss over the worst-case viewpoint distributions that can share the same one for different objects within the same category. Based on GMVFool, we contribute a large-scale dataset called ImageNet-V+ to benchmark viewpoint robustness. Experimental results show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool. Furthermore, we propose ViewRS, a certified viewpoint robustness method that provides a certified radius and accuracy to demonstrate the effectiveness of VIAT from the theoretical perspective. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: 14 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2307.10235

arXiv:2307.10235 [pdf, other]

Towards Viewpoint-Invariant Visual Recognition via Adversarial Training

Authors: Shouwei Ruan, Yinpeng Dong, Hang Su, Jianteng Peng, Ning Chen, Xingxing Wei

Abstract: Visual recognition models are not invariant to viewpoint changes in the 3D world, as different viewing directions can dramatically affect the predictions given the same object. Although many efforts have been devoted to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. As most models process images in the perspective view, it is c… ▽ More Visual recognition models are not invariant to viewpoint changes in the 3D world, as different viewing directions can dramatically affect the predictions given the same object. Although many efforts have been devoted to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. As most models process images in the perspective view, it is challenging to impose invariance to 3D viewpoint changes based only on 2D inputs. Motivated by the success of adversarial training in promoting model robustness, we propose Viewpoint-Invariant Adversarial Training (VIAT) to improve viewpoint robustness of common image classifiers. By regarding viewpoint transformation as an attack, VIAT is formulated as a minimax optimization problem, where the inner maximization characterizes diverse adversarial viewpoints by learning a Gaussian mixture distribution based on a new attack GMVFool, while the outer minimization trains a viewpoint-invariant classifier by minimizing the expected loss over the worst-case adversarial viewpoint distributions. To further improve the generalization performance, a distribution sharing strategy is introduced leveraging the transferability of adversarial viewpoints across objects. Experiments validate the effectiveness of VIAT in improving the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023

arXiv:2307.01486 [pdf, other]

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Authors: Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue

Abstract: Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the… ▽ More Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 11 pages, 2 figures. This paper has been accepted by Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2023

arXiv:2306.16131 [pdf, other]

Distributional Modeling for Location-Aware Adversarial Patches

Authors: Xingxing Wei, Shouwei Ruan, Yinpeng Dong, Hang Su

Abstract: Adversarial patch is one of the important forms of performing adversarial attacks in the physical world. To improve the naturalness and aggressiveness of existing adversarial patches, location-aware patches are proposed, where the patch's location on the target object is integrated into the optimization process to perform attacks. Although it is effective, efficiently finding the optimal location… ▽ More Adversarial patch is one of the important forms of performing adversarial attacks in the physical world. To improve the naturalness and aggressiveness of existing adversarial patches, location-aware patches are proposed, where the patch's location on the target object is integrated into the optimization process to perform attacks. Although it is effective, efficiently finding the optimal location for placing the patches is challenging, especially under the black-box attack settings. In this paper, we propose the Distribution-Optimized Adversarial Patch (DOPatch), a novel method that optimizes a multimodal distribution of adversarial locations instead of individual ones. DOPatch has several benefits: Firstly, we find that the locations' distributions across different models are pretty similar, and thus we can achieve efficient query-based attacks to unseen models using a distributional prior optimized on a surrogate model. Secondly, DOPatch can generate diverse adversarial samples by characterizing the distribution of adversarial locations. Thus we can improve the model's robustness to location-aware patches via carefully designed Distributional-Modeling Adversarial Training (DOP-DMAT). We evaluate DOPatch on various face recognition and image recognition tasks and demonstrate its superiority and efficiency over existing methods. We also conduct extensive ablation studies and analyses to validate the effectiveness of our method and provide insights into the distribution of adversarial locations. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.14646 [pdf, other]

Multi-View Attention Learning for Residual Disease Prediction of Ovarian Cancer

Authors: Xiangneng Gao, Shulan Ruan, Jun Shi, Guoqing Hu, Wei Wei

Abstract: In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the im… ▽ More In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the importance of 3D image information of disease, which might brings a limited performance for residual disease prediction, especially in small-scale datasets. To this end, in this paper, we propose a novel Multi-View Attention Learning (MuVAL) method for residual disease prediction, which focuses on the comprehensive learning of 3D Computed Tomography (CT) images in a multi-view manner. Specifically, we first obtain multi-view of 3D CT images from transverse, coronal and sagittal views. To better represent the image features in a multi-view manner, we further leverage attention mechanism to help find the more relevant slices in each view. Extensive experiments on a dataset of 111 patients show that our method outperforms existing deep-learning methods. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.11448 [pdf, other]

Prepare the Chair for the Bear! Robot Imagination of Sitting Affordance to Reorient Previously Unseen Chairs

Authors: Xin Meng, Hongtao Wu, Sipu Ruan, Gregory S. Chirikjian

Abstract: In this letter, a paradigm for the classification and manipulation of previously unseen objects is established and demonstrated through a real example of chairs. We present a novel robot manipulation method, guided by the understanding of object stability, perceptibility, and affordance, which allows the robot to prepare previously unseen and randomly oriented chairs for a teddy bear to sit on. Sp… ▽ More In this letter, a paradigm for the classification and manipulation of previously unseen objects is established and demonstrated through a real example of chairs. We present a novel robot manipulation method, guided by the understanding of object stability, perceptibility, and affordance, which allows the robot to prepare previously unseen and randomly oriented chairs for a teddy bear to sit on. Specifically, the robot encounters an unknown object and first reconstructs a complete 3D model from perceptual data via active and autonomous manipulation. By inserting this model into a physical simulator (i.e., the robot's "imagination"), the robot assesses whether the object is a chair and determines how to reorient it properly to be used, i.e., how to reorient it to an upright and accessible pose. If the object is classified as a chair, the robot reorients the object to this pose and seats the teddy bear onto the chair. The teddy bear is a proxy for an elderly person, hospital patient, or child. Experiment results show that our method achieves a high success rate on the real robot task of chair preparation. Also, it outperforms several baseline methods on the task of upright pose prediction for chairs. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.09124 [pdf, other]

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

Authors: Caixin Kang, Yinpeng Dong, Zhengyi Wang, Shouwei Ruan, Yubo Chen, Hang Su, Xingxing Wei

Abstract: Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications. This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks. Central to our approach is… ▽ More Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications. This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks. Central to our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon, which empowers the diffusion model to detect and localize adversarial patches through the analysis of distributional discrepancies. DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework, utilizing their close interaction to enhance defense efficacy. Moreover, DIFFender utilizes vision-language pre-training coupled with an efficient few-shot prompt-tuning algorithm, which streamlines the adaptation of the pre-trained diffusion model to defense tasks, thus eliminating the need for extensive retraining. Our comprehensive evaluation spans image classification and face recognition tasks, extending to real-world scenarios, where DIFFender shows good robustness against adversarial attacks. The versatility and generalizability of DIFFender are evident across a variety of settings, classifiers, and attack methodologies, marking an advancement in adversarial patch defense strategies. △ Less

Submitted 17 July, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.08817 [pdf, other]

Description-Enhanced Label Embedding Contrastive Learning for Text Classification

Authors: Kun Zhang, Le Wu, Guangyi Lv, Enhong Chen, Shulan Ruan, Jing Liu, Zhiqiang Zhang, Jun Zhou, Meng Wang

Abstract: Text Classification is one of the fundamental tasks in natural language processing, which requires an agent to determine the most appropriate category for input sentences. Recently, deep neural networks have achieved impressive performance in this area, especially Pre-trained Language Models (PLMs). Usually, these methods concentrate on input sentences and corresponding semantic embedding generati… ▽ More Text Classification is one of the fundamental tasks in natural language processing, which requires an agent to determine the most appropriate category for input sentences. Recently, deep neural networks have achieved impressive performance in this area, especially Pre-trained Language Models (PLMs). Usually, these methods concentrate on input sentences and corresponding semantic embedding generation. However, for another essential component: labels, most existing works either treat them as meaningless one-hot vectors or use vanilla embedding methods to learn label representations along with model training, underestimating the semantic information and guidance that these labels reveal. To alleviate this problem and better exploit label information, in this paper, we employ Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task for label utilization from a one-hot manner perspective. Then, we propose a novel Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. Meanwhile, triplet loss is employed to enhance the analysis of differences and connections among labels. Moreover, considering that one-hot usage is still short of exploiting label information, we incorporate external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning and extend R2-Net to a novel Description-Enhanced Label Embedding network (DELE) from a label embedding perspective. ... △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: This paper has been accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2305.15761 [pdf, other]

PRIMP: PRobabilistically-Informed Motion Primitives for Efficient Affordance Learning from Demonstration

Authors: Sipu Ruan, Weixiao Liu, Xiaoli Wang, Xin Meng, Gregory S. Chirikjian

Abstract: This paper proposes a learning-from-demonstration method using probability densities on the workspaces of robot manipulators. The method, named "PRobabilistically-Informed Motion Primitives (PRIMP)", learns the probability distribution of the end effector trajectories in the 6D workspace that includes both positions and orientations. It is able to adapt to new situations such as novel via poses wi… ▽ More This paper proposes a learning-from-demonstration method using probability densities on the workspaces of robot manipulators. The method, named "PRobabilistically-Informed Motion Primitives (PRIMP)", learns the probability distribution of the end effector trajectories in the 6D workspace that includes both positions and orientations. It is able to adapt to new situations such as novel via poses with uncertainty and a change of viewing frame. The method itself is robot-agnostic, in which the learned distribution can be transferred to another robot with the adaptation to its workspace density. The learned trajectory distribution is then used to guide an optimization-based motion planning algorithm to further help the robot avoid novel obstacles that are unseen during the demonstration process. The proposed methods are evaluated by several sets of benchmark experiments. PRIMP runs more than 5 times faster while generalizing trajectories more than twice as close to both the demonstrations and novel desired poses. It is then combined with our robot imagination method that learns object affordances, illustrating the applicability of PRIMP to learn tool use through physical experiments. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 17 pages, 19 figures

arXiv:2304.13725 [pdf, other]

doi 10.1016/j.compmedimag.2023.102218

Prediction of brain tumor recurrence location based on multi-modal fusion and nonlinear correlation learning

Authors: Tongxue Zhou, Alexandra Noeuveglise, Romain Modzelewski, Fethi Ghazouani, Sébastien Thureau, Maxime Fontanilles, Su Ruan

Abstract: Brain tumor is one of the leading causes of cancer death. The high-grade brain tumors are easier to recurrent even after standard treatment. Therefore, developing a method to predict brain tumor recurrence location plays an important role in the treatment planning and it can potentially prolong patient's survival time. There is still little work to deal with this issue. In this paper, we present a… ▽ More Brain tumor is one of the leading causes of cancer death. The high-grade brain tumors are easier to recurrent even after standard treatment. Therefore, developing a method to predict brain tumor recurrence location plays an important role in the treatment planning and it can potentially prolong patient's survival time. There is still little work to deal with this issue. In this paper, we present a deep learning-based brain tumor recurrence location prediction network. Since the dataset is usually small, we propose to use transfer learning to improve the prediction. We first train a multi-modal brain tumor segmentation network on the public dataset BraTS 2021. Then, the pre-trained encoder is transferred to our private dataset for extracting the rich semantic features. Following that, a multi-scale multi-channel feature fusion model and a nonlinear correlation learning module are developed to learn the effective features. The correlation between multi-channel features is modeled by a nonlinear equation. To measure the similarity between the distributions of original features of one modality and the estimated correlated features of another modality, we propose to use Kullback-Leibler divergence. Based on this divergence, a correlation loss function is designed to maximize the similarity between the two feature distributions. Finally, two decoders are constructed to jointly segment the present brain tumor and predict its future tumor recurrence location. To the best of our knowledge, this is the first work that can segment the present tumor and at the same time predict future tumor recurrence location, making the treatment planning more efficient and precise. The experimental results demonstrated the effectiveness of our proposed method to predict the brain tumor recurrence location from the limited dataset. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 23 pages, 4 figures

Journal ref: Computerized Medical Imaging and Graphics, 2023

arXiv:2304.04933 [pdf, other]

Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task

Authors: Sherry Ruan, Allen Nie, William Steenbergen, Jiayu He, JQ Zhang, Meng Guo, Yao Liu, Kyle Dang Nguyen, Catherine Y Wang, Rui Ying, James A Landay, Emma Brunskill

Abstract: Resource limitations make it hard to provide all students with one of the most effective educational interventions: personalized instruction. Reinforcement learning could be a key tool to reduce the development cost and improve the effectiveness of intelligent tutoring software that aims to provide the right support, at the right time, to a student. Here we illustrate that deep reinforcement learn… ▽ More Resource limitations make it hard to provide all students with one of the most effective educational interventions: personalized instruction. Reinforcement learning could be a key tool to reduce the development cost and improve the effectiveness of intelligent tutoring software that aims to provide the right support, at the right time, to a student. Here we illustrate that deep reinforcement learning can be used to provide adaptive pedagogical support to students learning about the concept of volume in a narrative storyline software. Using explainable artificial intelligence tools, we extracted interpretable insights about the pedagogical policy learned and demonstrated that the resulting policy had similar performance in a different student population. Most importantly, in both studies, the reinforcement-learning narrative system had the largest benefit for those students with the lowest initial pretest scores, suggesting the opportunity for AI to adapt and provide support for those most in need. △ Less

Submitted 13 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: 23 pages. Under review

arXiv:2303.13190 [pdf, other]

Marching-Primitives: Shape Abstraction from Signed Distance Function

Authors: Weixiao Liu, Yuwei Wu, Sipu Ruan, Gregory S. Chirikjian

Abstract: Representing complex objects with basic geometric primitives has long been a topic in computer vision. Primitive-based representations have the merits of compactness and computational efficiency in higher-level tasks such as physics simulation, collision checking, and robotic manipulation. Unlike previous works which extract polygonal meshes from a signed distance function (SDF), in this paper, we… ▽ More Representing complex objects with basic geometric primitives has long been a topic in computer vision. Primitive-based representations have the merits of compactness and computational efficiency in higher-level tasks such as physics simulation, collision checking, and robotic manipulation. Unlike previous works which extract polygonal meshes from a signed distance function (SDF), in this paper, we present a novel method, named Marching-Primitives, to obtain a primitive-based abstraction directly from an SDF. Our method grows geometric primitives (such as superquadrics) iteratively by analyzing the connectivity of voxels while marching at different levels of signed distance. For each valid connected volume of interest, we march on the scope of voxels from which a primitive is able to be extracted in a probabilistic sense and simultaneously solve for the parameters of the primitive to capture the underlying local geometry. We evaluate the performance of our method on both synthetic and real-world datasets. The results show that the proposed method outperforms the state-of-the-art in terms of accuracy, and is directly generalizable among different categories and scales. The code is open-sourced at https://github.com/ChirikjianLab/Marching-Primitives.git. △ Less

Submitted 5 June, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR2023 Highlight

arXiv:2303.08366 [pdf, other]

VENUS: A Geometrical Representation for Quantum State Visualization

Authors: Shaolun Ruan, Ribo Yuan, Qiang Guan, Yanna Lin, Ying Mao, Weiwen Jiang, Zhepeng Wang, Wei Xu, Yong Wang

Abstract: Visualizations have played a crucial role in helping quantum computing users explore quantum states in various quantum computing applications. Among them, Bloch Sphere is the widely-used visualization for showing quantum states, which leverages angles to represent quantum amplitudes. However, it cannot support the visualization of quantum entanglement and superposition, the two essential propertie… ▽ More Visualizations have played a crucial role in helping quantum computing users explore quantum states in various quantum computing applications. Among them, Bloch Sphere is the widely-used visualization for showing quantum states, which leverages angles to represent quantum amplitudes. However, it cannot support the visualization of quantum entanglement and superposition, the two essential properties of quantum computing. To address this issue, we propose VENUS, a novel visualization for quantum state representation. By explicitly correlating 2D geometric shapes based on the math foundation of quantum computing characteristics, VENUS effectively represents quantum amplitudes of both the single qubit and two qubits for quantum entanglement. Also, we use multiple coordinated semicircles to naturally encode probability distribution, making the quantum superposition intuitive to analyze. We conducted two well-designed case studies and an in-depth expert interview to evaluate the usefulness and effectiveness of VENUS. The result shows that VENUS can effectively facilitate the exploration of quantum states for the single qubit and two qubits. △ Less

Submitted 23 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

arXiv:2302.07547 [pdf, other]

Multimodal N-of-1 trials: A Novel Personalized Healthcare Design

Authors: Jingjing Fu, Shuheng Liu, Siqi Du, Siqiao Ruan, Xuliang Guo, Weiwei Pan, Abhishek Sharma, Stefan Konigorski

Abstract: N-of-1 trials aim to estimate treatment effects on the individual level and can be applied to personalize a wide range of physical and digital interventions in mHealth. In this study, we propose and apply a framework for multimodal N-of-1 trials in order to allow the inclusion of health outcomes assessed through images, audio or videos. We illustrate the framework in a series of N-of-1 trials that… ▽ More N-of-1 trials aim to estimate treatment effects on the individual level and can be applied to personalize a wide range of physical and digital interventions in mHealth. In this study, we propose and apply a framework for multimodal N-of-1 trials in order to allow the inclusion of health outcomes assessed through images, audio or videos. We illustrate the framework in a series of N-of-1 trials that investigate the effect of acne creams on acne severity assessed through pictures. For the analysis, we compare an expert-based manual labelling approach with different deep learning-based pipelines where in a first step, we train and fine-tune convolutional neural networks (CNN) on the images. Then, we use a linear mixed model on the scores obtained in the first step in order to test the effectiveness of the treatment. The results show that the CNN-based test on the images provides a similar conclusion as tests based on manual expert ratings of the images, and identifies a treatment effect in one individual. This illustrates that multimodal N-of-1 trials can provide a powerful way to identify individual treatment effects and can enable large-scale studies of a large variety of health outcomes that can be actively and passively assessed using technological advances in order to personalized health interventions. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2210.03895 [pdf, other]

ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints

Authors: Yinpeng Dong, Shouwei Ruan, Hang Su, Caixin Kang, Xingxing Wei, Jun Zhu

Abstract: Recent studies have demonstrated that visual recognition models lack robustness to distribution shift. However, current work mainly considers model robustness to 2D image transformations, leaving viewpoint changes in the 3D world less explored. In general, viewpoint changes are prevalent in various real-world applications (e.g., autonomous driving), making it imperative to evaluate viewpoint robus… ▽ More Recent studies have demonstrated that visual recognition models lack robustness to distribution shift. However, current work mainly considers model robustness to 2D image transformations, leaving viewpoint changes in the 3D world less explored. In general, viewpoint changes are prevalent in various real-world applications (e.g., autonomous driving), making it imperative to evaluate viewpoint robustness. In this paper, we propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models. By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints under an entropic regularizer, which helps to handle the fluctuations of the real camera pose and mitigate the reality gap between the real objects and their neural representations. Experiments validate that the common image classifiers are extremely vulnerable to the generated adversarial viewpoints, which also exhibit high cross-model transferability. Based on ViewFool, we introduce ImageNet-V, a new out-of-distribution dataset for benchmarking viewpoint robustness of image classifiers. Evaluation results on 40 classifiers with diverse architectures, objective functions, and data augmentations reveal a significant drop in model performance when tested on ImageNet-V, which provides a possibility to leverage ViewFool as an effective data augmentation strategy to improve viewpoint robustness. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2207.14135 [pdf, other]

VACSEN: A Visualization Approach for Noise Awareness in Quantum Computing

Authors: Shaolun Ruan, Yong Wang, Weiwen Jiang, Ying Mao, Qiang Guan

Abstract: Quantum computing has attracted considerable public attention due to its exponential speedup over classical computing. Despite its advantages, today's quantum computers intrinsically suffer from noise and are error-prone. To guarantee the high fidelity of the execution result of a quantum algorithm, it is crucial to inform users of the noises of the used quantum computer and the compiled physical… ▽ More Quantum computing has attracted considerable public attention due to its exponential speedup over classical computing. Despite its advantages, today's quantum computers intrinsically suffer from noise and are error-prone. To guarantee the high fidelity of the execution result of a quantum algorithm, it is crucial to inform users of the noises of the used quantum computer and the compiled physical circuits. However, an intuitive and systematic way to make users aware of the quantum computing noise is still missing. In this paper, we fill the gap by proposing a novel visualization approach to achieve noise-aware quantum computing. It provides a holistic picture of the noise of quantum computing through multiple interactively coordinated views: a Computer Evolution View with a circuit-like design overviews the temporal evolution of the noises of different quantum computers, a Circuit Filtering View facilitates quick filtering of multiple compiled physical circuits for the same quantum algorithm, and a Circuit Comparison View with a coupled bar chart enables detailed comparison of the filtered compiled circuits. We extensively evaluate the performance of VACSEN through two case studies on quantum algorithms of different scales and an in-depth interviews with 12 quantum computing users. The results demonstrate the effectiveness and usability of VACSEN in achieving noise-aware quantum computing. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2206.11739 [pdf, other]

Evidence fusion with contextual discounting for multi-modality medical image segmentation

Authors: Ling Huang, Thierry Denoeux, Pierre Vera, Su Ruan

Abstract: As information sources are usually imperfect, it is necessary to take into account their reliability in multi-source information fusion tasks. In this paper, we propose a new deep framework allowing us to merge multi-MR image segmentation results using the formalism of Dempster-Shafer theory while taking into account the reliability of different modalities relative to different classes. The framew… ▽ More As information sources are usually imperfect, it is necessary to take into account their reliability in multi-source information fusion tasks. In this paper, we propose a new deep framework allowing us to merge multi-MR image segmentation results using the formalism of Dempster-Shafer theory while taking into account the reliability of different modalities relative to different classes. The framework is composed of an encoder-decoder feature extraction module, an evidential segmentation module that computes a belief function at each voxel for each modality, and a multi-modality evidence fusion module, which assigns a vector of discount rates to each modality evidence and combines the discounted evidence using Dempster's rule. The whole framework is trained by minimizing a new loss function based on a discounted Dice index to increase segmentation accuracy and reliability. The method was evaluated on the BraTs 2021 database of 1251 patients with brain tumors. Quantitative and qualitative results show that our method outperforms the state of the art, and implements an effective new idea for merging multi-information within deep neural networks. △ Less

Submitted 22 November, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

Comments: MICCAI2022

arXiv:2206.11501 [pdf, other]

A novel adversarial learning strategy for medical image classification

Authors: Zong Fan, Xiaohui Zhang, Jacob A. Gasienica, Jennifer Potts, Su Ruan, Wade Thorstad, Hiram Gay, Pengfei Song, Xiaowei Wang, Hua Li

Abstract: Deep learning (DL) techniques have been extensively utilized for medical image classification. Most DL-based classification networks are generally structured hierarchically and optimized through the minimization of a single loss function measured at the end of the networks. However, such a single loss design could potentially lead to optimization of one specific value of interest but fail to lever… ▽ More Deep learning (DL) techniques have been extensively utilized for medical image classification. Most DL-based classification networks are generally structured hierarchically and optimized through the minimization of a single loss function measured at the end of the networks. However, such a single loss design could potentially lead to optimization of one specific value of interest but fail to leverage informative features from intermediate layers that might benefit classification performance and reduce the risk of overfitting. Recently, auxiliary convolutional neural networks (AuxCNNs) have been employed on top of traditional classification networks to facilitate the training of intermediate layers to improve classification performance and robustness. In this study, we proposed an adversarial learning-based AuxCNN to support the training of deep neural networks for medical image classification. Two main innovations were adopted in our AuxCNN classification framework. First, the proposed AuxCNN architecture includes an image generator and an image discriminator for extracting more informative image features for medical image classification, motivated by the concept of generative adversarial network (GAN) and its impressive ability in approximating target data distribution. Second, a hybrid loss function is designed to guide the model training by incorporating different objectives of the classification network and AuxCNN to reduce overfitting. Comprehensive experimental studies demonstrated the superior classification performance of the proposed model. The effect of the network-related factors on classification performance was investigated. △ Less

Submitted 7 July, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

arXiv:2205.11748 [pdf, other]

doi 10.3390/children9070996

Deep Learning-based automated classification of Chinese Speech Sound Disorders

Authors: Yao-Ming Kuo, Shanq-Jang Ruan, Yu-Chin Chen, Ya-Wen Tu

Abstract: This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrica… ▽ More This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3--6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech-language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of Mel-frequency cepstral coefficients (MFCC) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system's ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4~percent. △ Less

Submitted 6 July, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Children 2022

Journal ref: Children 2022, 9, 996

arXiv:2205.01733 [pdf, other]

doi 10.1016/j.inffus.2022.11.008

Application of belief functions to medical image segmentation: A review

Authors: Ling Huang, Su Ruan, Thierry Denoeux

Abstract: The investigation of uncertainty is of major importance in risk-critical applications, such as medical image segmentation. Belief function theory, a formal framework for uncertainty analysis and multiple evidence fusion, has made significant contributions to medical image segmentation, especially since the development of deep learning. In this paper, we provide an introduction to the topic of medi… ▽ More The investigation of uncertainty is of major importance in risk-critical applications, such as medical image segmentation. Belief function theory, a formal framework for uncertainty analysis and multiple evidence fusion, has made significant contributions to medical image segmentation, especially since the development of deep learning. In this paper, we provide an introduction to the topic of medical image segmentation methods using belief function theory. We classify the methods according to the fusion step and explain how information with uncertainty or imprecision is modeled and fused with belief function theory. In addition, we discuss the challenges and limitations of present belief function-based medical image segmentation and propose orientations for future research. Future research could investigate both belief function theory and deep learning to achieve more promising and reliable segmentation results. △ Less

Submitted 5 December, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

Comments: Accepted by Information fusion

Journal ref: Information Fusion, Volume 91, March 2023, Pages 737-756

arXiv:2203.14714 [pdf, other]

Primitive-based Shape Abstraction via Nonparametric Bayesian Inference

Authors: Yuwei Wu, Weixiao Liu, Sipu Ruan, Gregory S. Chirikjian

Abstract: 3D shape abstraction has drawn great interest over the years. Apart from low-level representations such as meshes and voxels, researchers also seek to semantically abstract complex objects with basic geometric primitives. Recent deep learning methods rely heavily on datasets, with limited generality to unseen categories. Furthermore, abstracting an object accurately yet with a small number of prim… ▽ More 3D shape abstraction has drawn great interest over the years. Apart from low-level representations such as meshes and voxels, researchers also seek to semantically abstract complex objects with basic geometric primitives. Recent deep learning methods rely heavily on datasets, with limited generality to unseen categories. Furthermore, abstracting an object accurately yet with a small number of primitives still remains a challenge. In this paper, we propose a novel non-parametric Bayesian statistical method to infer an abstraction, consisting of an unknown number of geometric primitives, from a point cloud. We model the generation of points as observations sampled from an infinite mixture of Gaussian Superquadric Taper Models (GSTM). Our approach formulates the abstraction as a clustering problem, in which: 1) each point is assigned to a cluster via the Chinese Restaurant Process (CRP); 2) a primitive representation is optimized for each cluster, and 3) a merging post-process is incorporated to provide a concise representation. We conduct extensive experiments on two datasets. The results indicate that our method outperforms the state-of-the-art in terms of accuracy and is generalizable to various types of objects. △ Less

Submitted 19 July, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: Accepted by ECCV 2022

arXiv:2203.11943 [pdf, other]

doi 10.3390/e24040436

A Quantitative Comparison between Shannon and Tsallis Havrda Charvat Entropies Applied to Cancer Outcome Prediction

Authors: Thibaud Brochet, Jérôme Lapuyade-Lahorgue, Pierre Vera, Su Ruan

Abstract: In this paper, we propose to quantitatively compare loss functions based on parameterized Tsallis-Havrda-Charvat entropy and classical Shannon entropy for the training of a deep network in the case of small datasets which are usually encountered in medical applications. Shannon cross-entropy is widely used as a loss function for most neural networks applied to the segmentation, classification and… ▽ More In this paper, we propose to quantitatively compare loss functions based on parameterized Tsallis-Havrda-Charvat entropy and classical Shannon entropy for the training of a deep network in the case of small datasets which are usually encountered in medical applications. Shannon cross-entropy is widely used as a loss function for most neural networks applied to the segmentation, classification and detection of images. Shannon entropy is a particular case of Tsallis-Havrda-Charvat entropy. In this work, we compare these two entropies through a medical application for predicting recurrence in patients with head-neck and lung cancers after treatment. Based on both CT images and patient information, a multitask deep neural network is proposed to perform a recurrence prediction task using cross-entropy as a loss function and an image reconstruction task. Tsallis-Havrda-Charvat cross-entropy is a parameterized cross entropy with the parameter $α$. Shannon entropy is a particular case of Tsallis-Havrda-Charvat entropy for $α$ = 1. The influence of this parameter on the final prediction results is studied. In this paper, the experiments are conducted on two datasets including in total 580 patients, of whom 434 suffered from head-neck cancers and 146 from lung cancers. The results show that Tsallis-Havrda-Charvat entropy can achieve better performance in terms of prediction accuracy with some values of $α$. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 11 pages, 3 figures

Journal ref: Entropy 2022, 24(4), 436;

arXiv:2203.00641 [pdf, other]

Multi-Task Multi-Scale Learning For Outcome Prediction in 3D PET Images

Authors: Amine Amyar, Romain Modzelewski, Pierre Vera, Vincent Morard, Su Ruan

Abstract: Background and Objectives: Predicting patient response to treatment and survival in oncology is a prominent way towards precision medicine. To that end, radiomics was proposed as a field of study where images are used instead of invasive methods. The first step in radiomic analysis is the segmentation of the lesion. However, this task is time consuming and can be physician subjective. Automated to… ▽ More Background and Objectives: Predicting patient response to treatment and survival in oncology is a prominent way towards precision medicine. To that end, radiomics was proposed as a field of study where images are used instead of invasive methods. The first step in radiomic analysis is the segmentation of the lesion. However, this task is time consuming and can be physician subjective. Automated tools based on supervised deep learning have made great progress to assist physicians. However, they are data hungry, and annotated data remains a major issue in the medical field where only a small subset of annotated images is available. Methods: In this work, we propose a multi-task learning framework to predict patient's survival and response. We show that the encoder can leverage multiple tasks to extract meaningful and powerful features that improve radiomics performance. We show also that subsidiary tasks serve as an inductive bias so that the model can better generalize. Results: Our model was tested and validated for treatment response and survival in lung and esophageal cancers, with an area under the ROC curve of 77% and 71% respectively, outperforming single task learning methods. Conclusions: We show that, by using a multi-task learning approach, we can boost the performance of radiomic analysis by extracting rich information of intratumoral and peritumoral regions. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.13520 [pdf, other]

Anti-Malware Sandbox Games

Authors: Sujoy Sikdar, Sikai Ruan, Qishen Han, Paween Pitimanaaree, Jeremy Blackthorne, Bulent Yener, Lirong Xia

Abstract: We develop a game theoretic model of malware protection using the state-of-the-art sandbox method, to characterize and compute optimal defense strategies for anti-malware. We model the strategic interaction between developers of malware (M) and anti-malware (AM) as a two player game, where AM commits to a strategy of generating sandbox environments, and M responds by choosing to either attack or h… ▽ More We develop a game theoretic model of malware protection using the state-of-the-art sandbox method, to characterize and compute optimal defense strategies for anti-malware. We model the strategic interaction between developers of malware (M) and anti-malware (AM) as a two player game, where AM commits to a strategy of generating sandbox environments, and M responds by choosing to either attack or hide malicious activity based on the environment it senses. We characterize the condition for AM to protect all its machines, and identify conditions under which an optimal AM strategy can be computed efficiently. For other cases, we provide a quadratically constrained quadratic program (QCQP)-based optimization framework to compute the optimal AM strategy. In addition, we identify a natural and easy to compute strategy for AM, which as we show empirically, achieves AM utility that is close to the optimal AM utility, in equilibrium. △ Less

Submitted 27 February, 2022; originally announced February 2022.

arXiv:2201.13078 [pdf, other]

doi 10.1016/j.ijar.2022.06.007

Lymphoma segmentation from 3D PET-CT images using a deep evidential network

Authors: Ling Huang, Su Ruan, Pierre Decazes, Thierry Denoeux

Abstract: An automatic evidential segmentation method based on Dempster-Shafer theory and deep learning is proposed to segment lymphomas from three-dimensional Positron Emission Tomography (PET) and Computed Tomography (CT) images. The architecture is composed of a deep feature-extraction module and an evidential layer. The feature extraction module uses an encoder-decoder framework to extract semantic feat… ▽ More An automatic evidential segmentation method based on Dempster-Shafer theory and deep learning is proposed to segment lymphomas from three-dimensional Positron Emission Tomography (PET) and Computed Tomography (CT) images. The architecture is composed of a deep feature-extraction module and an evidential layer. The feature extraction module uses an encoder-decoder framework to extract semantic feature vectors from 3D inputs. The evidential layer then uses prototypes in the feature space to compute a belief function at each voxel quantifying the uncertainty about the presence or absence of a lymphoma at this location. Two evidential layers are compared, based on different ways of using distances to prototypes for computing mass functions. The whole model is trained end-to-end by minimizing the Dice loss function. The proposed combination of deep feature extraction and evidential segmentation is shown to outperform the baseline UNet model as well as three other state-of-the-art models on a dataset of 173 patients. △ Less

Submitted 24 June, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: Preprint submitted to International Journal of Approximate Reasoning

Journal ref: International Journal of Approximate Reasoning, Volume 149, 2022, Pages 39-60,

arXiv:2201.00336 [pdf, other]

Visilence: An Interactive Visualization Tool for Error Resilience Analysis

Authors: Shaolun Ruan, Yong Wang, Qiang Guan

Abstract: Soft errors have become one of the major concerns for HPC applications, as those errors can result in seriously corrupted outcomes, such as silent data corruptions (SDCs). Prior studies on error resilience have studied the robustness of HPC applications. However, it is still difficult for program developers to identify potential vulnerability to soft errors. In this paper, we present Visilence, a… ▽ More Soft errors have become one of the major concerns for HPC applications, as those errors can result in seriously corrupted outcomes, such as silent data corruptions (SDCs). Prior studies on error resilience have studied the robustness of HPC applications. However, it is still difficult for program developers to identify potential vulnerability to soft errors. In this paper, we present Visilence, a novel visualization tool to visually analyze error vulnerability based on the control-flow graph generated from HPC applications. Visilence efficiently visualizes the affected program states under injected errors and presents the visual analysis of the most vulnerable parts of an application. We demonstrate the effectiveness of Visilence through a case study. △ Less

Submitted 2 January, 2022; originally announced January 2022.

arXiv:2112.15300 [pdf, other]

BatchLens: A Visualization Approach for Analyzing Batch Jobs in Cloud Systems

Authors: Shaolun Ruan, Yong Wang, Hailong Jiang, Weijia Xu, Qiang Guan

Abstract: Cloud systems are becoming increasingly powerful and complex. It is highly challenging to identify anomalous execution behaviors and pinpoint problems by examining the overwhelming intermediate results/states in complex application workflows. Domain scientists urgently need a friendly and functional interface to understand the quality of the computing services and the performance of their applicat… ▽ More Cloud systems are becoming increasingly powerful and complex. It is highly challenging to identify anomalous execution behaviors and pinpoint problems by examining the overwhelming intermediate results/states in complex application workflows. Domain scientists urgently need a friendly and functional interface to understand the quality of the computing services and the performance of their applications in real time. To meet these needs, we explore data generated by job schedulers and investigate general performance metrics (e.g., utilization of CPU, memory and disk I/O). Specifically, we propose an interactive visual analytics approach, BatchLens, to provide both providers and users of cloud service with an intuitive and effective way to explore the status of system batch jobs and help them conduct root-cause analysis of anomalous behaviors in batch jobs. We demonstrate the effectiveness of BatchLens through a case study on the public Alibaba bench workload trace datasets. △ Less

Submitted 30 December, 2021; originally announced December 2021.

Showing 1–50 of 88 results for author: Ruan, S