Search | arXiv e-print repository

arXiv:2408.00378 [pdf]

A deep spatio-temporal attention model of dynamic functional network connectivity shows sensitivity to Alzheimer's in asymptomatic individuals

Authors: Yuxiang Wei, Anees Abrol, James Lah, Deqiang Qiu, Vince D. Calhoun

Abstract: Alzheimer's disease (AD) progresses from asymptomatic changes to clinical symptoms, emphasizing the importance of early detection for proper treatment. Functional magnetic resonance imaging (fMRI), particularly dynamic functional network connectivity (dFNC), has emerged as an important biomarker for AD. Nevertheless, studies probing at-risk subjects in the pre-symptomatic stage using dFNC are limi… ▽ More Alzheimer's disease (AD) progresses from asymptomatic changes to clinical symptoms, emphasizing the importance of early detection for proper treatment. Functional magnetic resonance imaging (fMRI), particularly dynamic functional network connectivity (dFNC), has emerged as an important biomarker for AD. Nevertheless, studies probing at-risk subjects in the pre-symptomatic stage using dFNC are limited. To identify at-risk subjects and understand alterations of dFNC in different stages, we leverage deep learning advancements and introduce a transformer-convolution framework for predicting at-risk subjects based on dFNC, incorporating spatial-temporal self-attention to capture brain network dependencies and temporal dynamics. Our model significantly outperforms other popular machine learning methods. By analyzing individuals with diagnosed AD and mild cognitive impairment (MCI), we studied the AD progression and observed a higher similarity between MCI and asymptomatic AD. The interpretable analysis highlights the cognitive-control network's diagnostic importance, with the model focusing on intra-visual domain dFNC when predicting asymptomatic AD subjects. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted by EMBC 2024

arXiv:2407.21075 [pdf, other]

Apple Intelligence Foundation Language Models

Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.00295 [pdf, other]

A deep neural network framework for dynamic multi-valued mapping estimation and its applications

Authors: Geng Li, Di Qiu, Lok Ming Lui

Abstract: This paper addresses the problem of modeling and estimating dynamic multi-valued mappings. While most mathematical models provide a unique solution for a given input, real-world applications often lack deterministic solutions. In such scenarios, estimating dynamic multi-valued mappings is necessary to suggest different reasonable solutions for each input. This paper introduces a deep neural networ… ▽ More This paper addresses the problem of modeling and estimating dynamic multi-valued mappings. While most mathematical models provide a unique solution for a given input, real-world applications often lack deterministic solutions. In such scenarios, estimating dynamic multi-valued mappings is necessary to suggest different reasonable solutions for each input. This paper introduces a deep neural network framework incorporating a generative network and a classification component. The objective is to model the dynamic multi-valued mapping between the input and output by providing a reliable uncertainty measurement. Generating multiple solutions for a given input involves utilizing a discrete codebook comprising finite variables. These variables are fed into a generative network along with the input, producing various output possibilities. The discreteness of the codebook enables efficient estimation of the output's conditional probability distribution for any given input using a classifier. By jointly optimizing the discrete codebook and its uncertainty estimation during training using a specially designed loss function, a highly accurate approximation is achieved. The effectiveness of our proposed framework is demonstrated through its application to various imaging problems, using both synthetic and real imaging data. Experimental results show that our framework accurately estimates the dynamic multi-valued mapping with uncertainty estimation. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.18115 [pdf, other]

Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

Authors: Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

Abstract: Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instruction… ▽ More Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instructions from humans. To address these challenges, we propose a novel framework that leverages the zero-shot detection and grounded recognition capabilities of pretraining visual-language models (VLMs) combined with dense 3D entity reconstruction to build 3D semantic maps. Additionally, we utilize large language models (LLMs) for spatial region abstraction and online planning, incorporating human instructions and spatial semantic context. We have built a 10-DoF mobile manipulation robotic platform JSR-1 and demonstrated in real-world robot experiments that our proposed framework can effectively capture spatial semantics and process natural language user instructions for zero-shot OVMM tasks under dynamic environment settings, with an overall navigation and task success rate of 80.95% and 73.33% over 105 episodes, and better SFT and SPL by 157.18% and 19.53% respectively compared to the baseline. Furthermore, the framework is capable of replanning towards the next most probable candidate location based on the spatial semantic context derived from the 3D semantic map when initial plans fail, keeping an average success rate of 76.67%. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Open-vocabulary, Mobile Manipulation, Dynamic Environments, 3D Semantic Maps, Zero-shot, LLMs, VLMs, 18 pages, 2 figures

arXiv:2405.13467 [pdf, other]

AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

Authors: Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan

Abstract: With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena… ▽ More With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scenarios. In this paper, we propose a simple yet effective federated face recognition framework called AdaFedFR, by devising an adaptive inter-class representation learning algorithm to enhance the generalization of the generic face model and the efficiency of federated training under strict privacy-preservation. In particular, our work delicately utilizes feature representations of public identities as learnable negative knowledge to optimize the local objective within the feature space, which further encourages the local model to learn powerful representations and optimize personalized models for clients. Experimental results demonstrate that our method outperforms previous approaches on several prevalent face recognition benchmarks within less than 3 communication rounds, which shows communication-friendly and great efficiency. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11315 [pdf, other]

MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

Authors: Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, Xiuzhuang Zhou

Abstract: In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the… ▽ More In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the medical field where data collection and annotation are both very expensive. We propose an innovative approach, MediCLIP, which adapts the CLIP model to few-shot medical image anomaly detection through self-supervised fine-tuning. Although CLIP, as a vision-language model, demonstrates outstanding zero-/fewshot performance on various downstream tasks, it still falls short in the anomaly detection of medical images. To address this, we design a series of medical image anomaly synthesis tasks to simulate common disease patterns in medical imaging, transferring the powerful generalization capabilities of CLIP to the task of medical image anomaly detection. When only few-shot normal medical images are provided, MediCLIP achieves state-of-the-art performance in anomaly detection and location compared to other methods. Extensive experiments on three distinct medical anomaly detection tasks have demonstrated the superiority of our approach. The code is available at https://github.com/cnulab/MediCLIP. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 12 pages, 3 figures, 5 tables, early accepted at MICCAI 2024

arXiv:2404.19519 [pdf, ps, other]

Generating Robust Counterfactual Witnesses for Graph Neural Networks

Authors: Dazhuo Qiu, Mengying Wang, Arijit Khan, Yinghui Wu

Abstract: This paper introduces a new class of explanation structures, called robust counterfactual witnesses (RCWs), to provide robust, both counterfactual and factual explanations for graph neural networks. Given a graph neural network M, a robust counterfactual witness refers to the fraction of a graph G that are counterfactual and factual explanation of the results of M over G, but also remains so for a… ▽ More This paper introduces a new class of explanation structures, called robust counterfactual witnesses (RCWs), to provide robust, both counterfactual and factual explanations for graph neural networks. Given a graph neural network M, a robust counterfactual witness refers to the fraction of a graph G that are counterfactual and factual explanation of the results of M over G, but also remains so for any "disturbed" G by flipping up to k of its node pairs. We establish the hardness results, from tractable results to co-NP-hardness, for verifying and generating robust counterfactual witnesses. We study such structures for GNN-based node classification, and present efficient algorithms to verify and generate RCWs. We also provide a parallel algorithm to verify and generate RCWs for large graphs with scalability guarantees. We experimentally verify our explanation generation process for benchmark datasets, and showcase their applications. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by ICDE 2024

arXiv:2404.02225 [pdf, other]

CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement

Authors: Di Qiu, Yinda Zhang, Thabo Beeler, Vladimir Tankovich, Christian Häne, Sean Fanello, Christoph Rhemann, Sergio Orts Escolano

Abstract: We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, a… ▽ More We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, and automatically adapts to different metric or intrinsic scales determined by the capture system. The key to our approach is the application of contrastive learning in an appropriate solution space and a carefully designed hypothesis feature, based on which positive and negative hypotheses can be effectively distinguished. Integrated in a simple baseline multi-view stereo pipeline, CHOSEN delivers impressive quality in terms of depth and normal accuracy compared to many current deep learning based multi-view stereo pipelines. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01296 [pdf, other]

MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space

Authors: Armand Comas-Massagué, Di Qiu, Menglei Chai, Marcel Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark Matthews, Paulo Gotardo, Octavia Camps, Sergio Orts-Escolano, Thabo Beeler

Abstract: We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts to enhance user engagement and customization. Central to our approach are key innovations aimed at overcoming the challenges in photo-realistic avatar synthesis. Firstly, we utilize a conditional Neural Radiance Fields (NeRF) model, trained on a large-scale unannotated multi-view dataset, to… ▽ More We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts to enhance user engagement and customization. Central to our approach are key innovations aimed at overcoming the challenges in photo-realistic avatar synthesis. Firstly, we utilize a conditional Neural Radiance Fields (NeRF) model, trained on a large-scale unannotated multi-view dataset, to create a versatile initial solution space that accelerates and diversifies avatar generation. Secondly, we develop a geometric prior, leveraging the capabilities of Text-to-Image Diffusion Models, to ensure superior view invariance and enable direct optimization of avatar geometry. These foundational ideas are complemented by our optimization pipeline built on Variational Score Distillation (VSD), which mitigates texture loss and over-saturation issues. As supported by our extensive experiments, these strategies collectively enable the creation of custom avatars with unparalleled visual quality and better adherence to input text prompts. You can find more results and videos in our website: https://syntec-research.github.io/MagicMirror △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00667 [pdf, other]

Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation

Authors: Dafei Qiu, Shan Xiong, Jiajin Yi, Jialin Peng

Abstract: Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its p… ▽ More Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its performance on complicated segmentation tasks is still far from practical usage. To address these issues, we investigate a highly annotation-efficient weak supervision, which assumes only sparse center-points on a small subset of object instances in the target training images. To achieve accurate segmentation with partial point annotations, we introduce instance counting and center detection as auxiliary tasks and design a multitask learning framework to leverage correlations among the counting, detection, and segmentation, which are all tasks with partial or no supervision. Building upon the different domain-invariances of the three tasks, we enforce counting estimation with a novel soft consistency loss as a global prior for center detection, which further guides the per-pixel segmentation. To further compensate for annotation sparsity, we develop a cross-position cut-and-paste for label augmentation and an entropy-based pseudo-label selection. The experimental results highlight that, by simply using extremely weak annotation, e.g., 15\% sparse points, for model training, the proposed model is capable of significantly outperforming UDA methods and produces comparable performance as the supervised counterpart. The high robustness of our model shown in the validations and the low requirement of expert knowledge for sparse point annotation further improve the potential application value of our model. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2401.08957 [pdf, other]

SWBT: Similarity Weighted Behavior Transformer with the Imperfect Demonstration for Robotic Manipulation

Authors: Kun Wu, Ning Liu, Zhen Zhao, Di Qiu, Jinming Li, Zhengping Che, Zhiyuan Xu, Qinru Qiu, Jian Tang

Abstract: Imitation learning (IL), aiming to learn optimal control policies from expert demonstrations, has been an effective method for robot manipulation tasks. However, previous IL methods either only use expensive expert demonstrations and omit imperfect demonstrations or rely on interacting with the environment and learning from online experiences. In the context of robotic manipulation, we aim to conq… ▽ More Imitation learning (IL), aiming to learn optimal control policies from expert demonstrations, has been an effective method for robot manipulation tasks. However, previous IL methods either only use expensive expert demonstrations and omit imperfect demonstrations or rely on interacting with the environment and learning from online experiences. In the context of robotic manipulation, we aim to conquer the above two challenges and propose a novel framework named Similarity Weighted Behavior Transformer (SWBT). SWBT effectively learn from both expert and imperfect demonstrations without interaction with environments. We reveal that the easy-to-get imperfect demonstrations, such as forward and inverse dynamics, significantly enhance the network by learning fruitful information. To the best of our knowledge, we are the first to attempt to integrate imperfect demonstrations into the offline imitation learning setting for robot manipulation tasks. Extensive experiments on the ManiSkill2 benchmark built on the high-fidelity Sapien simulator and real-world robotic manipulation tasks demonstrated that the proposed method can extract better features and improve the success rates for all tasks. Our code will be released upon acceptance of the paper. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 8 pages, 5 figures

ACM Class: I.2.9

arXiv:2401.02086 [pdf, other]

doi 10.1145/3639295

View-based Explanations for Graph Neural Networks

Authors: Tingyang Chen, Dazhuo Qiu, Yinghui Wu, Arijit Khan, Xiangyu Ke, Yunjun Gao

Abstract: Generating explanations for graph neural networks (GNNs) has been studied to understand their behavior in analytical tasks such as graph classification. Existing approaches aim to understand the overall results of GNNs rather than providing explanations for specific class labels of interest, and may return explanation structures that are hard to access, nor directly queryable.We propose GVEX, a no… ▽ More Generating explanations for graph neural networks (GNNs) has been studied to understand their behavior in analytical tasks such as graph classification. Existing approaches aim to understand the overall results of GNNs rather than providing explanations for specific class labels of interest, and may return explanation structures that are hard to access, nor directly queryable.We propose GVEX, a novel paradigm that generates Graph Views for EXplanation. (1) We design a two-tier explanation structure called explanation views. An explanation view consists of a set of graph patterns and a set of induced explanation subgraphs. Given a database G of multiple graphs and a specific class label l assigned by a GNN-based classifier M, it concisely describes the fraction of G that best explains why l is assigned by M. (2) We propose quality measures and formulate an optimization problem to compute optimal explanation views for GNN explanation. We show that the problem is $Σ^2_P$-hard. (3) We present two algorithms. The first one follows an explain-and-summarize strategy that first generates high-quality explanation subgraphs which best explain GNNs in terms of feature influence maximization, and then performs a summarization step to generate patterns. We show that this strategy provides an approximation ratio of 1/2. Our second algorithm performs a single-pass to an input node stream in batches to incrementally maintain explanation views, having an anytime quality guarantee of 1/4 approximation. Using real-world benchmark data, we experimentally demonstrate the effectiveness, efficiency, and scalability of GVEX. Through case studies, we showcase the practical applications of GVEX. △ Less

Submitted 7 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: This paper has been accepted by SIGMOD 2024

arXiv:2312.09463 [pdf, other]

Partial Rewriting for Multi-Stage ASR

Authors: Antoine Bruguier, David Qiu, Yanzhang He

Abstract: For many streaming automatic speech recognition tasks, it is important to provide timely intermediate streaming results, while refining a high quality final result. This can be done using a multi-stage architecture, where a small left-context only model creates streaming results and a larger left- and right-context model produces a final result at the end. While this significantly improves the qua… ▽ More For many streaming automatic speech recognition tasks, it is important to provide timely intermediate streaming results, while refining a high quality final result. This can be done using a multi-stage architecture, where a small left-context only model creates streaming results and a larger left- and right-context model produces a final result at the end. While this significantly improves the quality of the final results without compromising the streaming emission latency of the system, streaming results do not benefit from the quality improvements. Here, we propose using a text manipulation algorithm that merges the streaming outputs of both models. We improve the quality of streaming results by around 10%, without altering the final results. Our approach introduces no additional latency and reduces flickering. It is also lightweight, does not require retraining the model, and it can be applied to a wide variety of multi-stage architectures. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.08553 [pdf, other]

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

Authors: Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal

Abstract: End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios… ▽ More End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, with a low-bit quantization and N:M structured sparsity aware paradigm on the model weights, reducing the model complexity from parameter precision and matrix topology perspectives. We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method. A series of ablation studies validate the effectiveness of up to int4 quantization and 2:4 sparsity. However, a single compression technique fails to recover the performance well under extreme setups including int2 quantization and 1:4 sparsity. By contrast, our proposed method can compress the model to have 9.4% of the size, at the cost of only 7.3% relative word error rate (WER) regressions. We also provided in-depth analyses on the results and discussions on the limitations and potential solutions, which would be valuable for future studies. △ Less

Submitted 16 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024. Preprint

arXiv:2312.03763 [pdf, other]

Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

Authors: Yushi Lan, Feitong Tan, Di Qiu, Qiangeng Xu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser, Chen Change Loy, Yinda Zhang

Abstract: We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-pla… ▽ More We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-plane payload within each Gaussian rather than directly storing color and opacity. Additionally, we parameterize the Gaussians in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with fine-grained editing over facial features and expressions. Extensive experiments demonstrate the effectiveness of our method. △ Less

Submitted 19 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: project webpage: https://nirvanalan.github.io/projects/gaussian3diff/

arXiv:2311.04643 [pdf, other]

Software Architecture Recovery with Information Fusion

Authors: Yiran Zhang, Zhengzi Xu, Chengwei Liu, Hongxu Chen, Jianwen Sun, Dong Qiu, Yang Liu

Abstract: Understanding the architecture is vital for effectively maintaining and managing large software systems. However, as software systems evolve over time, their architectures inevitably change. To keep up with the change, architects need to track the implementation-level changes and update the architectural documentation accordingly, which is time-consuming and error-prone. Therefore, many automatic… ▽ More Understanding the architecture is vital for effectively maintaining and managing large software systems. However, as software systems evolve over time, their architectures inevitably change. To keep up with the change, architects need to track the implementation-level changes and update the architectural documentation accordingly, which is time-consuming and error-prone. Therefore, many automatic architecture recovery techniques have been proposed to ease this process. Despite efforts have been made to improve the accuracy of architecture recovery, existing solutions still suffer from two limitations. First, most of them only use one or two type of information for the recovery, ignoring the potential usefulness of other sources. Second, they tend to use the information in a coarse-grained manner, overlooking important details within it. To address these limitations, we propose SARIF, a fully automated architecture recovery technique, which incorporates three types of comprehensive information, including dependencies, code text and folder structure. SARIF can recover architecture more accurately by thoroughly analyzing the details of each type of information and adaptively fusing them based on their relevance and quality. To evaluate SARIF, we collected six projects with published ground-truth architectures and three open-source projects labeled by our industrial collaborators. We compared SARIF with nine state-of-the-art techniques using three commonly-used architecture similarity metrics and two new metrics. The experimental results show that SARIF is 36.1% more accurate than the best of the previous techniques on average. By providing comprehensive architecture, SARIF can help users understand systems effectively and reduce the manual effort of obtaining ground-truth architectures. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.00353 [pdf, other]

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

Authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan

Abstract: Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to enc… ▽ More Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to encourage the temporal consistency. However, in those works, temporal inconsistency issue may not be thoroughly solved, rendering the fidelity of generated videos limited.%The current state of the art cross-frame attention method aims at maintaining fine-grained visual details across frames, but it is still challenged by the temporal coherence problem. In this paper, we find the bottleneck lies in the unconstrained query tokens and propose a new zero-shot video-to-video translation framework, named \textit{LatentWarp}. Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space to constrain the query tokens. Specifically, based on the optical flow obtained from the original video, we warp the generated latent features of last frame to align with the current frame during the denoising process. As a result, the corresponding regions across the adjacent frames can share closely-related query tokens and attention outputs, which can further improve latent-level consistency to enhance visual temporal coherence of generated videos. Extensive experiment results demonstrate the superiority of \textit{LatentWarp} in achieving video-to-video translation with temporal coherence. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.11488 [pdf, other]

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Authors: Tong Dong Qiu, Andreas Thune, Markus Blatt, Alf Birger Rustad, Razvan Nane

Abstract: Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternativ… ▽ More Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternatively, GPUs are also a good candidate to accelerate the computation due to their massively parallel architecture that allows many floating point operations per second to be performed. The numerical iterative solver takes thus the most computational time and is challenging to solve efficiently due to the dependencies that exist in the model between cells. In this work, we evaluate the OPM Flow simulator and compare several state-of-the-art GPU solver libraries as well as custom developed solutions for a BiCGStab solver using an ILU0 preconditioner and benchmark their performance against the default DUNE library implementation running on multiple CPU processors using MPI. The evaluated GPU software libraries include a manual linear solver in OpenCL and the integration of several third party sparse linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform our bench-marking, we use small, medium, and large use cases, starting with the public test case NORNE that includes approximately 50k active cells and ending with a large model that includes approximately 1 million active cells. We find that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times, and that it can compare with around 8 dual-threaded MPI processes. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2307.03870 [pdf, other]

Opacity of Parametric Discrete Event Systems: Models, Decidability, and Algorithms

Authors: Weilin Deng, Daowen Qiu, Jingkai Yang

Abstract: Finite automata (FAs) model is a popular tool to characterize discrete event systems (DESs) due to its succinctness. However, for some complex systems, it is difficult to describe the necessary details by means of FAs model. In this paper, we consider a kind of extended finite automata (EFAs) in which each transition carries a predicate over state and event parameters. We also consider a type of s… ▽ More Finite automata (FAs) model is a popular tool to characterize discrete event systems (DESs) due to its succinctness. However, for some complex systems, it is difficult to describe the necessary details by means of FAs model. In this paper, we consider a kind of extended finite automata (EFAs) in which each transition carries a predicate over state and event parameters. We also consider a type of simplified EFAs, called Event-Parameters EFAs (EP-EFAs), where the state parameters are removed. Based upon these two parametric models, we investigate the problem of opacity analysis for parametric DESs. First of all, it is shown that EFAs model is more expressive than EP-EFAs model. Secondly, it is proved that the opacity properties for EFAs are undecidable in general. Moreover, the decidable opacity properties for EP-EFAs are investigated. We present the verification algorithms for current-state opacity, initial-state opacity and infinite-step opacity, and then discuss the complexity. This paper establishes a preliminary theory for the opacity of parametric DESs, which lays a foundation for the opacity analysis of complex systems. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 13 pages, 9 figures

arXiv:2306.07719 [pdf, other]

Contextual Dictionary Lookup for Knowledge Graph Completion

Authors: Jining Wang, Delai Qiu, YouMing Liu, Yining Wang, Chuan Chen, Zibin Zheng, Yuren Zhou

Abstract: Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples, numbers of knowledge graph embedding (KGE) models have been proposed to perform KGC by learning embeddings. Nevertheless, most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under diffe… ▽ More Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples, numbers of knowledge graph embedding (KGE) models have been proposed to perform KGC by learning embeddings. Nevertheless, most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities. Additionally, the few available fine-grained semantic models rely on clustering algorithms, resulting in limited performance and applicability due to the cumbersome two-stage training process. In this paper, we present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner. More specifically, we represent each relation using a dictionary that contains multiple latent semantics. The composition of a given entity and the dictionary's central semantics serves as the context for generating a lookup, thus determining the fine-grained semantics of the relation adaptively. The proposed loss function optimizes both the central and fine-grained semantics simultaneously to ensure their semantic consistency. Besides, we introduce two metrics to assess the validity and accuracy of the dictionary lookup operation. We extend several KGE models with the method, resulting in substantial performance improvements on widely-used benchmark datasets. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.15536 [pdf, other]

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

Authors: David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

Abstract: With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have differen… ▽ More With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models. Despite recent advances in quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different training dynamics compared to sequence tasks. In this paper, we first benchmark the impact of popular techniques such as straight through estimator, pseudo-quantization noise, learnable scale parameter, clipping, etc. on 4-bit seq2seq models across a suite of speech recognition datasets ranging from 1,000 hours to 1 million hours, as well as one machine translation dataset to illustrate its applicability outside of speech. Through the experiments, we report that noise based QAT suffers when there is insufficient regularization signal flowing back to the quantization scale. We propose low complexity changes to the QAT process to improve model accuracy (outperforming popular learnable scale and clipping methods). With the improved accuracy, it opens up the possibility to exploit some of the other benefits of noise based QAT: 1) training a single model that performs well in mixed precision mode and 2) improved generalization on long form speech recognition. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.06194 [pdf, other]

Concentric Tube Robot Redundancy Resolution via Velocity/Compliance Manipulability Optimization

Authors: Jia Shen, Yifan Wang, Milad Azizkhani, Deqiang Qiu, Yue Chen

Abstract: Concentric Tube Robots (CTR) have the potential to enable effective minimally invasive surgeries. While extensive modeling and control schemes have been proposed in the past decade, limited efforts have been made to improve the trajectory tracking performance from the perspective of manipulability , which can be critical to generate safe motion and feasible actuator commands. In this paper, we pro… ▽ More Concentric Tube Robots (CTR) have the potential to enable effective minimally invasive surgeries. While extensive modeling and control schemes have been proposed in the past decade, limited efforts have been made to improve the trajectory tracking performance from the perspective of manipulability , which can be critical to generate safe motion and feasible actuator commands. In this paper, we propose a gradient-based redundancy resolution framework that optimizes velocity/compliance manipulability-based performance indices during trajectory tracking for a kinematically redundant CTR. We efficiently calculate the gradients of manipulabilities by propagating the first- and second-order derivatives of state variables of the Cosserat rod model along the CTR arc length, reducing the gradient computation time by 68\% compared to finite difference method. Task-specific performance indices are optimized by projecting the gradient into the null-space of trajectory tracking. The proposed method is validated in three exemplary scenarios that involve trajectory tracking, obstacle avoidance, and external load compensation, respectively. Simulation results show that the proposed method is able to accomplish the required tasks while commonly used redundancy resolution approaches underperform or even fail. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 8 pages, 5 figures

arXiv:2304.01436 [pdf, other]

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

Abstract: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduc… ▽ More We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduce over-smoothing and improve out-of-model expressions synthesis, we propose to predict local features anchored on the 3DMM geometry. These learnt features are driven by 3DMM deformation and interpolated in 3D space to yield the volumetric radiance at a designated query point. We further show that using a Convolutional Neural Network in the UV space is critical in incorporating spatial context and producing representative local features. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/

arXiv:2212.05719 [pdf, other]

Tensor Factorization via Transformed Tensor-Tensor Product for Image Alignment

Authors: Sijia Xia, Duo Qiu, Xiongjun Zhang

Abstract: In this paper, we study the problem of a batch of linearly correlated image alignment, where the observed images are deformed by some unknown domain transformations, and corrupted by additive Gaussian noise and sparse noise simultaneously. By stacking these images as the frontal slices of a third-order tensor, we propose to utilize the tensor factorization method via transformed tensor-tensor prod… ▽ More In this paper, we study the problem of a batch of linearly correlated image alignment, where the observed images are deformed by some unknown domain transformations, and corrupted by additive Gaussian noise and sparse noise simultaneously. By stacking these images as the frontal slices of a third-order tensor, we propose to utilize the tensor factorization method via transformed tensor-tensor product to explore the low-rankness of the underlying tensor, which is factorized into the product of two smaller tensors via transformed tensor-tensor product under any unitary transformation. The main advantage of transformed tensor-tensor product is that its computational complexity is lower compared with the existing literature based on transformed tensor nuclear norm. Moreover, the tensor $\ell_p$ $(0<p<1)$ norm is employed to characterize the sparsity of sparse noise and the tensor Frobenius norm is adopted to model additive Gaussian noise. A generalized Gauss-Newton algorithm is designed to solve the resulting model by linearizing the domain transformations and a proximal Gauss-Seidel algorithm is developed to solve the corresponding subproblem. Furthermore, the convergence of the proximal Gauss-Seidel algorithm is established, whose convergence rate is also analyzed based on the Kurdyka-$Ł$ojasiewicz property. Extensive numerical experiments on real-world image datasets are carried out to demonstrate the superior performance of the proposed method as compared to several state-of-the-art methods in both accuracy and computational time. △ Less

Submitted 13 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

arXiv:2210.13109 [pdf, other]

WDA-Net: Weakly-Supervised Domain Adaptive Segmentation of Electron Microscopy

Authors: Dafei Qiu, Jiajin Yi, Jialin Peng

Abstract: Accurate segmentation of organelle instances, e.g., mitochondria, is essential for electron microscopy analysis. Despite the outstanding performance of fully supervised methods, they highly rely on sufficient per-pixel annotated data and are sensitive to domain shift. Aiming to develop a highly annotation-efficient approach with competitive performance, we focus on weakly-supervised domain adaptat… ▽ More Accurate segmentation of organelle instances, e.g., mitochondria, is essential for electron microscopy analysis. Despite the outstanding performance of fully supervised methods, they highly rely on sufficient per-pixel annotated data and are sensitive to domain shift. Aiming to develop a highly annotation-efficient approach with competitive performance, we focus on weakly-supervised domain adaptation (WDA) with a type of extremely sparse and weak annotation demanding minimal annotation efforts, i.e., sparse point annotations on only a small subset of object instances. To reduce performance degradation arising from domain shift, we explore multi-level transferable knowledge through conducting three complementary tasks, i.e., counting, detection, and segmentation, constituting a task pyramid with different levels of domain invariance. The intuition behind this is that after investigating a related source domain, it is much easier to spot similar objects in the target domain than to delineate their fine boundaries. Specifically, we enforce counting estimation as a global constraint to the detection with sparse supervision, which further guides the segmentation. A cross-position cut-and-paste augmentation is introduced to further compensate for the annotation sparsity. Extensive validations show that our model with only 15% point annotations can achieve comparable performance as supervised models and shows robustness to annotation selection. △ Less

Submitted 30 October, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted by BIBM 2022: International Conference on Bioinformatics & Biomedicine

arXiv:2207.14709 [pdf, other]

Robust Quantitative Susceptibility Mapping via Approximate Message Passing with Parameter Estimation

Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

Abstract: Purpose: For quantitative susceptibility mapping (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theo… ▽ More Purpose: For quantitative susceptibility mapping (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theory: From a Bayesian perspective, the image wavelet coefficients are approximately sparse and modelled by the Laplace distribution. The measurement noise is modelled by a Gaussian-mixture distribution with two components, where the second component is used to model the noise outliers. Through probabilistic inference, the susceptibility map and distribution parameters can be jointly recovered using approximate message passing (AMP). Methods: We compare our proposed AMP with built-in parameter estimation (AMP-PE) to the state-of-the-art L1-QSM, FANSI and MEDI approaches on the simulated and in vivo datasets, and perform experiments to explore the optimal settings of AMP-PE. Reproducible code is available at https://github.com/EmoryCN2L/QSM_AMP_PE Results: On the simulated Sim2Snr1 dataset, AMP-PE achieved the lowest NRMSE, DFCM and the highest SSIM, while MEDI achieved the lowest HFEN. On the in vivo datasets, AMP-PE is robust and successfully recovers the susceptibility maps using the estimated parameters, whereas L1-QSM, FANSI and MEDI typically require additional visual fine-tuning to select or double-check working parameters. Conclusion: AMP-PE provides automatic and adaptive parameter estimation for QSM and avoids the subjectivity from the visual fine-tuning step, making it an excellent choice for the clinical setting. △ Less

Submitted 30 May, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: Keywords: Approximate message passing, Compressive sensing, Outlier modelling, Parameter estimation, Quantitative susceptibility mapping

arXiv:2205.13117 [pdf, other]

Learn to Cluster Faces via Pairwise Classification

Authors: Junfu Liu, Di Qiu, Pengfei Yan, Xiaolin Wei

Abstract: Face clustering plays an essential role in exploiting massive unlabeled face data. Recently, graph-based face clustering methods are getting popular for their satisfying performances. However, they usually suffer from excessive memory consumption especially on large-scale graphs, and rely on empirical thresholds to determine the connectivities between samples in inference, which restricts their ap… ▽ More Face clustering plays an essential role in exploiting massive unlabeled face data. Recently, graph-based face clustering methods are getting popular for their satisfying performances. However, they usually suffer from excessive memory consumption especially on large-scale graphs, and rely on empirical thresholds to determine the connectivities between samples in inference, which restricts their applications in various real-world scenes. To address such problems, in this paper, we explore face clustering from the pairwise angle. Specifically, we formulate the face clustering task as a pairwise relationship classification task, avoiding the memory-consuming learning on large-scale graphs. The classifier can directly determine the relationship between samples and is enhanced by taking advantage of the contextual information. Moreover, to further facilitate the efficiency of our method, we propose a rank-weighted density to guide the selection of pairs sent to the classifier. Experimental results demonstrate that our method achieves state-of-the-art performances on several public clustering benchmarks at the fastest speed and shows a great advantage in comparison with graph-based clustering methods on memory consumption. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted by ICCV2021

arXiv:2205.10448 [pdf, other]

doi 10.1109/TSP.2022.3167516

Approximate Message Passing with Parameter Estimation for Heavily Quantized Measurements

Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

Abstract: Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal o… ▽ More Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal of interest is assumed to follow certain prior distribution with unknown parameters. Previous works focused on finding the parameters that maximize the measurement likelihood via expectation maximization -- an increasingly difficult problem to solve in cases involving complicated probability models. In this paper, we treat the parameters as unknown variables and compute their posteriors via AMP. The parameters and signal of interest can then be jointly recovered. Compared to previous methods, the proposed approach leads to a simple and elegant parameter estimation scheme, allowing us to directly work with 1-bit quantization noise model. We then further extend our approach to general multi-bit quantization noise model. Experimental results show that the proposed framework provides significant improvement over state-of-the-art methods across a wide range of sparsity and noise levels. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2007.07679

Journal ref: IEEE Transactions on Signal Processing, Vol. 70, pp. 2062-2077, Apr. 2022

arXiv:2202.00147 [pdf, ps, other]

Distributed Quantum Vote Based on Quantum Logical Operators, a New Battlefield of the Second Quantum Revolution

Authors: Xin Sun, Feifei He, Daowen Qiu, Piotr Kulicki, Mirek Sopek, Meiyun Guo

Abstract: We designed two rules of binary quantum computed vote: Quantum Logical Veto (QLV) and Quantum Logical Nomination (QLN). The conjunction and disjunction from quantum computational logic are used to define QLV and QLN, respectively. Compared to classical vote, quantum computed vote is fairer, more democratic and has stronger expressive power. Since the advantage of quantum computed vote is neither t… ▽ More We designed two rules of binary quantum computed vote: Quantum Logical Veto (QLV) and Quantum Logical Nomination (QLN). The conjunction and disjunction from quantum computational logic are used to define QLV and QLN, respectively. Compared to classical vote, quantum computed vote is fairer, more democratic and has stronger expressive power. Since the advantage of quantum computed vote is neither the speed of computing nor the security of communication, we believe it opens a new battlefield in the second quantum revolution. Compared to other rules of quantum computed vote, QLV and QLN have better scalability. Both QLV and QLN can be implemented by the current technology and the difficulty of implementation does not grow with the increase of the number of voters. △ Less

Submitted 31 January, 2022; originally announced February 2022.

Comments: 9 pages

MSC Class: 81Pxx

arXiv:2111.14041 [pdf, ps, other]

Learning Quantum Finite Automata with Queries

Authors: Daowen Qiu

Abstract: {\it Learning finite automata} (termed as {\it model learning}) has become an important field in machine learning and has been useful realistic applications. Quantum finite automata (QFA) are simple models of quantum computers with finite memory. Due to their simplicity, QFA have well physical realizability, but one-way QFA still have essential advantages over classical finite automata with regard… ▽ More {\it Learning finite automata} (termed as {\it model learning}) has become an important field in machine learning and has been useful realistic applications. Quantum finite automata (QFA) are simple models of quantum computers with finite memory. Due to their simplicity, QFA have well physical realizability, but one-way QFA still have essential advantages over classical finite automata with regard to state complexity (two-way QFA are more powerful than classical finite automata in computation ability as well). As a different problem in {\it quantum learning theory} and {\it quantum machine learning}, in this paper, our purpose is to initiate the study of {\it learning QFA with queries} (naturally it may be termed as {\it quantum model learning}), and the main results are regarding learning two basic one-way QFA: (1) We propose a learning algorithm for measure-once one-way QFA (MO-1QFA) with query complexity of polynomial time; (2) We propose a learning algorithm for measure-many one-way QFA (MM-1QFA) with query complexity of polynomial-time, as well. △ Less

Submitted 12 November, 2023; v1 submitted 27 November, 2021; originally announced November 2021.

Comments: 25pages; comments are welcome

arXiv:2110.03327 [pdf, other]

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Authors: Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

Abstract: As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions,… ▽ More As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems. Recent research has shown that model-based confidence estimators have a significant advantage over using the output softmax probabilities. If the input data to the speech recogniser is from mismatched acoustic and linguistic conditions, the ASR performance and the corresponding confidence estimators may exhibit severe degradation. Since confidence models are often trained on the same in-domain data as the ASR, generalising to out-of-domain (OOD) scenarios is challenging. By keeping the ASR model untouched, this paper proposes two approaches to improve the model-based confidence estimators on OOD data: using pseudo transcriptions and an additional OOD language model. With an ASR model trained on LibriSpeech, experiments show that the proposed methods can greatly improve the confidence metrics on TED-LIUM and Switchboard datasets while preserving in-domain performance. Furthermore, the improved confidence estimators are better calibrated on OOD data and can provide a much more reliable criterion for data selection. △ Less

Submitted 2 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper at ICASSP 2022

arXiv:2110.00165 [pdf, other]

Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online A… ▽ More Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online ASR model. This approach demonstrates that using the source domain data with a small fraction of the target domain data (3%) can recover the performance gap compared to a full data baseline: relative 13.5% WER improvement for target domain data. △ Less

Submitted 15 February, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

arXiv:2107.09635 [pdf, ps, other]

Analyzing and predicting non-equilibrium many-body dynamics via dynamic mode decomposition

Authors: Jia Yin, Yang-hao Chan, Felipe da Jornada, Diana Qiu, Chao Yang, Steven G. Louie

Abstract: Simulating the dynamics of a nonequilibrium quantum many-body system by computing the two-time Green's function associated with such a system is computationally challenging. However, we are often interested in the time diagonal of such a Green's function or time dependent physical observables that are functions of one time. In this paper, we discuss the possibility of using dynamic model decomposi… ▽ More Simulating the dynamics of a nonequilibrium quantum many-body system by computing the two-time Green's function associated with such a system is computationally challenging. However, we are often interested in the time diagonal of such a Green's function or time dependent physical observables that are functions of one time. In this paper, we discuss the possibility of using dynamic model decomposition (DMD), a data-driven model order reduction technique, to characterize one-time observables associated with the nonequilibrium dynamics using snapshots computed within a small time window. The DMD method allows us to efficiently predict long time dynamics from a limited number of trajectory samples. We demonstrate the effectiveness of DMD on a model two-band system. We show that, in the equilibrium limit, the DMD analysis yields results that are consistent with those produced from a linear response analysis. In the nonequilibrium case, the extrapolated dynamics produced by DMD is more accurate than a special Fourier extrapolation scheme presented in this paper. We point out a potential pitfall of the standard DMD method caused by insufficient spatial/momentum resolution of the discretization scheme. We show how this problem can be overcome by using a variant of the DMD method known as higher order DMD. △ Less

Submitted 26 June, 2021; originally announced July 2021.

Comments: 22 pages, 17 pages

arXiv:2104.12870 [pdf, other]

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Authors: David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw

Abstract: Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to joi… ▽ More Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to jointly learn word confidence, word deletion, and utterance confidence. Empirical results show that multi-task learning with all three objectives improves confidence metrics (NCE, AUC, RMSE) without the need for increasing the model size of the confidence estimation module. Using the utterance-level confidence for rescoring also decreases the word error rates on Google's Voice Search and Long-tail Maps datasets by 3-5% relative, without needing a dedicated neural rescorer. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: Submitted to Interspeech 2021

arXiv:2104.09753 [pdf, other]

Supervisory Control of Quantum Discrete Event Systems

Authors: Daowen Qiu

Abstract: Discrete event systems (DES) have been deeply developed and applied in practice, but state complexity in DES still is an important problem to be better solved with innovative methods. With the development of quantum computing and quantum control, a natural problem is to simulate DES by means of quantum computing models and to establish {\it quantum DES} (QDES). The motivation is twofold: on the on… ▽ More Discrete event systems (DES) have been deeply developed and applied in practice, but state complexity in DES still is an important problem to be better solved with innovative methods. With the development of quantum computing and quantum control, a natural problem is to simulate DES by means of quantum computing models and to establish {\it quantum DES} (QDES). The motivation is twofold: on the one hand, QDES have potential applications when DES are simulated and processed by quantum computers, where quantum systems are employed to simulate the evolution of states driven by discrete events, and on the other hand, QDES may have essential advantages over DES concerning state complexity for imitating some practical problems. So, the goal of this paper is to establish a basic framework of QDES by using {\it quantum finite automata} (QFA) as the modelling formalisms, and the supervisory control theorems of QDES are established and proved. Then we present a polynomial-time algorithm to decide whether or not the controllability condition holds. In particular, we construct a number of new examples of QFA to illustrate the supervisory control of QDES and to verify the essential advantages of QDES over classical DES in state complexity. △ Less

Submitted 3 May, 2023; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 35 pages, 5 figures; comments are welcome

arXiv:2103.06716 [pdf, other]

Learning Word-Level Confidence For Subword End-to-End ASR

Authors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw

Abstract: We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR). Although prior works have proposed training auxiliary confidence models for ASR systems, they do not extend naturally to systems that operate on word-pieces (WP) as their vocabulary. In particular, ground truth WP correctness labels are needed for training confi… ▽ More We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR). Although prior works have proposed training auxiliary confidence models for ASR systems, they do not extend naturally to systems that operate on word-pieces (WP) as their vocabulary. In particular, ground truth WP correctness labels are needed for training confidence models, but the non-unique tokenization from word to WP causes inaccurate labels to be generated. This paper proposes and studies two confidence models of increasing complexity to solve this problem. The final model uses self-attention to directly learn word-level confidence without needing subword tokenization, and exploits full context features from multiple hypotheses to improve confidence accuracy. Experiments on Voice Search and long-tail test sets show standard metrics (e.g., NCE, AUC, RMSE) improving substantially. The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: To appear in ICASSP 2021

arXiv:2102.12642 [pdf, other]

CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results

Authors: Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao

Abstract: As facial interaction systems are prevalently deployed, security and reliability of these systems become a critical issue, with substantial research efforts devoted. Among them, face anti-spoofing emerges as an important area, whose objective is to identify whether a presented face is live or spoof. Recently, a large-scale face anti-spoofing dataset, CelebA-Spoof which comprised of 625,537 picture… ▽ More As facial interaction systems are prevalently deployed, security and reliability of these systems become a critical issue, with substantial research efforts devoted. Among them, face anti-spoofing emerges as an important area, whose objective is to identify whether a presented face is live or spoof. Recently, a large-scale face anti-spoofing dataset, CelebA-Spoof which comprised of 625,537 pictures of 10,177 subjects has been released. It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects. This paper reports methods and results in the CelebA-Spoof Challenge 2020 on Face AntiSpoofing which employs the CelebA-Spoof dataset. The model evaluation is conducted online on the hidden test set. A total of 134 participants registered for the competition, and 19 teams made valid submissions. We will analyze the top ranked solutions and present some discussion on future work directions. △ Less

Submitted 25 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: Technical report. Challenge website: https://competitions.codalab.org/competitions/26210

arXiv:2101.01745 [pdf, other]

doi 10.1145/3476229

Hardware Acceleration of HPC Computational Flow Dynamics using HBM-enabled FPGAs

Authors: Tom Hogervorst, Tong Dong Qiu, Giacomo Marchiori, Alf Birger, Markus Blatt, Razvan Nane

Abstract: Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the uttermost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. A Field-Programmable Gate Array is a reconfigurable hardw… ▽ More Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the uttermost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. A Field-Programmable Gate Array is a reconfigurable hardware accelerator that is fully customizable in terms of computational resources and memory storage requirements of an application during its lifetime. Therefore, it is an ideal candidate to accelerate scientific computing applications because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers found in scientific libraries. In this paper, we study the potential of using FPGA in HPC because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we first propose a novel ILU0 preconditioner tightly integrated with a BiCGStab solver kernel designed using a mixture of High-Level Synthesis and Register-Transfer Level hand-coded design. Second, we integrate the developed preconditioned iterative solver in Flow from the Open Porous Media (OPM) project, a state-of-the-art open-source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both standalone mode and integrated into the reservoir simulator that includes all the on-chip URAM and BRAM, on-board High-Bandwidth Memory, and off-chip CPU memory data transfers required in a complex simulator software such as OPM's Flow. We evaluate the performance on the Norne field, a real-world case reservoir model using a grid with more than 10^5 cells and using 3 unknowns per cell. △ Less

Submitted 5 January, 2021; originally announced January 2021.

Report number: Article No.: 20, pp 1--35

Journal ref: ACM Transactions on Reconfigurable Technology and Systems, Volume 15, Issue 2, June 2022

arXiv:2010.11428 [pdf, other]

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

Authors: Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

Abstract: For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based seq… ▽ More For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning. △ Less

Submitted 23 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: Submitted to ICASSP 2021

arXiv:2008.05258 [pdf, other]

Guided Collaborative Training for Pixel-wise Semi-Supervised Learning

Authors: Zhanghan Ke, Di Qiu, Kaican Li, Qiong Yan, Rynson W. H. Lau

Abstract: We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use… ▽ More We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use task-specific properties. In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. First, GCT addresses the issues caused by the dense outputs through a novel flaw detector. Second, the modules in GCT learn from unlabeled data collaboratively through two newly proposed constraints that are independent of task-specific properties. As a result, GCT can be applied to a wide range of pixel-wise tasks without structural adaptation. Our extensive experiments on four challenging vision tasks, including semantic segmentation, real image denoising, portrait image matting, and night image enhancement, show that GCT outperforms state-of-the-art SSL methods by a large margin. Our code available at: https://github.com/ZHKKKe/PixelSSL. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 16th European Conference on Computer Vision (ECCV 2020)

arXiv:2008.05157 [pdf, other]

Towards Geometry Guided Neural Relighting with Flash Photography

Authors: Di Qiu, Jin Zeng, Zhanghan Ke, Wenxiu Sun, Chengxi Yang

Abstract: Previous image based relighting methods require capturing multiple images to acquire high frequency lighting effect under different lighting conditions, which needs nontrivial effort and may be unrealistic in certain practical use scenarios. While such approaches rely entirely on cleverly sampling the color images under different lighting conditions, little has been done to utilize geometric infor… ▽ More Previous image based relighting methods require capturing multiple images to acquire high frequency lighting effect under different lighting conditions, which needs nontrivial effort and may be unrealistic in certain practical use scenarios. While such approaches rely entirely on cleverly sampling the color images under different lighting conditions, little has been done to utilize geometric information that crucially influences the high-frequency features in the images, such as glossy highlight and cast shadow. We therefore propose a framework for image relighting from a single flash photograph with its corresponding depth map using deep learning. By incorporating the depth map, our approach is able to extrapolate realistic high-frequency effects under novel lighting via geometry guided image decomposition from the flashlight image, and predict the cast shadow map from the shadow-encoding transformed depth map. Moreover, the single-image based setup greatly simplifies the data capture process. We experimentally validate the advantage of our geometry guided approach over state-of-the-art image-based approaches in intrinsic image decomposition and image relighting, and also demonstrate our performance on real mobile phone photo examples. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:2008.01806 [pdf, other]

Fast Nonconvex $T_2^*$ Mapping Using ADMM

Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

Abstract: Magnetic resonance (MR)-$T_2^*$ mapping is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise mapping of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ mapping method causes discomfort to patients and introduces motion artifacts to reconstructed… ▽ More Magnetic resonance (MR)-$T_2^*$ mapping is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise mapping of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ mapping method causes discomfort to patients and introduces motion artifacts to reconstructed images, which limits its wider applicability. In this paper we address this issue by performing $T_2^*$ mapping from undersampled data using compressive sensing (CS). We formulate the reconstruction as a nonconvex problem that can be decomposed into two subproblems. They can be solved either separately via the standard approach or jointly via the alternating direction method of multipliers (ADMM). Compared to previous CS-based approaches that only apply sparse regularization on the spin density $\boldsymbol X_0$ and the relaxation rate $\boldsymbol R_2^*$, our formulation enforces additional sparse priors on the $T_2^*$-weighted images at multiple echoes to improve the reconstruction performance. We performed convergence analysis of the proposed algorithm, evaluated its performance on in vivo data, and studied the effects of different sampling schemes. Experimental results showed that the proposed joint-recovery approach generally outperforms the state-of-the-art method, especially in the low-sampling rate regime, making it a preferred choice to perform fast 3D $T_2^*$ mapping in practice. The framework adopted in this work can be easily extended to other problems arising from MR or other imaging modalities with non-linearly coupled variables. △ Less

Submitted 4 August, 2020; originally announced August 2020.

arXiv:2007.14564 [pdf, other]

Bayesian Massive MIMO Channel Estimation with Parameter Estimation Using Low-Resolution ADCs

Authors: Shuai Huang, Deqiang Qiu, Trac D. Tran

Abstract: In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $\boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $\boldsymbol x$ in the angle domain is often used to compen… ▽ More In order to reduce hardware complexity and power consumption, massive multiple-input multiple-output (MIMO) systems employ low-resolution analog-to-digital converters (ADCs) to acquire quantized measurements $\boldsymbol y$. This poses new challenges to the channel estimation problem, and the sparse prior on the channel coefficient vector $\boldsymbol x$ in the angle domain is often used to compensate for the information lost during quantization. By interpreting the sparse prior from a probabilistic perspective, we can assume $\boldsymbol x$ follows certain sparse prior distribution and recover it using approximate message passing (AMP). However, the distribution parameters are unknown in practice and need to be estimated. Due to the increased computational complexity in the quantization noise model, previous works either use an approximated noise model or manually tune the noise distribution parameters. In this paper, we treat both signals and parameters as random variables and recover them jointly within the AMP framework. The proposed approach leads to a much simpler parameter estimation method, allowing us to work with the quantization noise model directly. Experimental results show that the proposed approach achieves state-of-the-art performance under various noise levels and does not require parameter tuning, making it a practical and maintenance-free approach for channel estimation. △ Less

Submitted 11 February, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

arXiv:2007.12942 [pdf, other]

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

Authors: Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang

Abstract: Human beings can quickly adapt to environmental changes by leveraging learning experience. However, the poor ability of adapting to dynamic environments remains a major challenge for AI models. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labeled source domain and a sequence of unlabeled target domains. There are two majo… ▽ More Human beings can quickly adapt to environmental changes by leveraging learning experience. However, the poor ability of adapting to dynamic environments remains a major challenge for AI models. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labeled source domain and a sequence of unlabeled target domains. There are two major obstacles in this problem: domain shifts and catastrophic forgetting. In this work, we propose Gradient Regularized Contrastive Learning to solve the above obstacles. At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains. Hence our method can jointly learn both semantically discriminative and domain-invariant features with labeled source domain and unlabeled target domains. The experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach when compared to the state-of-the-art. △ Less

Submitted 25 July, 2020; originally announced July 2020.

arXiv:2007.12858 [pdf, other]

Modal Uncertainty Estimation via Discrete Latent Representation

Authors: Di Qiu, Lok Ming Lui

Abstract: Many important problems in the real world don't have unique solutions. It is thus important for machine learning models to be capable of proposing different plausible solutions with meaningful probability measures. In this work we introduce such a deep learning framework that learns the one-to-many mappings between the inputs and outputs, together with faithful uncertainty measures. We call our fr… ▽ More Many important problems in the real world don't have unique solutions. It is thus important for machine learning models to be capable of proposing different plausible solutions with meaningful probability measures. In this work we introduce such a deep learning framework that learns the one-to-many mappings between the inputs and outputs, together with faithful uncertainty measures. We call our framework {\it modal uncertainty estimation} since we model the one-to-many mappings to be generated through a set of discrete latent variables, each representing a latent mode hypothesis that explains the corresponding type of input-output relationship. The discrete nature of the latent representations thus allows us to estimate for any input the conditional probability distribution of the outputs very effectively. Both the discrete latent space and its uncertainty estimation are jointly learned during training. We motivate our use of discrete latent space through the multi-modal posterior collapse problem in current conditional generative models, then develop the theoretical background, and extensively validate our method on both synthetic and realistic tasks. Our framework demonstrates significantly more accurate uncertainty estimation than the current state-of-the-art methods, and is informative and convenient for practical use. △ Less

Submitted 25 July, 2020; originally announced July 2020.

arXiv:2007.10924 [pdf, ps, other]

doi 10.3390/e23020189

Partial Boolean functions with exact quantum 1-query complexity

Authors: Guoliang Xu, Daowen Qiu

Abstract: We provide two sufficient and necessary conditions to characterize any $n$-bit partial Boolean function with exact quantum 1-query complexity. Using the first characterization, we present all $n$-bit partial Boolean functions that depend on $n$ bits and have exact quantum 1-query complexity. Due to the second characterization, we construct a function $F$ that maps any $n$-bit partial Boolean funct… ▽ More We provide two sufficient and necessary conditions to characterize any $n$-bit partial Boolean function with exact quantum 1-query complexity. Using the first characterization, we present all $n$-bit partial Boolean functions that depend on $n$ bits and have exact quantum 1-query complexity. Due to the second characterization, we construct a function $F$ that maps any $n$-bit partial Boolean function to some integer, and if an $n$-bit partial Boolean function $f$ depends on $k$ bits and has exact quantum 1-query complexity, then $F(f)$ is non-positive. In addition, we show that the number of all $n$-bit partial Boolean functions that depend on $k$ bits and have exact quantum 1-query complexity is not bigger than $n^{2}2^{2^{n-1}(1+2^{2-k})+2n^{2}}$ for all $n\geq 3$ and $k\geq 2$. △ Less

Submitted 4 September, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: 11pages; comments are welcome

arXiv:2005.13382 [pdf, ps, other]

doi 10.1016/j.tcs.2019.12.008

Security Improvements of Several Basic Quantum Private Query Protocols with O(log N) Communication Complexity

Authors: Fang Yu, Daowen Qiu, Xiaoming Wang, Qin Li, Lvzhou Li, Jozef Gruska

Abstract: New quantum private database (with N elements) query protocols are presented and analyzed. Protocols preserve O(logN) communication complexity of known protocols for the same task, but achieve several significant improvements in security, especially concerning user privacy. For example, the randomized form of our protocol has a cheat-sensitive property - it allows the user to detect a dishonest da… ▽ More New quantum private database (with N elements) query protocols are presented and analyzed. Protocols preserve O(logN) communication complexity of known protocols for the same task, but achieve several significant improvements in security, especially concerning user privacy. For example, the randomized form of our protocol has a cheat-sensitive property - it allows the user to detect a dishonest database with a nonzero probability, while the phase-encoded private query protocols for the same task do not have such a property. Moreover, when the database performs the computational basis measurement, a particular projective measurement which can cause a significant loss of user privacy in the previous private query protocols with O(logN) communication complexity, at most half of the user privacy could leak to such a database in our protocol, while in the QPQ protocol, the entire user privacy could leak out. In addition, it is proved here that for large N, the user could detect a cheating via the computational basis measurement, with a probability close to 1/2 using O(\sqrt{N}) special queries. Finally, it is shown here, for both forms of our protocol, basic and randomized, how a dishonest database has to act in case it could not learn user's queries. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Journal ref: Theoretical Computer Science 807 (2020): 330-340

arXiv:2004.01907 [pdf, other]

Knowledge Guided Metric Learning for Few-Shot Text Classification

Authors: Dianbo Sui, Yubo Chen, Binjie Mao, Delai Qiu, Kang Liu, Jun Zhao

Abstract: The training of deep-learning-based text classification models relies heavily on a huge amount of annotation data, which is difficult to obtain. When the labeled data is scarce, models tend to struggle to achieve satisfactory performance. However, human beings can distinguish new categories very efficiently with few examples. This is mainly due to the fact that human beings can leverage knowledge… ▽ More The training of deep-learning-based text classification models relies heavily on a huge amount of annotation data, which is difficult to obtain. When the labeled data is scarce, models tend to struggle to achieve satisfactory performance. However, human beings can distinguish new categories very efficiently with few examples. This is mainly due to the fact that human beings can leverage knowledge obtained from relevant tasks. Inspired by human intelligence, we propose to introduce external knowledge into few-shot learning to imitate human knowledge. A novel parameter generator network is investigated to this end, which is able to use the external knowledge to generate relation network parameters. Metrics can be transferred among tasks when equipped with these generated parameters, so that similar tasks use similar metrics while different tasks use different metrics. Through experiments, we demonstrate that our method outperforms the state-of-the-art few-shot text classification models. △ Less

Submitted 4 April, 2020; originally announced April 2020.

arXiv:2003.07071 [pdf, other]

Adapting Object Detectors with Conditional Domain Normalization

Authors: Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang

Abstract: Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific… ▽ More Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific attribute out of the semantic features from one domain via a domain embedding module, which learns a domain-vector to characterize the corresponding domain attribute information. Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute. We incorporate CDN into various convolution stages of an object detector to adaptively address the domain shifts of different level's representation. In contrast to existing adaptation works that conduct domain confusion learning on semantic features to remove domain-specific factors, CDN aligns different domain distributions by modulating the semantic features of one domain conditioned on the learned domain-vector of another domain. Extensive experiments show that CDN outperforms existing methods remarkably on both real-to-real and synthetic-to-real adaptation benchmarks, including 2D image detection and 3D point cloud detection. △ Less

Submitted 22 July, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Comments: Accepted at ECCV 2020

arXiv:2003.01357 [pdf, other]

doi 10.1098/rspa.2020.0147

Shape analysis via inconsistent surface registration

Authors: Gary P. T. Choi, Di Qiu, Lok Ming Lui

Abstract: In this work, we develop a framework for shape analysis using inconsistent surface mapping. Traditional landmark-based geometric morphometrics methods suffer from the limited degrees of freedom, while most of the more advanced non-rigid surface mapping methods rely on a strong assumption of the global consistency of two surfaces. From a practical point of view, given two anatomical surfaces with p… ▽ More In this work, we develop a framework for shape analysis using inconsistent surface mapping. Traditional landmark-based geometric morphometrics methods suffer from the limited degrees of freedom, while most of the more advanced non-rigid surface mapping methods rely on a strong assumption of the global consistency of two surfaces. From a practical point of view, given two anatomical surfaces with prominent feature landmarks, it is more desirable to have a method that automatically detects the most relevant parts of the two surfaces and finds the optimal landmark-matching alignment between those parts, without assuming any global 1-1 correspondence between the two surfaces. Our method is capable of solving this problem using inconsistent surface registration based on quasi-conformal theory. It further enables us to quantify the dissimilarity of two shapes using quasi-conformal distortion and differences in mean and Gaussian curvatures, thereby providing a natural way for shape classification. Experiments on Platyrrhine molars demonstrate the effectiveness of our method and shed light on the interplay between function and shape in nature. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Journal ref: Proceedings of the Royal Society A, 476(2242), 20200147 (2020)

Showing 1–50 of 82 results for author: Qiu, D