Search | arXiv e-print repository

arXiv:2407.19436 [pdf, other]

X-Fake: Juggling Utility Evaluation and Explanation of Simulated SAR Images

Authors: Zhongling Huang, Yihan Zhuang, Zipei Zhong, Feng Xu, Gong Cheng, Junwei Han

Abstract: SAR image simulation has attracted much attention due to its great potential to supplement the scarce training data for deep learning algorithms. Consequently, evaluating the quality of the simulated SAR image is crucial for practical applications. The current literature primarily uses image quality assessment techniques for evaluation that rely on human observers' perceptions. However, because of… ▽ More SAR image simulation has attracted much attention due to its great potential to supplement the scarce training data for deep learning algorithms. Consequently, evaluating the quality of the simulated SAR image is crucial for practical applications. The current literature primarily uses image quality assessment techniques for evaluation that rely on human observers' perceptions. However, because of the unique imaging mechanism of SAR, these techniques may produce evaluation results that are not entirely valid. The distribution inconsistency between real and simulated data is the main obstacle that influences the utility of simulated SAR images. To this end, we propose a novel trustworthy utility evaluation framework with a counterfactual explanation for simulated SAR images for the first time, denoted as X-Fake. It unifies a probabilistic evaluator and a causal explainer to achieve a trustworthy utility assessment. We construct the evaluator using a probabilistic Bayesian deep model to learn the posterior distribution, conditioned on real data. Quantitatively, the predicted uncertainty of simulated data can reflect the distribution discrepancy. We build the causal explainer with an introspective variational auto-encoder to generate high-resolution counterfactuals. The latent code of IntroVAE is finally optimized with evaluation indicators and prior information to generate the counterfactual explanation, thus revealing the inauthentic details of simulated data explicitly. The proposed framework is validated on four simulated SAR image datasets obtained from electromagnetic models and generative artificial intelligence approaches. The results demonstrate the proposed X-Fake framework outperforms other IQA methods in terms of utility. Furthermore, the results illustrate that the generated counterfactual explanations are trustworthy, and can further improve the data utility in applications. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2406.17672 [pdf, other]

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of model parameters. To address the challenges, we propose SpecMaskGIT, a light-weighted, efficient yet effective TTA model based on the masked generative modeling of spectrograms. First, SpecMaskGIT synthesizes a realistic 10s audio clip by less than 16 iterations, an order-of-magnitude less than previous iterative TTA methods. As a discrete model, SpecMaskGIT outperforms larger VQ-Diffusion and auto-regressive models in the TTA benchmark, while being real-time with only 4 CPU cores or even 30x faster with a GPU. Next, built upon a latent space of Mel-spectrogram, SpecMaskGIT has a wider range of applications (e.g., the zero-shot bandwidth extension) than similar methods built on the latent wave domain. Moreover, we interpret SpecMaskGIT as a generative extension to previous discriminative audio masked Transformers, and shed light on its audio representation learning potential. We hope our work inspires the exploration of masked audio modeling toward further diverse scenarios. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

arXiv:2405.18503 [pdf, other]

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Authors: Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji

Abstract: Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error… ▽ More Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error to align them with their artistic intentions. To address this issue, we introduce Sound Consistency Trajectory Models (SoundCTM). Our model enables flexible transitioning between high-quality 1-step sound generation and superior sound quality through multi-step generation. This allows creators to initially control sounds with 1-step samples before refining them through multi-step generation. While CTM fundamentally achieves flexible 1-step and multi-step generation, its impressive performance heavily depends on an additional pretrained feature extractor and an adversarial loss, which are expensive to train and not always available in other domains. Thus, we reframe CTM's training framework and introduce a novel feature distance by utilizing the teacher's network for a distillation loss. Additionally, while distilling classifier-free guided trajectories, we train conditional and unconditional student models simultaneously and interpolate between these models during inference. We also propose training-free controllable frameworks for SoundCTM, leveraging its flexible sampling capability. SoundCTM achieves both promising 1-step and multi-step real-time sound generation without using any extra off-the-shelf networks. Furthermore, we demonstrate SoundCTM's capability of controllable sound generation in a training-free manner. Our codes, pretrained models, and audio samples are available at https://github.com/sony/soundctm. △ Less

Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Audio samples: https://koichi-saito-sony.github.io/soundctm/. Codes: https://github.com/sony/soundctm. Checkpoints: https://huggingface.co/Sony/soundctm

arXiv:2405.14905 [pdf, other]

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang Miao

Abstract: The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction a… ▽ More The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: The code is available at https://github.com/mk-runner/SEI-Temp or https://github.com/mk-runner/SEI

arXiv:2405.14598 [pdf, other]

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Abstract: In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method… ▽ More In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at https://docs.google.com/presentation/d/1ZtC0SeblKkut4XJcRaDsSTuCRIXB3ypxmSi7HTY3IyQ/ △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2405.14113 [pdf, other]

Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation

Authors: Zhusi Zhong, Jie Li, John Sollee, Scott Collins, Harrison Bai, Paul Zhang, Terrence Healey, Michael Atalay, Xinbo Gao, Zhicheng Jiao

Abstract: In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that foc… ▽ More In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that focuses on high-risk regions. By learning spatial correlation in the detector, MRANet visually grounds region-specific descriptions, providing robust anatomical regions with a completion strategy. The visual features of each region are embedded using a novel survival attention mechanism, offering spatially and risk-aware features for sentence encoding while maintaining global coherence across tasks. A cross LLMs alignment is employed to enhance the image-to-text transfer process, resulting in sentences rich with clinical detail and improved explainability for radiologist. Multi-center experiments validate both MRANet's overall performance and each module's composition within the model, encouraging further advancements in radiology report generation research emphasizing clinical interpretation and trustworthiness in AI models applied to medical studies. The code is available at https://github.com/zzs95/MRANet. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.08288 [pdf, other]

Orthogonal Delay-Doppler Division Multiplexing Modulation with Tomlinson-Harashima Precoding

Authors: Yiyan Ma, Akram Shafie, Jinhong Yuan, Guoyu Ma, Zhangdui Zhong, Bo Ai

Abstract: The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima p… ▽ More The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima precoding (THP) for the ODDM transmitter, to make the DD domain single-tap equalizer feasible, thereby reducing the equalization complexity. In our design, we first pre-cancel the inter-symbolinterference (ISI) using the linear time-varying (LTV) channel information. Second, different from classical THP designs, we introduce a modified modulo operation with an adaptive modulus, by which the joint DD domain data multiplexing and timedomain ISI pre-cancellation can be realized without excessively increasing the bit errors. We then analytically study the losses encountered in this design, namely the power loss, the modulo noise loss, and the modulo signal loss. Based on this analysis, BER lower bounds of the ODDM system with time domain THP are derived when 4-QAM or 16-QAM modulations are adopted for symbol mapping in the DD domain. Finally, through numerical results, we validate our analysis and then demonstrate that the ODDM system with time domain THP is a promising solution to realize better BER performance over LTV channels compared to orthogonal frequency division multiplexing systems with single-tap equalizer and ODDM systems with maximum ratio combining. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2403.13225 [pdf, other]

Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation

Authors: Linshan Wu, Zhun Zhong, Jiayi Ma, Yunchao Wei, Hao Chen, Leyuan Fang, Shutao Li

Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each… ▽ More Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each other in the feature space are more likely to share the same class, and those closer to the distribution centers tend to have higher confidence. Motivated by this, we propose to model the underlying label distributions and employ cross-label constraints to generate more accurate pseudo labels. In this paper, we develop a unified WSSS framework named Adaptive Gaussian Mixtures Model, which leverages a GMM to model the label distributions. Specifically, we calculate the feature distribution centers of pseudo-labeled pixels and build the GMM by measuring the distance between the centers and each pseudo-labeled pixel. Then, we introduce an Online Expectation-Maximization (OEM) algorithm and a novel maximization loss to optimize the GMM adaptively, aiming to learn more discriminative decision boundaries between different class-wise Gaussian mixtures. Based on the label distributions, we leverage the GMM to generate high-quality pseudo labels for more reliable supervision. Our framework is capable of solving different forms of weak labels: image-level labels, points, scribbles, blocks, and bounding-boxes. Extensive experiments on PASCAL, COCO, Cityscapes, and ADE20K datasets demonstrate that our framework can effectively provide more reliable supervision and outperform the state-of-the-art methods under all settings. Code will be available at https://github.com/Luffy03/AGMM-SASS. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.10497 [pdf, ps, other]

Data-Driven Distributionally Robust Safety Verification Using Barrier Certificates and Conditional Mean Embeddings

Authors: Oliver Schön, Zhengang Zhong, Sadegh Soudjani

Abstract: Algorithmic verification of realistic systems to satisfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many approaches still rely on exact models of the underlying systems. Since this assumption can rarely be met in practice, models have to be inferred from measurement data or are bypassed co… ▽ More Algorithmic verification of realistic systems to satisfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many approaches still rely on exact models of the underlying systems. Since this assumption can rarely be met in practice, models have to be inferred from measurement data or are bypassed completely. Whilst former usually requires the model structure to be known a-priori and immense amounts of data to be available, latter gives rise to a plethora of restrictive mathematical assumptions about the unknown dynamics. In a pursuit of developing scalable formal verification algorithms without shifting the problem to unrealistic assumptions, we employ the concept of barrier certificates, which can guarantee safety of the system, and learn the certificate directly from a compact set of system trajectories. We use conditional mean embeddings to embed data from the system into a reproducing kernel Hilbert space (RKHS) and construct an RKHS ambiguity set that can be inflated to robustify the result w.r.t. a set of plausible transition kernels. We show how to solve the resulting program efficiently using sum-of-squares optimization and a Gaussian process envelope. Our approach lifts the need for restrictive assumptions on the system dynamics and uncertainty, and suggests an improvement in the sample complexity of verifying the safety of a system on a tested case study compared to a state-of-the-art approach. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 7 pages, 2 figures, accepted to American Control Conference (ACC) 2024

arXiv:2403.00505 [pdf, other]

A Cluster-Based Statistical Channel Model for Integrated Sensing and Communication Channels

Authors: Zhengyu Zhang, Ruisi He, Bo Ai, Mi Yang, Yong Niu, Zhangdui Zhong, Yujian Li, Xuejian Zhang, Jing Li

Abstract: The emerging 6G network envisions integrated sensing and communication (ISAC) as a promising solution to meet growing demand for native perception ability. To optimize and evaluate ISAC systems and techniques, it is crucial to have an accurate and realistic wireless channel model. However, some important features of ISAC channels have not been well characterized, for example, most existing ISAC ch… ▽ More The emerging 6G network envisions integrated sensing and communication (ISAC) as a promising solution to meet growing demand for native perception ability. To optimize and evaluate ISAC systems and techniques, it is crucial to have an accurate and realistic wireless channel model. However, some important features of ISAC channels have not been well characterized, for example, most existing ISAC channel models consider communication channels and sensing channels independently, whereas ignoring correlation under the consistent environment. Moreover, sensing channels have not been well modeled in the existing standard-level channel models. Therefore, in order to better model ISAC channel, a cluster-based statistical channel model is proposed in this paper, which is based on measurements conducted at 28 GHz. In the proposed model, a new framework based on 3GPP standard is proposed, which includes communication clusters and sensing clusters. Clustering and tracking algorithms are used to extract and analyze ISAC channel characteristics. Furthermore, some special sensing cluster structures such as shared sensing cluster, newborn sensing cluster, etc., are defined to model correlation and difference between communication and sensing channels. Finally, accuracy of the proposed model is validated based on measurements and simulations. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2311.07128 [pdf, other]

doi 10.1109/TVT.2023.3331707

Sum Rate Maximization under AoI Constraints for RIS-Assisted mmWave Communications

Authors: Ziqi Guo, Yong Niu, Shiwen Mao, Changming Zhang, Ning Wang, Zhangdui Zhong, Bo Ai

Abstract: The concept of age of information (AoI) has been proposed to quantify information freshness, which is crucial for time-sensitive applications. However, in millimeter wave (mmWave) communication systems, the link blockage caused by obstacles and the severe path loss greatly impair the freshness of information received by the user equipments (UEs). In this paper, we focus on reconfigurable intellige… ▽ More The concept of age of information (AoI) has been proposed to quantify information freshness, which is crucial for time-sensitive applications. However, in millimeter wave (mmWave) communication systems, the link blockage caused by obstacles and the severe path loss greatly impair the freshness of information received by the user equipments (UEs). In this paper, we focus on reconfigurable intelligent surface (RIS)-assisted mmWave communications, where beamforming is performed at transceivers to provide directional beam gain and a RIS is deployed to combat link blockage. We aim to maximize the system sum rate while satisfying the information freshness requirements of UEs by jointly optimizing the beamforming at transceivers, the discrete RIS reflection coefficients, and the UE scheduling strategy. To facilitate a practical solution, we decompose the problem into two subproblems. For the first per-UE data rate maximization problem, we further decompose it into a beamforming optimization subproblem and a RIS reflection coefficient optimization subproblem. Considering the difficulty of channel estimation, we utilize the hierarchical search method for the former and the local search method for the latter, and then adopt the block coordinate descent (BCD) method to alternately solve them. For the second scheduling strategy design problem, a low-complexity heuristic scheduling algorithm is designed. Simulation results show that the proposed algorithm can effectively improve the system sum rate while satisfying the information freshness requirements of all UEs. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.13267 [pdf, other]

On the Language Encoder of Contrastive Cross-modal Models

Authors: Mengjie Zhao, Junya Ono, Zhi Zhong, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Takashi Shibuya, Hiromi Wakaki, Yuki Mitsufuji

Abstract: Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding… ▽ More Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding training affect language encoder quality and cross-modal task performance. In VL pretraining, we found that sentence embedding training language encoder quality and aids in cross-modal tasks, improving contrastive VL models such as CyCLIP. In contrast, AL pretraining benefits less from sentence embedding training, which may result from the limited amount of pretraining data. We analyze the representation spaces to understand the strengths of sentence embedding training, and find that it improves text-space uniformity, at the cost of decreased cross-modal alignment. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.12429 [pdf, other]

Reconfigurable Intelligent Surface Assisted High-Speed Train Communications: Coverage Performance Analysis and Placement Optimization

Authors: Changzhu Liu, Ruisi He, Yong Niu, Zhu Han, Bo Ai, Meilin Gao, Zhangfeng Ma, Gongpu Wang, Zhangdui Zhong

Abstract: Reconfigurable intelligent surface (RIS) emerges as an efficient and promising technology for the next wireless generation networks and has attracted a lot of attention owing to the capability of extending wireless coverage by reflecting signals toward targeted receivers. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve… ▽ More Reconfigurable intelligent surface (RIS) emerges as an efficient and promising technology for the next wireless generation networks and has attracted a lot of attention owing to the capability of extending wireless coverage by reflecting signals toward targeted receivers. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve coverage probability. First, coverage performance of the downlink single-input-single-output system is investigated, and the closed-form expression of coverage probability is derived. Moreover, travel distance maximization problem is formulated to facilitate RIS discrete phase design and RIS placement optimization, which is subject to coverage probability constraint. Simulation results validate that better coverage performance and higher travel distance can be achieved with deployment of RIS. The impacts of some key system parameters including transmission power, signal-to-noise ratio threshold, number of RIS elements, number of RIS quantization bits, horizontal distance between base station and RIS, and speed of HST on system performance are investigated. In addition, it is found that RIS can well improve coverage probability with limited power consumption for HST communications. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 14 figures, accepted by IEEE Transactions on Vehicular Technology

arXiv:2308.09929 [pdf, other]

RIS-assisted High-Speed Railway Integrated Sensing and Communication System

Authors: Panpan Li, Yong Niu, Hao Wu, Zhu Han, Guiqi Sun, Ning Wang, Zhangdui Zhong, Bo Ai

Abstract: One technology that has the potential to improve wireless communications in years to come is integrated sensing and communication (ISAC). In this study, we take advantage of reconfigurable intelligent surface's (RIS) potential advantages to achieve ISAC while using the same frequency and resources. Specifically, by using the reflecting elements, the RIS dynamically modifies the radio waves' streng… ▽ More One technology that has the potential to improve wireless communications in years to come is integrated sensing and communication (ISAC). In this study, we take advantage of reconfigurable intelligent surface's (RIS) potential advantages to achieve ISAC while using the same frequency and resources. Specifically, by using the reflecting elements, the RIS dynamically modifies the radio waves' strength or phase in order to change the environment for radio transmission and increase the ISAC systems' transmission rate. We investigate a single cell downlink communication situation with RIS assistance. Combining the ISAC base station's (BS) beamforming with RIS's discrete phase shift optimization, while guaranteeing the sensing signal, The aim of optimizing the sum rate is specified. We take advantage of alternating maximization to find practical solutions with dividing the challenge into two minor issues. The first power allocation subproblem is non-convex that CVX solves by converting it to convex. A local search strategy is used to solve the second subproblem of phase shift optimization. According to the results of the simulation, using RIS with adjusted phase shifts can significantly enhance the ISAC system's performance. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: 12 pages

arXiv:2308.00393 [pdf, other]

A Survey of Time Series Anomaly Detection Methods in the AIOps Domain

Authors: Zhenyu Zhong, Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei

Abstract: Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection… ▽ More Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues. This review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.16259 [pdf, ps, other]

Communication-Sensing Region for Cell-Free Massive MIMO ISAC Systems

Authors: Weihao Mao, Yang Lu, Chong-Yung Chi, Bo Ai, Zhangdui Zhong, Zhiguo Ding

Abstract: This paper investigates the system model and the transmit beamforming design for the Cell-Free massive multi-input multi-output (MIMO) integrated sensing and communication (ISAC) system. The impact of the uncertainty of the target locations on the propagation of wireless signals is considered during both uplink and downlink phases, and especially, the main statistics of the MIMO channel estimation… ▽ More This paper investigates the system model and the transmit beamforming design for the Cell-Free massive multi-input multi-output (MIMO) integrated sensing and communication (ISAC) system. The impact of the uncertainty of the target locations on the propagation of wireless signals is considered during both uplink and downlink phases, and especially, the main statistics of the MIMO channel estimation error are theoretically derived in the closed-form fashion. A fundamental performance metric, termed communication-sensing (C-S) region, is defined for the considered system via three cases, i.e., the sensing-only case, the communication-only case and the ISAC case. The transmit beamforming design problems for the three cases are respectively carried out through different reformulations, e.g., the Lagrangian dual transform and the quadratic fractional transform, and some combinations of the block coordinate descent method and the successive convex approximation method. Numerical results present a 3-dimensional C-S region with a dynamic number of access points to illustrate the trade-off between communication and radar sensing. The advantage for radar sensing of the Cell-Free massive MIMO system is also studied via a comparison with the traditional cellular system. Finally, the efficacy of the proposed beamforming scheme is validated in comparison with zero-forcing and maximum ratio transmission schemes. △ Less

Submitted 30 July, 2023; originally announced July 2023.

arXiv:2306.07888 [pdf, other]

CAMEO: A Causal Transfer Learning Approach for Performance Optimization of Configurable Computer Systems

Authors: Md Shahriar Iqbal, Ziyuan Zhong, Iftakhar Ahmad, Baishakhi Ray, Pooyan Jamshidi

Abstract: Modern computer systems are highly configurable, with hundreds of configuration options that interact, resulting in an enormous configuration space. As a result, optimizing performance goals (e.g., latency) in such systems is challenging due to frequent uncertainties in their environments (e.g., workload fluctuations). Recently, transfer learning has been applied to address this problem by reusing… ▽ More Modern computer systems are highly configurable, with hundreds of configuration options that interact, resulting in an enormous configuration space. As a result, optimizing performance goals (e.g., latency) in such systems is challenging due to frequent uncertainties in their environments (e.g., workload fluctuations). Recently, transfer learning has been applied to address this problem by reusing knowledge from configuration measurements from the source environments, where it is cheaper to intervene than the target environment, where any intervention is costly or impossible. Recent empirical research showed that statistical models can perform poorly when the deployment environment changes because the behavior of certain variables in the models can change dramatically from source to target. To address this issue, we propose CAMEO, a method that identifies invariant causal predictors under environmental changes, allowing the optimization process to operate in a reduced search space, leading to faster optimization of system performance. We demonstrate significant performance improvements over state-of-the-art optimization methods in MLperf deep learning systems, a video analytics pipeline, and a database system. △ Less

Submitted 3 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.10734 [pdf, other]

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Authors: Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

Abstract: Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us… ▽ More Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us to use the further complementarity between predictive and diffusion-based generative SE. In this paper, we propose a unified system that use jointly generative and predictive decoders across two levels. The encoder encodes both generative and predictive information at the shared encoding level. At the decoded feature level, we fuse the two decoded features by generative and predictive decoders. Specifically, the two SE modules are fused in the initial and final diffusion steps: the initial fusion initializes the diffusion process with the predictive SE to improve convergence, and the final fusion combines the two complementary SE outputs to enhance SE performance. Experiments conducted on the Voice-Bank dataset demonstrate that incorporating predictive information leads to faster decoding and higher PESQ scores compared with other score-based diffusion SE (StoRM and SGMSE+). △ Less

Submitted 28 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.06701 [pdf, ps, other]

Extending Audio Masked Autoencoders Toward Audio Restoration

Authors: Zhi Zhong, Hao Shi, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Abstract: Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced benefits, there has been rising interest in how to improve the performance of pretrained models for restoration tasks, e.g., s… ▽ More Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced benefits, there has been rising interest in how to improve the performance of pretrained models for restoration tasks, e.g., speech enhancement (SE). Previous works have shown that the features extracted by pretrained audio encoders are effective for SE tasks, but these speech-specialized encoder-only models usually require extra decoders to become compatible with SE, and involve complicated pretraining procedures or complex data augmentation. Therefore, in pursuit of a universal audio model, the audio masked autoencoder (MAE) whose backbone is the autoencoder of Vision Transformers (ViT-AE), is extended from audio classification to SE, a representative restoration task with well-established evaluation standards. ViT-AE learns to restore masked audio signal via a mel-to-mel mapping during pretraining, which is similar to restoration tasks like SE. We propose variations of ViT-AE for a better SE performance, where the mel-to-mel variations yield high scores in non-intrusive metrics and the STFT-oriented variation is effective at intrusive metrics such as PESQ. Different variations can be used in accordance with the scenarios. Comprehensive evaluations reveal that MAE pretraining is beneficial to SE tasks and help the ViT-AE to better generalize to out-of-domain distortions. We further found that large-scale noisy data of general audio sources, rather than clean speech, is sufficiently effective for pretraining. △ Less

Submitted 17 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: WASPAA 2023.Copyright 2023 IEEE.Personal use of this material is permitted.Permission from IEEE must be obtained for all other uses,in any current or future media,including reprinting/republishing this material for advertising or promotional purposes, creating new collective works,for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works

arXiv:2305.03704 [pdf, other]

A 3D Modeling Method for Scattering on Rough Surfaces at the Terahertz Band

Authors: Ben Chen, Ke Guan, Danping He, Pengxiang Xie, Zhangdui Zhong, Jianwu Dou, Shahid Mumtaz, Wael Bazzi

Abstract: The terahertz (THz) band (0.1-10 THz) is widely considered to be a candidate band for the sixth-generation mobile communication technology (6G). However, due to its short wavelength (less than 1 mm), scattering becomes a particularly significant propagation mechanism. In previous studies, we proposed a scattering model to characterize the scattering in THz bands, which can only reconstruct the sca… ▽ More The terahertz (THz) band (0.1-10 THz) is widely considered to be a candidate band for the sixth-generation mobile communication technology (6G). However, due to its short wavelength (less than 1 mm), scattering becomes a particularly significant propagation mechanism. In previous studies, we proposed a scattering model to characterize the scattering in THz bands, which can only reconstruct the scattering in the incidence plane. In this paper, a three-dimensional (3D) stochastic model is proposed to characterize the THz scattering on rough surfaces. Then, we reconstruct the scattering on rough surfaces with different shapes and under different incidence angles utilizing the proposed model. Good agreements can be achieved between the proposed model and full-wave simulation results. This stochastic 3D scattering model can be integrated into the standard channel modeling framework to realize more realistic THz channel data for the evaluation of 6G. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2305.03308 [pdf]

Tiny-PPG: A Lightweight Deep Neural Network for Real-Time Detection of Motion Artifacts in Photoplethysmogram Signals on Edge Devices

Authors: Yali Zheng, Chen Wu, Peizheng Cai, Zhiqiang Zhong, Hongda Huang, Yuqi Jiang

Abstract: Photoplethysmogram (PPG) signals are easily contaminated by motion artifacts in real-world settings, despite their widespread use in Internet-of-Things (IoT) based wearable and smart health devices for cardiovascular health monitoring. This study proposed a lightweight deep neural network, called Tiny-PPG, for accurate and real-time PPG artifact segmentation on IoT edge devices. The model was trai… ▽ More Photoplethysmogram (PPG) signals are easily contaminated by motion artifacts in real-world settings, despite their widespread use in Internet-of-Things (IoT) based wearable and smart health devices for cardiovascular health monitoring. This study proposed a lightweight deep neural network, called Tiny-PPG, for accurate and real-time PPG artifact segmentation on IoT edge devices. The model was trained and tested on a public dataset, PPG DaLiA, which featured complex artifacts with diverse lengths and morphologies during various daily activities of 15 subjects using a watch-type device (Empatica E4). The model structure, training method and loss function were specifically designed to balance detection accuracy and speed for real-time PPG artifact detection in resource-constrained embedded devices. To optimize the model size and capability in multi-scale feature representation, the model employed depth-wise separable convolution and atrous spatial pyramid pooling modules, respectively. Additionally, the contrastive loss was also utilized to further optimize the feature embeddings. With additional model pruning, Tiny-PPG achieved state-of-the-art detection accuracy of 87.4% while only having 19,726 model parameters (0.15 megabytes), and was successfully deployed on an STM32 embedded system for real-time PPG artifact detection. Therefore, this study provides an effective solution for resource-constraint IoT smart health devices in PPG artifact detection. △ Less

Submitted 10 October, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2303.11642 [pdf, other]

Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark

Authors: Muyao Niu, Zhuoxiao Li, Zhihang Zhong, Yinqiang Zheng

Abstract: Seeing-in-the-dark is one of the most important and challenging computer vision tasks due to its wide applications and extreme complexities of in-the-wild scenarios. Existing arts can be mainly divided into two threads: 1) RGB-dependent methods restore information using degraded RGB inputs only (\eg, low-light enhancement), 2) RGB-independent methods translate images captured under auxiliary near-… ▽ More Seeing-in-the-dark is one of the most important and challenging computer vision tasks due to its wide applications and extreme complexities of in-the-wild scenarios. Existing arts can be mainly divided into two threads: 1) RGB-dependent methods restore information using degraded RGB inputs only (\eg, low-light enhancement), 2) RGB-independent methods translate images captured under auxiliary near-infrared (NIR) illuminants into RGB domain (\eg, NIR2RGB translation). The latter is very attractive since it works in complete darkness and the illuminants are visually friendly to naked eyes, but tends to be unstable due to its intrinsic ambiguities. In this paper, we try to robustify NIR2RGB translation by designing the optimal spectrum of auxiliary illumination in the wide-band VIS-NIR range, while keeping visual friendliness. Our core idea is to quantify the visibility constraint implied by the human vision system and incorporate it into the design pipeline. By modeling the formation process of images in the VIS-NIR range, the optimal multiplexing of a wide range of LEDs is automatically designed in a fully differentiable manner, within the feasible region defined by the visibility constraint. We also collect a substantially expanded VIS-NIR hyperspectral image dataset for experiments by using a customized 50-band filter wheel. Experimental results show that the task can be significantly improved by using the optimized wide-band illumination than using NIR only. Codes Available: https://github.com/MyNiuuu/VCSD. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2302.08136 [pdf, ps, other]

An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification

Authors: Zhi Zhong, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Shusuke Takahashi, Yuki Mitsufuji

Abstract: Although music is typically multi-label, many works have studied hierarchical music tagging with simplified settings such as single-label data. Moreover, there lacks a framework to describe various joint training methods under the multi-label setting. In order to discuss the above topics, we introduce hierarchical multi-label music instrument classification task. The task provides a realistic sett… ▽ More Although music is typically multi-label, many works have studied hierarchical music tagging with simplified settings such as single-label data. Moreover, there lacks a framework to describe various joint training methods under the multi-label setting. In order to discuss the above topics, we introduce hierarchical multi-label music instrument classification task. The task provides a realistic setting where multi-instrument real music data is assumed. Various hierarchical methods that jointly train a DNN are summarized and explored in the context of the fusion of deep learning and conventional techniques. For the effective joint training in the multi-label setting, we propose two methods to model the connection between fine- and coarse-level tags, where one uses rule-based grouped max-pooling, the other one uses the attention mechanism obtained in a data-driven manner. Our evaluation reveals that the proposed methods have advantages over the method without joint training. In addition, the decision procedure within the proposed methods can be interpreted by visualizing attention maps or referring to fixed rules. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: To appear at ICASSP 2023

arXiv:2301.11557 [pdf, other]

A Ray-tracing and Deep Learning Fusion Super-resolution Modeling Method for Wireless Mobile Channel

Authors: Zhao Zhang, Danping He, Xiping Wang, Ke Guan, Zhangdui Zhong, Jianwu Dou

Abstract: Mobile channel modeling has always been the core part for design, deployment and optimization of communication system, especially in 5G and beyond era. Deterministic channel modeling could precisely achieve mobile channel description, however with defects of equipment and time consuming. In this paper, we proposed a novel super resolution (SR) model for cluster characteristics prediction. The mode… ▽ More Mobile channel modeling has always been the core part for design, deployment and optimization of communication system, especially in 5G and beyond era. Deterministic channel modeling could precisely achieve mobile channel description, however with defects of equipment and time consuming. In this paper, we proposed a novel super resolution (SR) model for cluster characteristics prediction. The model is based on deep neural networks with residual connection. A series of simulations at 3.5 GHz are conducted by a three-dimensional ray tracing (RT) simulator in diverse scenarios. Cluster characteristics are extracted and corresponding data sets are constructed to train the model. Experiments demonstrate that the proposed SR approach could achieve better power and cluster location prediction performance than traditional interpolation method and the root mean square error (RMSE) drops by 51% and 78% relatively. Channel impulse response (CIR) is reconstructed based on cluster characteristics, which could match well with the multi-path component (MPC). The proposed method can be used to efficiently and accurately generate big data of mobile channel, which significantly reduces the computation time of RT-only. △ Less

Submitted 27 January, 2023; originally announced January 2023.

Comments: 5 pages,7 figures,accepted by EuCAP2023

arXiv:2301.05629 [pdf, other]

Electromagnetic-Compliant Channel Modeling and Performance Evaluation for Holographic MIMO

Authors: Tengjiao Wang, Wei Han, Zhimeng Zhong, Jiyong Pang, Guohua Zhou, Shaobo Wang, Qiang Li

Abstract: Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO s… ▽ More Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO systems, which is able to model both the characteristics of the propagation channel and the non-ideal factors caused by mutual coupling at the transceivers, including the antenna pattern distortion and the decrease of antenna efficiency. Based on the proposed channel model, a more realistic performance evaluation is conducted to show the performance of the holographic MIMO system in both the single-user and the multi-user scenarios. Key challenges and future research directions are further provided based on the theoretical analyses and numerical results. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 6 pages, 4 figures, to be published in IEEE GLOBECOM 2022

arXiv:2211.14595 [pdf, other]

Tube-based Distributionally Robust Model Predictive Control for Nonlinear Process Systems via Linearization

Authors: Zhengang Zhong, Ehecatl Antonio del Rio-Chanona, Panagiotis Petsagkourakis

Abstract: Model predictive control (MPC) is an effective approach to control multivariable dynamic systems with constraints. Most real dynamic models are however affected by plant-model mismatch and process uncertainties, which can lead to closed-loop performance deterioration and constraint violations. Methods such as stochastic MPC (SMPC) have been proposed to alleviate these problems; however, the result… ▽ More Model predictive control (MPC) is an effective approach to control multivariable dynamic systems with constraints. Most real dynamic models are however affected by plant-model mismatch and process uncertainties, which can lead to closed-loop performance deterioration and constraint violations. Methods such as stochastic MPC (SMPC) have been proposed to alleviate these problems; however, the resulting closed-loop state trajectory might still significantly violate the prescribed constraints if the real system deviates from the assumed disturbance distributions made during the controller design. In this work we propose a novel data-driven distributionally robust MPC scheme for nonlinear systems. Unlike SMPC, which requires the exact knowledge of the disturbance distribution, our scheme decides the control action with respect to the worst distribution from a distribution ambiguity set. This ambiguity set is defined as a Wasserstein ball centered at the empirical distribution. Due to the potential model errors that cause off-sets, the scheme is also extended by leveraging an offset-free method. The favorable results of this control scheme are demonstrated and empirically verified with a nonlinear mass spring system and a nonlinear CSTR case study. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2209.14245 [pdf, other]

Framework for Highway Traffic Profiling using Connected Vehicle Data

Authors: Zijia Zhong, Liuhui Zhao, Branislav Dimitrijevic, Dejan Besenski, Joyoung Lee

Abstract: The connected vehicle (CV) data could potentially revolutionize the traffic monitoring landscape as a new source of CV data that are collected exclusively from original equipment manufactures (OEMs) have emerged in the commercial market in recent years. Compared to existing CV data that are used by agencies, the new-generation of CV data have certain advantages including nearly ubiquitous coverage… ▽ More The connected vehicle (CV) data could potentially revolutionize the traffic monitoring landscape as a new source of CV data that are collected exclusively from original equipment manufactures (OEMs) have emerged in the commercial market in recent years. Compared to existing CV data that are used by agencies, the new-generation of CV data have certain advantages including nearly ubiquitous coverage, high temporal resolution, high spatial accuracy, and enriched vehicle telematics data (e.g., hard braking events). This paper proposed a traffic profiling framework that target vehicle-level performance indexes across mobility, safety, riding comfort, traffic flow stability, and fuel consumption. The proof-of-concept study of a major interstate highway (i.e., I-280 NJ), using the CV data, illustrates the feasibility of going beyond traditional aggregated traffic metrics. Lastly, potential applications for either historical analysis and even near real-time monitoring are discussed. The proposed framework can be easily scaled and is particularly valuable for agencies that wish to systemically monitoring regional or statewide roadways without substantial investment on infrastructure-based sensing (and the associated on-going maintenance costs) △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: 7 pages, 8 figures

arXiv:2208.04703 [pdf, other]

Assessing Connected Vehicle Data Coverage on New Jersey Roadways

Authors: Branislav Dimitrijevic, Zijia Zhong, Liuhui Zhao, Dejan Besenski, Joyoung Lee

Abstract: The connected vehicle data (CVD) is one of the most promising emerging mobility data that greatly increases the ability to effectively monitor transportation system performance. A commercial vehicle trajectory dataset was evaluated for market penetration and coverage to establish whether it represents a sufficient sample of the vehicle volumes across the statewide roadway network of New Jersey. Th… ▽ More The connected vehicle data (CVD) is one of the most promising emerging mobility data that greatly increases the ability to effectively monitor transportation system performance. A commercial vehicle trajectory dataset was evaluated for market penetration and coverage to establish whether it represents a sufficient sample of the vehicle volumes across the statewide roadway network of New Jersey. The dataset (officially named Wejo Vehicle Movement data) was compared to the vehicle volumes obtained from 46 weight-in-motion (WIM) traffic count stations during the corresponding two-month period. The observed market penetration rates of the Movement data for the interstate highways, non-interstate expressways, major arterials, and minor arterials are 2.55% (std. dev. 0.76%), 2.31% (std. dev. 1.07%), 3.25% (standard deviation 1.48%), and 4.39% (standard deviation 2.65%), respectively. Additionally, the temporal resolution of the dataset (i.e., the time interval between consecutive Wejo vehicle trips captured at a given roadway section, time-of-day variation, day-of-month variation) was also found to be consistent among the evaluated WIM locations. Although relatively low (less than 5%), the consistent market penetration, combined with uniform spatial distribution of equipped vehicles within the traffic flow, could enable or enhance a wide range of traffic analytics applications. △ Less

Submitted 7 September, 2022; v1 submitted 23 July, 2022; originally announced August 2022.

Comments: 6 pages, 9 figures

arXiv:2207.03127 [pdf]

5G for Railways: the Next Generation Railway Dedicated Communications

Authors: Ruisi He, Bo Ai, Zhangdui Zhong, Mi Yang, Ruifeng Chen, Jianwen Ding, Zhangfeng Ma, Guiqi Sun, Changzhu Liu

Abstract: To overcome increasing traffic, provide various new services, further ensure safety and security, significantly improve travel comfort, a new communication system for railways is required. Since 2019, public networks have been evolving to the fifth generation communication (5G) worldwide, whereas the main communication system of railway is still based on the second generation communication (2G). I… ▽ More To overcome increasing traffic, provide various new services, further ensure safety and security, significantly improve travel comfort, a new communication system for railways is required. Since 2019, public networks have been evolving to the fifth generation communication (5G) worldwide, whereas the main communication system of railway is still based on the second generation communication (2G). It is thus necessary for railways to replace the current 2G-based technology with the next generation railway dedicated communication system with improved capacity and capability, and the 5G for railways (5G-R) technology is a promising solution for further intelligent railways. This article gives a review of the current developments of the next generation railway communications, followed by a discussion of the typical services that the 5G-R can provide to intelligent railways. Then, main application scenarios of 5G-R are summarized and system configurations are compared. Some key technologies of 5G-R such as network architecture, massive MIMO, millimeter-wave, multiple access scheme, ultra-reliable low latency communication, and advanced video processing are presented and analyzed. Finally, some challenges of 5G-R are highlighted. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.02308 [pdf, other]

Reconfigurable intelligent surfaces: Channel characterization and modeling

Authors: Jie Huang, Cheng-Xiang Wang, Yingzhuo Sun, Rui Feng, Jialing Huang, Bolun Guo, Zhimeng Zhong, Tie Jun Cui

Abstract: Reconfigurable intelligent surfaces (RISs) are two dimensional (2D) metasurfaces which can intelligently manipulate electromagnetic waves by low-cost near passive reflecting elements. RIS is viewed as a potential key technology for the sixth generation (6G) wireless communication systems mainly due to its advantages in tuning wireless signals, thus smartly controlling propagation environments. In… ▽ More Reconfigurable intelligent surfaces (RISs) are two dimensional (2D) metasurfaces which can intelligently manipulate electromagnetic waves by low-cost near passive reflecting elements. RIS is viewed as a potential key technology for the sixth generation (6G) wireless communication systems mainly due to its advantages in tuning wireless signals, thus smartly controlling propagation environments. In this paper, we aim at addressing channel characterization and modeling issues of RIS-assisted wireless communication systems. At first, the concept, principle, and potential applications of RIS are given. An overview of RIS based channel measurements and experiments is presented by classifying frequency bands, scenarios, system configurations, RIS constructions, experiment purposes, and channel observations. Then, RIS based channel characteristics are studied, including reflection and transmission, Doppler effect and multipath fading mitigation, channel reciprocity, channel hardening, rank improvement, far field and near field, etc. RIS based channel modeling works are investigated, including largescale path loss models and small-scale multipath fading models. At last, future research directions related to RIS-assisted channels are also discussed. △ Less

Submitted 5 June, 2022; originally announced June 2022.

arXiv:2205.13133 [pdf, other]

Coverage Probability Analysis of RIS-Assisted High-Speed Train Communications

Authors: Changzhu Liu, Ruisi He, Yong Niu, Bo Ai, Zhu Han, Zhangfeng Ma, Meilin Gao, Zhangdui Zhong, Ning Wang

Abstract: Reconfigurable intelligent surface (RIS) has received increasing attention due to its capability of extending cell coverage by reflecting signals toward receivers. This paper considers a RIS-assisted high-speed train (HST) communication system to improve the coverage probability. We derive the closed-form expression of coverage probability. Moreover, we analyze impacts of some key system parameter… ▽ More Reconfigurable intelligent surface (RIS) has received increasing attention due to its capability of extending cell coverage by reflecting signals toward receivers. This paper considers a RIS-assisted high-speed train (HST) communication system to improve the coverage probability. We derive the closed-form expression of coverage probability. Moreover, we analyze impacts of some key system parameters, including transmission power, signal-to-noise ratio threshold, and horizontal distance between base station and RIS. Simulation results verify the efficiency of RIS-assisted HST communications in terms of coverage probability. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 6 pages, 6 figures,submmited to GlobeCom 2022

arXiv:2201.00863 [pdf, other]

Adaptive Model Predictive Control of Wheeled Mobile Robots

Authors: Nikhil Potu Surya Prakash, Tamara Perreault, Trevor Voth, Zejun Zhong

Abstract: In this paper, a control algorithm for guiding a two wheeled mobile robot with unknown inertia to a desired point and orientation using an Adaptive Model Predictive Control (AMPC) framework is presented. The two wheeled mobile robot is modeled as a knife edge or a skate with nonholonomic kinematic constraints and the dynamical equations are derived using the Lagrangian approach. The inputs at ever… ▽ More In this paper, a control algorithm for guiding a two wheeled mobile robot with unknown inertia to a desired point and orientation using an Adaptive Model Predictive Control (AMPC) framework is presented. The two wheeled mobile robot is modeled as a knife edge or a skate with nonholonomic kinematic constraints and the dynamical equations are derived using the Lagrangian approach. The inputs at every time instant are obtained from Model Predictive Control (MPC) with a set of nominal parameters which are updated using a recursive least squares algorithm. The efficacy of the algorithm is demonstrated through numerical simulations at the end of the paper. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: 5 pages, 7 figures

arXiv:2111.12228 [pdf, other]

doi 10.1109/TAP.2022.3149665

Artificial intelligence enabled radio propagation for communications-Part II: Scenario identification and channel modeling

Authors: Chen Huang, Ruisi He, Bo Ai, Andreas F. Molisch, Buon Kiong Lau, Katsuyuki Haneda, Bo Liu, Cheng-Xiang Wang, Mi Yang, Claude Oestges, Zhangdui Zhong

Abstract: This two-part paper investigates the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. In Part I, we introduced AI and ML as well as provided a comprehensive survey on ML enabled channel characterization and antenna-channel optimization, and in this part (Part II) we review state-of-the-art literature on scenario iden… ▽ More This two-part paper investigates the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. In Part I, we introduced AI and ML as well as provided a comprehensive survey on ML enabled channel characterization and antenna-channel optimization, and in this part (Part II) we review state-of-the-art literature on scenario identification and channel modeling here. In particular, the key ideas of ML for scenario identification and channel modeling/prediction are presented, and the widely used ML methods for propagation scenario identification and channel modeling and prediction are analyzed and compared. Based on the state-of-art, the future challenges of AI/ML-based channel data processing techniques are given as well. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2111.12227 [pdf, other]

doi 10.1109/TAP.2022.3149663

Artificial intelligence enabled radio propagation for communications-Part I: Channel characterization and antenna-channel optimization

Authors: Chen Huang, Ruisi He, Bo Ai, Andreas F. Molisch, Buon Kiong Lau, Katsuyuki Haneda, Bo Liu, Cheng-Xiang Wang, Mi Yang, Claude Oestges, Zhangdui Zhong

Abstract: To provide higher data rates, as well as better coverage, cost efficiency, security, adaptability, and scalability, the 5G and beyond 5G networks are developed with various artificial intelligence techniques. In this two-part paper, we investigate the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. It firstly provid… ▽ More To provide higher data rates, as well as better coverage, cost efficiency, security, adaptability, and scalability, the 5G and beyond 5G networks are developed with various artificial intelligence techniques. In this two-part paper, we investigate the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. It firstly provides a comprehensive overview of ML for channel characterization and ML-based antenna-channel optimization in this first part, and then it gives a state-of-the-art literature review of channel scenario identification and channel modeling in Part II. Fundamental results and key concepts of ML for communication networks are presented, and widely used ML methods for channel data processing, propagation channel estimation, and characterization are analyzed and compared. A discussion of challenges and future research directions for ML-enabled next generation networks of the topics covered in this part rounds off the paper. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2108.13782 [pdf, other]

Robust Symbol-Level Precoding and Passive Beamforming for IRS-Aided Communications

Authors: Guangyang Zhang, Chao Shen, Bo Ai, Zhangdui Zhong

Abstract: This paper investigates a joint beamforming design in a multiuser multiple-input single-output (MISO) communication network aided with an intelligent reflecting surface (IRS) panel. The symbol-level precoding (SLP) is adopted to enhance the system performance by exploiting the multiuser interference (MUI) with consideration of bounded channel uncertainty. The joint beamforming design is formulated… ▽ More This paper investigates a joint beamforming design in a multiuser multiple-input single-output (MISO) communication network aided with an intelligent reflecting surface (IRS) panel. The symbol-level precoding (SLP) is adopted to enhance the system performance by exploiting the multiuser interference (MUI) with consideration of bounded channel uncertainty. The joint beamforming design is formulated into a nonconvex worst-case robust programming to minimize the transmit power subject to single-to-noise ratio (SNR) requirements. To address the challenges due to the constant modulus and the coupling of the beamformers, we first study the single-user case. Specifically, we propose and compare two algorithms based on the semidefinite relaxation (SDR) and alternating optimization (AO) methods, respectively. It turns out that the AO-based algorithm has much lower computational complexity but with almost the same power to the SDR-based algorithm. Then, we apply the AO technique to the multiuser case and thereby develop an algorithm based on the proximal gradient descent (PGD) method. The algorithm can be generalized to the case of finite-resolution IRS and the scenario with direct links from the transmitter to the users. Numerical results show that the SLP can significantly improve the system performance. Meanwhile, 3-bit phase shifters can achieve near-optimal power performance. △ Less

Submitted 31 August, 2021; originally announced August 2021.

arXiv:2108.11902 [pdf, other]

Cluster-based Characterization and Modeling for UAV Air-to-Ground Time-Varying Channels

Authors: Zhuangzhuang Cui, Ke Guan, Claude Oestges, César Briso-Rodríguez, Bo Ai, Zhangdui Zhong

Abstract: With the deep integration between the unmanned aerial vehicle (UAV) and wireless communication, UAV-based air-to-ground (AG) propagation channels need more detailed descriptions and accurate models. In this paper, we aim to perform cluster-based characterization and modeling for AG channels. To our best knowledge, this is the first study that concentrates on the clustering and tracking of multipat… ▽ More With the deep integration between the unmanned aerial vehicle (UAV) and wireless communication, UAV-based air-to-ground (AG) propagation channels need more detailed descriptions and accurate models. In this paper, we aim to perform cluster-based characterization and modeling for AG channels. To our best knowledge, this is the first study that concentrates on the clustering and tracking of multipath components (MPCs) for time-varying AG channels. Based on measurement data at 6.5 GHz with 500 MHz of bandwidth, we first estimate potential MPCs utilizing the space-alternating generalized expectation-maximization (SAGE) algorithm. Then, we cluster the extracted MPCs considering their static and dynamic characteristics by employing K-Power-Means (KPM) algorithm under multipath component distance (MCD) measure. For characterizing time-variant clusters, we exploit a clustering-based tracking (CBT) method, which efficiently quantifies the survival lengths of clusters. Ultimately, we establish a cluster-based channel model, and validations illustrate the accuracy of the proposed model. This work not only promotes a better understanding of AG propagation channels but also provides a general cluster-based AG channel model with certain extensibility. △ Less

Submitted 26 August, 2021; originally announced August 2021.

arXiv:2105.13717 [pdf, other]

doi 10.1109/GLOBECOM46510.2021.9685078

Coverage Analysis of Cellular-Connected UAV Communications with 3GPP Antenna and Channel Models

Authors: Zhuangzhuang Cui, Ke Guan, İsmail Güvenç, Claude Oestges, Zhangdui Zhong

Abstract: For reliable and efficient communications of aerial platforms, such as unmanned aerial vehicles (UAVs), the cellular network is envisioned to provide connectivity for the aerial and ground user equipment (GUE) simultaneously, which brings challenges to the existing pattern of the base station (BS) tailored for ground-level services. Thus, we focus on the coverage probability analysis to investigat… ▽ More For reliable and efficient communications of aerial platforms, such as unmanned aerial vehicles (UAVs), the cellular network is envisioned to provide connectivity for the aerial and ground user equipment (GUE) simultaneously, which brings challenges to the existing pattern of the base station (BS) tailored for ground-level services. Thus, we focus on the coverage probability analysis to investigate the coexistence of aerial and terrestrial users, by employing realistic antenna and channel models reported in the 3rd Generation Partnership Project (3GPP). The homogeneous Poisson point process (PPP) is used to describe the BS distribution, and the BS antenna is adjustable in the down-tilted angle and the number of the antenna array. Meantime, omnidirectional antennas are used for cellular users. We first derive the approximation of coverage probability and then conduct numerous simulations to evaluate the impacts of antenna numbers, down-tilted angles, carrier frequencies, and user heights. One of the essential findings indicates that the coverage probabilities of high-altitude users become less sensitive to the down-tilted angle. Moreover, we found that the aerial user equipment (AUE) in a certain range of heights can achieve the same or better coverage probability than that of GUE, which provides an insight into the effective deployment of cellular-connected aerial communications. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2105.08414 [pdf, other]

Data-driven distributionally robust MPC using the Wasserstein metric

Authors: Zhengang Zhong, Ehecatl Antonio del Rio-Chanona, Panagiotis Petsagkourakis

Abstract: A data-driven MPC scheme is proposed to safely control constrained stochastic linear systems using distributionally robust optimization. Distributionally robust constraints based on the Wasserstein metric are imposed to bound the state constraint violations in the presence of process disturbance. A feedback control law is solved to guarantee that the predicted states comply with constraints. The s… ▽ More A data-driven MPC scheme is proposed to safely control constrained stochastic linear systems using distributionally robust optimization. Distributionally robust constraints based on the Wasserstein metric are imposed to bound the state constraint violations in the presence of process disturbance. A feedback control law is solved to guarantee that the predicted states comply with constraints. The stochastic constraints are satisfied with regard to the worst-case distribution within the Wasserstein ball centered at their discrete empirical probability distribution. The resulting distributionally robust MPC framework is computationally tractable and efficient, as well as recursively feasible. The innovation of this approach is that all the information about the uncertainty can be determined empirically from the data. The effectiveness of the proposed scheme is demonstrated through numerical case studies. △ Less

Submitted 18 May, 2021; originally announced May 2021.

arXiv:2105.01511 [pdf]

Radio Communication Scenarios in 5G-Railways

Authors: Ruisi He, Bo Ai, Zhangdui Zhong, Mi Yang, Chen Huang, Ruifeng Chen, Jianwen Ding, Hang Mi, Zhangfeng Ma, Guiqi Sun, Changzhu Liu

Abstract: With the rapid development of railways, especially high-speed railways, there is an increasingly urgent demand for new wireless communication system for railways. Taking the mature 5G technology as an opportunity, 5G-railways (5G-R) have been widely regarded as a solution to meet the diversified demands of railway wireless communications. For the design, deployment and improvement of 5G-R networks… ▽ More With the rapid development of railways, especially high-speed railways, there is an increasingly urgent demand for new wireless communication system for railways. Taking the mature 5G technology as an opportunity, 5G-railways (5G-R) have been widely regarded as a solution to meet the diversified demands of railway wireless communications. For the design, deployment and improvement of 5G-R networks, radio communication scenario classification plays an important role, affecting channel modeling and system performance evaluation. In this paper, a standardized radio communication scenario classification, including 18 scenarios, is proposed for 5G-R. This paper analyzes the differences of 5G-R scenarios compared with the traditional cellular networks and GSM-railways, according to 5G-R requirements and the unique physical environment and propagation characteristics. The proposed standardized scenario classification helps deepen the research of 5G-R and promote the development and application of the existing advanced technologies in railways. △ Less

Submitted 6 April, 2021; originally announced May 2021.

Comments: 7 pages

arXiv:2103.02025 [pdf]

Rightsizing the Railway Signal Workforce: a Zero-Based Resourcing Approach Towards Asset Management

Authors: Alex Lu, Zhiqi Zhong, Thomas Barger, Michael Brotzman

Abstract: Classic asset management approaches begin by inventorying all infrastructure assets and then assigning maintenance tasks and resources. Our approach collects similar data, but by starting with current personnel assignment and describing their job responsibilities and work processes, staff resistance in a railroad infrastructure owner-operator environment is minimized. Resulting "manning model" qua… ▽ More Classic asset management approaches begin by inventorying all infrastructure assets and then assigning maintenance tasks and resources. Our approach collects similar data, but by starting with current personnel assignment and describing their job responsibilities and work processes, staff resistance in a railroad infrastructure owner-operator environment is minimized. Resulting "manning model" quantitatively measures signal maintenance burden including Federally mandated tests, trouble tickets, non-FRA maintenance, overhead and vacation coverage, location/shift assignment, administrative process, and work curfew productivity losses. It is capable of delivering immediate results by rightsizing allocation of workforce across shifts and maintenance base locations--even before all assets are formally inventoried. Typical data from a commuter passenger railroad shows that work curfews and shift assignment constraints have significant impacts on workforce productivity. Just over half of signal maintenance employee-hours are devoted to Federally mandated tests, whilst non-FRA and repair maintenance consumes about 25% each. These indicators provide intelligence driving strategic management actions to improve signal maintenance cost-effectiveness. This model provides workload-based employee assignment by craft, location, gang, and shift for maintenance manager use, but also provides analytical basis for establishing or abolishing positions in the budgeting process. Comparing its results with current employee payroll provides a measure of how much staffing stress the maintenance organization is under, which can help measure whether the current overtime usage is appropriate. Asset and maintenance task inventories collected in this process can also feed normal asset management processes to assess replacement cycles, asset failure risk, and to inform strategic and investment decisions. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 22 pages, 12 figures

arXiv:2102.04517 [pdf]

Power Off! Challenges in Planning and Executing Power Isolations on Shared-Use Electrified Railways

Authors: Alex Lu, Aleksandr Lukatskiy, Zhiqi Zhong, John G. Allen

Abstract: Electric railways are fast, clean, and safe, but complex to operate and maintain. Electric traction infrastructure includes signal power and feeder lines that remain live during isolations and complicate maintenance processes. Stakeholders involved in power outage planning include contractors, linemen, groundmen, power directors, dispatchers, conductor-flag, and support personnel. Weekly planning… ▽ More Electric railways are fast, clean, and safe, but complex to operate and maintain. Electric traction infrastructure includes signal power and feeder lines that remain live during isolations and complicate maintenance processes. Stakeholders involved in power outage planning include contractors, linemen, groundmen, power directors, dispatchers, conductor-flag, and support personnel. Weekly planning processes for track time requires many contingencies due to large number of moving parts and factors not known in advance, like personnel availability. Electrical and mechanical environments faced by crews working in adjacent areas may be entirely different and require a "bespoke" circuit configuration to de-energize catenary, which must be planned meticulously. Although recent automation improved real-time "plate order" communications between power directors and dispatchers, each outage still requires many manual switching operations. Net impact of this isolation process reduces available construction work windows nightly from a nominal 7 hours to 2 hrs 39 mins. We recommend joint design of electrical and civil infrastructure, cross-training between disciplines, limiting maximum number of concurrent outages, formal study of maintenance outage capacity, and further automation in power switching. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: 26 pages, 6 figures

arXiv:2012.06707 [pdf, other]

Channel Modeling for UAV Communications: State of the Art, Case Studies, and Future Directions

Authors: Zhuangzhuang Cui, Ke Guan, César Briso-Rodríguez, Bo Ai, Zhangdui Zhong, Claude Oestges

Abstract: As essential aerial platforms, unmanned aerial vehicles (UAVs) play an increasingly important role in broad wireless connectivity and high-data-rate transmission for future communication systems. Notably, various communication scenarios are involved in UAV communications, such as intercommunications between UAVs and communications with the ground user equipment, the cellular base station, and the… ▽ More As essential aerial platforms, unmanned aerial vehicles (UAVs) play an increasingly important role in broad wireless connectivity and high-data-rate transmission for future communication systems. Notably, various communication scenarios are involved in UAV communications, such as intercommunications between UAVs and communications with the ground user equipment, the cellular base station, and the ground station, to name a few. However, existing works mostly focus on a single communication scenario, a designated channel type, and a specific operating frequency, thus urgently requiring a comprehensive understanding of multi-scenario, multi-frequency, and multi-type UAV channels. This article pours attention into the essentials of corresponding air-to-air (A2A) and air-to-ground (A2G) channels in UAV communications. We first identify the latest key challenges of channel modeling for UAV communications. We then provide the state of the art for A2A and A2G channel properties and models based on extensive measurement campaigns. In particular, we conduct realistic case studies to further demonstrate critical channel characterizations and machine learning-based modeling methods. Last but not least, potential directions are widely discussed for paving the way towards more accurate and effective channel models for UAV communications. △ Less

Submitted 16 April, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

arXiv:2012.03171 [pdf, other]

doi 10.1109/TVT.2021.3063408

Coverage Probability Analysis of IRS-Aided Communication Systems

Authors: Zhuangzhuang Cui, Ke Guan, Jiayi Zhang, Zhangdui Zhong

Abstract: The intelligent reflective surface (IRS) technology has received many interests in recent years, thanks to its potential uses in future wireless communications, in which one of the promising use cases is to widen coverage, especially in the line-of-sight-blocked scenarios. Therefore, it is critical to analyze the corresponding coverage probability of IRS-aided communication systems. To our best kn… ▽ More The intelligent reflective surface (IRS) technology has received many interests in recent years, thanks to its potential uses in future wireless communications, in which one of the promising use cases is to widen coverage, especially in the line-of-sight-blocked scenarios. Therefore, it is critical to analyze the corresponding coverage probability of IRS-aided communication systems. To our best knowledge, however, previous works focusing on this issue are very limited. In this paper, we analyze the coverage probability under the Rayleigh fading channel, taking the number and size of the array elements into consideration. We first derive the exact closed-form of coverage probability for the unit element. Afterward, with the method of moment matching, the approximation of the coverage probability can be formulated as the ratio of upper incomplete Gamma function and Gamma function, allowing an arbitrary number of elements. Finally, we comprehensively evaluate the impacts of essential factors on the coverage probability, such as the coefficient of fading channel, the number and size of the element, and the angle of incidence. Overall, the paper provides a succinct and general expression of coverage probability, which can be helpful in the performance evaluation and practical implementation of the IRS. △ Less

Submitted 5 December, 2020; originally announced December 2020.

arXiv:2010.15376 [pdf, ps, other]

doi 10.1109/JSAC.2020.3036959

Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth

Authors: Wei Chen, Bowen Zhang, Shi Jin, Bo Ai, Zhangdui Zhong

Abstract: Sparse signal recovery problems from noisy linear measurements appear in many areas of wireless communications. In recent years, deep learning (DL) based approaches have attracted interests of researchers to solve the sparse linear inverse problem by unfolding iterative algorithms as neural networks. Typically, research concerning DL assume a fixed number of network layers. However, it ignores a k… ▽ More Sparse signal recovery problems from noisy linear measurements appear in many areas of wireless communications. In recent years, deep learning (DL) based approaches have attracted interests of researchers to solve the sparse linear inverse problem by unfolding iterative algorithms as neural networks. Typically, research concerning DL assume a fixed number of network layers. However, it ignores a key character in traditional iterative algorithms, where the number of iterations required for convergence changes with varying sparsity levels. By investigating on the projected gradient descent, we unveil the drawbacks of the existing DL methods with fixed depth. Then we propose an end-to-end trainable DL architecture, which involves an extra halting score at each layer. Therefore, the proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase. We conduct experiments using both synthetic data and applications including random access in massive MTC and massive MIMO channel estimation, and the results demonstrate the improved efficiency for the proposed approach. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: IEEE Journal on Selected Areas in Communications (JSAC), accepted

Journal ref: IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, 2021

arXiv:2009.06184 [pdf, other]

VC-Net: Deep Volume-Composition Networks for Segmentation and Visualization of Highly Sparse and Noisy Image Data

Authors: Yifan Wang, Guoli Yan, Haikuan Zhu, Sagar Buch, Ying Wang, Ewart Mark Haacke, Jing Hua, Zichun Zhong

Abstract: The motivation of our work is to present a new visualization-guided computing paradigm to combine direct 3D volume processing and volume rendered clues for effective 3D exploration such as extracting and visualizing microstructures in-vivo. However, it is still challenging to extract and visualize high fidelity 3D vessel structure due to its high sparseness, noisiness, and complex topology variati… ▽ More The motivation of our work is to present a new visualization-guided computing paradigm to combine direct 3D volume processing and volume rendered clues for effective 3D exploration such as extracting and visualizing microstructures in-vivo. However, it is still challenging to extract and visualize high fidelity 3D vessel structure due to its high sparseness, noisiness, and complex topology variations. In this paper, we present an end-to-end deep learning method, VC-Net, for robust extraction of 3D microvasculature through embedding the image composition, generated by maximum intensity projection (MIP), into 3D volume image learning to enhance the performance. The core novelty is to automatically leverage the volume visualization technique (MIP) to enhance the 3D data exploration at deep learning level. The MIP embedding features can enhance the local vessel signal and are adaptive to the geometric variability and scalability of vessels, which is crucial in microvascular tracking. A multi-stream convolutional neural network is proposed to learn the 3D volume and 2D MIP features respectively and then explore their inter-dependencies in a joint volume-composition embedding space by unprojecting the MIP features into 3D volume embedding space. The proposed framework can better capture small / micro vessels and improve vessel connectivity. To our knowledge, this is the first deep learning framework to construct a joint convolutional embedding space, where the computed vessel probabilities from volume rendering based 2D projection and 3D volume can be explored and integrated synergistically. Experimental results are compared with the traditional 3D vessel segmentation methods and the deep learning state-of-the-art on public and real patient (micro-)cerebrovascular image datasets. Our method demonstrates the potential in a powerful MR arteriogram and venogram diagnosis of vascular diseases. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: 15 pages, 10 figures, proceeding to IEEE Transactions on Visualization and Computer Graphics (TVCG) (IEEE SciVis 2020), October, 2020

arXiv:2007.12619 [pdf, other]

Channel-Level Variable Quantization Network for Deep Image Compression

Authors: Zhisheng Zhong, Hiroaki Akutsu, Kiyoharu Aizawa

Abstract: Deep image compression systems mainly contain four components: encoder, quantizer, entropy model, and decoder. To optimize these four components, a joint rate-distortion framework was proposed, and many deep neural network-based methods achieved great success in image compression. However, almost all convolutional neural network-based methods treat channel-wise feature maps equally, reducing the f… ▽ More Deep image compression systems mainly contain four components: encoder, quantizer, entropy model, and decoder. To optimize these four components, a joint rate-distortion framework was proposed, and many deep neural network-based methods achieved great success in image compression. However, almost all convolutional neural network-based methods treat channel-wise feature maps equally, reducing the flexibility in handling different types of information. In this paper, we propose a channel-level variable quantization network to dynamically allocate more bitrates for significant channels and withdraw bitrates for negligible channels. Specifically, we propose a variable quantization controller. It consists of two key components: the channel importance module, which can dynamically learn the importance of channels during training, and the splitting-merging module, which can allocate different bitrates for different channels. We also formulate the quantizer into a Gaussian mixture model manner. Quantitative and qualitative experiments verify the effectiveness of the proposed model and demonstrate that our method achieves superior performance and can produce much better visual reconstructions. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2020

arXiv:2006.14345 [pdf]

doi 10.1016/j.patcog.2021.108515

Collaborative Boundary-aware Context Encoding Networks for Error Map Prediction

Authors: Zhenxi Zhang, Chunna Tian, Jie Li, Zhusi Zhong, Zhicheng Jiao, Xinbo Gao

Abstract: Medical image segmentation is usually regarded as one of the most important intermediate steps in clinical situations and medical imaging research. Thus, accurately assessing the segmentation quality of the automatically generated predictions is essential for guaranteeing the reliability of the results of the computer-assisted diagnosis (CAD). Many researchers apply neural networks to train segmen… ▽ More Medical image segmentation is usually regarded as one of the most important intermediate steps in clinical situations and medical imaging research. Thus, accurately assessing the segmentation quality of the automatically generated predictions is essential for guaranteeing the reliability of the results of the computer-assisted diagnosis (CAD). Many researchers apply neural networks to train segmentation quality regression models to estimate the segmentation quality of a new data cohort without labeled ground truth. Recently, a novel idea is proposed that transforming the segmentation quality assessment (SQA) problem intothe pixel-wise error map prediction task in the form of segmentation. However, the simple application of vanilla segmentation structures in medical image fails to detect some small and thin error regions of the auto-generated masks with complex anatomical structures. In this paper, we propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task. Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions. Further, we propose a context encoding module to utilize the global predictor from the error map to enhance the feature representation and regularize the networks. We perform experiments on IBSR v2.0 dataset and ACDC dataset. The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task,and shows a high Pearson correlation coefficient of 0.9873 between the actual segmentation accuracy and the predicted accuracy inferred from the predicted error map on IBSR v2.0 dataset, which verifies the efficacy of our AEP-Net. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Journal ref: Pattern Recognition PR_108515 ,2022

arXiv:2003.11988 [pdf]

Severity Assessment of Coronavirus Disease 2019 (COVID-19) Using Quantitative Features from Chest CT Images

Authors: Zhenyu Tang, Wei Zhao, Xingzhi Xie, Zheng Zhong, Feng Shi, Jun Liu, Dinggang Shen

Abstract: Background: Chest computed tomography (CT) is recognized as an important tool for COVID-19 severity assessment. As the number of affected patients increase rapidly, manual severity assessment becomes a labor-intensive task, and may lead to delayed treatment. Purpose: Using machine learning method to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images,… ▽ More Background: Chest computed tomography (CT) is recognized as an important tool for COVID-19 severity assessment. As the number of affected patients increase rapidly, manual severity assessment becomes a labor-intensive task, and may lead to delayed treatment. Purpose: Using machine learning method to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images, and to explore the severity-related features from the resulting assessment model. Materials and Method: Chest CT images of 176 patients (age 45.3$\pm$16.5 years, 96 male and 80 female) with confirmed COVID-19 are used, from which 63 quantitative features, e.g., the infection volume/ratio of the whole lung and the volume of ground-glass opacity (GGO) regions, are calculated. A random forest (RF) model is trained to assess the severity (non-severe or severe) based on quantitative features. Importance of each quantitative feature, which reflects the correlation to the severity of COVID-19, is calculated from the RF model. Results: Using three-fold cross validation, the RF model shows promising results, i.e., 0.933 of true positive rate, 0.745 of true negative rate, 0.875 of accuracy, and 0.91 of area under receiver operating characteristic curve (AUC). The resulting importance of quantitative features shows that the volume and its ratio (with respect to the whole lung volume) of ground glass opacity (GGO) regions are highly related to the severity of COVID-19, and the quantitative features calculated from the right lung are more related to the severity assessment than those of the left lung. Conclusion: The RF based model can achieve automatic severity assessment (non-severe or severe) of COVID-19 infection, and the performance is promising. Several quantitative features, which have the potential to reflect the severity of COVID-19, were revealed. △ Less

Submitted 26 March, 2020; originally announced March 2020.

arXiv:1912.12265 [pdf, ps, other]

Deep Transfer Learning Based Downlink Channel Prediction for FDD Massive MIMO Systems

Authors: Yuwen Yang, Feifei Gao, Zhimeng Zhong, Bo Ai, Ahmed Alkhateeb

Abstract: Artificial intelligence (AI) based downlink channel state information (CSI) prediction for frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems has attracted growing attention recently. However, existing works focus on the downlink CSI prediction for the users under a given environment and is hard to adapt to users in new environment especially when labeled data… ▽ More Artificial intelligence (AI) based downlink channel state information (CSI) prediction for frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems has attracted growing attention recently. However, existing works focus on the downlink CSI prediction for the users under a given environment and is hard to adapt to users in new environment especially when labeled data is limited. To address this issue, we formulate the downlink channel prediction as a deep transfer learning (DTL) problem, where each learning task aims to predict the downlink CSI from the uplink CSI for one single environment. Specifically, we develop the direct-transfer algorithm based on the fully-connected neural network architecture, where the network is trained on the data from all previous environments in the manner of classical deep learning and is then fine-tuned for new environments. To further improve the transfer efficiency, we propose the meta-learning algorithm that trains the network by alternating inner-task and across-task updates and then adapts to a new environment with a small number of labeled data. Simulation results show that the direct-transfer algorithm achieves better performance than the deep learning algorithm, which implies that the transfer learning benefits the downlink channel prediction in new environments. Moreover, the meta-learning algorithm significantly outperforms the direct-transfer algorithm in terms of both prediction accuracy and stability, which validates its effectiveness and superiority. △ Less

Submitted 7 September, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

Comments: Accepted by IEEE Transactions on Communications

arXiv:1912.11221 [pdf, ps, other]

FDD Massive MIMO Uplink and Downlink Channel Reciprocity Properties: Full or Partial Reciprocity?

Authors: Zhimeng Zhong, Li Fan, Shibin Ge

Abstract: One challenge for FDD massive MIMO communication system is how to obtain the downlink channel state information (CSI) at the base station. Except for traditional codebook feedback through uplink pilot transmission, some channel reciprocity properties can be utilized through uplink channel estimation and channel parameter estimation algorithms. In this paper, the uplink and downlink channel recipro… ▽ More One challenge for FDD massive MIMO communication system is how to obtain the downlink channel state information (CSI) at the base station. Except for traditional codebook feedback through uplink pilot transmission, some channel reciprocity properties can be utilized through uplink channel estimation and channel parameter estimation algorithms. In this paper, the uplink and downlink channel reciprocity properties are analyzed. It is theoretically proved that not all multipath parameters for FDD downlink and uplink channels are equivalent. Therefore, the so called full reciprocity property does not hold while the partial reciprocity property holds. Moreover, the channel measurement campaign is conducted to verify our theoretical analysis. Finally, in order to support the partial reciprocity property, the revision for the standardization 5G channel model is proposed as well. With the contribution of this paper, the FDD massive MIMO system transmission scheme design could be led to the right direction. △ Less

Submitted 30 December, 2019; v1 submitted 24 December, 2019; originally announced December 2019.

Showing 1–50 of 68 results for author: Zhong, Z