Search | arXiv e-print repository

Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey

Authors: Wei Huo, Huiwen Yang, Nachuan Yang, Zhaohua Yang, Jiuzhou Zhang, Fuhai Nan, Xingzhou Chen, Yifan Mao, Suyang Hu, Pengyu Wang, Xuanyu Zheng, Mingming Zhao, Ling Shi

Abstract: The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex… ▽ More The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex demands of these emerging systems. As the volume of data continues to escalate, the integration of data-driven methods has become indispensable for enabling adaptive and intelligent control mechanisms in future wireless communication systems. This comprehensive survey explores recent advancements in data-driven methodologies applied to wireless communication networks. It focuses on developments over the past five years and their application to various control objectives within wireless cyber-physical systems. It encompasses critical areas such as link adaptation, user scheduling, spectrum allocation, beam management, power control, and the co-design of communication and control systems. We provide an in-depth exploration of the technical underpinnings that support these data-driven approaches, including the algorithms, models, and frameworks developed to enhance network performance and efficiency. We also examine the challenges that current data-driven algorithms face, particularly in the context of the dynamic and heterogeneous nature of next-generation wireless networks. The paper provides a critical analysis of these challenges and offers insights into potential solutions and future research directions. This includes discussing the adaptability, integration with 6G, and security of data-driven methods in the face of increasing network complexity and data volume. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2407.21394 [pdf, other]

Force Sensing Guided Artery-Vein Segmentation via Sequential Ultrasound Images

Authors: Yimeng Geng, Gaofeng Meng, Mingcong Chen, Guanglin Cao, Mingyang Zhao, Jianbo Zhao, Hongbin Liu

Abstract: Accurate identification of arteries and veins in ultrasound images is crucial for vascular examinations and interventions in robotics-assisted surgeries. However, current methods for ultrasound vessel segmentation face challenges in distinguishing between arteries and veins due to their morphological similarities. To address this challenge, this study introduces a novel force sensing guided segmen… ▽ More Accurate identification of arteries and veins in ultrasound images is crucial for vascular examinations and interventions in robotics-assisted surgeries. However, current methods for ultrasound vessel segmentation face challenges in distinguishing between arteries and veins due to their morphological similarities. To address this challenge, this study introduces a novel force sensing guided segmentation approach to enhance artery-vein segmentation accuracy by leveraging their distinct deformability. Our proposed method utilizes force magnitude to identify key frames with the most significant vascular deformation in a sequence of ultrasound images. These key frames are then integrated with the current frame through attention mechanisms, with weights assigned in accordance with force magnitude. Our proposed force sensing guided framework can be seamlessly integrated into various segmentation networks and achieves significant performance improvements in multiple U-shaped networks such as U-Net, Swin-unet and Transunet. Furthermore, we contribute the first multimodal ultrasound artery-vein segmentation dataset, Mus-V, which encompasses both force and image data simultaneously. The dataset comprises 3114 ultrasound images of carotid and femoral vessels extracted from 105 videos, with corresponding force data recorded by the force sensor mounted on the US probe. Our code and dataset will be publicly available. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.05619 [pdf, other]

AIRA: A Low-cost IR-based Approach Towards Autonomous Precision Drone Landing and NLOS Indoor Navigation

Authors: Yanchen Liu, Minghui Zhao, Kaiyuan Hou, Junxi Xia, Charlie Carver, Stephen Xia, Xia Zhou, Xiaofan Jiang

Abstract: Automatic drone landing is an important step for achieving fully autonomous drones. Although there are many works that leverage GPS, video, wireless signals, and active acoustic sensing to perform precise landing, autonomous drone landing remains an unsolved challenge for palm-sized microdrones that may not be able to support the high computational requirements of vision, wireless, or active audio… ▽ More Automatic drone landing is an important step for achieving fully autonomous drones. Although there are many works that leverage GPS, video, wireless signals, and active acoustic sensing to perform precise landing, autonomous drone landing remains an unsolved challenge for palm-sized microdrones that may not be able to support the high computational requirements of vision, wireless, or active audio sensing. We propose AIRA, a low-cost infrared light-based platform that targets precise and efficient landing of low-resource microdrones. AIRA consists of an infrared light bulb at the landing station along with an energy efficient hardware photodiode (PD) sensing platform at the bottom of the drone. AIRA costs under 83 USD, while achieving comparable performance to existing vision-based methods at a fraction of the energy cost. AIRA requires only three PDs without any complex pattern recognition models to accurately land the drone, under $10$cm of error, from up to $11.1$ meters away, compared to camera-based methods that require recognizing complex markers using high resolution images with a range of only up to $1.2$ meters from the same height. Moreover, we demonstrate that AIRA can accurately guide drones in low light and partial non line of sight scenarios, which are difficult for traditional vision-based approaches. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.17672 [pdf, other]

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of model parameters. To address the challenges, we propose SpecMaskGIT, a light-weighted, efficient yet effective TTA model based on the masked generative modeling of spectrograms. First, SpecMaskGIT synthesizes a realistic 10s audio clip by less than 16 iterations, an order-of-magnitude less than previous iterative TTA methods. As a discrete model, SpecMaskGIT outperforms larger VQ-Diffusion and auto-regressive models in the TTA benchmark, while being real-time with only 4 CPU cores or even 30x faster with a GPU. Next, built upon a latent space of Mel-spectrogram, SpecMaskGIT has a wider range of applications (e.g., the zero-shot bandwidth extension) than similar methods built on the latent wave domain. Moreover, we interpret SpecMaskGIT as a generative extension to previous discriminative audio masked Transformers, and shed light on its audio representation learning potential. We hope our work inspires the exploration of masked audio modeling toward further diverse scenarios. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

arXiv:2406.08305 [pdf, other]

Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization

Authors: Fengxiao Tang, Xiaonan Wang, Xun Yuan, Linfeng Luo, Ming Zhao, Nei Kato

Abstract: Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learn… ▽ More Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learning techniques, lack multi-scale adaptivity for heterogeneous device information, resulting in unsatisfactory diagnostic accuracy for DHNs. In this paper, we develop an LLM-assisted end-to-end intelligent network health management framework. The framework first proposes a Multi-Scale Semanticized Anomaly Detection Model (MSADM), incorporating semantic rule trees with an attention mechanism to address the multi-scale anomaly detection problem in DHNs. Secondly, a chain-of-thought-based large language model is embedded in downstream to adaptively analyze the fault detection results and produce an analysis report with detailed fault information and optimization strategies. Experimental results show that the accuracy of our proposed MSADM for heterogeneous network entity anomaly detection is as high as 91.31\%. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.19516 [pdf, other]

Enabling Visual Recognition at Radio Frequency

Authors: Haowen Lai, Gaoxiang Luo, Yifei Liu, Mingmin Zhao

Abstract: This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. Pano… ▽ More This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. PanoRadar utilizes a rotating single-chip mmWave radar, along with a combination of novel signal processing and machine learning algorithms, to create high-resolution 3D images of the surroundings. Our system accurately estimates robot motion, allowing for coherent imaging through a dense grid of synthetic antennas. It also exploits the high azimuth resolution to enhance elevation resolution using learning-based methods. Furthermore, PanoRadar tackles 3D learning via 2D convolutions and addresses challenges due to the unique characteristics of RF signals. Our results demonstrate PanoRadar's robust performance across 12 buildings. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16791 [pdf, ps, other]

Joint Node Selection and Resource Allocation Optimization for Cooperative Sensing with a Shared Wireless Backhaul

Authors: Mingxin Chen, Ming-Min Zhao, An Liu, Min Li, Qingjiang Shi

Abstract: In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multi… ▽ More In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multiple access channel (MAC) for cooperative target localization. To improve the localization performance, we present a hybrid information-signal domain cooperative sensing (HISDCS) design, where each sensing receiver transmits both the estimated time delay/effective reflecting coefficient and the received sensing signal sampled around the estimated time delay to the FC. Then, we propose to minimize the number of channel uses by utilizing an efficient Karhunen-Loéve transformation (KLT) encoding scheme for signal quantization and proper node selection, under the Cramér-Rao lower bound (CRLB) constraint and the capacity limits of MAC. A novel matrix-inequality constrained successive convex approximation (MCSCA) algorithm is proposed to optimize the wireless backhaul resource allocation, together with a greedy strategy for node selection. Despite the high non-convexness of the considered problem, we prove that the proposed MCSCA algorithm is able to converge to the set of Karush-Kuhn-Tucker (KKT) solutions of a relaxed problem obtained by relaxing the discrete variables. Besides, a low-complexity quantization bit reallocation algorithm is designed, which does not perform explicit node selection, and is able to harvest most of the performance gain brought by HISDCS. Finally, numerical simulations are presented to show that the proposed HISDCS design is able to significantly outperform the baseline schemes. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 13 pages, 10 figures

arXiv:2405.14598 [pdf, other]

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Abstract: In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method… ▽ More In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at https://docs.google.com/presentation/d/1ZtC0SeblKkut4XJcRaDsSTuCRIXB3ypxmSi7HTY3IyQ/ △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2405.12125 [pdf, other]

doi 10.1109/TRO.2023.3327634

Design, Control, and Motion-Planning for a Root-Perching Rotor-Distributed Manipulator

Authors: Takuzumi Nishio, Moju Zhao, Kei Okada, Masayuki Inaba

Abstract: Manipulation performance improvement is crucial for aerial robots. For aerial manipulators, the baselink position and attitude errors directly affect the precision at the end effector. To address this stability problem, fixed-body approaches such as perching on the environment using the rotor suction force are useful. Additionally, conventional arm-equipped multirotors, called rotor-concentrated m… ▽ More Manipulation performance improvement is crucial for aerial robots. For aerial manipulators, the baselink position and attitude errors directly affect the precision at the end effector. To address this stability problem, fixed-body approaches such as perching on the environment using the rotor suction force are useful. Additionally, conventional arm-equipped multirotors, called rotor-concentrated manipulators (RCMs), find it difficult to generate a large wrench at the end effector due to joint torque limitations. Using distributed rotors to each link, the thrust can support each link weight, decreasing the arm joints' torque. Based on this approach, rotor-distributed manipulators (RDMs) can increase feasible wrench and reachability of the end-effector. This paper introduces a minimal configuration of a rotor-distributed manipulator that can perch on surfaces, especially ceilings, using a part of their body. First, we design a minimal rotor-distributed arm considering the flight and end-effector performance. Second, a flight controller is proposed for this minimal RDM along with a perching controller adaptable for various types of aerial robots. Third, we propose a motion planning method based on inverse kinematics (IK), considering specific constraints to the proposed RDMs such as perching force. Finally, we evaluate flight and perching motions and \revise{confirm} that the proposed manipulator can significantly improve the manipulation performance. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: IEEE Transactions on Robotics (2023)

arXiv:2405.04027 [pdf, other]

Joint Visibility Region Detection and Channel Estimation for XL-MIMO Systems via Alternating MAP

Authors: Wenkang Xu, An Liu, Min-jian Zhao

Abstract: We investigate a joint visibility region (VR) detection and channel estimation problem in extremely large-scale multiple-input-multiple-output (XL-MIMO) systems, where near-field propagation and spatial non-stationary effects exist. In this case, each scatterer can only see a subset of antennas, i.e., it has a certain VR over the antennas. Because of the spatial correlation among adjacent sub-arra… ▽ More We investigate a joint visibility region (VR) detection and channel estimation problem in extremely large-scale multiple-input-multiple-output (XL-MIMO) systems, where near-field propagation and spatial non-stationary effects exist. In this case, each scatterer can only see a subset of antennas, i.e., it has a certain VR over the antennas. Because of the spatial correlation among adjacent sub-arrays, VR of scatterers exhibits a two-dimensional (2D) clustered sparsity. We design a 2D Markov prior model to capture such a structured sparsity. Based on this, a novel alternating maximum a posteriori (MAP) framework is developed for high-accuracy VR detection and channel estimation. The alternating MAP framework consists of three basic modules: a channel estimation module, a VR detection module, and a grid update module. Specifically, the first module is a low-complexity inverse-free variational Bayesian inference (IF-VBI) algorithm that avoids the matrix inverse via minimizing a relaxed Kullback-Leibler (KL) divergence. The second module is a structured expectation propagation (EP) algorithm which has the ability to deal with complicated prior information. And the third module refines polar-domain grid parameters via gradient ascent. Simulations demonstrate the superiority of the proposed algorithm in both VR detection and channel estimation. △ Less

Submitted 21 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 13 pages, 14 figures, submitted to IEEE TSP

arXiv:2405.01242 [pdf, other]

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB. △ Less

Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2403.10873 [pdf, other]

CSI Transfer From Sub-6G to mmWave: Reduced-Overhead Multi-User Hybrid Beamforming

Authors: Weicao Deng, Min Li, Ming-Min Zhao, Min-Jian Zhao, Osvaldo Simeone

Abstract: Hybrid beamforming is vital in modern wireless systems, especially for massive MIMO and millimeter-wave deployments, offering efficient directional transmission with reduced hardware complexity. However, effective beamforming in multi-user scenarios relies heavily on accurate channel state information, the acquisition of which often incurs excessive pilot overhead, degrading system performance. To… ▽ More Hybrid beamforming is vital in modern wireless systems, especially for massive MIMO and millimeter-wave deployments, offering efficient directional transmission with reduced hardware complexity. However, effective beamforming in multi-user scenarios relies heavily on accurate channel state information, the acquisition of which often incurs excessive pilot overhead, degrading system performance. To address this and inspired by the spatial congruence between sub-6GHz (sub-6G) and mmWave channels, we propose a Sub-6G information Aided Multi-User Hybrid Beamforming (SA-MUHBF) framework, avoiding excessive use of pilots. SA-MUHBF employs a convolutional neural network to predict mmWave beamspace from sub-6G channel estimate, followed by a novel multi-layer graph neural network for analog beam selection and a linear minimum mean-square error algorithm for digital beamforming. Numerical results demonstrate that SA-MUHBF efficiently predicts the mmWave beamspace representation and achieves superior spectrum efficiency over state-of-the-art benchmarks. Moreover, SA-MUHBF demonstrates robust performance across varied sub-6G system configurations and exhibits strong generalization to unseen scenarios. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 13 pages, 12 figures, submitted

arXiv:2402.03042 [pdf, other]

Semi-Passive Intelligent Reflecting Surface Enabled Sensing Systems

Authors: Qiaoyan Peng, Qingqing Wu, Wen Chen, Shaodan Ma, Ming-Min Zhao, Octavia A. Dobre

Abstract: Intelligent reflecting surface (IRS) has garnered growing interest and attention due to its potential for facilitating and supporting wireless communications and sensing. This paper studies a semi-passive IRS-enabled sensing system, where an IRS consists of both passive reflecting elements and active sensors. Our goal is to minimize the Cramér-Rao bound (CRB) for parameter estimation under both po… ▽ More Intelligent reflecting surface (IRS) has garnered growing interest and attention due to its potential for facilitating and supporting wireless communications and sensing. This paper studies a semi-passive IRS-enabled sensing system, where an IRS consists of both passive reflecting elements and active sensors. Our goal is to minimize the Cramér-Rao bound (CRB) for parameter estimation under both point and extended target cases. Towards this goal, we begin by deriving the CRB for the direction-of-arrival (DoA) estimation in closed-form and then theoretically analyze the IRS reflecting elements and sensors allocation design based on the CRB under the point target case with a single-antenna base station (BS). To efficiently solve the corresponding optimization problem for the case with a multi-antenna BS, we propose an efficient algorithm by jointly optimizing the IRS phase shifts and the BS beamformers. Under the extended target case, the CRB for the target response matrix (TRM) estimation is minimized via the optimization of the BS transmit beamformers. Moreover, we explore the influence of various system parameters on the CRB and compare these effects to those observed under the point target case. Simulation results show the effectiveness of the semi-passive IRS and our proposed beamforming design for improving the performance of the sensing system. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2311.12745 [pdf, other]

Learn to Augment Network Simulators Towards Digital Network Twins

Authors: Yuru Zhang, Ming Zhao, Qiang Liu

Abstract: Digital network twin (DNT) is a promising paradigm to replicate real-world cellular networks toward continual assessment, proactive management, and what-if analysis. Existing discussions have been focusing on using only deep learning techniques to build DNTs, which raises widespread concerns regarding their generalization, explainability, and transparency. In this paper, we explore an alternative… ▽ More Digital network twin (DNT) is a promising paradigm to replicate real-world cellular networks toward continual assessment, proactive management, and what-if analysis. Existing discussions have been focusing on using only deep learning techniques to build DNTs, which raises widespread concerns regarding their generalization, explainability, and transparency. In this paper, we explore an alternative approach to augment network simulators with context-aware neural agents. The main challenge lies in the non-trivial simulation-to-reality (sim-to-real) discrepancy between offline simulators and real-world networks. To solve the challenge, we propose a new learn-to-bridge algorithm to cost-efficiently bridge the sim-to-real discrepancy in two alternative stages. In the first stage, we select states to query performances in real-world networks by using newly-designed cost-aware Bayesian optimization. In the second stage, we train the neural agent to learn the state context and bridge the probabilistic discrepancy based on Bayesian neural networks (BNN). In addition, we build a small-scale end-to-end network testbed based on OpenAirInterface RAN and Core with USRP B210 and a smartphone, and replicate the network in NS-3. The evaluation results show that, our proposed solution substantially outperforms existing methods, with more than 92\% reduction in the sim-to-real discrepancy. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.08201 [pdf, other]

Joint Location Sensing and Channel Estimation for IRS-Aided mmWave ISAC Systems

Authors: Zijian Chen, Ming-Min Zhao, Min Li, Fan Xu, Qingqing Wu, Min-Jian Zhao

Abstract: In this paper, we investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system. Unlike the conventional purely passive IRS, the self-sensing IRS can effectively reduce the path loss of sensing-related links, thus rendering it advantageous in ISAC systems. Aiming to jointly sense the target/scatterer/user positions… ▽ More In this paper, we investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system. Unlike the conventional purely passive IRS, the self-sensing IRS can effectively reduce the path loss of sensing-related links, thus rendering it advantageous in ISAC systems. Aiming to jointly sense the target/scatterer/user positions as well as estimate the sensing and communication (SAC) channels in the considered system, we propose a two-phase transmission scheme, where the coarse and refined sensing/channel estimation (CE) results are respectively obtained in the first phase (using scanning-based IRS reflection coefficients) and second phase (using optimized IRS reflection coefficients). For each phase, an angle-based sensing turbo variational Bayesian inference (AS-TVBI) algorithm, which combines the VBI, messaging passing and expectation-maximization (EM) methods, is developed to solve the considered joint location sensing and CE problem. The proposed algorithm effectively exploits the partial overlapping structured (POS) sparsity and 2-dimensional (2D) block sparsity inherent in the SAC channels to enhance the overall performance. Based on the estimation results from the first phase, we formulate a Cramér-Rao bound (CRB) minimization problem for optimizing IRS reflection coefficients, and through proper reformulations, a low-complexity manifold-based optimization algorithm is proposed to solve this problem. Simulation results are provided to verify the superiority of the proposed transmission scheme and associated algorithms. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.08188 [pdf, ps, other]

Fast List Decoding of High-Rate Polar Codes

Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

Abstract: Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developi… ▽ More Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developing tailored fast and efficient list decoding algorithms of specific polar substituent codes (special nodes) is a promising solution. Recently, fast list decoding algorithms are proposed by considering special nodes with low code rates. Aiming to further speedup the SCL decoding, this paper presents fast list decoding algorithms for two types of high-rate special nodes, namely single-parity-check (SPC) nodes and sequence rate one or single-parity-check (SR1/SPC) nodes. In particular, we develop two classes of fast list decoding algorithms for these nodes, where the first class uses a sequential decoding procedure to yield decoding latency that is linear with the list size, and the second further parallelizes the decoding process by pre-determining the redundant candidate paths offline. Simulation results show that the proposed list decoding algorithms are able to achieve up to 70.7\% lower decoding latency than state-of-the-art fast SCL decoders, while exhibiting the same error-correction performance. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 13 pages, 8 figures

arXiv:2310.13267 [pdf, other]

On the Language Encoder of Contrastive Cross-modal Models

Authors: Mengjie Zhao, Junya Ono, Zhi Zhong, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Takashi Shibuya, Hiromi Wakaki, Yuki Mitsufuji

Abstract: Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding… ▽ More Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding training affect language encoder quality and cross-modal task performance. In VL pretraining, we found that sentence embedding training language encoder quality and aids in cross-modal tasks, improving contrastive VL models such as CyCLIP. In contrast, AL pretraining benefits less from sentence embedding training, which may result from the limited amount of pretraining data. We analyze the representation spaces to understand the strengths of sentence embedding training, and find that it improves text-space uniformity, at the cost of decreased cross-modal alignment. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.05382 [pdf, other]

A Stochastic Particle Variational Bayesian Inference Inspired Deep-Unfolding Network for Non-Convex Parameter Estimation

Authors: Zhixiang Hu, An Liu, Minjian Zhao

Abstract: Future wireless networks are envisioned to provide ubiquitous sensing services, which also gives rise to a substantial demand for high-dimensional non-convex parameter estimation, i.e., the associated likelihood function is non-convex and contains numerous local optima. Variational Bayesian inference (VBI) provides a powerful tool for modeling complex estimation problems and reasoning with prior i… ▽ More Future wireless networks are envisioned to provide ubiquitous sensing services, which also gives rise to a substantial demand for high-dimensional non-convex parameter estimation, i.e., the associated likelihood function is non-convex and contains numerous local optima. Variational Bayesian inference (VBI) provides a powerful tool for modeling complex estimation problems and reasoning with prior information, but poses a long-standing challenge on computing intractable posteriori distributions. Most existing variational methods generally rely on assumptions about specific distribution families to derive closed-form solutions, and are difficult to apply in high-dimensional, non-convex scenarios. Given these challenges, firstly, we propose a parallel stochastic particle variational Bayesian inference (PSPVBI) algorithm. Thanks to innovations such as particle approximation, additional updates of particle positions, and parallel stochastic successive convex approximation (PSSCA), PSPVBI can flexibly drive particles to fit the posteriori distribution with acceptable complexity, yielding high-precision estimates of the target parameters. Furthermore, additional speedup can be obtained by deep-unfolding (DU) the PSPVBI algorithm. Specifically, superior hyperparameters are learned to dramatically reduce the number of algorithmic iterations. In this PSPVBI-induced Deep-Unfolding Networks, some techniques related to gradient computation, data sub-sampling, differentiable sampling, and generalization ability are also employed to facilitate the practical deployment. Finally, we apply the LPSPVBI to solve several important parameter estimation problems in wireless sensing scenarios. Simulations indicate that the LPSPVBI algorithm outperforms existing solutions. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2309.04508 [pdf, other]

Spatial-Temporal Graph Attention Fuser for Calibration in IoT Air Pollution Monitoring Systems

Authors: Keivan Faghih Niresi, Mengjie Zhao, Hugo Bissig, Henri Baumann, Olga Fink

Abstract: The use of Internet of Things (IoT) sensors for air pollution monitoring has significantly increased, resulting in the deployment of low-cost sensors. Despite this advancement, accurately calibrating these sensors in uncontrolled environmental conditions remains a challenge. To address this, we propose a novel approach that leverages graph neural networks, specifically the graph attention network… ▽ More The use of Internet of Things (IoT) sensors for air pollution monitoring has significantly increased, resulting in the deployment of low-cost sensors. Despite this advancement, accurately calibrating these sensors in uncontrolled environmental conditions remains a challenge. To address this, we propose a novel approach that leverages graph neural networks, specifically the graph attention network module, to enhance the calibration process by fusing data from sensor arrays. Through our experiments, we demonstrate the effectiveness of our approach in significantly improving the calibration accuracy of sensors in IoT air pollution monitoring platforms. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.03114 [pdf, ps, other]

NUV-DoA: NUV Prior-based Bayesian Sparse Reconstruction with Spatial Filtering for Super-Resolution DoA Estimation

Authors: Mengyuan Zhao, Guy Revach, Tirza Routtenberg, Nir Shlezinger

Abstract: Achieving high-resolution Direction of Arrival (DoA) recovery typically requires high Signal to Noise Ratio (SNR) and a sufficiently large number of snapshots. This paper presents NUV-DoA algorithm, that augments Bayesian sparse reconstruction with spatial filtering for super-resolution DoA estimation. By modeling each direction on the azimuth's grid with the sparsity-promoting normal with unknown… ▽ More Achieving high-resolution Direction of Arrival (DoA) recovery typically requires high Signal to Noise Ratio (SNR) and a sufficiently large number of snapshots. This paper presents NUV-DoA algorithm, that augments Bayesian sparse reconstruction with spatial filtering for super-resolution DoA estimation. By modeling each direction on the azimuth's grid with the sparsity-promoting normal with unknown variance (NUV) prior, the non-convex optimization problem is reduced to iteratively reweighted least-squares under Gaussian distribution, where the mean of the snapshots is a sufficient statistic. This approach not only simplifies our solution but also accurately detects the DoAs. We utilize a hierarchical approach for interference cancellation in multi-source scenarios. Empirical evaluations show the superiority of NUV-DoA, especially in low SNRs, compared to alternative DoA estimators. △ Less

Submitted 25 December, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: This paper has 5 pages including reference, 11 figures. This paper has been accepted to ICASSP 2024 - 2024 International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

arXiv:2308.13996 [pdf]

Improve in-situ life prediction and classification performance by capturing both the present state and evolution rate of battery aging

Authors: Mingyuan Zhao, Yongzhi Zhang

Abstract: This study develops a methodology by capturing both the battery aging state and degradation rate for improved life prediction performance. The aging state is indicated by six physical features of an equivalent circuit model that are extracted from the voltage relaxation data. And the degradation rate is captured by two features extracted from the differences between the voltage relaxation curves w… ▽ More This study develops a methodology by capturing both the battery aging state and degradation rate for improved life prediction performance. The aging state is indicated by six physical features of an equivalent circuit model that are extracted from the voltage relaxation data. And the degradation rate is captured by two features extracted from the differences between the voltage relaxation curves within a moving window (for life prediction), or the differences between the capacity vs. voltage curves at different cycles (for life classification). Two machine learning models, which are constructed based on Gaussian Processes, are used to describe the relationships between these physical features and battery lifetimes for the life prediction and classification, respectively. The methodology is validated with the aging data of 74 battery cells of three different types. Experimental results show that based on only 3-12 minutes' sampling data, the method with novel features predicts accurate battery lifetimes, with the prediction accuracy improved by up to 67.09% compared with the benchmark method. And the batteries are classified into three groups (long, medium, and short) with an overall accuracy larger than 90% based on only two adjacent cycles' information, enabling the highly efficient regrouping of retired batteries. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.09349 [pdf, other]

Intelligent Reflecting Surface Aided Multi-Tier Hybrid Computing

Authors: Yapeng Zhao, Qingqing Wu, Guangji Chen, Wen Chen, Ruiqi Liu, Ming-Min Zhao, Yuan Wu, Shaodan Ma

Abstract: The digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time… ▽ More The digital twin edge network (DITEN) aims to integrate mobile edge computing (MEC) and digital twin (DT) to provide real-time system configuration and flexible resource allocation for the sixth-generation network. This paper investigates an intelligent reflecting surface (IRS)-aided multi-tier hybrid computing system that can achieve mutual benefits for DT and MEC in the DITEN. For the first time, this paper presents the opportunity to realize the network-wide convergence of DT and MEC. In the considered system, specifically, over-the-air computation (AirComp) is employed to monitor the status of the DT system, while MEC is performed with the assistance of DT to provide low-latency computing services. Besides, the IRS is utilized to enhance signal transmission and mitigate interference among heterogeneous nodes. We propose a framework for designing the hybrid computing system, aiming to maximize the sum computation rate under communication and computation resources constraints. To tackle the non-convex optimization problem, alternative optimization and successive convex approximation techniques are leveraged to decouple variables and then transform the problem into a more tractable form. Simulation results verify the effectiveness of the proposed algorithm and demonstrate the IRS can significantly improve the system performance with appropriate phase shift configurations. Moreover, the results indicate that the DT assisted MEC system can precisely achieve the balance between local computing and task offloading since real-time system status can be obtained with the help of DT. △ Less

Submitted 25 October, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2307.09149 [pdf, other]

Successive Linear Approximation VBI for Joint Sparse Signal Recovery and Dynamic Grid Parameters Estimation

Authors: Wenkang Xu, An Liu, Bingpeng Zhou, Minjian Zhao

Abstract: For many practical applications in wireless communications, we need to recover a structured sparse signal from a linear observation model with dynamic grid parameters in the sensing matrix. Conventional expectation maximization (EM)-based compressed sensing (CS) methods, such as turbo compressed sensing (Turbo-CS) and turbo variational Bayesian inference (Turbo-VBI), have double-loop iterations, w… ▽ More For many practical applications in wireless communications, we need to recover a structured sparse signal from a linear observation model with dynamic grid parameters in the sensing matrix. Conventional expectation maximization (EM)-based compressed sensing (CS) methods, such as turbo compressed sensing (Turbo-CS) and turbo variational Bayesian inference (Turbo-VBI), have double-loop iterations, where the inner loop (E-step) obtains a Bayesian estimation of sparse signals and the outer loop (M-step) obtains a point estimation of dynamic grid parameters. This leads to a slow convergence rate. Furthermore, each iteration of the E-step involves a complicated matrix inverse in general. To overcome these drawbacks, we first propose a successive linear approximation VBI (SLA-VBI) algorithm that can provide Bayesian estimation of both sparse signals and dynamic grid parameters. Besides, we simplify the matrix inverse operation based on the majorization-minimization (MM) algorithmic framework. In addition, we extend our proposed algorithm from an independent sparse prior to more complicated structured sparse priors, which can exploit structured sparsity in specific applications to further enhance the performance. Finally, we apply our proposed algorithm to solve two practical application problems in wireless communications and verify that the proposed algorithm can achieve faster convergence, lower complexity, and better performance compared to the state-of-the-art EM-based methods. △ Less

Submitted 12 November, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 14 pages, 15 figures, submitted to IEEE Transactions on Wireless Communications

arXiv:2307.07796 [pdf, other]

doi 10.1109/Allerton58177.2023.10313359

Theoretical Analysis of Binary Masks in Snapshot Compressive Imaging Systems

Authors: Mengyu Zhao, Shirin Jalali

Abstract: Snapshot compressive imaging (SCI) systems have gained significant attention in recent years. While previous theoretical studies have primarily focused on the performance analysis of Gaussian masks, practical SCI systems often employ binary-valued masks. Furthermore, recent research has demonstrated that optimized binary masks can significantly enhance system performance. In this paper, we present… ▽ More Snapshot compressive imaging (SCI) systems have gained significant attention in recent years. While previous theoretical studies have primarily focused on the performance analysis of Gaussian masks, practical SCI systems often employ binary-valued masks. Furthermore, recent research has demonstrated that optimized binary masks can significantly enhance system performance. In this paper, we present a comprehensive theoretical characterization of binary masks and their impact on SCI system performance. Initially, we investigate the scenario where the masks are binary and independently identically distributed (iid), revealing a noteworthy finding that aligns with prior numerical results. Specifically, we show that the optimal probability of non-zero elements in the masks is smaller than 0.5. This result provides valuable insights into the design and optimization of binary masks for SCI systems, facilitating further advancements in the field. Additionally, we extend our analysis to characterize the performance of SCI systems where the mask entries are not independent but are generated based on a stationary first-order Markov process. Overall, our theoretical framework offers a comprehensive understanding of the performance implications associated with binary masks in SCI systems. △ Less

Submitted 15 July, 2023; originally announced July 2023.

arXiv:2307.01486 [pdf, other]

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Authors: Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue

Abstract: Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the… ▽ More Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 11 pages, 2 figures. This paper has been accepted by Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2023

arXiv:2307.00269 [pdf, other]

AE-RED: A Hyperspectral Unmixing Framework Powered by Deep Autoencoder and Regularization by Denoising

Authors: Min Zhao, Jie Chen, Nicolas Dobigeon

Abstract: Spectral unmixing has been extensively studied with a variety of methods and used in many applications. Recently, data-driven techniques with deep learning methods have obtained great attention to spectral unmixing for its superior learning ability to automatically learn the structure information. In particular, autoencoder based architectures are elaborately designed to solve blind unmixing and m… ▽ More Spectral unmixing has been extensively studied with a variety of methods and used in many applications. Recently, data-driven techniques with deep learning methods have obtained great attention to spectral unmixing for its superior learning ability to automatically learn the structure information. In particular, autoencoder based architectures are elaborately designed to solve blind unmixing and model complex nonlinear mixtures. Nevertheless, these methods perform unmixing task as blackboxes and lack of interpretability. On the other hand, conventional unmixing methods carefully design the regularizer to add explicit information, in which algorithms such as plug-and-play (PnP) strategies utilize off-the-shelf denoisers to plug powerful priors. In this paper, we propose a generic unmixing framework to integrate the autoencoder network with regularization by denoising (RED), named AE-RED. More specially, we decompose the unmixing optimized problem into two subproblems. The first one is solved using deep autoencoders to implicitly regularize the estimates and model the mixture mechanism. The second one leverages the denoiser to bring in the explicit information. In this way, both the characteristics of the deep autoencoder based unmixing methods and priors provided by denoisers are merged into our well-designed framework to enhance the unmixing performance. Experiment results on both synthetic and real data sets show the superiority of our proposed framework compared with state-of-the-art unmixing approaches. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.17197 [pdf, other]

Guided Deep Generative Model-based Spatial Regularization for Multiband Imaging Inverse Problems

Authors: Min Zhao, Nicolas Dobigeon, Jie Chen

Abstract: When adopting a model-based formulation, solving inverse problems encountered in multiband imaging requires to define spatial and spectral regularizations. In most of the works of the literature, spectral information is extracted from the observations directly to derive data-driven spectral priors. Conversely, the choice of the spatial regularization often boils down to the use of conventional pen… ▽ More When adopting a model-based formulation, solving inverse problems encountered in multiband imaging requires to define spatial and spectral regularizations. In most of the works of the literature, spectral information is extracted from the observations directly to derive data-driven spectral priors. Conversely, the choice of the spatial regularization often boils down to the use of conventional penalizations (e.g., total variation) promoting expected features of the reconstructed image (e.g., piecewise constant). In this work, we propose a generic framework able to capitalize on an auxiliary acquisition of high spatial resolution to derive tailored data-driven spatial regularizations. This approach leverages on the ability of deep learning to extract high level features. More precisely, the regularization is conceived as a deep generative network able to encode spatial semantic features contained in this auxiliary image of high spatial resolution. To illustrate the versatility of this approach, it is instantiated to conduct two particular tasks, namely multiband image fusion and multiband image inpainting. Experimental results obtained on these two tasks demonstrate the benefit of this class of informed regularizations when compared to more conventional ones. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.05775 [pdf, other]

Weight Freezing: A Regularization Approach for Fully Connected Layers with an Application in EEG Classification

Authors: Zhengqing Miao, Meirong Zhao

Abstract: In the realm of EEG decoding, enhancing the performance of artificial neural networks (ANNs) carries significant potential. This study introduces a novel approach, termed "weight freezing", that is anchored on the principles of ANN regularization and neuroscience prior knowledge. The concept of weight freezing revolves around the idea of reducing certain neurons' influence on the decision-making p… ▽ More In the realm of EEG decoding, enhancing the performance of artificial neural networks (ANNs) carries significant potential. This study introduces a novel approach, termed "weight freezing", that is anchored on the principles of ANN regularization and neuroscience prior knowledge. The concept of weight freezing revolves around the idea of reducing certain neurons' influence on the decision-making process for a specific EEG task by freezing specific weights in the fully connected layer during the backpropagation process. This is actualized through the use of a mask matrix and a threshold to determine the proportion of weights to be frozen during backpropagation. Moreover, by setting the masked weights to zero, weight freezing can not only realize sparse connections in networks with a fully connected layer as the classifier but also function as an efficacious regularization method for fully connected layers. Through experiments involving three distinct ANN architectures and three widely recognized EEG datasets, we validate the potency of weight freezing. Our method significantly surpasses previous peak performances in classification accuracy across all examined datasets. Supplementary control experiments offer insights into performance differences pre and post weight freezing implementation and scrutinize the influence of the threshold in the weight freezing process. Our study underscores the superior efficacy of weight freezing compared to traditional fully connected networks for EEG feature classification tasks. With its proven effectiveness, this innovative approach holds substantial promise for contributing to future strides in EEG decoding research. △ Less

Submitted 11 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: 16 pages, 5 figures

arXiv:2306.05704 [pdf, other]

Exploring Effective Mask Sampling Modeling for Neural Image Compression

Authors: Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Abstract: Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose… ▽ More Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression. Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage. Moreover, to further reduce channel redundancy, we propose the Learnable Channel Mask Module (LCMM) and the Learnable Channel Completion Module (LCCM). Our plug-and-play CMSM, LCMM, LCCM modules can apply to both CNN-based and Transformer-based architectures, significantly reduce the computational cost, and improve the quality of images. Experiments on the public Kodak and Tecnick datasets demonstrate that our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 10 pages

arXiv:2304.14508 [pdf]

3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation

Authors: Rui Nian, Guoyao Zhang, Yao Sui, Yuqi Qian, Qiuying Li, Mingzhang Zhao, Jianhui Li, Ali Gholipour, Simon K. Warfield

Abstract: Magnetic resonance imaging (MRI) is critically important for brain mapping in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neu… ▽ More Magnetic resonance imaging (MRI) is critically important for brain mapping in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neural networks. By the nature of limited receptive fields, however, those architectures are subject to representing long-range spatial dependencies of the voxel intensities in MRI images. Transformers have been leveraged recently to address the above limitations of convolutional networks. Unfortunately, the majority of current Transformers-based methods in segmentation are performed with 2D MRI slices, instead of 3D volumes. Moreover, it is difficult to incorporate the structures between layers because each head is calculated independently in the Multi-Head Self-Attention mechanism (MHSA). In this work, we proposed a 3D Transformer-based segmentation approach. We developed a Fusion-Head Self-Attention mechanism (FHSA) to combine each attention head through attention logic and weight mapping, for the exploration of the long-range spatial dependencies in 3D MRI images. We implemented a plug-and-play self-attention module, named the Infinite Deformable Fusion Transformer Module (IDFTM), to extract features on any deformable feature maps. We applied our approach to the task of brain tumor segmentation, and assessed it on the public BRATS datasets. The experimental results demonstrated that our proposed approach achieved superior performance, in comparison to several state-of-the-art segmentation methods. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: 10 pages, 4 figures

MSC Class: 68T07 ACM Class: I.4.6; I.5.1

arXiv:2304.01461 [pdf, other]

Time-space-frequency feature Fusion for 3-channel motor imagery classification

Authors: Zhengqing Miao, Meirong Zhao

Abstract: Low-channel EEG devices are crucial for portable and entertainment applications. However, the low spatial resolution of EEG presents challenges in decoding low-channel motor imagery. This study introduces TSFF-Net, a novel network architecture that integrates time-space-frequency features, effectively compensating for the limitations of single-mode feature extraction networks based on time-series… ▽ More Low-channel EEG devices are crucial for portable and entertainment applications. However, the low spatial resolution of EEG presents challenges in decoding low-channel motor imagery. This study introduces TSFF-Net, a novel network architecture that integrates time-space-frequency features, effectively compensating for the limitations of single-mode feature extraction networks based on time-series or time-frequency modalities. TSFF-Net comprises four main components: time-frequency representation, time-frequency feature extraction, time-space feature extraction, and feature fusion and classification. Time-frequency representation and feature extraction transform raw EEG signals into time-frequency spectrograms and extract relevant features. The time-space network processes time-series EEG trials as input and extracts temporal-spatial features. Feature fusion employs MMD loss to constrain the distribution of time-frequency and time-space features in the Reproducing Kernel Hilbert Space, subsequently combining these features using a weighted fusion approach to obtain effective time-space-frequency features. Moreover, few studies have explored the decoding of three-channel motor imagery based on time-frequency spectrograms. This study proposes a shallow, lightweight decoding architecture (TSFF-img) based on time-frequency spectrograms and compares its classification performance in low-channel motor imagery with other methods using two publicly available datasets. Experimental results demonstrate that TSFF-Net not only compensates for the shortcomings of single-mode feature extraction networks in EEG decoding, but also outperforms other state-of-the-art methods. Overall, TSFF-Net offers considerable advantages in decoding low-channel motor imagery and provides valuable insights for algorithmically enhancing low-channel EEG decoding. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 15 pages, 4 Figures

arXiv:2303.16407 [pdf, other]

LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability

Authors: Zhengqing Miao, Xin Zhang, Meirong Zhao, Dong Ming

Abstract: EEG-based recognition of activities and states involves the use of prior neuroscience knowledge to generate quantitative EEG features, which may limit BCI performance. Although neural network-based methods can effectively extract features, they often encounter issues such as poor generalization across datasets, high predicting volatility, and low model interpretability. Hence, we propose a novel l… ▽ More EEG-based recognition of activities and states involves the use of prior neuroscience knowledge to generate quantitative EEG features, which may limit BCI performance. Although neural network-based methods can effectively extract features, they often encounter issues such as poor generalization across datasets, high predicting volatility, and low model interpretability. Hence, we propose a novel lightweight multi-dimensional attention network, called LMDA-Net. By incorporating two novel attention modules designed specifically for EEG signals, the channel attention module and the depth attention module, LMDA-Net can effectively integrate features from multiple dimensions, resulting in improved classification performance across various BCI tasks. LMDA-Net was evaluated on four high-impact public datasets, including motor imagery (MI) and P300-Speller paradigms, and was compared with other representative models. The experimental results demonstrate that LMDA-Net outperforms other representative methods in terms of classification accuracy and predicting volatility, achieving the highest accuracy in all datasets within 300 training epochs. Ablation experiments further confirm the effectiveness of the channel attention module and the depth attention module. To facilitate an in-depth understanding of the features extracted by LMDA-Net, we propose class-specific neural network feature interpretability algorithms that are suitable for event-related potentials (ERPs) and event-related desynchronization/synchronization (ERD/ERS). By mapping the output of the specific layer of LMDA-Net to the time or spatial domain through class activation maps, the resulting feature visualizations can provide interpretable analysis and establish connections with EEG time-spatial analysis in neuroscience. In summary, LMDA-Net shows great potential as a general online decoding model for various EEG tasks. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: 20 pages, 7 Figures

arXiv:2302.12368 [pdf, other]

Power System Recovery Coordinated with (Non-)Black-Start Generators

Authors: Meng Zhao, Patrick R. Maloney, Xinda Ke, Juan Carlos Bedoya Ceballos, Xiaoyuan Fan, Marcelo A. Elizondo

Abstract: Power restoration is an urgent task after a black-out, and recovery efficiency is critical when quantifying system resilience. Multiple elements should be considered to restore the power system quickly and safely. This paper proposes a recovery model to solve a direct-current optimal power flow (DCOPF) based on mixed-integer linear programming (MILP). Since most of the generators cannot start inde… ▽ More Power restoration is an urgent task after a black-out, and recovery efficiency is critical when quantifying system resilience. Multiple elements should be considered to restore the power system quickly and safely. This paper proposes a recovery model to solve a direct-current optimal power flow (DCOPF) based on mixed-integer linear programming (MILP). Since most of the generators cannot start independently, the interaction between black-start (BS) and non-black-start (NBS) generators must be modeled appropriately. The energization status of the NBS is coordinated with the recovery status of transmission lines, and both of them are modeled as binary variables. Also, only after an NBS unit receives cranking power through connected transmission lines, will it be allowed to participate in the following system dispatch. The amount of cranking power is estimated as a fixed proportion of the maximum generation capacity. The proposed model is validated on several test systems, as well as a 1393-bus representation system of the Puerto Rican electric power grid. Test results demonstrate how the recovery of NBS units and damaged transmission lines can be optimized, resulting in an efficient and well-coordinated recovery procedure. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 5 pages, 6 figures

arXiv:2302.02587 [pdf, other]

Joint Scattering Environment Sensing and Channel Estimation Based on Non-stationary Markov Random Field

Authors: Wenkang Xu, Yongbo Xiao, An Liu, Ming Lei, Minjian Zhao

Abstract: This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a two-dimensional (2-D) joint burst sparsity. We propose a joint scatterin… ▽ More This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a two-dimensional (2-D) joint burst sparsity. We propose a joint scattering environment sensing and channel estimation scheme to enhance the target/scatterer localization and channel estimation performance simultaneously, where a spatially non-stationary Markov random field (MRF) model is proposed to capture the 2-D joint burst sparsity. An expectation maximization (EM) based method is designed to solve the joint estimation problem, where the E-step obtains the Bayesian estimation of the radar and communication channels and the M-step automatically learns the dynamic position grid and prior parameters in the MRF. However, the existing sparse Bayesian inference methods used in the E-step involve a high-complexity matrix inverse per iteration. Moreover, due to the complicated non-stationary MRF prior, the complexity of M-step is exponentially large. To address these difficulties, we propose an inverse-free variational Bayesian inference algorithm for the E-step and a low-complexity method based on pseudo-likelihood approximation for the M-step. In the simulations, the proposed scheme can achieve a better performance than the state-of-the-art method while reducing the computational overhead significantly. △ Less

Submitted 18 July, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: 15 pages, 13 figures, submitted to IEEE Transactions on Wireless Communications

arXiv:2302.01619 [pdf, other]

Joint Scattering Environment Sensing and Channel Estimation for Integrated Sensing and Communication

Authors: Wenkang Xu, Yongbo Xiao, An Liu, Minjian Zhao

Abstract: This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a partially common sparsity. By exploiting this, we propose a joint scatte… ▽ More This paper considers an integrated sensing and communication system, where some radar targets also serve as communication scatterers. A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a partially common sparsity. By exploiting this, we propose a joint scattering environment sensing and channel estimation scheme to enhance the target/scatterer localization and channel estimation performance simultaneously. Specifically, the base station (BS) first transmits downlink pilots to sense the targets in the scattering environment. Then the user transmits uplink pilots to estimate the communication channel. Finally, joint scattering environment sensing and channel estimation are performed at the BS based on the reflected downlink pilot signal and received uplink pilot signal. A message passing based algorithm is designed by combining the turbo approach and the expectation maximization method. The advantages of our proposed scheme are verified in the simulations. △ Less

Submitted 3 February, 2023; originally announced February 2023.

arXiv:2302.00953 [pdf]

Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan

Authors: Meng Zhao, Yifan Hu, Ruixuan Jiang, Yuanli Zhao, Dong Zhang, Yan Zhang, Rong Wang, Yong Cao, Qian Zhang, Yonggang Ma, Jiaxi Li, Shaochen Yu, Wenjie Li, Ran Zhang, Yefeng Zheng, Shuo Wang, Jizong Zhao

Abstract: Background: To develop an artificial intelligence system that can accurately identify acute non-traumatic intracranial hemorrhage (ICH) etiology based on non-contrast CT (NCCT) scans and investigate whether clinicians can benefit from it in a diagnostic setting. Materials and Methods: The deep learning model was developed with 1868 eligible NCCT scans with non-traumatic ICH collected between Janua… ▽ More Background: To develop an artificial intelligence system that can accurately identify acute non-traumatic intracranial hemorrhage (ICH) etiology based on non-contrast CT (NCCT) scans and investigate whether clinicians can benefit from it in a diagnostic setting. Materials and Methods: The deep learning model was developed with 1868 eligible NCCT scans with non-traumatic ICH collected between January 2011 and April 2018. We tested the model on two independent datasets (TT200 and SD 98) collected after April 2018. The model's diagnostic performance was compared with clinicians's performance. We further designed a simulated study to compare the clinicians's performance with and without the deep learning system augmentation. Results: The proposed deep learning system achieved area under the receiver operating curve of 0.986 (95% CI 0.967-1.000) on aneurysms, 0.952 (0.917-0.987) on hypertensive hemorrhage, 0.950 (0.860-1.000) on arteriovenous malformation (AVM), 0.749 (0.586-0.912) on Moyamoya disease (MMD), 0.837 (0.704-0.969) on cavernous malformation (CM), and 0.839 (0.722-0.959) on other causes in TT200 dataset. Given a 90% specificity level, the sensitivities of our model were 97.1% and 90.9% for aneurysm and AVM diagnosis, respectively. The model also shows an impressive generalizability in an independent dataset SD98. The clinicians achieve significant improvements in the sensitivity, specificity, and accuracy of diagnoses of certain hemorrhage etiologies with proposed system augmentation. Conclusions: The proposed deep learning algorithms can be an effective tool for early identification of hemorrhage etiologies based on NCCT scans. It may also provide more information for clinicians for triage and further imaging examination selection. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.13507 [pdf, ps, other]

An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features

Authors: Mengyisong Zhao, Morgan Harvey, David Cameron, Frank Hopfgartner, Valerie J. Gillet

Abstract: Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboa… ▽ More Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata, including song audio features provided by Spotify, song lyrics, and novel metadata-based features (title topic, popularity continuity and genre class). Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our results show that Random Forest (RF) and Logistic Regression (LR) with all features (including novel features, song audio features and lyrics features) outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively. Our findings also demonstrate the utility of our novel music metadata features, which contributed most to the models' discriminative performance. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2211.16666 [pdf, ps, other]

Secrecy Rate Maximization of RIS-assisted SWIPT Systems: A Two-Timescale Beamforming Design Approach

Authors: Ming-Min Zhao, Kaidi Xu, Yunlong Cai, Yong Niu, Lajos Hanzo

Abstract: Reconfigurable intelligent surfaces (RISs) achieve high passive beamforming gains for signal enhancement or interference nulling by dynamically adjusting their reflection coefficients. Their employment is particularly appealing for improving both the wireless security and the efficiency of radio frequency (RF)-based wireless power transfer. Motivated by this, we conceive and investigate a RIS-assi… ▽ More Reconfigurable intelligent surfaces (RISs) achieve high passive beamforming gains for signal enhancement or interference nulling by dynamically adjusting their reflection coefficients. Their employment is particularly appealing for improving both the wireless security and the efficiency of radio frequency (RF)-based wireless power transfer. Motivated by this, we conceive and investigate a RIS-assisted secure simultaneous wireless information and power transfer (SWIPT) system designed for information and power transfer from a base station (BS) to an information user (IU) and to multiple energy users (EUs), respectively. Moreover, the EUs are also potential eavesdroppers that may overhear the communication between the BS and IU. We adopt two-timescale transmission for reducing the signal processing complexity as well as channel training overhead, and aim for maximizing the average worst-case secrecy rate achieved by the IU. This is achieved by jointly optimizing the short-term transmit beamforming vectors at the BS as well as the long-term phase shifts at the RIS, under the energy harvesting constraints considered at the EUs and the power constraint at the BS. The stochastic optimization problem formulated is non-convex with intricately coupled variables, and is non-smooth due to the existence of multiple EUs/eavesdroppers. No standard optimization approach is available for this challenging scenario. To tackle this challenge, we propose a smooth approximation aided stochastic successive convex approximation (SA-SSCA) algorithm. Furthermore, a low-complexity heuristic algorithm is proposed for reducing the computational complexity without unduly eroding the performance. Simulation results show the efficiency of the RIS in securing SWIPT systems. The significant performance gains achieved by our proposed algorithms over the relevant benchmark schemes are also demonstrated. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 16 pages, 12 figures, accepted for publication in IEEE Transactions on Wireless Communications

arXiv:2211.12082 [pdf, other]

Brain MRI-to-PET Synthesis using 3D Convolutional Attention Networks

Authors: Ramy Hussein, David Shin, Moss Zhao, Jia Guo, Guido Davidzon, Michael Moseley, Greg Zaharchuk

Abstract: Accurate quantification of cerebral blood flow (CBF) is essential for the diagnosis and assessment of a wide range of neurological diseases. Positron emission tomography (PET) with radiolabeled water (15O-water) is considered the gold-standard for the measurement of CBF in humans. PET imaging, however, is not widely available because of its prohibitive costs and use of short-lived radiopharmaceuti… ▽ More Accurate quantification of cerebral blood flow (CBF) is essential for the diagnosis and assessment of a wide range of neurological diseases. Positron emission tomography (PET) with radiolabeled water (15O-water) is considered the gold-standard for the measurement of CBF in humans. PET imaging, however, is not widely available because of its prohibitive costs and use of short-lived radiopharmaceutical tracers that typically require onsite cyclotron production. Magnetic resonance imaging (MRI), in contrast, is more readily accessible and does not involve ionizing radiation. This study presents a convolutional encoder-decoder network with attention mechanisms to predict gold-standard 15O-water PET CBF from multi-sequence MRI scans, thereby eliminating the need for radioactive tracers. Inputs to the prediction model include several commonly used MRI sequences (T1-weighted, T2-FLAIR, and arterial spin labeling). The model was trained and validated using 5-fold cross-validation in a group of 126 subjects consisting of healthy controls and cerebrovascular disease patients, all of whom underwent simultaneous $15O-water PET/MRI. The results show that such a model can successfully synthesize high-quality PET CBF measurements (with an average SSIM of 0.924 and PSNR of 38.8 dB) and is more accurate compared to concurrent and previous PET synthesis methods. We also demonstrate the clinical significance of the proposed algorithm by evaluating the agreement for identifying the vascular territories with abnormally low CBF. Such methods may enable more widespread and accurate CBF evaluation in larger cohorts who cannot undergo PET imaging due to radiation concerns, lack of access, or logistic challenges. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 19 pages, 14 figures

arXiv:2210.16197 [pdf]

Dimensionality Reduced Antenna Array for Beamforming/steering

Authors: Shiyi Xia, Mingyang Zhao, Qian Ma, Xunnan Zhang, Ling Yang, Yazhi Pi, Hyunchul Chung, Ad Reniers, A. M. J. Koonen, Zizheng Cao

Abstract: Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of larg… ▽ More Beamforming makes possible a focused communication method. It is extensively employed in many disciplines involving electromagnetic waves, including arrayed ultrasonic, optical, and high-speed wireless communication. Conventional beam steering often requires the addition of separate active amplitude phase control units after each radiating element. The high power consumption and complexity of large-scale phased arrays can be overcome by reducing the number of active controllers, pushing beamforming into satellite communications and deep space exploration. Here, we suggest a brand-new design for a phased array antenna with a dimension reduced cascaded angle offset (DRCAO-PAA). Furthermore, the suggested DRCAO-PAA was compressed by using the concept of singular value deposition. To pave the way for practical application the particle swarm optimization algorithm and deep neural network Transformer were adopted. Based on this theoretical framework, an experimental board was built to verify the theory. Finally, the 16/8/4 -array beam steering was demonstrated by using 4/3/2 active controllers, respectively. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2210.14725 [pdf, other]

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

Authors: Xulong Zhang, Jianzong Wang, Ning Cheng, Mengyuan Zhao, Zhiyong Zhang, Jing Xiao

Abstract: The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an acoustic encoder renders the language model from ground-truth sequences in an auto-regressive manner during training. However, the training corpus of the decoder… ▽ More The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an acoustic encoder renders the language model from ground-truth sequences in an auto-regressive manner during training. However, the training corpus of the decoder is limited to the speech transcriptions, which is far less than the corpus needed to train an acceptable language model. This leads to poor robustness of decoder. To alleviate this problem, we propose linguistic-enhanced transformer, which introduces refined CTC information to decoder during training process, so that the decoder can be more robust. Our experiments on AISHELL-1 speech corpus show that the character error rate (CER) is relatively reduced by up to 7%. We also find that in joint CTC-Attention ASR model, decoder is more sensitive to linguistic information than acoustic information. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: Accepted by ECAISS2022, The Fourth International Workshop on Edge Computing and Artificial Intelligence based Sensor-Cloud System

arXiv:2210.08483 [pdf, ps, other]

Direct Computing on Control Capability for Linear Continuous-time Systems Based on Hurwitz Matrix

Authors: Mingwang Zhao

Abstract: In this paper, based on the controllable canonical form and the Hurwitz matrix of the Hurwitz stability criterion, an analytical volume computing method for the smooth controllability zonotope for the linear continuous-time(LCT) systems, without of help of the eigenvalue computing of the systems, is presented. And then, the computing method is generlized to the volume computing of the controllabil… ▽ More In this paper, based on the controllable canonical form and the Hurwitz matrix of the Hurwitz stability criterion, an analytical volume computing method for the smooth controllability zonotope for the linear continuous-time(LCT) systems, without of help of the eigenvalue computing of the systems, is presented. And then, the computing method is generlized to the volume computing of the controllability ellipsoid of the LCT systems. Because the controllability zonotope and ellipsoid are directly related to control capability and their volumes are the main index describing the control capability, the new volume computing methods proposed in this paper can help greatly the computing, analysis and optimization of the control capability of LCT systems. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: 16 pages

arXiv:2210.08480 [pdf, ps, other]

Analytical Volume Analysis for the Finite-time Controllable Region of the Linear Discrete-time Systems

Authors: Mingwang Zhao

Abstract: In this paper, the works on the analytical volume analysis for the controllable regions of the linear discrete-time (LDT) systems in papers \cite{zhaomw202001} and \cite {zhaomw202004} are discussed further and a new theorem on the analytical computing for the finite-time controllability zonotope (controllable region) of LDT systems are proven. And then, three analytical factors describing the con… ▽ More In this paper, the works on the analytical volume analysis for the controllable regions of the linear discrete-time (LDT) systems in papers \cite{zhaomw202001} and \cite {zhaomw202004} are discussed further and a new theorem on the analytical computing for the finite-time controllability zonotope (controllable region) of LDT systems are proven. And then, three analytical factors describing the control capability of the systems are deconstructed successfully from the analytical volume expression of the controllable region. Finally, the theorem is generalized to three cases: the narrow controllable region, the matrix $A$ with $n$ negative eigenvalues, the linear continuous-time systems. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: 27 pages

arXiv:2210.06111 [pdf, ps, other]

THUEE system description for NIST 2020 SRE CTS challenge

Authors: Yu Zheng, Jinghan Peng, Miao Zhao, Yufeng Ma, Min Liu, Xinyue Ma, Tianyu Liang, Tianlong Kong, Liang He, Minqiang Xu

Abstract: This paper presents the system description of the THUEE team for the NIST 2020 Speaker Recognition Evaluation (SRE) conversational telephone speech (CTS) challenge. The subsystems including ResNet74, ResNet152, and RepVGG-B2 are developed as speaker embedding extractors in this evaluation. We used combined AM-Softmax and AAM-Softmax based loss functions, namely CM-Softmax. We adopted a two-staged… ▽ More This paper presents the system description of the THUEE team for the NIST 2020 Speaker Recognition Evaluation (SRE) conversational telephone speech (CTS) challenge. The subsystems including ResNet74, ResNet152, and RepVGG-B2 are developed as speaker embedding extractors in this evaluation. We used combined AM-Softmax and AAM-Softmax based loss functions, namely CM-Softmax. We adopted a two-staged training strategy to further improve system performance. We fused all individual systems as our final submission. Our approach leads to excellent performance and ranks 1st in the challenge. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 3 pages, 1 table; System desciption of NIST 2020 SRE CTS challenge

arXiv:2208.12133 [pdf, other]

doi 10.1145/3536221.3558066

The ReprGesture entry to the GENEA Challenge 2022

Authors: Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao

Abstract: This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022. The GENEA challenge provides the processed datasets and performs crowdsourced evaluations to compare the performance of different gesture generation systems. In this paper, we explore an automatic gesture generation system based on multimodal representatio… ▽ More This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022. The GENEA challenge provides the processed datasets and performs crowdsourced evaluations to compare the performance of different gesture generation systems. In this paper, we explore an automatic gesture generation system based on multimodal representation learning. We use WavLM features for audio, FastText features for text and position and rotation matrix features for gesture. Each modality is projected to two distinct subspaces: modality-invariant and modality-specific. To learn inter-modality-invariant commonalities and capture the characters of modality-specific representations, gradient reversal layer based adversarial classifier and modality reconstruction decoders are used during training. The gesture decoder generates proper gestures using all representations and features related to the rhythm in the audio. Our code, pre-trained models and demo are available at https://github.com/YoungSeng/ReprGesture. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 8 pages, 4 figures, ICMI 2022

arXiv:2208.03752 [pdf]

doi 10.1007/s12350-023-03226-2

Automatic reorientation by deep learning to generate short axis SPECT myocardial perfusion images

Authors: Fubao Zhu, Guojie Wang, Chen Zhao, Saurabh Malhotra, Min Zhao, Zhuo He, Jianzhou Shi, Zhixin Jiang, Weihua Zhou

Abstract: Single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) can be displayed both in traditional short-axis (SA) cardiac planes and polar maps for interpretation and quantification. It is essential to reorient the reconstructed transaxial SPECT MPI into standard SA slices. This study is aimed to develop a deep-learning-based approach for automatic reorientation of MPI. Met… ▽ More Single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) can be displayed both in traditional short-axis (SA) cardiac planes and polar maps for interpretation and quantification. It is essential to reorient the reconstructed transaxial SPECT MPI into standard SA slices. This study is aimed to develop a deep-learning-based approach for automatic reorientation of MPI. Methods: A total of 254 patients were enrolled, including 228 stress SPECT MPIs and 248 rest SPECT MPIs. Five-fold cross-validation with 180 stress and 201 rest MPIs was used for training and internal validation; the remaining images were used for testing. The rigid transformation parameters (translation and rotation) from manual reorientation were annotated by an experienced operator and used as the ground truth. A convolutional neural network (CNN) was designed to predict the transformation parameters. Then, the derived transform was applied to the grid generator and sampler in spatial transformer network (STN) to generate the reoriented image. A loss function containing mean absolute errors for translation and mean square errors for rotation was employed. A three-stage optimization strategy was adopted for model optimization: 1) optimize the translation parameters while fixing the rotation parameters; 2) optimize rotation parameters while fixing the translation parameters; 3) optimize both translation and rotation parameters together. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: 27 pages,7 figures

arXiv:2207.10427 [pdf, other]

A Two-stage Multiband WiFi Sensing Scheme via Stochastic Particle-Based Variational Bayesian Inference

Authors: Zhixiang Hu, An Liu, Yubo Wan, Tony Xiao Han, Minjian Zhao

Abstract: Multiband fusion enhances WiFi sensing by jointly utilizing signals from multiple non-contiguous frequency bands. However, in the multi-band WiFi sensing signal model, there are many local optimums in the associated likelihood function due to the existence of high frequency component and phase distortion factors, posing challenges for high-accuracy parameter estimation. To address this, we propose… ▽ More Multiband fusion enhances WiFi sensing by jointly utilizing signals from multiple non-contiguous frequency bands. However, in the multi-band WiFi sensing signal model, there are many local optimums in the associated likelihood function due to the existence of high frequency component and phase distortion factors, posing challenges for high-accuracy parameter estimation. To address this, we propose a two-stage scheme equipped with different signal models derived from the original model, where the first-stage coarse estimation is performed using a weighted root MUSIC algorithm to narrow down the search range for the subsequent stage, and the second-stage refined estimation utilizes a Bayesian approach to avoid convergence to bad suboptimal solutions. Specifically, we apply the block stochastic successive convex approximation (SSCA) approach to derive a novel stochastic particle-based variational Bayesian inference (SPVBI) algorithm in the refined stage. Unlike conventional particle-based VBI (PVBI) that optimizes only particle probability and incurs exponential per-iteration complexity with particle count, our more flexible SPVBI algorithm optimizes both the position and probability of each particle. Additionally, it utilizes block SSCA to significantly improve sampling efficiency by averaging over iterations, making it suitable for high-dimensional problems. Extensive simulations demonstrate the superiority of our proposed algorithm over various baseline methods. △ Less

Submitted 9 October, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.08123 [pdf, ps, other]

Latency Minimization for mmWave D2D Mobile Edge Computing Systems: Joint Task Allocation and Hybrid Beamforming Design

Authors: Yanzhen Liu, Yunlong Cai, An Liu, Minjian Zhao, Lajos Hanzo

Abstract: Mobile edge computing (MEC) and millimeter wave (mmWave) communications are capable of significantly reducing the network's delay and enhancing its capacity. In this paper we investigate a mmWave and device-to-device (D2D) assisted MEC system, in which user A carries out some computational tasks and shares the results with user B with the aid of a base station (BS). We propose a novel two-timescal… ▽ More Mobile edge computing (MEC) and millimeter wave (mmWave) communications are capable of significantly reducing the network's delay and enhancing its capacity. In this paper we investigate a mmWave and device-to-device (D2D) assisted MEC system, in which user A carries out some computational tasks and shares the results with user B with the aid of a base station (BS). We propose a novel two-timescale joint hybrid beamforming and task allocation algorithm to reduce the system latency whilst cut down the required signaling overhead. Specifically, the high-dimensional analog beamforming matrices are updated in a frame-based manner based on the channel state information (CSI) samples, where each frame consists of a number of time slots, while the low-dimensional digital beamforming matrices and the offloading ratio are optimized more frequently relied on the low-dimensional effective channel matrices in each time slot. A stochastic successive convex approximation (SSCA) based algorithm is developed to design the long-term analog beamforming matrices. As for the short-term variables, the digital beamforming matrices are optimized relying on the innovative penalty-concave convex procedure (penalty-CCCP) for handling the mmWave non-linear transmit power constraint, and the offloading ratio can be obtained via the derived closed-form solution. Simulation results verify the effectiveness of the proposed algorithm by comparing the benchmarks. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2206.05508 [pdf, other]

doi 10.1109/MSP.2022.3208987

Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Authors: Jie Chen, Min Zhao, Xiuheng Wang, Cédric Richard, Susanto Rahardja

Abstract: Spectral unmixing is one of the most important quantitative analysis tasks in hyperspectral data processing. Conventional physics-based models are characterized by clear interpretation. However they may not be suitable for analyzing scenes with unknown complex physical characteristics. Data-driven methods have developed rapidly in recent years, in particular deep learning methods because they poss… ▽ More Spectral unmixing is one of the most important quantitative analysis tasks in hyperspectral data processing. Conventional physics-based models are characterized by clear interpretation. However they may not be suitable for analyzing scenes with unknown complex physical characteristics. Data-driven methods have developed rapidly in recent years, in particular deep learning methods because they possess superior capability in modeling complex and nonlinear systems. Simply transferring these methods as black-boxes to conduct unmixing may lead to low physical interpretability and generalization ability. This article reviews hyperspectral unmixing works that integrate advantages of both physics-based models and data-driven methods by means of deep neural network structures design, prior design and loss design. Most of these methods derive from a common mathematical optimization framework, and combine good interpretability with high accuracy. △ Less

Submitted 27 August, 2022; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: IEEE Signal Process. Mag., to be published. Manuscript submitted March 14, 2022; revised June 25, 2022 and July 27, 2022; accepted August 27, 2022

arXiv:2204.12115 [pdf, ps, other]

Fast Successive-Cancellation Decoding of Polar Codes with Sequence Nodes

Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

Abstract: Due to the sequential nature of the successive-cancellation (SC) algorithm, the decoding of polar codes suffers from significant decoding latencies. Fast SC decoding is able to speed up the SC decoding process, by implementing parallel decoders at the intermediate levels of the SC decoding tree for some special nodes with specific information and frozen bit patterns. To further improve the paralle… ▽ More Due to the sequential nature of the successive-cancellation (SC) algorithm, the decoding of polar codes suffers from significant decoding latencies. Fast SC decoding is able to speed up the SC decoding process, by implementing parallel decoders at the intermediate levels of the SC decoding tree for some special nodes with specific information and frozen bit patterns. To further improve the parallelism of SC decoding, this paper present a new class of special nodes composed of a sequence of rate one or single-parity-check (SR1/SPC) nodes, which can be easily found especially in high-rate polar code and is able to envelop a wide variety of existing special node types. Then, we analyse the parity constraints caused by the frozen bits in each descendant node, such that the decoding performance of the SR1/SPC node can be preserved once the parity constraints are satisfied. Finally, a generalized fast decoding algorithm is proposed to decode SR1/SPC nodes efficiently, where the corresponding parity constraints are taken into consideration. Simulation results show that the proposed decoding algorithm of the SR1/SPC node can achieve near-ML performance, and the overall decoding latency can be reduced by 43.8% as compared to the state-of-the-art fast SC decoder. △ Less

Submitted 18 November, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: 30 pages, 6 figures, submitted for possible journal publication

Showing 1–50 of 126 results for author: Zhao, M