Search | arXiv e-print repository

arXiv:2407.14947 [pdf, other]

A Distributionally Robust Optimization Framework for Stochastic Assessment of Power System Flexibility in Economic Dispatch

Authors: Xinyi Zhao, Lei Fan, Fei Ding, Weijia Liu, Chaoyue Zhao

Abstract: Given the complexity of power systems, particularly the high-dimensional variability of net loads, accurately depicting the entire operational range of net loads poses a challenge. To address this, recent methodologies have sought to gauge the maximum range of net load uncertainty across all buses. In this paper, we consider the stochastic nature of the net load and introduce a distributionally ro… ▽ More Given the complexity of power systems, particularly the high-dimensional variability of net loads, accurately depicting the entire operational range of net loads poses a challenge. To address this, recent methodologies have sought to gauge the maximum range of net load uncertainty across all buses. In this paper, we consider the stochastic nature of the net load and introduce a distributionally robust optimization framework that assesses system flexibility stochastically, accommodating a minimal extent of system violations. We verify the proposed method by solving the flexibility of the real-time economic dispatch problem on four IEEE standard test systems. Compared to traditional deterministic flexibility evaluations, our approach consistently yields less conservative flexibility outcomes. △ Less

Submitted 20 July, 2024; originally announced July 2024.

arXiv:2406.11697 [pdf, other]

GridSweep Simulation: Measuring Subsynchronous Impedance Spectra of Distribution Feeder

Authors: Lingling Fan, Zhixin Miao, Jason MacDonald, Alex McEachern

Abstract: Peaks and troughs in the subsynchronous impedance spectrum of a distribution feeder may be a useful indication of oscillation risk, or more importantly lack of oscillation risk, if inverter-based resource (IBR) deployments are increased on that feeder. GridSweep is a new instrument for measuring the subsynchronous impedance spectra of distribution feeders. It combines an active probing device that… ▽ More Peaks and troughs in the subsynchronous impedance spectrum of a distribution feeder may be a useful indication of oscillation risk, or more importantly lack of oscillation risk, if inverter-based resource (IBR) deployments are increased on that feeder. GridSweep is a new instrument for measuring the subsynchronous impedance spectra of distribution feeders. It combines an active probing device that modulates a 120-volt 1-kW load sinusoidally at a user-selected GPS-phase locked frequency from 1.0 to 40.0 Hz, and with a recorder that takes ultra-high-precision continuous point-on-wave (CPOW) 120-volt synchrowaveforms at 4 kHz. This paper presents a computer simulation of GridSweep's probing and measurement capability. We construct an electromagnetic transient (EMT) simulation of a single-phase distribution feeder equipped with multiple inverter-based resources (IBRs). We include a model of the GridSweep probing device, then demonstrate the model's capability to measure the subsynchronous apparent impedance spectrum of the feeder. Peaks in that spectrum align with the system's dominant oscillation modes caused by IBRs. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 10 pages, 18 figures

arXiv:2312.17407 [pdf]

Comparing roughness maps generated by five roughness descriptors for LiDAR-derived digital elevation models

Authors: Lei Fan, Yang Zhao

Abstract: Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature. This study compares five commonly used roughness descriptors, exploring correlations among their quantified terrain surface roughness maps across three terrains with distinct spatial variations. Additionally, the study investigates the impacts o… ▽ More Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature. This study compares five commonly used roughness descriptors, exploring correlations among their quantified terrain surface roughness maps across three terrains with distinct spatial variations. Additionally, the study investigates the impacts of spatial scales and interpolation methods on these correlations. Dense point cloud data obtained through Light Detection and Ranging technique are used in this study. The findings highlight both global pattern similarities and local pattern distinctions in the derived roughness maps, emphasizing the significance of incorporating multiple descriptors in studies where local roughness values play a crucial role in subsequent analyses. The spatial scales were found to have a smaller impact on rougher terrain, while interpolation methods had minimal influence on roughness maps derived from different descriptors. △ Less

Submitted 13 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 14 pages, 6 figures

arXiv:2312.14448 [pdf, other]

doi 10.1109/TNSE.2024.3435444

Quantum-Assisted Joint Caching and Power Allocation for Integrated Satellite-Terrestrial Networks

Authors: Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo

Abstract: Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an opt… ▽ More Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an optimization model to maximize the network throughput by jointly optimizing content delivery policy, cache placement, and transmission power allocation. The resulting optimization model is a large-scale mixed-integer nonlinear program (MINLP) that is intractable for classical computer solvers. Inspired by quantum computing techniques, we propose a hybrid quantum-classical generalized Benders' decomposition (HQCGBD) algorithm to address this challenge. Specifically, we first exploit the generalized Benders' decomposition (GBD) to decompose the problem into a master problem and a subproblem and then leverage the state-of-art quantum annealer to solve the challenging master problem. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2310.04645 [pdf, other]

Do self-supervised speech and language models extract similar representations as human brain?

Authors: Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

Abstract: Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models,… ▽ More Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech contextual representations in SSL models and their alignment with the neural network underlying speech perception, offering valuable insights into both SSL models and the neural basis of speech and language processing. △ Less

Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2310.04644 [pdf, other]

Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction

Authors: Jiawei Li, Chunxu Guo, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

Abstract: Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performan… ▽ More Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility. △ Less

Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2309.03351 [pdf, other]

Using Neural Networks for Fast SAR Roughness Estimation of High Resolution Images

Authors: Li Fan, Jeova Farias Sales Rocha Neto

Abstract: The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. Th… ▽ More The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.02106 [pdf, other]

Leveraging Label Information for Multimodal Emotion Recognition

Authors: Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He

Abstract: Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label informat… ▽ More Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by Interspeech 2023

arXiv:2307.00327 [pdf]

SDRCNN: A single-scale dense residual connected convolutional neural network for pansharpening

Authors: Yuan Fang, Yuanzhi Cai, Lei Fan

Abstract: Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better… ▽ More Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better trade-off between accuracy and efficiency. The performance of SDRCNN was tested using four datasets from the WorldView-3, WorldView-2 and QuickBird satellites. The compared methods include eight traditional methods (i.e., GS, GSA, PRACS, BDSD, SFIM, GLP-CBD, CDIF and LRTCFPan) and five lightweight deep learning methods (i.e., PNN, PanNet, BayesianNet, DMDNet and FusionNet). Based on a visual inspection of the pansharpened images created and the associated absolute residual maps, SDRCNN exhibited least spatial detail blurring and spectral distortion, amongst all the methods considered. The values of the quantitative evaluation metrics were closest to their ideal values when SDRCNN was used. The processing time of SDRCNN was also the shortest among all methods tested. Finally, the effectiveness of each component in the SDRCNN was demonstrated in ablation experiments. All of these confirmed the superiority of SDRCNN. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: This paper has been accepted for publication in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

arXiv:2306.02541 [pdf, other]

OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition

Authors: Li Fu, Siqi Li, Qingtao Li, Fangzhu Li, Liping Deng, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He

Abstract: Self-Supervised Learning (SSL) Automatic Speech Recognition (ASR) models have shown great promise over Supervised Learning (SL) ones in low-resource settings. However, the advantages of SSL are gradually weakened when the amount of labeled data increases in many industrial applications. To further improve the ASR performance when abundant labels are available, we first explore the potential of com… ▽ More Self-Supervised Learning (SSL) Automatic Speech Recognition (ASR) models have shown great promise over Supervised Learning (SL) ones in low-resource settings. However, the advantages of SSL are gradually weakened when the amount of labeled data increases in many industrial applications. To further improve the ASR performance when abundant labels are available, we first explore the potential of combining SL and SSL ASR models via analyzing their complementarity in recognition accuracy and optimization property. Then, we propose a novel Optimal Transport based Fusion (OTF) method for SL and SSL models without incurring extra computation cost in inference. Specifically, optimal transport is adopted to softly align the layer-wise weights to unify the two different networks into a single one. Experimental results on the public 1k-hour English LibriSpeech dataset and our in-house 2.6k-hour Chinese dataset show that OTF largely outperforms the individual models with lower error rates. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted by Interspeech 2023

arXiv:2305.03250 [pdf, other]

Experimentally Realizing Convolution Processing in the Photonic Synthetic Frequency Dimension

Authors: Lingling Fan, Kai Wang, Heming Wang, Avik Dutt, Shanhui Fan

Abstract: Convolution is an essential operation in signal and image processing and consumes most of the computing power in convolutional neural networks. Photonic convolution has the promise of addressing computational bottlenecks and outperforming electronic implementations. Performing photonic convolution in the synthetic frequency dimension, which harnesses the dynamics of light in the spectral degrees o… ▽ More Convolution is an essential operation in signal and image processing and consumes most of the computing power in convolutional neural networks. Photonic convolution has the promise of addressing computational bottlenecks and outperforming electronic implementations. Performing photonic convolution in the synthetic frequency dimension, which harnesses the dynamics of light in the spectral degrees of freedom for photons, can lead to highly compact devices. Here we experimentally realize convolution operations in the synthetic frequency dimension. Using a modulated ring resonator, we synthesize arbitrary convolution kernels using a pre-determined modulation waveform with high accuracy. We demonstrate the convolution computation between input frequency combs and synthesized kernels. We also introduce the idea of an additive offset to broaden the kinds of kernels that can be implemented experimentally when the modulation strength is limited. Our work demonstrate the use of synthetic frequency dimension to efficiently encode data and implement computation tasks, leading to a compact and scalable photonic computation architecture. △ Less

Submitted 11 August, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: Science Advances, in press

arXiv:2301.04811 [pdf]

Deformation measurement of a soil mixing retaining wall using terrestrial laser scanning

Authors: Yang Zhao, Lei Fan, Hyungjoon Seo

Abstract: Retaining walls are often built to prevent excessive lateral movements of the ground surrounding an excavation site. During an excavation, failure of retaining walls could cause catastrophic accidents and hence their lateral deformations are monitored regularly. Laser scanning can rapidly acquire the spatial data of a relatively large area at fine spatial resolutions, which is ideal for monitoring… ▽ More Retaining walls are often built to prevent excessive lateral movements of the ground surrounding an excavation site. During an excavation, failure of retaining walls could cause catastrophic accidents and hence their lateral deformations are monitored regularly. Laser scanning can rapidly acquire the spatial data of a relatively large area at fine spatial resolutions, which is ideal for monitoring retaining walls' deformations. This paper attempts to apply laser scanning to measurements of the lateral deformations of a soil mixing retaining wall at an ongoing excavation site. Reference measurements by total station and inclinometer were also conducted to verify those from the laser scanning. The deformations derived using laser scanning data were consistent with the reference measurements at the top part of the retaining wall (i.e., mainly the ring beam of the wall). This research also shows that the multi-scale-model-to-model method was the most accurate deformation estimation method on the research data. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Comments: 22 pages

Journal ref: Lasers in Engineering: Volume 54, Number 1-3 (2023)

arXiv:2210.14515 [pdf, other]

UFO2: A unified pre-training framework for online and offline speech recognition

Authors: Li Fu, Siqi Li, Qingtao Li, Liping Deng, Fangzhu Li, Lu Fan, Meng Chen, Xiaodong He

Abstract: In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL… ▽ More In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL)-based ASR approach to a unified manner, where the model training is conditioned on both the full-context and dynamic-chunked inputs. To enhance the pre-trained representation model, stop-gradient operation is applied to decouple the online-mode objectives to the quantizer. Moreover, in both the pre-training and the downstream fine-tuning stages, joint losses are proposed to train the unified model with full-weight sharing for the two modes. Experimental results on the LibriSpeech dataset show that UFO2 outperforms the SSL-based baseline method by 29.7% and 18.2% relative WER reduction in offline and online modes, respectively. △ Less

Submitted 3 April, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: Accepted by ICASSP 2023

arXiv:2208.09635 [pdf, other]

Mobile Robot Navigation in Complex Polygonal Workspaces Using Conformal Navigation Transformations

Authors: Li Fan, Jianchang Liu

Abstract: This work proposes a novel transformation termed the conformal navigation transformation to achieve collision-free navigation of a robot in a workspace populated with arbitrary polygonal obstacles. The properties of the conformal navigation transformation in the polygonal workspace are investigated in this work as well as its capability to provide a solution to the navigation problem. %The propert… ▽ More This work proposes a novel transformation termed the conformal navigation transformation to achieve collision-free navigation of a robot in a workspace populated with arbitrary polygonal obstacles. The properties of the conformal navigation transformation in the polygonal workspace are investigated in this work as well as its capability to provide a solution to the navigation problem. %The properties of the conformal navigation transformation are investigated, which contribute to the solution of the robot navigation problem in complex polygonal environments. %which facilitates the navigation of robots in complex environments. The definition of the navigation function is generalized to accommodate non-smooth obstacle boundaries. Based on the proposed transformation and the generalized navigation function, a provably correct feedback controller is derived for the automatic guidance and motion control of the kinematic mobile robot. Moreover, an iterative method is proposed to construct the conformal navigation transformation in a multi-connected polygonal workspace, which transforms the multi-connected problem into multiple single-connected problems to achieve fast convergence.In addition to the analytic guarantees, the simulation study verifies the effectiveness of the proposed methodology in a workspace with non-trivial polygonal obstacles. △ Less

Submitted 20 August, 2022; originally announced August 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.06876

arXiv:2208.06876 [pdf, other]

Conformal Navigation Transformations with Application to Robot Navigation in Complex Workspaces

Authors: Li Fan, Jianchang Liu, Wenle Zhang, Peng Xu

Abstract: Navigation functions provide both path and motion planning, which can be used to ensure obstacle avoidance and convergence in the sphere world. When dealing with complex and realistic scenarios, constructing a transformation to the sphere world is essential and, at the same time, challenging. This work proposes a novel transformation termed the conformal navigation transformation to achieve collis… ▽ More Navigation functions provide both path and motion planning, which can be used to ensure obstacle avoidance and convergence in the sphere world. When dealing with complex and realistic scenarios, constructing a transformation to the sphere world is essential and, at the same time, challenging. This work proposes a novel transformation termed the conformal navigation transformation to achieve collision-free navigation of a robot in a workspace populated with obstacles of arbitrary shapes. The properties of the conformal navigation transformation, including uniqueness, invariance of navigation properties, and no angular deformation, are investigated, which contribute to the solution of the robot navigation problem in complex environments. Based on navigation functions and the proposed transformation, feedback controllers are derived for the automatic guidance and motion control of kinematic and dynamic mobile robots. Moreover, an iterative method is proposed to construct the conformal navigation transformation in a multiply-connected workspace, which transforms the multiply-connected problem into multiple simply-connected problems to achieve fast convergence. In addition to the analytic guarantees, simulation studies verify the effectiveness of the proposed methodology in workspaces with non-trivial obstacles. △ Less

Submitted 2 October, 2022; v1 submitted 14 August, 2022; originally announced August 2022.

arXiv:2206.03393 [pdf, other]

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

Authors: Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Feng Wang, Jiashui Wang

Abstract: Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns. In this work, we systematically investigate transformation and adversarial training based defenses for securing SRSs. According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversari… ▽ More Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns. In this work, we systematically investigate transformation and adversarial training based defenses for securing SRSs. According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition. With careful regard for best practices in defense evaluations, we analyze the strength of transformations to withstand adaptive attacks. We also evaluate and understand their effectiveness against adaptive attacks when combined with adversarial training. Our study provides lots of useful insights and findings, many of them are new or inconsistent with the conclusions in the image and speech recognition domains, e.g., variable and constant bit rate speech compressions have different performance, and some non-differentiable transformations remain effective against current promising evasion techniques which often work well in the image domain. We demonstrate that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarial training in a complete white-box setting, e.g., increasing the accuracy by 13.62% and attack cost by two orders of magnitude, while other transformations do not necessarily improve the overall defense capability. This work sheds further light on the research directions in this field. We also release our evaluation platform SPEAKERGUARD to foster further research. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.03351 [pdf, other]

AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems

Authors: Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu

Abstract: Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks, raising significant security concerns in deploying SRSs. However, they considered only a few settings (e.g., some combinations of source and target speakers), leaving many interesting and important settings in real-world attack scenarios alone. In this work, we present AS2T, the first a… ▽ More Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks, raising significant security concerns in deploying SRSs. However, they considered only a few settings (e.g., some combinations of source and target speakers), leaving many interesting and important settings in real-world attack scenarios alone. In this work, we present AS2T, the first attack in this domain which covers all the settings, thus allows the adversary to craft adversarial voices using arbitrary source and target speakers for any of three main recognition tasks. Since none of the existing loss functions can be applied to all the settings, we explore many candidate loss functions for each setting including the existing and newly designed ones. We thoroughly evaluate their efficacy and find that some existing loss functions are suboptimal. Then, to improve the robustness of AS2T towards practical over-the-air attack, we study the possible distortions occurred in over-the-air transmission, utilize different transformation functions with different parameters to model those distortions, and incorporate them into the generation of adversarial voices. Our simulated over-the-air evaluation validates the effectiveness of our solution in producing robust adversarial voices which remain effective under various hardware devices and various acoustic environments with different reverberation, ambient noises, and noise levels. Finally, we leverage AS2T to perform thus far the largest-scale evaluation to understand transferability among 14 diverse SRSs. The transferability analysis provides many interesting and useful insights which challenge several findings and conclusion drawn in previous works in the image domain. Our study also sheds light on future directions of adversarial attacks in the speaker recognition domain. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2203.09954 [pdf, other]

Learning to Optimize Resource Assignment for Task Offloading in Mobile Edge Computing

Authors: Yurong Qian, Jindan Xu, Shuhan Zhu, Wei Xu, Lisheng Fan, George K. Karagiannidis

Abstract: In this paper, we consider a multiuser mobile edge computing (MEC) system, where a mixed-integer offloading strategy is used to assist the resource assignment for task offloading. Although the conventional branch and bound (BnB) approach can be applied to solve this problem, a huge burden of computational complexity arises which limits the application of BnB. To address this issue, we propose an i… ▽ More In this paper, we consider a multiuser mobile edge computing (MEC) system, where a mixed-integer offloading strategy is used to assist the resource assignment for task offloading. Although the conventional branch and bound (BnB) approach can be applied to solve this problem, a huge burden of computational complexity arises which limits the application of BnB. To address this issue, we propose an intelligent BnB (IBnB) approach which applies deep learning (DL) to learn the pruning strategy of the BnB approach. By using this learning scheme, the structure of the BnB approach ensures near-optimal performance and meanwhile DL-based pruning strategy significantly reduces the complexity. Numerical results verify that the proposed IBnB approach achieves optimal performance with complexity reduced by over 80%. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2111.03753 [pdf, other]

CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms

Authors: Yingying Zhang, Zhengxiong Guan, Huajie Qian, Leili Xu, Hengbo Liu, Qingsong Wen, Liang Sun, Junwei Jiang, Lunting Fan, Min Ke

Abstract: As business of Alibaba expands across the world among various industries, higher standards are imposed on the service quality and reliability of big data cloud computing platforms which constitute the infrastructure of Alibaba Cloud. However, root cause analysis in these platforms is non-trivial due to the complicated system architecture. In this paper, we propose a root cause analysis framework c… ▽ More As business of Alibaba expands across the world among various industries, higher standards are imposed on the service quality and reliability of big data cloud computing platforms which constitute the infrastructure of Alibaba Cloud. However, root cause analysis in these platforms is non-trivial due to the complicated system architecture. In this paper, we propose a root cause analysis framework called CloudRCA which makes use of heterogeneous multi-source data including Key Performance Indicators (KPIs), logs, as well as topology, and extracts important features via state-of-the-art anomaly detection and log analysis techniques. The engineered features are then utilized in a Knowledge-informed Hierarchical Bayesian Network (KHBN) model to infer root causes with high accuracy and efficiency. Ablation study and comprehensive experimental comparisons demonstrate that, compared to existing frameworks, CloudRCA 1) consistently outperforms existing approaches in f1-score across different cloud systems; 2) can handle novel types of root causes thanks to the hierarchical structure of KHBN; 3) performs more robustly with respect to algorithmic configurations; and 4) scales more favorably in the data and feature sizes. Experiments also show that a cross-platform transfer learning mechanism can be adopted to further improve the accuracy by more than 10\%. CloudRCA has been integrated into the diagnosis system of Alibaba Cloud and employed in three typical cloud computing platforms including MaxCompute, Realtime Compute and Hologres. It saves Site Reliability Engineers (SREs) more than $20\%$ in the time spent on resolving failures in the past twelve months and improves service reliability significantly. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: Accepted by CIKM 2021; 10 pages, 3 figures, 12 tables

Journal ref: 30th ACM International Conference on Information and Knowledge Management (CIKM 2021)

arXiv:2111.02506 [pdf, other]

Real-Time Simulation of Level 1, Level 2, and Level 3 Electric Vehicle Charging Systems

Authors: Li Bao, Lingling Fan, Zhixin Miao

Abstract: A charging system is required to convert ac electricity from the grid to dc electricity to charge an electric vehicle (EV) battery. According to the Society of Automatic Engineers (SAE) standard, EV chargers can be divided into three levels based on power rating: Level 1, Level 2, and Level 3. This paper investigates the circuit topologies and control principles of EV charging systems at each leve… ▽ More A charging system is required to convert ac electricity from the grid to dc electricity to charge an electric vehicle (EV) battery. According to the Society of Automatic Engineers (SAE) standard, EV chargers can be divided into three levels based on power rating: Level 1, Level 2, and Level 3. This paper investigates the circuit topologies and control principles of EV charging systems at each level. Three high-fidelity testbeds of EV charging systems for a 10 kWh battery are designed and implemented in real-time digital simulator RT-Lab. The testbeds include modeling details such as switching of semiconductors. Twenty-five minutes real-time simulation is conducted for each testbed. Detailed dynamic performance of the circuits and the controls at every stage are presented to demonstrate the charging process. All three level EV charging systems employ high-frequency transformer embedded dual active bridge (DAB) dc/dc converter to regulate battery side dc voltage and current. Hence, average model-based linear system analysis is given to configure the parameters of the phase shift control adopted by the DAB dc/dc converter. In addition, power factor control (PFC) that is employed for Level 1 and Level 2 single-phase ac charging systems, three-phase voltage source converter control that is employed for Level 3 three-phase ac charging systems, are all analyzed. The three testbeds, with their detailed circuit parameters and control parameters presented, can be used as reference testbeds for EV grid integration research. △ Less

Submitted 3 November, 2021; originally announced November 2021.

arXiv:2110.04187 [pdf, other]

SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition

Authors: Li Fu, Xiaoxiao Li, Runyu Wang, Lu Fan, Zhengchen Zhang, Meng Chen, Youzheng Wu, Xiaodong He

Abstract: End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision. This could result in recognition errors due to similar-phoneme confusion or phoneme reduction. To alleviate this problem, we propose a novel framework based on Supervised Contrastive Learning (SCaLa) to enhance phonem… ▽ More End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision. This could result in recognition errors due to similar-phoneme confusion or phoneme reduction. To alleviate this problem, we propose a novel framework based on Supervised Contrastive Learning (SCaLa) to enhance phonemic representation learning for end-to-end ASR systems. Specifically, we extend the self-supervised Masked Contrastive Predictive Coding (MCPC) to a fully-supervised setting, where the supervision is applied in the following way. First, SCaLa masks variable-length encoder features according to phoneme boundaries given phoneme forced-alignment extracted from a pre-trained acoustic model; it then predicts the masked features via contrastive learning. The forced-alignment can provide phoneme labels to mitigate the noise introduced by positive-negative pairs in self-supervised MCPC. Experiments on reading and spontaneous speech datasets show that our proposed approach achieves 2.8 and 1.4 points Character Error Rate (CER) absolute reductions compared to the baseline, respectively. △ Less

Submitted 20 June, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: INTERSPEECH 2022

arXiv:2110.01161 [pdf, other]

Enhance Images as You Like with Unpaired Learning

Authors: Xiaopeng Sun, Muxingzi Li, Tianyu He, Lubin Fan

Abstract: Low-light image enhancement exhibits an ill-posed nature, as a given image may have many enhanced versions, yet recent studies focus on building a deterministic mapping from input to an enhanced version. In contrast, we propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space, given only sets of low-… ▽ More Low-light image enhancement exhibits an ill-posed nature, as a given image may have many enhanced versions, yet recent studies focus on building a deterministic mapping from input to an enhanced version. In contrast, we propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space, given only sets of low- and normal-light training images without any correspondence. By formulating this ill-posed problem as a modulation code learning task, our network learns to generate a collection of enhanced images from a given input conditioned on various reference images. Therefore our inference model easily adapts to various user preferences, provided with a few favorable photos from each user. Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets, while being 6 to 10 times lighter than state-of-the-art generative adversarial networks (GANs) approaches. △ Less

Submitted 3 October, 2021; originally announced October 2021.

Comments: 7 pages; IJCAI 2021

arXiv:2109.01766 [pdf, other]

SEC4SR: A Security Analysis Platform for Speaker Recognition

Authors: Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu

Abstract: Adversarial attacks have been expanded to speaker recognition (SR). However, existing attacks are often assessed using different SR models, recognition tasks and datasets, and only few adversarial defenses borrowed from computer vision are considered. Yet,these defenses have not been thoroughly evaluated against adaptive attacks. Thus, there is still a lack of quantitative understanding about the… ▽ More Adversarial attacks have been expanded to speaker recognition (SR). However, existing attacks are often assessed using different SR models, recognition tasks and datasets, and only few adversarial defenses borrowed from computer vision are considered. Yet,these defenses have not been thoroughly evaluated against adaptive attacks. Thus, there is still a lack of quantitative understanding about the strengths and limitations of adversarial attacks and defenses. More effective defenses are also required for securing SR systems. To bridge this gap, we present SEC4SR, the first platform enabling researchers to systematically and comprehensively evaluate adversarial attacks and defenses in SR. SEC4SR incorporates 4 white-box and 2 black-box attacks, 24 defenses including our novel feature-level transformations. It also contains techniques for mounting adaptive attacks. Using SEC4SR, we conduct thus far the largest-scale empirical study on adversarial attacks and defenses in SR, involving 23 defenses, 15 attacks and 4 attack settings. Our study provides lots of useful findings that may advance future research: such as (1) all the transformations slightly degrade accuracy on benign examples and their effectiveness vary with attacks; (2) most transformations become less effective under adaptive attacks, but some transformations become more effective; (3) few transformations combined with adversarial training yield stronger defenses over some but not all attacks, while our feature-level transformation combined with adversarial training yields the strongest defense over all the attacks. Extensive experiments demonstrate capabilities and advantages of SEC4SR which can benefit future research in SR. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2108.00303 [pdf]

doi 10.1109/TSG.2022.3148978

Practical Adoption of Cloud Computing in Power Systems- Drivers, Challenges, Guidance, and Real-world Use Cases

Authors: Song Zhang, Amritanshu Pandey, Xiaochuan Luo, Maggy Powell, Ranjan Banerji, Lei Fan, Abhineet Parchure, Edgardo Luzcando

Abstract: Motivated by The Federal Energy Regulatory Commission's (FERC) recent direction and ever-growing interest in cloud adoption by power utilities, a Task Force was established to assist power system practitioners with secure, reliable and cost-effective adoption of cloud technology to meet various business needs. This paper summarizes the business drivers, challenges, guidance, and best practices for… ▽ More Motivated by The Federal Energy Regulatory Commission's (FERC) recent direction and ever-growing interest in cloud adoption by power utilities, a Task Force was established to assist power system practitioners with secure, reliable and cost-effective adoption of cloud technology to meet various business needs. This paper summarizes the business drivers, challenges, guidance, and best practices for cloud adoption in power systems from the Task Force's perspective, after extensive review and deliberation by its members, including grid operators, utility companies, software vendors, and cloud providers. The paper begins by enumerating various business drivers for cloud adoption in the power industry. It follows with the discussion of the challenges and risks of migrating power grid utility workloads to the cloud. Next, for each corresponding challenge or risk, the paper provides appropriate guidance. Notably, the guidance is directed toward power industry professionals who are considering cloud solutions and are yet hesitant about the practical execution. Finally, to tie all the sections together, the paper documents various real-world use cases of cloud technology in the power system domain, which both the power industry practitioners and software vendors can look toward to design and select their own future cloud solutions. We hope that the information in this paper will serve as helpful guidance for the development of NERC guidelines and standards relevant to cloud adoption in the industry. △ Less

Submitted 2 February, 2022; v1 submitted 31 July, 2021; originally announced August 2021.

arXiv:2107.11222 [pdf]

Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

Authors: Quandong Wang, Junnan Wu, Zhao Yan, Sichong Qian, Liyong Guo, Lichun Fan, Weiji Zhuang, Peng Gao, Yujun Wang

Abstract: We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are compute… ▽ More We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are computed and then integrated with the first 2-D convolutional layer, while in the frequency domain, the log-power spectra (LPS) features from both original channels and super-directive beamforming outputs are combined with a second 2-D convolutional layer. To fully integrate the rich information of multi-channel speech, i.e. time-frequency domain features and the array geometry, we apply a third 2-D convolutional layer in the second fusion stage to obtain the final convolutional features. Furthermore, we propose to use a fixed clean acoustic model trained with the end-to-end lattice-free maximum mutual information criterion to enforce the enhanced output to have the same distribution as the clean waveform to alleviate the over-estimation problem of the enhancement task and constrain distortion. On the Task1 development dataset of ConferencingSpeech 2021 challenge, a PESQ improvement of 0.24 and 0.19 is attained compared to the official baseline and a recently proposed multi-channel separation method. △ Less

Submitted 24 September, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: 7 pages, 3 figures, accepted to APSIPA 2021, revised

arXiv:2106.10707 [pdf, other]

Minimizing Delay in Network Function Visualization with Quantum Computing

Authors: Wenlu Xuan, Zhongqi Zhao, Lei Fan, Zhu Han

Abstract: Network function virtualization (NFV) is a crucial technology for the 5G network development because it can improve the flexibility of employing hardware and reduce the construction of base stations. There are vast service chains in NFV to meet users' requests, which are composed of a sequence of network functions. These virtual network functions (VNFs) are implemented in virtual machines by softw… ▽ More Network function virtualization (NFV) is a crucial technology for the 5G network development because it can improve the flexibility of employing hardware and reduce the construction of base stations. There are vast service chains in NFV to meet users' requests, which are composed of a sequence of network functions. These virtual network functions (VNFs) are implemented in virtual machines by software and virtual environment. How to deploy VMs to process VNFs of the service chains as soon as possible when users' requests are received is very challenging to solve by traditional algorithms on a large scale. Compared with traditional algorithms, quantum computing has better computational performance because of quantum parallelism. We build an integer linear programming model of the VNF scheduling problem with the objective of minimizing delays and transfer it into the quadratic unconstrained binary optimization (QUBO) model. Our proposed heuristic algorithm employs a quantum annealer to solve the model. Finally, we evaluate the computational results and explore the feasibility of leveraging quantum computing to solve the VNFs scheduling problem. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: Invited Paper by IEEE MASS 2021

arXiv:2106.04043 [pdf, other]

Dilated Convolution based CSI Feedback Compression for Massive MIMO Systems

Authors: Shunpu Tang, Junjuan Xia, Lisheng Fan, Xianfu Lei, Wei Xu, Arumugam Nallanathan

Abstract: Although the frequency-division duplex (FDD) massive multiple-input multiple-output (MIMO) system can offer high spectral and energy efficiency, it requires to feedback the downlink channel state information (CSI) from users to the base station (BS), in order to fulfill the precoding design at the BS. However, the large dimension of CSI matrices in the massive MIMO system makes the CSI feedback ve… ▽ More Although the frequency-division duplex (FDD) massive multiple-input multiple-output (MIMO) system can offer high spectral and energy efficiency, it requires to feedback the downlink channel state information (CSI) from users to the base station (BS), in order to fulfill the precoding design at the BS. However, the large dimension of CSI matrices in the massive MIMO system makes the CSI feedback very challenging, and it is urgent to compress the feedback CSI. To this end, this paper proposes a novel dilated convolution based CSI feedback network, namely DCRNet. Specifically, the dilated convolutions are used to enhance the receptive field (RF) of the proposed DCRNet without increasing the convolution size. Moreover, advanced encoder and decoder blocks are designed to improve the reconstruction performance and reduce computational complexity as well. Numerical results are presented to show the superiority of the proposed DCRNet over the conventional networks. In particular, the proposed DCRNet can achieve almost the state-of-the-arts (SOTA) performance with much lower floating point operations (FLOPs). The open source code and checkpoint of this work are available at https://github.com/recusant7/DCRNet. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2008.07112 [pdf, other]

AnciNet: An Efficient Deep Learning Approach for Feedback Compression of Estimated CSI in Massive MIMO Systems

Authors: Yuyao Sun, Wei Xu, Lisheng Fan, Geoffrey Ye Li, George K. Karagiannidis

Abstract: Accurate channel state information (CSI) feedback plays a vital role in improving the performance gain of massive multiple-input multiple-output (m-MIMO) systems, where the dilemma is excessive CSI overhead versus limited feedback bandwith. By considering the noisy CSI due to imperfect channel estimation, we propose a novel deep neural network architecture, namely AnciNet, to conduct the CSI feedb… ▽ More Accurate channel state information (CSI) feedback plays a vital role in improving the performance gain of massive multiple-input multiple-output (m-MIMO) systems, where the dilemma is excessive CSI overhead versus limited feedback bandwith. By considering the noisy CSI due to imperfect channel estimation, we propose a novel deep neural network architecture, namely AnciNet, to conduct the CSI feedback with limited bandwidth. AnciNet extracts noise-free features from the noisy CSI samples to achieve effective CSI compression for the feedback. Experimental results verify that the proposed AnciNet approach outperforms the existing techniques under various conditions. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2008.00250 [pdf, ps, other]

Deep Reinforcement Learning Based Mobile Edge Computing for Intelligent Internet of Things

Authors: Rui Zhao, Xinjie Wang, Junjuan Xia, Liseng Fan

Abstract: In this paper, we investigate mobile edge computing (MEC) networks for intelligent internet of things (IoT), where multiple users have some computational tasks assisted by multiple computational access points (CAPs). By offloading some tasks to the CAPs, the system performance can be improved through reducing the latency and energy consumption, which are the two important metrics of interest in th… ▽ More In this paper, we investigate mobile edge computing (MEC) networks for intelligent internet of things (IoT), where multiple users have some computational tasks assisted by multiple computational access points (CAPs). By offloading some tasks to the CAPs, the system performance can be improved through reducing the latency and energy consumption, which are the two important metrics of interest in the MEC networks. We devise the system by proposing the offloading strategy intelligently through the deep reinforcement learning algorithm. In this algorithm, Deep Q-Network is used to automatically learn the offloading decision in order to optimize the system performance, and a neural network (NN) is trained to predict the offloading action, where the training data is generated from the environmental system. Moreover, we employ the bandwidth allocation in order to optimize the wireless spectrum for the links between the users and CAPs, where several bandwidth allocation schemes are proposed. In further, we use the CAP selection in order to choose one best CAP to assist the computational tasks from the users. Simulation results are finally presented to show the effectiveness of the proposed reinforcement learning offloading strategy. In particular, the system cost of latency and energy consumption can be reduced significantly by the proposed deep reinforcement learning based algorithm. △ Less

Submitted 1 August, 2020; originally announced August 2020.

arXiv:2006.03890 [pdf, ps, other]

Flexibility Management in Economic Dispatch with Dynamic Automatic Generation Control

Authors: Lei Fan, Chaoyue Zhao, Guangyuan Zhang, Qiuhua Huang

Abstract: As the installation of electronically interconnected renewable energy resources grows rapidly in power systems, system frequency maintenance and control become challenging problems to maintain the system reliability in bulk power systems. As two of the most important frequency control actions in the control centers of independent system operators (ISOs) and utilities, the interaction between Econo… ▽ More As the installation of electronically interconnected renewable energy resources grows rapidly in power systems, system frequency maintenance and control become challenging problems to maintain the system reliability in bulk power systems. As two of the most important frequency control actions in the control centers of independent system operators (ISOs) and utilities, the interaction between Economic Dispatch (ED) and Automatic Generation Control (AGC) attracts more and more attention. In this paper, we propose a robust optimization based framework to measure the system flexibility by considering the interaction between two hierarchical processes (i.e., ED and AGC). We propose a cutting plane algorithm with the reformulation technique to obtain seven different indices of the system. In addition, we study the impacts of several system factors (i.e., the budget of operational cost, ramping capability, and transmission line capacity) and show numerically how these factors can influence the system flexibility. △ Less

Submitted 6 June, 2020; originally announced June 2020.

Comments: 8 pages, submitted

arXiv:2004.06912 [pdf, other]

Combining Visible Light and Infrared Imaging for Efficient Detection of Respiratory Infections such as COVID-19 on Portable Device

Authors: Zheng Jiang, Menghan Hu, Lei Fan, Yaling Pan, Wei Tang, Guangtao Zhai, Yong Lu

Abstract: Coronavirus Disease 2019 (COVID-19) has become a serious global epidemic in the past few months and caused huge loss to human society worldwide. For such a large-scale epidemic, early detection and isolation of potential virus carriers is essential to curb the spread of the epidemic. Recent studies have shown that one important feature of COVID-19 is the abnormal respiratory status caused by viral… ▽ More Coronavirus Disease 2019 (COVID-19) has become a serious global epidemic in the past few months and caused huge loss to human society worldwide. For such a large-scale epidemic, early detection and isolation of potential virus carriers is essential to curb the spread of the epidemic. Recent studies have shown that one important feature of COVID-19 is the abnormal respiratory status caused by viral infections. During the epidemic, many people tend to wear masks to reduce the risk of getting sick. Therefore, in this paper, we propose a portable non-contact method to screen the health condition of people wearing masks through analysis of the respiratory characteristics. The device mainly consists of a FLIR one thermal camera and an Android phone. This may help identify those potential patients of COVID-19 under practical scenarios such as pre-inspection in schools and hospitals. In this work, we perform the health screening through the combination of the RGB and thermal videos obtained from the dual-mode camera and deep learning architecture.We first accomplish a respiratory data capture technique for people wearing masks by using face recognition. Then, a bidirectional GRU neural network with attention mechanism is applied to the respiratory data to obtain the health screening result. The results of validation experiments show that our model can identify the health status on respiratory with the accuracy of 83.7\% on the real-world dataset. The abnormal respiratory data and part of normal respiratory data are collected from Ruijin Hospital Affiliated to The Shanghai Jiao Tong University Medical School. Other normal respiratory data are obtained from healthy people around our researchers. This work demonstrates that the proposed portable and intelligent health screening device can be used as a pre-scan method for respiratory infections, which may help fight the current COVID-19 epidemic. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:2003.03860 [pdf, other]

A Modular Small-Signal Analysis Framework for Inverter Penetrated Power Grids: Measurement, Assembling, Aggregation, and Stability Assessment

Authors: Lingling Fan, Zhixin Miao

Abstract: Unprecedented dynamic phenomena may appear in power grids due to higher and higher penetration of inverter-based resources (IBR), e.g., wind and solar photovoltaic (PV). A major challenge in dynamic modeling and analysis is that unlike synchronous generators, whose analytical models are well studied and known to system planners, inverter models are proprietary information with black box models pro… ▽ More Unprecedented dynamic phenomena may appear in power grids due to higher and higher penetration of inverter-based resources (IBR), e.g., wind and solar photovoltaic (PV). A major challenge in dynamic modeling and analysis is that unlike synchronous generators, whose analytical models are well studied and known to system planners, inverter models are proprietary information with black box models provided to utilities. Thus, measurement based characterization of IBR is a popular approach to find frequency-domain response of an IBR. The resulting admittances are essentially small-signal current/voltage relationship in frequency domain. Integrating admittances for grid dynamic modeling and analysis requires a new framework, namely modular small-signal analysis framework. In this visionary paper, we examine the current state-of-the-art of dynamic modeling and analysis of power grids with IBR, including inverter admittance characterization, the procedure of component assembling and aggregation, and stability assessment. We push forward a computing efficient modular modeling and analysis framework via four visions: (i) efficient and accurate admittance model characterization via model building and time-domain responses, (ii) accurate assembling of components, (iii) efficient aggregation, and (iv) stability assessment relying on network admittance matrices. Challenges of admittance-based modular analysis are demonstrated using examples and techniques to tackle those challenges are pointed out in this visionary paper. △ Less

Submitted 8 March, 2020; originally announced March 2020.

arXiv:1912.11221 [pdf, ps, other]

FDD Massive MIMO Uplink and Downlink Channel Reciprocity Properties: Full or Partial Reciprocity?

Authors: Zhimeng Zhong, Li Fan, Shibin Ge

Abstract: One challenge for FDD massive MIMO communication system is how to obtain the downlink channel state information (CSI) at the base station. Except for traditional codebook feedback through uplink pilot transmission, some channel reciprocity properties can be utilized through uplink channel estimation and channel parameter estimation algorithms. In this paper, the uplink and downlink channel recipro… ▽ More One challenge for FDD massive MIMO communication system is how to obtain the downlink channel state information (CSI) at the base station. Except for traditional codebook feedback through uplink pilot transmission, some channel reciprocity properties can be utilized through uplink channel estimation and channel parameter estimation algorithms. In this paper, the uplink and downlink channel reciprocity properties are analyzed. It is theoretically proved that not all multipath parameters for FDD downlink and uplink channels are equivalent. Therefore, the so called full reciprocity property does not hold while the partial reciprocity property holds. Moreover, the channel measurement campaign is conducted to verify our theoretical analysis. Finally, in order to support the partial reciprocity property, the revision for the standardization 5G channel model is proposed as well. With the contribution of this paper, the FDD massive MIMO system transmission scheme design could be led to the right direction. △ Less

Submitted 30 December, 2019; v1 submitted 24 December, 2019; originally announced December 2019.

arXiv:1911.01840 [pdf, other]

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Authors: Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu

Abstract: Speaker recognition (SR) is widely used in our daily life as a biometric authentication or identification mechanism. The popularity of SR brings in serious security concerns, as demonstrated by recent adversarial attacks. However, the impacts of such threats in the practical black-box setting are still open, since current attacks consider the white-box setting only. In this paper, we conduct the f… ▽ More Speaker recognition (SR) is widely used in our daily life as a biometric authentication or identification mechanism. The popularity of SR brings in serious security concerns, as demonstrated by recent adversarial attacks. However, the impacts of such threats in the practical black-box setting are still open, since current attacks consider the white-box setting only. In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical blackbox setting. For this purpose, we propose an adversarial attack, named FAKEBOB, to craft adversarial samples. Specifically, we formulate the adversarial sample generation as an optimization problem, incorporated with the confidence of adversarial samples and maximal distortion to balance between the strength and imperceptibility of adversarial voices. One key contribution is to propose a novel algorithm to estimate the score threshold, a feature in SRSs, and use it in the optimization problem to solve the optimization problem. We demonstrate that FAKEBOB achieves 99% targeted attack success rate on both open-source and commercial systems. We further demonstrate that FAKEBOB is also effective on both open-source and commercial systems when playing over the air in the physical world. Moreover, we have conducted a human study which reveals that it is hard for human to differentiate the speakers of the original and adversarial voices. Last but not least, we show that four promising defense methods for adversarial attack from the speech recognition domain become ineffective on SRSs against FAKEBOB, which calls for more effective defense methods. We highlight that our study peeks into the security implications of adversarial attacks on SRSs, and realistically fosters to improve the security robustness of SRSs. △ Less

Submitted 23 April, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

Comments: IEEE Symposium on Security and Privacy 2021

arXiv:1910.11496 [pdf, other]

L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition

Authors: Yuanfeng Song, Di Jiang, Xuefang Zhao, Qian Xu, Raymond Chi-Wing Wong, Lixin Fan, Qiang Yang

Abstract: Modern Automatic Speech Recognition (ASR) systems primarily rely on scores from an Acoustic Model (AM) and a Language Model (LM) to rescore the N-best lists. With the abundance of recent natural language processing advances, the information utilized by current ASR for evaluating the linguistic and semantic legitimacy of the N-best hypotheses is rather limited. In this paper, we propose a novel Lea… ▽ More Modern Automatic Speech Recognition (ASR) systems primarily rely on scores from an Acoustic Model (AM) and a Language Model (LM) to rescore the N-best lists. With the abundance of recent natural language processing advances, the information utilized by current ASR for evaluating the linguistic and semantic legitimacy of the N-best hypotheses is rather limited. In this paper, we propose a novel Learning-to-Rescore (L2RS) mechanism, which is specialized for utilizing a wide range of textual information from the state-of-the-art NLP models and automatically deciding their weights to rescore the N-best lists for ASR systems. Specifically, we incorporate features including BERT sentence embedding, topic vector, and perplexity scores produced by n-gram LM, topic modeling LM, BERT LM and RNNLM to train a rescoring model. We conduct extensive experiments based on a public dataset, and experimental results show that L2RS outperforms not only traditional rescoring methods but also its deep neural network counterparts by a substantial improvement of 20.67% in terms of NDCG@10. L2RS paves the way for developing more effective rescoring models for ASR. △ Less

Submitted 24 October, 2019; originally announced October 2019.

Comments: 5 pages, 3 figures

arXiv:1909.09300 [pdf, other]

Making the Invisible Visible: Action Recognition Through Walls and Occlusions

Authors: Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, Dina Katabi

Abstract: Understanding people's actions and interactions typically depends on seeing them. Automating the process of action recognition from visual data has been the topic of much research in the computer vision community. But what if it is too dark, or if the person is occluded or behind a wall? In this paper, we introduce a neural network model that can detect human actions through walls and occlusions,… ▽ More Understanding people's actions and interactions typically depends on seeing them. Automating the process of action recognition from visual data has been the topic of much research in the computer vision community. But what if it is too dark, or if the person is occluded or behind a wall? In this paper, we introduce a neural network model that can detect human actions through walls and occlusions, and in poor lighting conditions. Our model takes radio frequency (RF) signals as input, generates 3D human skeletons as an intermediate representation, and recognizes actions and interactions of multiple people over time. By translating the input to an intermediate skeleton-based representation, our model can learn from both vision-based and RF-based datasets, and allow the two tasks to help each other. We show that our model achieves comparable accuracy to vision-based action recognition systems in visible scenarios, yet continues to work accurately when people are not visible, hence addressing scenarios that are beyond the limit of today's vision-based action recognition. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: ICCV 2019. The first two authors contributed equally to this paper

arXiv:1504.03524 [pdf, other]

Achieving Economic Operation and Secondary Frequency Regulation Simultaneously Through Feedback Control

Authors: Zhixin Miao, Lingling Fan

Abstract: This article presents an exciting finding for the power industry: the parameters of secondary frequency control based on integral or proportional integral control can be tuned to achieve economic operation and frequency regulation simultaneously. We show that if the power imbalance is represented by frequency deviation, an iterative dual decomposition based economic dispatch solving is equivalent… ▽ More This article presents an exciting finding for the power industry: the parameters of secondary frequency control based on integral or proportional integral control can be tuned to achieve economic operation and frequency regulation simultaneously. We show that if the power imbalance is represented by frequency deviation, an iterative dual decomposition based economic dispatch solving is equivalent to integral control. An iterative method of multipliers based economic dispatch is equivalent to proportional integral control. Similarly, if the controller parameters of the secondary frequency controls are chosen based on generator cost functions, these secondary frequency controllers achieve both economic operation and frequency regulation simultaneously. △ Less

Submitted 14 April, 2015; originally announced April 2015.

Comments: submitted to IEEE PES letters

arXiv:1503.09087 [pdf, other]

Dual Decomposition-Based Privacy-Preserving Multi-Horizon Utility-Community Decision Making Paradigms

Authors: Vahid. R Disfani, Zhixin Miao, Lingling Fan, Bo Zeng

Abstract: Two types of privacy-preserving decision making paradigms for utility-community interactions for multi-horizon operation are examined in this paper. In both designs, communities with renewable energy sources, distributed generators, and energy storage systems minimize their costs with limited information exchange with the utility. The utility makes decision based on the information provided from t… ▽ More Two types of privacy-preserving decision making paradigms for utility-community interactions for multi-horizon operation are examined in this paper. In both designs, communities with renewable energy sources, distributed generators, and energy storage systems minimize their costs with limited information exchange with the utility. The utility makes decision based on the information provided from the communities. Through an iterative process, all parties achieve agreement. The authors' previous research results on subgradient and lower-upper-bound switching (LUBS)-based distributed optimization oriented multi-agent control strategies are examined and the convergence analysis of both strategies are provided. The corresponding decision making architectures, including information flow among agents and learning (or iteration) procedure, are developed for multi-horizon decision making scenarios. Numerical results illustrate the decision making procedures and demonstrate their feasibility of practical implementation. The two decision making architectures are compared for their implementation requirements as well as performance. △ Less

Submitted 31 March, 2015; originally announced March 2015.

Comments: 8 pages, 15 figures, submitted to IEEE trans. Power Systems

arXiv:1503.05224 [pdf, other]

Least Squares Estimation-Based Synchronous Generator Parameter Estimation Using PMU Data

Authors: Bander Mogharbel, Lingling Fan, Zhixin Miao

Abstract: In this paper, least square estimation (LSE)-based dynamic generator model parameter identification is investigated. Electromechanical dynamics related parameters such as inertia constant and primary frequency control droop for a synchronous generator are estimated using Phasor Measurement Unit (PMU) data obtained at the generator terminal bus. The key idea of applying LSE for dynamic parameter es… ▽ More In this paper, least square estimation (LSE)-based dynamic generator model parameter identification is investigated. Electromechanical dynamics related parameters such as inertia constant and primary frequency control droop for a synchronous generator are estimated using Phasor Measurement Unit (PMU) data obtained at the generator terminal bus. The key idea of applying LSE for dynamic parameter estimation is to have a discrete \underline{a}uto\underline{r}egression with e\underline{x}ogenous input (ARX) model. With an ARX model, a linear estimation problem can be formulated and the parameters of the ARX model can be found. This paper gives the detailed derivation of converting a generator model with primary frequency control into an ARX model. The generator parameters will be recovered from the estimated ARX model parameters afterwards. Two types of conversion methods are presented: zero-order hold (ZOH) method and Tustin method. Numerical results are presented to illustrate the proposed LSE application in dynamic system parameter identification using PMU data. △ Less

Submitted 17 March, 2015; originally announced March 2015.

Comments: 5 pages, 6 figures, accepted by IEEE PESGM 2015

Showing 1–39 of 39 results for author: Fan, L