Electrical Engineering and Systems Science
See recent articles
Showing new listings for Friday, 8 November 2024
- [1] arXiv:2411.04128 [pdf, other]
-
Title: On the analysis of saturated pressure to detect fatigueComments: 12 pages. arXiv admin note: substantial text overlap with arXiv:2203.14782Journal-ref: In: Parziale, A., Diaz, M., Melo, F. (eds) Graphonomics in Human Body Movement. IGS 2023. Lecture Notes in Computer Science, vol 14285. Springer, ChamSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
This paper examines the saturation of pressure signals during various handwriting tasks, including drawings, cursive text, capital words text, and signature, under different levels of fatigue. Experimental results demonstrate a significant rise in the proportion of saturated samples following strenuous exercise in tasks performed without resting wrist. The analysis of saturation highlights significant differences when comparing the results to the baseline situation and strenuous fatigue.
- [2] arXiv:2411.04131 [pdf, other]
-
Title: Data Processing Chain and Products of EOS-06 OCM-3 Payload From Signal Processing to Geometric PrecisionAnkur Garg, Tushar Shukla, Sunita Arya, Ghansham Sangar, Sampa Roy, Meenakshi Sarkar, S. Manthira Moorthi, Debajyoti DharComments: PreprintSubjects: Signal Processing (eess.SP)
The Ocean Color Monitor-3, launched aboard Oceansat-3, represents a significant advancement in ocean observation technology, building upon the capabilities of its predecessors. With thirteen spectral bands, OCM-3 enhances feature identification and atmospheric correction, enabling precise data collection from a sun-synchronous orbit. With thirteen spectral bands, OCM-3 enhances feature identification and atmospheric correction, enabling precise data collection from a sunsynchronous orbit. Operating at an altitude of 732.5 km, the satellite achieves high signal-to-noise ratios (SNR) through sophisticated onboard and ground processing techniques, including advanced geometric modeling for pixel this http URL OCM-3 processing pipeline, consisting of multiple levels, ensures rigorous calibration and correction of radiometric and geometric data. This paper presents key methodologies such as dark data modeling, photo response non-uniformity correction, and smear correction, are employed to enhance data quality. The effective implementation of ground time delay integration (TDI) allows for the refinement of SNR, with evaluations demonstrating that performance specifications were exceeded. Geometric calibration procedures, including band-to-band registration and geolocation accuracy assessments, which further optimize data reliability are presented in the paper. Advanced image registration techniques leveraging Ground Control Points (GCPs) and residual error analysis significantly reduce geolocation errors, achieving precision within specified thresholds. Overall, OCM-3 comprehensive calibration and processing strategies ensure high-quality, reliable data crucial for ocean monitoring and change detection applications, facilitating improved understanding of ocean dynamics and environmental changes.
- [3] arXiv:2411.04142 [pdf, html, other]
-
Title: Unified Pathological Speech Analysis with Prompt TuningComments: This work has been submitted to the IEEE for possible publicationSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Pathological speech analysis has been of interest in the detection of certain diseases like depression and Alzheimer's disease and attracts much interest from researchers. However, previous pathological speech analysis models are commonly designed for a specific disease while overlooking the connection between diseases, which may constrain performance and lower training efficiency. Instead of fine-tuning deep models for different tasks, prompt tuning is a much more efficient training paradigm. We thus propose a unified pathological speech analysis system for as many as three diseases with the prompt tuning technique. This system uses prompt tuning to adjust only a small part of the parameters to detect different diseases from speeches of possible patients. Our system leverages a pre-trained spoken language model and demonstrates strong performance across multiple disorders while only fine-tuning a fraction of the parameters. This efficient training approach leads to faster convergence and improved F1 scores by allowing knowledge to be shared across tasks. Our experiments on Alzheimer's disease, Depression, and Parkinson's disease show competitive results, highlighting the effectiveness of our method in pathological speech analysis.
- [4] arXiv:2411.04152 [pdf, other]
-
Title: A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learningJournal-ref: ISMIR 2024, Nov 2024, San Francisco, Californ, United StatesSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking. Taking inspiration from the Contrastive Predictive Coding paradigm, we propose to train a Log-Mel-Spectrogram Transformer encoder to contrast observations at times separated by hypothesized beat intervals from those that are not. We do this without the knowledge of ground-truth tempo or beat positions, as we rely on the local maxima of a Predominant Local Pulse function, considered as a proxy for Tatum positions, to define candidate anchors, candidate positives (located at a distance of a power of two from the anchor) and negatives (remaining time positions). We show that a model pre-trained using this approach on the unlabeled FMA, MTT and MTG-Jamendo datasets can successfully be fine-tuned in the few-shot regime, i.e. with just a few annotated examples to get a competitive beat-tracking performance.
- [5] arXiv:2411.04153 [pdf, html, other]
-
Title: Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches and DatasetsComments: Accepted by IEEE Geoscience and Remote Sensing MagazineSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Understanding the extent of urban flooding is crucial for assessing building damage, casualties and economic losses. Synthetic Aperture Radar (SAR) technology offers significant advantages for mapping flooded urban areas due to its ability to collect data regardless weather and solar illumination conditions. However, the wide range of existing methods makes it difficult to choose the best approach for a specific situation and to identify future research directions. Therefore, this study provides a comprehensive review of current research on urban flood mapping using SAR data, summarizing key characteristics of floodwater in SAR images and outlining various approaches from scientific articles. Additionally, we provide a brief overview of the advantages and disadvantages of each method category, along with guidance on selecting the most suitable approach for different scenarios. This study focuses on the challenges and advancements in SAR-based urban flood mapping. It specifically addresses the limitations of spatial and temporal resolution in SAR data and discusses the essential pre-processing steps. Moreover, the article explores the potential benefits of Polarimetric SAR (PolSAR) techniques and uncertainty analysis for future research. Furthermore, it highlights a lack of open-access SAR datasets for urban flood mapping, hindering development in advanced deep learning-based methods. Besides, we evaluated the Technology Readiness Levels (TRLs) of urban flood mapping techniques to identify challenges and future research areas. Finally, the study explores the practical applications of SAR-based urban flood mapping in both the private and public sectors and provides a comprehensive overview of the benefits and potential impact of these methods.
- [6] arXiv:2411.04155 [pdf, html, other]
-
Title: MINDSETS: Multi-omics Integration with Neuroimaging for Dementia Subtyping and Effective Temporal StudySubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In the complex realm of cognitive disorders, Alzheimer's disease (AD) and vascular dementia (VaD) are the two most prevalent dementia types, presenting entangled symptoms yet requiring distinct treatment approaches. The crux of effective treatment in slowing neurodegeneration lies in early, accurate diagnosis, as this significantly assists doctors in determining the appropriate course of action. However, current diagnostic practices often delay VaD diagnosis, impeding timely intervention and adversely affecting patient prognosis. This paper presents an innovative multi-omics approach to accurately differentiate AD from VaD, achieving a diagnostic accuracy of 89.25%. The proposed method segments the longitudinal MRI scans and extracts advanced radiomics features. Subsequently, it synergistically integrates the radiomics features with an ensemble of clinical, cognitive, and genetic data to provide state-of-the-art diagnostic accuracy, setting a new benchmark in classification accuracy on a large public dataset. The paper's primary contribution is proposing a comprehensive methodology utilizing multi-omics data to provide a nuanced understanding of dementia subtypes. Additionally, the paper introduces an interpretable model to enhance clinical decision-making coupled with a novel model architecture for evaluating treatment efficacy. These advancements lay the groundwork for future work not only aimed at improving differential diagnosis but also mitigating and preventing the progression of dementia.
- [7] arXiv:2411.04158 [pdf, html, other]
-
Title: Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment DetectionSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Mild cognitive impairment (MCI) is a major public health concern due to its high risk of progressing to dementia. This study investigates the potential of detecting MCI with spontaneous voice assistant (VA) commands from 35 older adults in a controlled setting. Specifically, a command-generation task is designed with pre-defined intents for participants to freely generate commands that are more associated with cognitive ability than read commands. We develop MCI classification and regression models with audio, textual, intent, and multimodal fusion features. We find the command-generation task outperforms the command-reading task with an average classification accuracy of 82%, achieved by leveraging multimodal fusion features. In addition, generated commands correlate more strongly with memory and attention subdomains than read commands. Our results confirm the effectiveness of the command-generation task and imply the promise of using longitudinal in-home commands for MCI detection.
- [8] arXiv:2411.04202 [pdf, html, other]
-
Title: Observability and Generalized Sensor Placement for Nonlinear Quality Models in Drinking Water NetworksSubjects: Systems and Control (eess.SY)
This paper studies the problem of optimal geographic placement of water quality (WQ) sensors in drinking water distribution networks (WDNs), with a specific focus on chlorine transport, decay, and reaction models. Such models are traditionally used as suitable proxies for WQ. The literature on this topic is indeed inveterate, but has a key limitation: it utilizes simplified single-species decay and reaction models that do not capture WQ transients for nonlinear, multi-species interactions. This results in sensor placements that do not account for nonlinear WQ dynamics. Furthermore, and as WQ simulations are parameterized by hydraulic profiles and demand patterns, the placement of sensors are often hydraulics-dependent. This study produces a simple algorithm that addresses the two aforementioned limitations. The presented algorithm is grounded in nonlinear dynamic system sciences and observability theory, and yields sensor placements that are robust to hydraulic changes. Thorough case studies on benchmark water networks are provided. The key findings provide practical recommendations for WDN operators.
- [9] arXiv:2411.04274 [pdf, html, other]
-
Title: Effective Capacity of a Battery Energy Storage System Captive to a Wind FarmSubjects: Systems and Control (eess.SY); Applications (stat.AP)
Wind energy's role in the global electric grid is set to expand significantly. New York State alone anticipates offshore wind farms (WFs) contributing 9GW by 2035. Integration of energy storage emerges as crucial for this advancement. In this study, we focus on a WF paired with a captive battery energy storage system (BESS). We aim to ascertain the capacity credit for a BESS with specified energy and power ratings. Unlike prior methods rooted in reliability theory, we define a power alignment function, which leads to a straightforward definition of capacity and incremental capacity for the BESS. We develop a solution method based on a linear programming formulation. Our analysis utilizes wind data, collected by NYSERDA off Long Island's coast and load demand data from NYISO. Additionally, we present theoretical insights into BESS sizing and a key time-series property influencing BESS capacity, aiding in simulating wind and demand for estimating BESS energy requirements.
- [10] arXiv:2411.04364 [pdf, html, other]
-
Title: Efficient Position Determination of Highly Directional RF Emitters via Iterated Beampattern AnalysisComments: 9 pages, 16 figures, submitted to IEEE Transactions on Aerospace and Electronic SystemsSubjects: Signal Processing (eess.SP)
Received signal strength (RSS) information has seldom been incorporated in the direct position determination (DPD) method of passive radio emitter localization to date. Further, the common use of directional emitters modulates the RSS such that omnidirectional assumptions would dramatically decrease accuracy. This paper introduces a new DPD approach utilizing an RSS- enhanced adaptive beamforming method demonstrating on par or better than state-of-the-art performance at very low SNR for omnidirectional emitters. The technique is then applied to directional emitters taking the imposed RSS modulation into account using a beampattern library, significantly improving localization region confidence as compared to omnidirectional assumption approaches. This is the first approach to date in the open literature for localizing directional emitters.
- [11] arXiv:2411.04370 [pdf, html, other]
-
Title: Non-Reciprocal Beyond Diagonal RIS: Multiport Network Models and Performance Benefits in Full-Duplex SystemsComments: 13 pages, 11 figures, submitted to IEEE journal for publicationSubjects: Signal Processing (eess.SP)
Beyond diagonal reconfigurable intelligent surfaces (BD-RIS) is a new advance in RIS techniques that introduces reconfigurable inter-element connections to generate scattering matrices not limited to being diagonal. BD-RIS has been recently proposed and proven to have benefits in enhancing channel gain and enlarging coverage in wireless communications. Uniquely, BD-RIS enables reciprocal and non-reciprocal architectures characterized by symmetric and non-symmetric scattering matrices. However, the performance benefits and new use cases enabled by non-reciprocal BD-RIS for wireless systems remain unexplored. This work takes a first step toward closing this knowledge gap and studies the non-reciprocal BD-RIS in full-duplex systems and its performance benefits over reciprocal counterparts. We start by deriving a general RIS aided full-duplex system model using a multiport circuit theory, followed by a simplified channel model based on physically consistent assumptions. With the considered channel model, we investigate the effect of BD-RIS non-reciprocity and identify the theoretical conditions for reciprocal and non-reciprocal BD-RISs to simultaneously achieve the maximum received power of the signal of interest in the uplink and the downlink. Simulation results validate the theories and highlight the significant benefits offered by non-reciprocal BD-RIS in full-duplex systems. The significant gains are achieved because of the non-reciprocity principle which implies that if a wave hits the non-reciprocal BD-RIS from one direction, the surface behaves differently than if it hits from the opposite direction. This enables an uplink user and a downlink user at different locations to optimally communicate with the same full-duplex base station via a non-reciprocal BD-RIS, which would not be possible with reciprocal surfaces.
- [12] arXiv:2411.04379 [pdf, html, other]
-
Title: A Pre-training Framework that Encodes Noise Information for Speech Quality AssessmentSubjects: Audio and Speech Processing (eess.AS)
Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust representations, where SSL even outperforms supervised models. Most self-supervised approaches, however, are limited to embedding information about, i.e., the phonemes, speaker identity, and emotion, into the extracted representations, where they become invariant to background sounds due to contrastive and auto-regressive learning. This is limiting because many downstream tasks leverage noise information to function accurately. Therefore, we propose a pre-training framework that learns information pertaining to background noise in a supervised manner, while jointly embedding speech information using a self-supervised strategy. We experiment with multiple encoders and show that our framework is useful for perceptual speech quality estimation, which relies on background cues. Our results show that the proposed approach improves performance with fewer parameters, in comparison to multiple baselines.
- [13] arXiv:2411.04382 [pdf, html, other]
-
Title: Holographic-Pattern Based Multi-User Beam Training in RHS-Aided Hybrid Near-Field and Far-Field CommunicationsComments: 13 pages, 15 figuresSubjects: Signal Processing (eess.SP)
Reconfigurable holographic surfaces (RHSs) have been suggested as an energy-efficient solution for extremely large-scale arrays. By controlling the amplitude of RHS elements, high-gain directional holographic patterns can be achieved. However, the complexity of acquiring real-time channel state information (CSI) for beamforming is exceedingly high, particularly in large-scale RHS-assisted communications, where users may distribute in the near-field region of RHS. This paper proposes a one-shot multi-user beam training scheme in large-scale RHS-assisted systems applicable to both near and far fields. The proposed beam training scheme comprises two phases: angle search and distance search, both conducted simultaneously for all users. For the angle search, an RHS angular codebook is designed based on holographic principles so that each codeword covers multiple angles in both near-field and far-field regions, enabling simultaneous angular search for all users. For the distance search, we construct the distance-adaptive codewords covering all candidate angles of users in a real-time way by leveraging the additivity of holographic patterns, which is different from the traditional phase array case. Simulation results demonstrate that the proposed scheme achieves higher system throughput compared to traditional beam training schemes. The beam training accuracy approaches the upper bound of exhaustive search at a significantly reduced overhead.
- [14] arXiv:2411.04398 [pdf, html, other]
-
Title: Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter PositionSubjects: Signal Processing (eess.SP)
In this paper, we propose a radio-based passive target tracking algorithm using multipath measurements, including the angle of arrival and relative distance. We focus on a scenario in which a mobile receiver continuously receives radio signals from a transmitter located at an unknown position. The receiver utilizes multipath measurements extracted from the received signal to jointly localize the transmitter and the scatterers over time, with scatterers comprising a moving target and stationary objects that can reflect signals within the environment. We develop a comprehensive probabilistic model for the target tracking problem, incorporating the localization of the transmitter and scatterers, the identification of false alarms and missed detections in the measurements, and the association between scatterers and measurements. We employ a belief propagation approach to compute the posterior distributions of the positions of the scatterers and the transmitter. Additionally, we introduce a particle implementation for the belief propagation method. Simulation results demonstrate that our proposed algorithm outperforms existing benchmark methods in terms of target tracking accuracy.
- [15] arXiv:2411.04404 [pdf, html, other]
-
Title: Enhancing Bronchoscopy Depth Estimation through Synthetic-to-Real Domain AdaptationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Monocular depth estimation has shown promise in general imaging tasks, aiding in localization and 3D reconstruction. While effective in various domains, its application to bronchoscopic images is hindered by the lack of labeled data, challenging the use of supervised learning methods. In this work, we propose a transfer learning framework that leverages synthetic data with depth labels for training and adapts domain knowledge for accurate depth estimation in real bronchoscope data. Our network demonstrates improved depth prediction on real footage using domain adaptation compared to training solely on synthetic data, validating our approach.
- [16] arXiv:2411.04419 [pdf, html, other]
-
Title: Joint Discrete Antenna Positioning and Beamforming Optimization in Movable Antenna Enabled Full-Duplex ISAC NetworksSubjects: Signal Processing (eess.SP)
In this paper, we propose a full-duplex integrated sensing and communication (ISAC) system enabled by a movable antenna (MA). By leveraging the characteristic of MA that can increase the spatial diversity gain, the performance of the system can be enhanced. We formulate a problem of minimizing the total transmit power consumption via jointly optimizing the discrete position of MA elements, beamforming vectors, sensing signal covariance matrix and user transmit power. Given the significant coupling of optimization variables, the formulated problem presents a non-convex optimization challenge that poses difficulties for direct resolution. To address this challenging issue, the discrete binary particle swarm optimization (BPSO) algorithm framework is employed to solve the formulated problem. Specifically, the discrete positions of MA elements are first obtained by iteratively solving the fitness function. The difference-of-convex (DC) programming and successive convex approximation (SCA) are used to handle non-convex and rank-1 terms in the fitness function. Once the BPSO iteration is complete, the discrete positions of MA elements can be determined, and we can obtain the solutions for beamforming vectors, sensing signal covariance matrix and user transmit power. Numerical results demonstrate the superiority of the proposed system in reducing the total transmit power consumption compared with fixed antenna arrays.
- [17] arXiv:2411.04423 [pdf, html, other]
-
Title: Model Predictive Control Enabled UAV Trajectory Optimization and Secure Resource AllocationSubjects: Signal Processing (eess.SP)
In this paper, we investigate a secure communication architecture based on unmanned aerial vehicle (UAV), which enhances the security performance of the communication system through UAV trajectory optimization. We formulate a control problem of minimizing the UAV flight path and power consumption while maximizing secure communication rate over infinite horizon by jointly optimizing UAV trajectory, transmit beamforming vector, and artificial noise (AN) vector. Given the non-uniqueness of optimization objective and significant coupling of the optimization variables, the problem is a non-convex optimization problem which is difficult to solve directly. To address this complex issue, an alternating-iteration technique is employed to decouple the optimization variables. Specifically, the problem is divided into three subproblems, i.e., UAV trajectory, transmit beamforming vector, and AN vector, which are solved alternately. Additionally, considering the susceptibility of UAV trajectory to disturbances, the model predictive control (MPC) approach is applied to obtain UAV trajectory and enhance the system robustness. Numerical results demonstrate the superiority of the proposed optimization algorithm in maintaining accurate UAV trajectory and high secure communication rate compared with other benchmark schemes.
- [18] arXiv:2411.04467 [pdf, html, other]
-
Title: A Distributionally Robust Control Strategy for Frequency Safety based on Koopman Operator Described System ModelSubjects: Systems and Control (eess.SY)
As the proportion of renewable energy and power electronics in the power system increases, modeling frequency dynamics under power deficits becomes more challenging. Although data-driven methods help mitigate these challenges, they are exposed to data noise and training errors, leading to uncertain prediction errors. To address uncertain and limited statistical information of prediction errors, we introduce a distributionally robust data-enabled emergency frequency control (DREFC) framework. It aims to ensure a high probability of frequency safety and allows for adjustable control conservativeness for decision makers. Specifically, DREFC solves a min-max optimization problem to find the optimal control that is robust to distribution of prediction errors within a Wasserstein-distance-based ambiguity set. With an analytical approximation for VaR constraints, we achieve a computationally efficient reformulations. Simulations demonstrate that DREFC ensures frequency safety, low control costs and low computation time.
- [19] arXiv:2411.04472 [pdf, other]
-
Title: Accurate Calculation of Switching Events in Electromagnetic Transient Simulation Considering State Variable DiscontinuitiesSubjects: Systems and Control (eess.SY)
Accurate calculation of switching events is important for electromagnetic transient simulation to obtain reliable results. The common presumption of continuous differential state variables could prevent the accurate calculation, thus leading to unreliable results. This paper explores accurately calculating switching events without presuming continuous differential state variables. Possibility of the calculation is revealed by the proposal of related methods. Feasibility and accuracy of the proposed methods are demonstrated and analyzed via numerical case studies.
- [20] arXiv:2411.04510 [pdf, html, other]
-
Title: Sliding Mode Roll Control of Active Suspension Electric VehiclesSubjects: Systems and Control (eess.SY)
Vehicle roll control has been a well studied problem. One of the ubiquitous methods to mitigate vehicle rollover in the automobile industry is via a mechanical anti-roll bar. However with the advent of electric vehicles, rollover mitigation can be pursued using electric actuation. In this work, we study a roll control algorithm using sliding mode control for active suspension vehicles, where the actuation for the roll control signal is generated by electric motors independently at the four corners of the vehicle. This technology precludes the need for any mechanical actuation which is often slower as well as any anti-roll bar to mitigate vehicle rollover situations. We provide an implementation of the proposed algorithm and conduct numerical experiments to validate the functionality and effectiveness. Specifically, we perform Slalom and J-turn maneuvering tests on an active suspension electric vehicle with sliding model roll control and it is shown to mitigate rollover by atleast 50$\%$ compared to passive suspension vehicles, while simultaneously maintaining rider comfort.
- [21] arXiv:2411.04511 [pdf, html, other]
-
Title: Improve the Fitting Accuracy of Deep Learning for the Nonlinear Schr\"odinger Equation Using Linear Feature Decoupling MethodSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
We utilize the Feature Decoupling Distributed (FDD) method to enhance the capability of deep learning to fit the Nonlinear Schrodinger Equation (NLSE), significantly reducing the NLSE loss compared to non decoupling model.
- [22] arXiv:2411.04541 [pdf, html, other]
-
Title: Low Complexity Joint Chromatic Dispersion and Time/Frequency Offset Estimation Based on Fractional Fourier TransformComments: 5 pages, 5 figures, 1 table, ACPIPOC2024 acceptSubjects: Signal Processing (eess.SP)
We propose and experimentally validate a joint estimation method for chromatic dispersion and time-frequency offset based on the fractional Fourier transform, which reduces computational complexity by more than 50% while keeping estimation accuracy.
- [23] arXiv:2411.04548 [pdf, html, other]
-
Title: Convergence and Robustness of Value and Policy Iteration for the Linear Quadratic RegulatorComments: This work has been submitted to the European Control Conference 2025Subjects: Systems and Control (eess.SY)
This paper revisits and extends the convergence and robustness properties of value and policy iteration algorithms for discrete-time linear quadratic regulator problems. In the model-based case, we extend current results concerning the region of exponential convergence of both algorithms. In the case where there is uncertainty on the value of the system matrices, we provide input-to-state stability results capturing the effect of model parameter uncertainties. Our findings offer new insights into these algorithms at the heart of several approximate dynamic programming schemes, highlighting their convergence and robustness behaviors. Numerical examples illustrate the significance of some of the theoretical results.
- [24] arXiv:2411.04575 [pdf, html, other]
-
Title: Generative Semantic Communications with Foundation Models: Perception-Error Analysis and Semantic-Aware Power AllocationSubjects: Signal Processing (eess.SP)
Generative foundation models can revolutionize the design of semantic communication (SemCom) systems allowing high fidelity exchange of semantic information at ultra low rates. In this work, a generative SemCom framework with pretrained foundation models is proposed, where both uncoded forward-with-error and coded discard-with-error schemes are developed for the semantic decoder. To characterize the impact of transmission reliability on the perceptual quality of the regenerated signal, their mathematical relationship is analyzed from a rate-distortion-perception perspective, which is proved to be non-decreasing. The semantic values are defined to measure the semantic information of multimodal semantic features accordingly. We also investigate semantic-aware power allocation problems aiming at power consumption minimization for ultra low rate and high fidelity SemComs. To solve these problems, two semantic-aware power allocation methods are proposed by leveraging the non-decreasing property of the perception-error relationship. Numerically, perception-error functions and semantic values of semantic data streams under both schemes for image tasks are obtained based on the Kodak dataset. Simulation results show that our proposed semanticaware method significantly outperforms conventional approaches, particularly in the channel-coded case (up to 90% power saving).
- [25] arXiv:2411.04581 [pdf, html, other]
-
Title: URLLC Networks enabled by STAR-RIS, Rate Splitting, and Multiple AntennasComments: Accepted at 2025 International Conference on Mobile and Miniaturized Terahertz Systems (ICMMTS)Subjects: Signal Processing (eess.SP)
The challenges in dense ultra-reliable low-latency communication networks to deliver the required service to multiple devices are addressed by three main technologies: multiple antennas at the base station (MISO), rate splitting multiple access (RSMA) with private and common message encoding, and simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS). Careful resource allocation, encompassing beamforming and RIS optimization, is required to exploit the synergy between the three. We propose an alternating optimization-based algorithm, relying on minorization-maximization. Numerical results show that the achievable second-order max-min rates of the proposed scheme outperform the baselines significantly. MISO, RSMA, and STAR-RIS all contribute to enabling ultra-reliable low-latency communication (URLLC).
- [26] arXiv:2411.04595 [pdf, html, other]
-
Title: TexLiverNet: Leveraging Medical Knowledge and Spatial-Frequency Perception for Enhanced Liver Tumor SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Integrating textual data with imaging in liver tumor segmentation is essential for enhancing diagnostic accuracy. However, current multi-modal medical datasets offer only general text annotations, lacking lesion-specific details critical for extracting nuanced features, especially for fine-grained segmentation of tumor boundaries and small lesions. To address these limitations, we developed datasets with lesion-specific text annotations for liver tumors and introduced the TexLiverNet model. TexLiverNet employs an agent-based cross-attention module that integrates text features efficiently with visual features, significantly reducing computational costs. Additionally, enhanced spatial and adaptive frequency domain perception is proposed to precisely delineate lesion boundaries, reduce background interference, and recover fine details in small lesions. Comprehensive evaluations on public and private datasets demonstrate that TexLiverNet achieves superior performance compared to current state-of-the-art methods.
- [27] arXiv:2411.04611 [pdf, html, other]
-
Title: Compressive Spectrum Sensing with 1-bit ADCsSubjects: Signal Processing (eess.SP)
Efficient wideband spectrum sensing (WSS) is essential for managing spectrum scarcity in wireless communications. However, existing compressed sensing (CS)-based WSS methods require high sampling rates and power consumption, particularly with high-precision analog-to-digital converters (ADCs). Although 1-bit CS with low-precision ADCs can mitigate these demands, most approaches still depend on multi-user cooperation and prior sparsity information, which are often unavailable in WSS scenarios. This paper introduces a non-cooperative WSS method using multicoset sampling with 1-bit ADCs to achieve sub-Nyquist sampling without requiring sparsity knowledge. We analyze the impact of 1-bit quantization on multiband signals, then apply eigenvalue decomposition to isolate the signal subspace from noise, enabling spectrum support estimation without signal reconstruction. This approach provides a power-efficient solution for WSS that eliminates the need for cooperation and prior information.
- [28] arXiv:2411.04648 [pdf, other]
-
Title: Bayesian reconstruction of sparse raster-scanned mid-infrared optoacoustic signals enables fast, label-free chemical microscopyConstantin Berger, Myeongseop Kim, Lukas Scheel-Platz, Vasilis Ntziachristos, Dominik Jüstel, Miguel A. PleitezSubjects: Image and Video Processing (eess.IV)
Hyperspectral optoacoustic microscopy (OAM) enables obtaining images with label-free biomolecular contrast, offering excellent perspectives as a diagnostic tool to assess freshly excised and unprocessed tissues. However, time-consuming raster-scanning image formation currently limits the translation potential of OAM into the clinical setting-for instance, in intraoperative histopathological assessments-where micrographs of excised tissue need to be taken within a few minutes for fast clinical decision-making. Here, we present a non-data-driven computational framework tailored to enable fast OAM by sparse data acquisition and model-based image reconstruction, termed Bayesian raster-computed optoacoustic microscopy (BayROM). Unlike conventional machine learning, BayROM doesn't require training datasets, but instead, it employs 1) optomechanical system properties to define a forward model and 2) prior knowledge of the imaged samples to facilitate reconstructing images based on the sparsely acquired data. We show that BayROM enables acquiring micrographs ten times faster and with structural similarity (SSIM) indices greater than 0.93 compared to conventional raster scanning microscopy, thus facilitating the clinical translation of OAM for fast, label-free intraoperative histopathology.
- [29] arXiv:2411.04675 [pdf, html, other]
-
Title: Advancing Multi-Connectivity in Satellite-Terrestrial Integrated Networks: Architectures, Challenges, and ApplicationsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Systems and Control (eess.SY)
Multi-connectivity (MC) in satellite-terrestrial integrated networks (STINs), included in 3GPP standards, is regarded as a promising technology for future networks. The significant advantages of MC in improving coverage, communication, and sensing through satellite-terrestrial collaboration have sparked widespread interest. In this article, we first introduce three fundamental deployment architectures of MC systems in STINs, including multi-satellite, single-satellite single-base-station, and multi-satellite multi-base-station configurations. Considering the emerging but still evolving satellite networking, we explore system design challenges such as satellite networking schemes, e.g., cell-free and multi-tier satellite networks. Then, key technical challenges that severely influence the quality of mutual communications, including beamforming, channel estimation, and synchronization, are discussed subsequently. Furthermore, typical applications such as coverage enhancement, traffic offloading, collaborative sensing, and low-altitude communication are demonstrated, followed by a case study comparing coverage performance in MC and single-connectivity (SC) configurations. Several essential future research directions for MC in STINs are presented to facilitate further exploration.
- [30] arXiv:2411.04689 [pdf, html, other]
-
Title: Over-the-Air DPD and Reciprocity Calibration in Massive MIMO and BeyondComments: This work has been submitted to the IEEE for possible publicationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this paper we study an over-the-air (OTA) approach for digital pre-distortion (DPD) and reciprocity calibration in massive multiple-input-multiple-output systems. In particular, we consider a memory-less non-linearity model for the base station (BS) transmitters and propose a methodology to linearize the transmitters and perform the calibration by using mutual coupling OTA measurements between BS antennas. We show that by only using the OTA-based data, we can linearize the transmitters and design the calibration to compensate for both the non-linearity and non-reciprocity of BS transceivers effectively. This allows to alleviate the requirement to have dedicated hardware modules for transceiver characterization. Moreover, exploiting the results of the DPD linearization step, our calibration method may be formulated in terms of closed-form transformations, achieving a significant complexity reduction over state-of-the-art methods, which usually rely on costly iterative computations. Simulation results showcase the potential of our approach in terms of the calibration matrix estimation error and downlink data-rates when applying zero-forcing precoding after using our OTA-based DPD and calibration method.
- [31] arXiv:2411.04702 [pdf, html, other]
-
Title: Large Intelligent Surfaces with Low-End Receivers: From Scaling to Antenna and Panel SelectionComments: This work has been submitted to the IEEE for possible publicationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
We analyze the performance of large intelligent surface (LIS) with hardware distortion at its RX-chains. In particular, we consider the memory-less polynomial model for non-ideal hardware and derive analytical expressions for the signal to noise plus distortion ratio after applying maximum ratio combining (MRC) at the LIS. We also study the effect of back-off and automatic gain control on the RX-chains. The derived expressions enable us to evaluate the scalability of LIS when hardware impairments are present. We also study the cost of assuming ideal hardware by analyzing the minimum scaling required to achieve the same performance with a non-ideal hardware. Then, we exploit the analytical expressions to propose optimized antenna selection schemes for LIS and we show that such schemes can improve the performance significantly. In particular, the antenna selection schemes allow the LIS to have lower number of non-ideal RX-chains for signal reception while maintaining a good performance. We also consider a more practical case where the LIS is deployed as a grid of multi-antenna panels, and we propose panel selection schemes to optimize the complexity-performance trade-offs and improve the system overall efficiency.
- [32] arXiv:2411.04753 [pdf, html, other]
-
Title: Efficient Channel Estimation With Shorter Pilots in RIS-Aided Communications: Using Array Geometries and Interference StatisticsComments: 16 pages, 9 figures, to appear in IEEE Transactions on Wireless CommunicationsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Accurate estimation of the cascaded channel from a user equipment (UE) to a base station (BS) via each reconfigurable intelligent surface (RIS) element is critical to realizing the full potential of the RIS's ability to control the overall channel. The number of parameters to be estimated is equal to the number of RIS elements, requiring an equal number of pilots unless an underlying structure can be identified. In this paper, we show how the spatial correlation inherent in the different RIS channels provides this desired structure. We first optimize the RIS phase-shift pattern using a much-reduced pilot length (determined by the rank of the spatial correlation matrices) to minimize the mean square error (MSE) in the channel estimation under electromagnetic interference. In addition to considering the linear minimum MSE (LMMSE) channel estimator, we propose a novel channel estimator that requires only knowledge of the array geometry while not requiring any user-specific statistical information. We call this the reduced-subspace least squares (RS-LS) estimator and optimize the RIS phase-shift pattern for it. This novel estimator significantly outperforms the conventional LS estimator. For both the LMMSE and RS-LS estimators, the proposed optimized RIS configurations result in significant channel estimation improvements over the benchmarks.
- [33] arXiv:2411.04782 [pdf, html, other]
-
Title: An Effective Pipeline for Whole-Slide Image Glomerulus SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Whole-slide images (WSI) glomerulus segmentation is essential for accurately diagnosing kidney diseases. In this work, we propose a practical pipeline for glomerulus segmentation that effectively enhances both patch-level and WSI-level segmentation tasks. Our approach leverages stitching on overlapping patches, increasing the detection coverage, especially when glomeruli are located near patch image borders. In addition, we conduct comprehensive evaluations from different segmentation models across two large and diverse datasets with over 30K glomerulus annotations. Experimental results demonstrate that models using our pipeline outperform the previous state-of-the-art method, achieving superior results across both datasets and setting a new benchmark for glomerulus segmentation in WSIs. The code and pre-trained models are available at this https URL.
- [34] arXiv:2411.04789 [pdf, other]
-
Title: Distributed Attack-Resilient Platooning Against False Data InjectionSubjects: Systems and Control (eess.SY)
This paper presents a novel distributed vehicle platooning control and coordination strategy. We propose a distributed predecessor-follower CACC scheme that allows to choose an arbitrarily small inter-vehicle distance while guaranteeing no rear-end collisions occur, even in the presence of undetected cyber-attacks on the communication channels such as false data injection. The safety guarantees of the CACC policy are derived by combing a sensor-based ACC policy that explicitly accounts for actuator saturation, and a communication-based predictive term that has state-dependent limits on its control authority, thus containing the effects of an unreliable communication channel. An undetected attack may still however be able to degrade platooning performance. To mitigate it, we propose a tailored Kalman observer-based attack detection algorithm that initially triggers a switch from the CACC policy to the ACC policy. Subsequently, by relying on a high-level coordinator, our strategy allows to isolate a compromised vehicle from the platoon formation by reconfiguring the platoon topology itself. The coordinator can also handle merging and splitting requests. We compare our algorithm in simulation against a state of the art distributed MPC scheme and we extensively test our full method in practice on a real system, a team of scaled-down car-like robots. Furthermore, we share the code to run both the simulations and robotic experiments.
- [35] arXiv:2411.04791 [pdf, html, other]
-
Title: A Continuification-Based Control Solution for Large-Scale ShepherdingSubjects: Systems and Control (eess.SY)
In this paper, we address the large-scale shepherding control problem using a continuification-based strategy. We consider a scenario in which a large group of follower agents (targets) must be confined within a designated goal region through indirect interactions with a controllable set of leader agents (herders). Our approach transforms the microscopic agent-based dynamics into a macroscopic continuum model via partial differential equations (PDEs). This formulation enables efficient, scalable control design for the herders' behavior, with guarantees of global convergence. Numerical and experimental validations in a mixed-reality swarm robotics framework demonstrate the method's effectiveness.
- [36] arXiv:2411.04833 [pdf, html, other]
-
Title: Finding Control Invariant Sets via Lipschitz Constants of Linear ProgramsSubjects: Systems and Control (eess.SY)
Control invariant sets play an important role in safety-critical control and find broad application in numerous fields such as obstacle avoidance for mobile robots. However, finding valid control invariant sets of dynamical systems under input limitations is notoriously difficult. We present an approach to safely expand an initial set while always guaranteeing that the set is control invariant. Specifically, we define an expansion law for the boundary of a set and check for control invariance using Linear Programs (LPs). To verify control invariance on a continuous domain, we leverage recently proposed Lipschitz constants of LPs to transform the problem of continuous verification into a finite number of LPs. Using concepts from differentiable optimization, we derive the safe expansion law of the control invariant set and show how it can be interpreted as a second invariance problem in the space of possible boundaries. Finally, we show how the obtained set can be used to obtain a minimally invasive safety filter in a Control Barrier Function (CBF) framework. Our work is supported by theoretical results as well as numerical examples.
- [37] arXiv:2411.04844 [pdf, html, other]
-
Title: Differentiable Gaussian Representation for Incomplete CT ReconstructionShaokai Wu, Yuxiang Lu, Wei Ji, Suizhi Huang, Fengyu Yang, Shalayiding Sirejiding, Qichen He, Jing Tong, Yanbiao Ji, Yue Ding, Hongtao LuSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Incomplete Computed Tomography (CT) benefits patients by reducing radiation exposure. However, reconstructing high-fidelity images from limited views or angles remains challenging due to the ill-posed nature of the problem. Deep Learning Reconstruction (DLR) methods have shown promise in enhancing image quality, but the paradox between training data diversity and high generalization ability remains unsolved. In this paper, we propose a novel Gaussian Representation for Incomplete CT Reconstruction (GRCT) without the usage of any neural networks or full-dose CT data. Specifically, we model the 3D volume as a set of learnable Gaussians, which are optimized directly from the incomplete sinogram. Our method can be applied to multiple views and angles without changing the architecture. Additionally, we propose a differentiable Fast CT Reconstruction method for efficient clinical usage. Extensive experiments on multiple datasets and settings demonstrate significant improvements in reconstruction quality metrics and high efficiency. We plan to release our code as open-source.
- [38] arXiv:2411.04864 [pdf, other]
-
Title: Voltage Support Capability Analysis of Grid-Forming Inverters with Current-Limiting Control Under Asymmetrical Grid FaultsSubjects: Systems and Control (eess.SY)
Voltage support capability is critical for grid-forming (GFM) inverters with current-limiting control (CLC) during grid faults. Despite the findings on the voltage support for symmetrical grid faults, its applicability to more common but complex asymmetrical grid faults has yet to be verified rigorously. This letter fills the gap in the voltage support capability analysis for asymmetrical grid faults by establishing and analyzing positive- and negative-sequence equivalent circuit models, where the virtual impedance is adopted to emulate various CLCs. It is discovered that matching the phase angle of the virtual impedance, emulated by the CLC, with that of the composed impedance from the capacitor to the fault location can maximize the voltage support capability of GFM inverters under asymmetrical grid faults. Rigorous theoretical analysis and experimental results verify this conclusion.
New submissions (showing 38 of 38 entries)
- [39] arXiv:2405.00695 (cross-list from cs.RO) [pdf, html, other]
-
Title: Joint torques prediction of a robotic arm using neural networksComments: 6 pages, 5 figures, submitted to CASE 2024Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
Accurate dynamic models are crucial for many robotic applications. Traditional approaches to deriving these models are based on the application of Lagrangian or Newtonian mechanics. Although these methods provide a good insight into the physical behaviour of the system, they rely on the exact knowledge of parameters such as inertia, friction and joint flexibility. In addition, the system is often affected by uncertain and nonlinear effects, such as saturation and dead zones, which can be difficult to model. A popular alternative is the application of Machine Learning (ML) techniques - e.g., Neural Networks (NNs) - in the context of a "black-box" methodology. This paper reports on our experience with this approach for a real-life 6 degrees of freedom (DoF) manipulator. Specifically, we considered several NN architectures: single NN, multiple NNs, and cascade NN. We compared the performance of the system by using different policies for selecting the NN hyperparameters. Our experiments reveal that the best accuracy and performance are obtained by a cascade NN, in which we encode our prior physical knowledge about the dependencies between joints, complemented by an appropriate optimisation of the hyperparameters.
- [40] arXiv:2411.04337 (cross-list from cs.SD) [pdf, html, other]
-
Title: Model and Deep learning based Dynamic Range Compression InversionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Dynamic Range Compression (DRC) is a popular audio effect used to control the dynamic range of a signal. Inverting DRC can also help to restore the original dynamics to produce new mixes and/or to improve the overall quality of the audio signal. Since, state-of-the-art DRC inversion techniques either ignore parameters or require precise parameters that are difficult to estimate, we fill the gap by combining a model-based approach with neural networks for DRC inversion. To this end, depending on the scenario, we use different neural networks to estimate DRC parameters. Then, a model-based inversion is completed to restore the original audio signal. Our experimental results show the effectiveness and robustness of the proposed method in comparison to several state-of-the-art methods, when applied on two music datasets.
- [41] arXiv:2411.04366 (cross-list from cs.SD) [pdf, html, other]
-
Title: The Concatenator: A Bayesian Approach To Real Time Concatenative MusaicingComments: 12 pages, 6 figures, Accepted for Publication in The International Society for Music Information Retrieval Proceedings, 2024Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
We present ``The Concatenator,'' a real time system for audio-guided concatenative synthesis. Similarly to Driedger et al.'s ``musaicing'' (or ``audio mosaicing'') technique, we concatenate a set number of windows within a corpus of audio to re-create the harmonic and percussive aspects of a target audio stream. Unlike Driedger's NMF-based technique, however, we instead use an explicitly Bayesian point of view, where corpus window indices are hidden states and the target audio stream is an observation. We use a particle filter to infer the best hidden corpus states in real-time. Our transition model includes a tunable parameter to control the time-continuity of corpus grains, and our observation model allows users to prioritize how quickly windows change to match the target. Because the computational complexity of the system is independent of the corpus size, our system scales to corpora that are hours long, which is an important feature in the age of vast audio data collections. Within The Concatenator module itself, composers can vary grain length, fit to target, and pitch shift in real time while reacting to the sounds they hear, enabling them to rapidly iterate ideas. To conclude our work, we evaluate our system with extensive quantitative tests of the effects of parameters, as well as a qualitative evaluation with artistic insights. Based on the quality of the results, we believe the real-time capability unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.
- [42] arXiv:2411.04376 (cross-list from cs.LG) [pdf, html, other]
-
Title: Game-Theoretic Defenses for Robust Conformal Prediction Against Adversarial Attacks in Medical ImagingSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Image and Video Processing (eess.IV)
Adversarial attacks pose significant threats to the reliability and safety of deep learning models, especially in critical domains such as medical imaging. This paper introduces a novel framework that integrates conformal prediction with game-theoretic defensive strategies to enhance model robustness against both known and unknown adversarial perturbations. We address three primary research questions: constructing valid and efficient conformal prediction sets under known attacks (RQ1), ensuring coverage under unknown attacks through conservative thresholding (RQ2), and determining optimal defensive strategies within a zero-sum game framework (RQ3). Our methodology involves training specialized defensive models against specific attack types and employing maximum and minimum classifiers to aggregate defenses effectively. Extensive experiments conducted on the MedMNIST datasets, including PathMNIST, OrganAMNIST, and TissueMNIST, demonstrate that our approach maintains high coverage guarantees while minimizing prediction set sizes. The game-theoretic analysis reveals that the optimal defensive strategy often converges to a singular robust model, outperforming uniform and simple strategies across all evaluated datasets. This work advances the state-of-the-art in uncertainty quantification and adversarial robustness, providing a reliable mechanism for deploying deep learning models in adversarial environments.
- [43] arXiv:2411.04452 (cross-list from quant-ph) [pdf, html, other]
-
Title: Optimal Allocation of Pauli Measurements for Low-rank Quantum State TomographySubjects: Quantum Physics (quant-ph); Signal Processing (eess.SP); Optimization and Control (math.OC)
The process of reconstructing quantum states from experimental measurements, accomplished through quantum state tomography (QST), plays a crucial role in verifying and benchmarking quantum devices. A key challenge of QST is to find out how the accuracy of the reconstruction depends on the number of state copies used in the measurements. When multiple measurement settings are used, the total number of state copies is determined by multiplying the number of measurement settings with the number of repeated measurements for each setting. Due to statistical noise intrinsic to quantum measurements, a large number of repeated measurements is often used in practice. However, recent studies have shown that even with single-sample measurements--where only one measurement sample is obtained for each measurement setting--high accuracy QST can still be achieved with a sufficiently large number of different measurement settings. In this paper, we establish a theoretical understanding of the trade-off between the number of measurement settings and the number of repeated measurements per setting in QST. Our focus is primarily on low-rank density matrix recovery using Pauli measurements. We delve into the global landscape underlying the low-rank QST problem and demonstrate that the joint consideration of measurement settings and repeated measurements ensures a bounded recovery error for all second-order critical points, to which optimization algorithms tend to converge. This finding suggests the advantage of minimizing the number of repeated measurements per setting when the total number of state copies is held fixed. Additionally, we prove that the Wirtinger gradient descent algorithm can converge to the region of second-order critical points with a linear convergence rate. We have also performed numerical experiments to support our theoretical findings.
- [44] arXiv:2411.04494 (cross-list from cs.RO) [pdf, html, other]
-
Title: Online Omnidirectional Jumping Trajectory Planning for Quadrupedal Robots on Uneven TerrainsLinzhu Yue, Zhitao Song, Jinhu Dong, Zhongyu Li, Hongbo Zhang, Lingwei Zhang, Xuanqi Zeng, Koushil Sreenath, Yun-hui LiuComments: Submitted to IJRRSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Natural terrain complexity often necessitates agile movements like jumping in animals to improve traversal efficiency. To enable similar capabilities in quadruped robots, complex real-time jumping maneuvers are required. Current research does not adequately address the problem of online omnidirectional jumping and neglects the robot's kinodynamic constraints during trajectory generation. This paper proposes a general and complete cascade online optimization framework for omnidirectional jumping for quadruped robots. Our solution systematically encompasses jumping trajectory generation, a trajectory tracking controller, and a landing controller. It also incorporates environmental perception to navigate obstacles that standard locomotion cannot bypass, such as jumping from high platforms. We introduce a novel jumping plane to parameterize omnidirectional jumping motion and formulate a tightly coupled optimization problem accounting for the kinodynamic constraints, simultaneously optimizing CoM trajectory, Ground Reaction Forces (GRFs), and joint states. To meet the online requirements, we propose an accelerated evolutionary algorithm as the trajectory optimizer to address the complexity of kinodynamic constraints. To ensure stability and accuracy in environmental perception post-landing, we introduce a coarse-to-fine relocalization method that combines global Branch and Bound (BnB) search with Maximum a Posteriori (MAP) estimation for precise positioning during navigation and jumping. The proposed framework achieves jump trajectory generation in approximately 0.1 seconds with a warm start and has been successfully validated on two quadruped robots on uneven terrains. Additionally, we extend the framework's versatility to humanoid robots.
- [45] arXiv:2411.04515 (cross-list from physics.med-ph) [pdf, other]
-
Title: Effect of the geometry of butt-joint implant-supported restorations on the fatigue life of prosthetic screwsJournal-ref: The Journal of Prosthetic Dentistry (2022) 127: 477Subjects: Medical Physics (physics.med-ph); Systems and Control (eess.SY)
Statement of problem. Dental implant geometry affects the mechanical performance and fatigue behavior of butt-joint implant-supported restorations. However, failure of the implant component has been generally studied by ignoring the prosthetic screw, which is frequently the critical restoration component Purpose. Evaluate the effect of 3 main implant geometric parameters: the implant body diameter, the platform diameter, and the implant-abutment connection type (external versus internal butt-joint) on the fatigue life of the prosthetic screw. The experimental values were further compared with the theoretical ones obtained by using a previously published methodology M&M. 4 different designs of direct-to-implant dental restorations from the manufacturer BTI were tested. Forty-eight fatigue tests were performed in an axial fatigue testing machine according to ISO 14801. Linear regression models, 95% interval confidence bands for the linear regression, and 95% prediction intervals of the fatigue load-life results were obtained and compared through an analysis of covariance to determine the influence of the 3 parameters under study on the fatigue behavior Results. Linear regression models showed a statistical difference when the implant body diameter was increased by 1 mm; an average 3.5-fold increase in fatigue life was observed. Increasing the implant abutment connection diameter by 1.4 mm also showed a significant difference, leading to 7-fold longer fatigue life on average. No significant statistical evidence was found to demonstrate a difference in fatigue life between internal and external connections Conclusions. Increasing the implant platform and body diameter significantly improved the fatigue life of the screw, whereas external and internal connections provided similar results. In addition, experimental results proved the accuracy of the fatigue life prediction methodology
- [46] arXiv:2411.04568 (cross-list from cs.HC) [pdf, html, other]
-
Title: Dynamic-Attention-based EEG State Transition Modeling for Emotion RecognitionComments: 14 pages, 6 figuresSubjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)
Electroencephalogram (EEG)-based emotion decoding can objectively quantify people's emotional state and has broad application prospects in human-computer interaction and early detection of emotional disorders. Recently emerging deep learning architectures have significantly improved the performance of EEG emotion decoding. However, existing methods still fall short of fully capturing the complex spatiotemporal dynamics of neural signals, which are crucial for representing emotion processing. This study proposes a Dynamic-Attention-based EEG State Transition (DAEST) modeling method to characterize EEG spatiotemporal dynamics. The model extracts spatiotemporal components of EEG that represent multiple parallel neural processes and estimates dynamic attention weights on these components to capture transitions in brain states. The model is optimized within a contrastive learning framework for cross-subject emotion recognition. The proposed method achieved state-of-the-art performance on three publicly available datasets: FACED, SEED, and SEED-V. It achieved 75.4% accuracy in the binary classification of positive and negative emotions and 59.3% in nine-class discrete emotion classification on the FACED dataset, 88.1% in the three-class classification of positive, negative, and neutral emotions on the SEED dataset, and 73.6% in five-class discrete emotion classification on the SEED-V dataset. The learned EEG spatiotemporal patterns and dynamic transition properties offer valuable insights into neural dynamics underlying emotion processing.
- [47] arXiv:2411.04570 (cross-list from cs.LG) [pdf, html, other]
-
Title: Higher-Order GNNs Meet Efficiency: Sparse Sobolev Graph Neural NetworksJhony H. Giraldo, Aref Einizade, Andjela Todorovic, Jhon A. Castro-Correa, Mohsen Badiey, Thierry Bouwmans, Fragkiskos D. MalliarosJournal-ref: IEEE Transactions on Signal and Information Processing over Networks, 2024Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Graph Neural Networks (GNNs) have shown great promise in modeling relationships between nodes in a graph, but capturing higher-order relationships remains a challenge for large-scale networks. Previous studies have primarily attempted to utilize the information from higher-order neighbors in the graph, involving the incorporation of powers of the shift operator, such as the graph Laplacian or adjacency matrix. This approach comes with a trade-off in terms of increased computational and memory demands. Relying on graph spectral theory, we make a fundamental observation: the regular and the Hadamard power of the Laplacian matrix behave similarly in the spectrum. This observation has significant implications for capturing higher-order information in GNNs for various tasks such as node classification and semi-supervised learning. Consequently, we propose a novel graph convolutional operator based on the sparse Sobolev norm of graph signals. Our approach, known as Sparse Sobolev GNN (S2-GNN), employs Hadamard products between matrices to maintain the sparsity level in graph representations. S2-GNN utilizes a cascade of filters with increasing Hadamard powers to generate a diverse set of functions. We theoretically analyze the stability of S2-GNN to show the robustness of the model against possible graph perturbations. We also conduct a comprehensive evaluation of S2-GNN across various graph mining, semi-supervised node classification, and computer vision tasks. In particular use cases, our algorithm demonstrates competitive performance compared to state-of-the-art GNNs in terms of performance and running time.
- [48] arXiv:2411.04573 (cross-list from cs.CL) [pdf, html, other]
-
Title: Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource LanguagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper presents a novel multistage fine-tuning strategy designed to enhance automatic speech recognition (ASR) performance in low-resource languages using OpenAI's Whisper model. In this approach we aim to build ASR model for languages with limited digital resources by sequentially adapting the model across linguistically similar languages. We experimented this on the Malasar language, a Dravidian language spoken by approximately ten thousand people in the Western Ghats of South India. Malasar language faces critical challenges for technological intervention due to its lack of a native script and absence of digital or spoken data resources. Working in collaboration with Wycliffe India and Malasar community members, we created a spoken Malasar corpus paired with transcription in Tamil script, a closely related major language. In our approach to build ASR model for Malasar, we first build an intermediate Tamil ASR, leveraging higher data availability for Tamil annotated speech. This intermediate model is subsequently fine-tuned on Malasar data, allowing for more effective ASR adaptation despite limited resources. The multistage fine-tuning strategy demonstrated significant improvements over direct fine-tuning on Malasar data alone, achieving a word error rate (WER) of 51.9%, which is 4.5% absolute reduction when compared to the direct fine-tuning method. Further a WER reduction to 47.3% was achieved through punctuation removal in post-processing, which addresses formatting inconsistencies that impact evaluation. Our results underscore the effectiveness of sequential multistage fine-tuning combined with targeted post-processing as a scalable strategy for ASR system development in low-resource languages, especially where linguistic similarities can be leveraged to bridge gaps in training data.
- [49] arXiv:2411.04574 (cross-list from cs.IT) [pdf, html, other]
-
Title: RIS-Assisted Space Shift Keying with Non-Ideal Transceivers and Greedy DetectionComments: 12 pages, 8 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Reconfigurable intelligent surfaces (RIS) and index modulation (IM) represent key technologies for enabling reliable wireless communication with high energy efficiency. However, to fully take advantage of these technologies in practical deployments, comprehending the impact of the non-ideal nature of the underlying transceivers is paramount. In this context, this paper introduces two RIS-assisted IM communication models, in which the RIS is part of the transmitter and space-shift keying (SSK) is employed for IM, and assesses their performance in the presence of hardware impairments. In the first model, the RIS acts as a passive reflector only, reflecting the oncoming SSK modulated signal intelligently towards the desired receive diversity branch/antenna. The second model employs RIS as a transmitter, employing M-ary phase-shift keying for reflection phase modulation (RPM), and as a reflector for the incoming SSK modulated signal. Considering transmissions subjected to Nakagami-m fading, and a greedy detection rule at the receiver, the performance of both the system configurations is evaluated. Specifically, the pairwise probability of erroneous index detection and the probability of erroneous index detection are adopted as performance metrics, and their closed-form expressions are derived for the RIS-assisted SSK and RIS-assisted SSK-RPM system models. Monte-Carlo simulation studies are carried out to verify the analytical framework, and numerical results are presented to study the dependency of the error performance on the system parameters. The findings highlight the effect of hardware impairment on the performance of the communication system under study.
- [50] arXiv:2411.04620 (cross-list from cs.CV) [pdf, html, other]
-
Title: Multi-temporal crack segmentation in concrete structure using deep learning approachesSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cracks are among the earliest indicators of deterioration in concrete structures. Early automatic detection of these cracks can significantly extend the lifespan of critical infrastructures, such as bridges, buildings, and tunnels, while simultaneously reducing maintenance costs and facilitating efficient structural health monitoring. This study investigates whether leveraging multi-temporal data for crack segmentation can enhance segmentation quality. Therefore, we compare a Swin UNETR trained on multi-temporal data with a U-Net trained on mono-temporal data to assess the effect of temporal information compared with conventional single-epoch approaches. To this end, a multi-temporal dataset comprising 1356 images, each with 32 sequential crack propagation images, was created. After training the models, experiments were conducted to analyze their generalization ability, temporal consistency, and segmentation quality. The multi-temporal approach consistently outperformed its mono-temporal counterpart, achieving an IoU of $82.72\%$ and a F1-score of $90.54\%$, representing a significant improvement over the mono-temporal model's IoU of $76.69\%$ and F1-score of $86.18\%$, despite requiring only half of the trainable parameters. The multi-temporal model also displayed a more consistent segmentation quality, with reduced noise and fewer errors. These results suggest that temporal information significantly enhances the performance of segmentation models, offering a promising solution for improved crack detection and the long-term monitoring of concrete structures, even with limited sequential data.
- [51] arXiv:2411.04672 (cross-list from cs.LG) [pdf, html, other]
-
Title: Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement LearningComments: This paper has been submitted to IEEE Journal. The source code has been released at:this https URLSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
This paper presents a semantic-aware multi-modal resource allocation (SAMRA) for multi-task using multi-agent reinforcement learning (MARL), termed SAMRAMARL, utilizing in platoon systems where cellular vehicle-to-everything (C-V2X) communication is employed. The proposed approach leverages the semantic information to optimize the allocation of communication resources. By integrating a distributed multi-agent reinforcement learning (MARL) algorithm, SAMRAMARL enables autonomous decision-making for each vehicle, channel assignment optimization, power allocation, and semantic symbol length based on the contextual importance of the transmitted information. This semantic-awareness ensures that both vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications prioritize data that is critical for maintaining safe and efficient platoon operations. The framework also introduces a tailored quality of experience (QoE) metric for semantic communication, aiming to maximize QoE in V2V links while improving the success rate of semantic information transmission (SRS). Extensive simulations has demonstrated that SAMRAMARL outperforms existing methods, achieving significant gains in QoE and communication efficiency in C-V2X platooning scenarios.
- [52] arXiv:2411.04676 (cross-list from math.OC) [pdf, html, other]
-
Title: A Comparative Study of Distributed Feedback Optimizing Control ArchitecturesComments: Accepted to IEEE Transactions on Control Systems TechnologySubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper considers the problem of steady-state real-time optimization (RTO) of interconnected systems with a common constraint that couples several units, for example, a shared resource. Such problems are often studied under the context of distributed optimization, where decisions are made locally in each subsystem, and are coordinated to optimize the overall performance. Here, we use distributed feedback-optimizing control framework, where the local systems and the coordinator problems are converted into feedback control problems. This is a powerful scheme that allows us to design feedback control loops, and estimate parameters locally, as well as provide local fast response, allowing different closed-loop time constants for each local subsystem. This paper provides a comparative study of different distributed feedback optimizing control architectures using two case studies. The first case study considers the problem of demand response in a residential energy hub powered by a common renewable energy source, and compares the different feedback optimizing control approaches using simulations. The second case study experimentally validates and compares the different approaches using a lab-scale experimental rig that emulates a subsea oil production network, where the common resource is the gas lift that must be optimally allocated among the wells. %The pros and cons of the different approaches are discussed.
- [53] arXiv:2411.04682 (cross-list from cs.CV) [pdf, html, other]
-
Title: DNN-based 3D Cloud Retrieval for Variable Solar Illumination and Multiview Spaceborne ImagingComments: 4 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Climate studies often rely on remotely sensed images to retrieve two-dimensional maps of cloud properties. To advance volumetric analysis, we focus on recovering the three-dimensional (3D) heterogeneous extinction coefficient field of shallow clouds using multiview remote sensing data. Climate research requires large-scale worldwide statistics. To enable scalable data processing, previous deep neural networks (DNNs) can infer at spaceborne remote sensing downlink rates. However, prior methods are limited to a fixed solar illumination direction. In this work, we introduce the first scalable DNN-based system for 3D cloud retrieval that accommodates varying camera poses and solar directions. By integrating multiview cloud intensity images with camera poses and solar direction data, we achieve greater flexibility in recovery. Training of the DNN is performed by a novel two-stage scheme to address the high number of degrees of freedom in this problem. Our approach shows substantial improvements over previous state-of-the-art, particularly in handling variations in the sun's zenith angle.
- [54] arXiv:2411.04711 (cross-list from cs.CV) [pdf, html, other]
-
Title: Progressive Multi-Level Alignments for Semi-Supervised Domain Adaptation SAR Target Recognition Using Simulated DataSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Recently, an intriguing research trend for automatic target recognition (ATR) from synthetic aperture radar (SAR) imagery has arisen: using simulated data to train ATR models is a feasible solution to the issue of inadequate measured data. To close the domain gap that exists between the real and simulated data, the unsupervised domain adaptation (UDA) techniques are frequently exploited to construct ATR models. However, for UDA, the target domain lacks labeled data to direct the model training, posing a great challenge to ATR performance. To address the above problem, a semi-supervised domain adaptation (SSDA) framework has been proposed adopting progressive multi-level alignments for simulated data-aided SAR ATR. First, a progressive wavelet transform data augmentation (PWTDA) is presented by analyzing the discrepancies of wavelet decomposition sub-bands of two domain images, obtaining the domain-level alignment. Specifically, the domain gap is narrowed by mixing the wavelet transform high-frequency sub-band components. Second, we develop an asymptotic instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes, aiming to achieve category-level alignment. Moreover, the consistency alignment is implemented by excavating the strong-weak augmentation consistency of both individual samples and the multi-sample relationship, enhancing the generalization capability of the model. Extensive experiments on the Synthetic and Measured Paired Labeled Experiment (SAMPLE) dataset, indicate that our approach obtains recognition accuracies of 99.63% and 98.91% in two common experimental settings with only one labeled sample per class of the target domain, outperforming the most advanced SSDA techniques.
- [55] arXiv:2411.04727 (cross-list from quant-ph) [pdf, html, other]
-
Title: Quantum Speedup for Polar Maximum Likelihood DecodingComments: 5pages, 6 figuresSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Signal Processing (eess.SP)
Conventional decoding algorithms for polar codes strive to balance achievable performance and computational complexity in classical computing. While maximum likelihood (ML) decoding guarantees optimal performance, its NP-hard nature makes it impractical for real-world systems. In this letter, we propose a novel ML decoding architecture for polar codes based on the Grover adaptive search, a quantum exhaustive search algorithm. Unlike conventional studies, our approach, enabled by a newly formulated objective function, uniquely supports Gray-coded multi-level modulation without expanding the search space size compared to the classical ML decoding. Simulation results demonstrate that our proposed quantum decoding achieves ML performance while providing a pure quadratic speedup in query complexity.
- [56] arXiv:2411.04762 (cross-list from cs.NI) [pdf, html, other]
-
Title: JC5A: Service Delay Minimization for Aerial MEC-assisted Industrial Cyber-Physical SystemsSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
In the era of the sixth generation (6G) and industrial Internet of Things (IIoT), an industrial cyber-physical system (ICPS) drives the proliferation of sensor devices and computing-intensive tasks. To address the limited resources of IIoT sensor devices, unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) has emerged as a promising solution, providing flexible and cost-effective services in close proximity of IIoT sensor devices (ISDs). However, leveraging aerial MEC to meet the delay-sensitive and computation-intensive requirements of the ISDs could face several challenges, including the limited communication, computation and caching (3C) resources, stringent offloading requirements for 3C services, and constrained on-board energy of UAVs. To address these issues, we first present a collaborative aerial MEC-assisted ICPS architecture by incorporating the computing capabilities of the macro base station (MBS) and UAVs. We then formulate a service delay minimization optimization problem (SDMOP). Since the SDMOP is proved to be an NP-hard problem, we propose a joint computation offloading, caching, communication resource allocation, computation resource allocation, and UAV trajectory control approach (JC5A). Specifically, JC5A consists of a block successive upper bound minimization method of multipliers (BSUMM) for computation offloading and service caching, a convex optimization-based method for communication and computation resource allocation, and a successive convex approximation (SCA)-based method for UAV trajectory control. Moreover, we theoretically prove the convergence and polynomial complexity of JC5A. Simulation results demonstrate that the proposed approach can achieve superior system performance compared to the benchmark approaches and algorithms.
- [57] arXiv:2411.04809 (cross-list from math.OC) [pdf, html, other]
-
Title: Minimax Linear Regulator Problems for Positive SystemsComments: 26 pages, 5 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
Exceptional are the instances where explicit solutions to optimal control problems are obtainable. Of particular interest are the explicit solutions derived for minimax problems, which provide a framework for tackling challenges characterized by adversarial conditions and uncertainties. This work builds on recent discrete-time research, extending it to a multi-disturbance minimax linear framework for linear time-invariant systems in continuous time. Disturbances are considered to be bounded by elementwise linear constraints, along with unconstrained positive disturbances. Dynamic programming theory is applied to derive explicit solutions to the Hamilton-Jacobi-Bellman (HJB) equation for both finite and infinite horizons. For the infinite horizon a fixed-point method is proposed to compute the solution of the HJB equation. Moreover, the Linear Regulator (LR) problem is introduced, which, analogous to the Linear-Quadratic Regulator (LQR) problem, can be utilized for the stabilization of positive systems. A linear program formulation for the LR problem is proposed which computes the associated stabilizing controller, it it exists. Additionally necessary and sufficient conditions for minimizing the $l_1$-induced gain of the system are derived and characterized through the disturbance penalty of the cost function of the minimax problem class. We motivate the prospective scalability properties of our framework with a large-scale water management network.
- [58] arXiv:2411.04810 (cross-list from cs.CV) [pdf, html, other]
-
Title: GANESH: Generalizable NeRF for Lensless ImagingRakesh Raj Madavan, Akshat Kaimal, Badhrinarayanan K V, Vinayak Gupta, Rohit Choudhary, Chandrakala Shanmuganathan, Kaushik MitraJournal-ref: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Lensless imaging offers a significant opportunity to develop ultra-compact cameras by removing the conventional bulky lens system. However, without a focusing element, the sensor's output is no longer a direct image but a complex multiplexed scene representation. Traditional methods have attempted to address this challenge by employing learnable inversions and refinement models, but these methods are primarily designed for 2D reconstruction and do not generalize well to 3D reconstruction. We introduce GANESH, a novel framework designed to enable simultaneous refinement and novel view synthesis from multi-view lensless images. Unlike existing methods that require scene-specific training, our approach supports on-the-fly inference without retraining on each scene. Moreover, our framework allows us to tune our model to specific scenes, enhancing the rendering and refinement quality. To facilitate research in this area, we also present the first multi-view lensless dataset, LenslessScenes. Extensive experiments demonstrate that our method outperforms current approaches in reconstruction accuracy and refinement quality. Code and video results are available at this https URL
- [59] arXiv:2411.04821 (cross-list from cs.CV) [pdf, html, other]
-
Title: End-to-end Inception-Unet based Generative Adversarial Networks for Snow and Rain RemovalsSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
The superior performance introduced by deep learning approaches in removing atmospheric particles such as snow and rain from a single image; favors their usage over classical ones. However, deep learning-based approaches still suffer from challenges related to the particle appearance characteristics such as size, type, and transparency. Furthermore, due to the unique characteristics of rain and snow particles, single network based deep learning approaches struggle in handling both degradation scenarios simultaneously. In this paper, a global framework that consists of two Generative Adversarial Networks (GANs) is proposed where each handles the removal of each particle individually. The architectures of both desnowing and deraining GANs introduce the integration of a feature extraction phase with the classical U-net generator network which in turn enhances the removal performance in the presence of severe variations in size and appearance. Furthermore, a realistic dataset that contains pairs of snowy images next to their groundtruth images estimated using a low-rank approximation approach; is presented. The experiments show that the proposed desnowing and deraining approaches achieve significant improvements in comparison to the state-of-the-art approaches when tested on both synthetic and realistic datasets.
- [60] arXiv:2411.04949 (cross-list from cs.IT) [pdf, html, other]
-
Title: Global Optimal Closed-Form Solutions for Intelligent Surfaces With Mutual Coupling: Is Mutual Coupling Detrimental or Beneficial?Comments: Submitted to IEEE for publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Reconfigurable Intelligent Surface (RIS) is a breakthrough technology enabling the dynamic control of the propagation environment in wireless communications through programmable surfaces. To improve the flexibility of conventional diagonal RIS (D-RIS), beyond diagonal RIS (BD-RIS) has emerged as a family of more general RIS architectures. However, D-RIS and BD-RIS have been commonly explored neglecting mutual coupling effects, while the global optimization of RIS with mutual coupling, its performance limits, and scaling laws remain unexplored. This study addresses these gaps by deriving global optimal closed-form solutions for BD-RIS with mutual coupling to maximize the channel gain, specifically fully- and tree-connected RISs. Besides, we provide the expression of the maximum channel gain achievable in the presence of mutual coupling and its scaling law in closed form. By using the derived scaling laws, we analytically prove that mutual coupling increases the channel gain on average under Rayleigh fading channels. Our theoretical analysis, confirmed by numerical simulations, shows that both fully- and tree-connected RISs with mutual coupling achieve the same channel gain upper bound when optimized with the proposed global optimal solutions. Furthermore, we observe that a mutual coupling-unaware optimization of RIS can cause a channel gain degradation of up to 5 dB.
Cross submissions (showing 22 of 22 entries)
- [61] arXiv:2304.12507 (replaced) [pdf, html, other]
-
Title: Learning Task-Specific Strategies for Accelerated MRIZihui Wu, Tianwei Yin, Yu Sun, Robert Frost, Andre van der Kouwe, Adrian V. Dalca, Katherine L. BoumanJournal-ref: IEEE Transactions on Computational Imaging (Volume 10, 2024) 1040 - 1054Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Compressed sensing magnetic resonance imaging (CS-MRI) seeks to recover visual information from subsampled measurements for diagnostic tasks. Traditional CS-MRI methods often separately address measurement subsampling, image reconstruction, and task prediction, resulting in a suboptimal end-to-end performance. In this work, we propose TACKLE as a unified co-design framework for jointly optimizing subsampling, reconstruction, and prediction strategies for the performance on downstream tasks. The naïve approach of simply appending a task prediction module and training with a task-specific loss leads to suboptimal downstream performance. Instead, we develop a training procedure where a backbone architecture is first trained for a generic pre-training task (image reconstruction in our case), and then fine-tuned for different downstream tasks with a prediction head. Experimental results on multiple public MRI datasets show that TACKLE achieves an improved performance on various tasks over traditional CS-MRI methods. We also demonstrate that TACKLE is robust to distribution shifts by showing that it generalizes to a new dataset we experimentally collected using different acquisition setups from the training data. Without additional fine-tuning, TACKLE leads to both numerical and visual improvements compared to existing baselines. We have further implemented a learned 4$\times$-accelerated sequence on a Siemens 3T MRI Skyra scanner. Compared to the fully-sampling scan that takes 335 seconds, our optimized sequence only takes 84 seconds, achieving a four-fold time reduction as desired, while maintaining high performance.
- [62] arXiv:2309.13539 (replaced) [pdf, html, other]
-
Title: MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for EchocardiographySekeun Kim, Pengfei Jin, Cheng Chen, Kyungsang Kim, Zhiliang Lyu, Hui Ren, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Tianming Liu, Xiang Li, Quanzheng LiSubjects: Image and Video Processing (eess.IV)
Despite achieving impressive results in general-purpose semantic segmentation with strong generalization on natural images, the Segment Anything Model (SAM) has shown less precision and stability in medical image segmentation. In particular, the original SAM architecture is designed for 2D natural images and is therefore not support to handle three-dimensional information, which is particularly important for medical imaging modalities that are often volumetric or video data. In this paper, we introduce MediViSTA, a parameter-efficient fine-tuning method designed to adapt the vision foundation model for medical video, with a specific focus on echocardiographic segmentation. To achieve spatial adaptation, we propose a frequency feature fusion technique that injects spatial frequency information from a CNN branch. For temporal adaptation, we integrate temporal adapters within the transformer blocks of the image encoder. Using a fine-tuning strategy, only a small subset of pre-trained parameters is updated, allowing efficient adaptation to echocardiographic data. The effectiveness of our method has been comprehensively evaluated on three datasets, comprising two public datasets and one multi-center in-house dataset. Our method consistently outperforms various state-of-the-art approaches without using any prompts. Furthermore, our model exhibits strong generalization capabilities on unseen datasets, surpassing the second-best approach by 2.15\% in Dice and 0.09 in temporal consistency. The results demonstrate the potential of MediViSTA to significantly advance echocardiographical video segmentation, offering improved accuracy and robustness in cardiac assessment applications.
- [63] arXiv:2311.18386 (replaced) [pdf, other]
-
Title: A Novel Variational Approach for Multiphoton Microscopy Image Restoration: from PSF Estimation to 3D DeconvolutionJulien Ajdenbaum (OPIS, CVN), Emilie Chouzenoux (OPIS, CVN), Claire Lefort (XLIM), Ségolène Martin (OPIS, CVN), Jean-Christophe Pesquet (OPIS, CVN)Journal-ref: Inverse Problems, 2024, 40 (6)Subjects: Image and Video Processing (eess.IV); Optimization and Control (math.OC)
In multi-photon microscopy (MPM), a recent in-vivo fluorescence microscopy system, the task of image restoration can be decomposed into two interlinked inverse problems: firstly, the characterization of the Point Spread Function (PSF) and subsequently, the deconvolution (i.e., deblurring) to remove the PSF effect, and reduce noise. The acquired MPM image quality is critically affected by PSF blurring and intense noise. The PSF in MPM is highly spread in 3D and is not well characterized, presenting high variability with respect to the observed objects. This makes the restoration of MPM images challenging. Common PSF estimation methods in fluorescence microscopy, including MPM, involve capturing images of sub-resolution beads, followed by quantifying the resulting ellipsoidal 3D spot. In this work, we revisit this approach, coping with its inherent limitations in terms of accuracy and practicality. We estimate the PSF from the observation of relatively large beads (approximately 1$\mu$m in diameter). This goes through the formulation and resolution of an original non-convex minimization problem, for which we propose a proximal alternating method along with convergence guarantees. Following the PSF estimation step, we then introduce an innovative strategy to deal with the high level multiplicative noise degrading the acquisitions. We rely on a heteroscedastic noise model for which we estimate the parameters. We then solve a constrained optimization problem to restore the image, accounting for the estimated PSF and noise, while allowing a minimal hyper-parameter tuning. Theoretical guarantees are given for the restoration algorithm. These algorithmic contributions lead to an end-to-end pipeline for 3D image restoration in MPM, that we share as a publicly available Python software. We demonstrate its effectiveness through several experiments on both simulated and real data.
- [64] arXiv:2312.08946 (replaced) [pdf, html, other]
-
Title: Color Agnostic Cross-Spectral Disparity EstimationJournal-ref: 2024 IEEE International Conference on Acoustics, Speech and Signal ProcessingSubjects: Image and Video Processing (eess.IV)
Since camera modules become more and more affordable, multispectral camera arrays have found their way from special applications to the mass market, e.g., in automotive systems, smartphones, or drones. Due to multiple modalities, the registration of different viewpoints and the required cross-spectral disparity estimation is up to the present extremely challenging. To overcome this problem, we introduce a novel spectral image synthesis in combination with a color agnostic transform. Thus, any recently published stereo matching network can be turned to a cross-spectral disparity estimator. Our novel algorithm requires only RGB stereo data to train a cross-spectral disparity estimator and a generalization from artificial training data to camera-captured images is obtained. The theoretical examination of the novel color agnostic method is completed by an extensive evaluation compared to state of the art including self-recorded multispectral data and a reference implementation. The novel color agnostic disparity estimation improves cross-spectral as well as conventional color stereo matching by reducing the average end-point error by 41% for cross-spectral and by 22% for mono-modal content, respectively.
- [65] arXiv:2312.08949 (replaced) [pdf, html, other]
-
Title: A Guided Upsampling Network for Short Wave Infrared Images Using Graph RegularizationJournal-ref: 2024 IEEE International Conference on Acoustics, Speech and Signal ProcessingSubjects: Image and Video Processing (eess.IV)
Exploiting the infrared area of the spectrum for classification problems is getting increasingly popular, because many materials have characteristic absorption bands in this area. However, sensors in the short wave infrared (SWIR) area and even higher wavelengths have a very low spatial resolution in comparison to classical cameras that operate in the visible wavelength area. Thus, in this paper an upsampling method for SWIR images guided by a visible image is presented. For that, the proposed guided upsampling network (GUNet) uses a graph-regularized optimization problem based on learned affinities is presented. The evaluation is based on a novel synthetic near-field visible-SWIR stereo database. Different guided upsampling methods are evaluated, which shows an improvement of nearly 1 dB on this database for the proposed upsampling method in comparison to the second best guided upsampling network. Furthermore, a visual example of an upsampled SWIR image of a real-world scene is depicted for showing real-world applicability.
- [66] arXiv:2401.09980 (replaced) [pdf, other]
-
Title: A Comparative Analysis of U-Net-based models for Segmentation of Cardiac MRISubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Medical imaging refers to the technologies and methods utilized to view the human body and its inside, in order to diagnose, monitor, or even treat medical disorders. This paper aims to explore the application of deep learning techniques in the semantic segmentation of Cardiac short-axis MRI (Magnetic Resonance Imaging) images, aiming to enhance the diagnosis, monitoring, and treatment of medical disorders related to the heart. The focus centers on implementing various architectures that are derivatives of U-Net, to effectively isolate specific parts of the heart for comprehensive anatomical and functional analysis. Through a combination of images, graphs, and quantitative metrics, the efficacy of the models and their predictions are showcased. Additionally, this paper addresses encountered challenges and outline strategies for future improvements. This abstract provides a concise overview of the efforts in utilizing deep learning for cardiac image segmentation, emphasizing both the accomplishments and areas for further refinement.
- [67] arXiv:2403.02046 (replaced) [pdf, other]
-
Title: Array Synthesis in Terms of Characteristic Modes and Generalized Scattering MatricesSubjects: Signal Processing (eess.SP)
The synthesis of antenna arrays in presence of mutual coupling using generalized scattering matrices in terms of characteristic modes is proposed. For the synthesis, the array is built of synthetic elements that are described by their modal scattering and radiation behavior. In particular, the question of how to describe the degrees of freedom of such elements is addressed. The eigenvalues of the characteristic modes of the element geometry and the modal radiation behavior of the antenna are thereby selected as degrees of freedom for the model of the synthetic elements. Using this model and a modal coupling matrix, an approach to optimize the modal configuration of the elements within an array is proposed. Finally, a close to reality example shows how the proposed theory can be used to enhance the cross-polarization rejection of a circularly polarized patch antenna array with a fixed beam.
- [68] arXiv:2403.04655 (replaced) [pdf, html, other]
-
Title: Closed-loop Performance Optimization of Model Predictive Control with Robustness GuaranteesSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Model mismatch and process noise are two frequently occurring phenomena that can drastically affect the performance of model predictive control (MPC) in practical applications. We propose a principled way to tune the cost function and the constraints of linear MPC schemes to improve the closed-loop performance and robust constraint satisfaction on uncertain nonlinear dynamics with additive noise. The tuning is performed using a novel MPC tuning algorithm based on backpropagation developed in our earlier work. Using the scenario approach, we provide probabilistic bounds on the likelihood of closed-loop constraint violation over a finite horizon. We showcase the effectiveness of the proposed method on linear and nonlinear simulation examples.
- [69] arXiv:2405.10649 (replaced) [pdf, html, other]
-
Title: Efficient Recovery of Sparse Graph Signals from Graph Filter OutputsComments: This work has been submitted to the IEEE for possible publicationSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper investigates the recovery of a node-domain sparse graph signal from the output of a graph filter. This problem, which is often referred to as the identification of the source of a diffused sparse graph signal, is seminal in the field of graph signal processing (GSP). Sparse graph signals can be used in the modeling of a variety of real-world applications in networks, such as social, biological, and power systems, and enable various GSP tasks, such as graph signal reconstruction, blind deconvolution, and sampling. In this paper, we assume double sparsity of both the graph signal and the graph topology, as well as a low-order graph filter. We propose three algorithms to reconstruct the support set of the input sparse graph signal from the graph filter output samples, leveraging these assumptions and the generalized information criterion (GIC). First, we describe the graph multiple GIC (GM-GIC) method, which is based on partitioning the dictionary elements (graph filter matrix columns) that capture information on the signal into smaller subsets. Then, the local GICs are computed for each subset and aggregated to make a global decision. Second, inspired by the well-known branch and bound (BNB) approach, we develop the graph-based branch and bound GIC (graph-BNB-GIC), and incorporate a new tractable heuristic bound tailored to the graph and graph filter characteristics. In addition, we propose the graph-based first order correction (GFOC) method, which improves existing sparse recovery methods by iteratively examining potential improvements to the GIC cost function by replacing elements from the estimated support set with elements from their one-hop neighborhood. In addition, we investigate the application of our graph-based sparse recovery methods in blind deconvolution scenarios where the graph filter is unknown.
- [70] arXiv:2405.18782 (replaced) [pdf, html, other]
-
Title: Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play PriorsComments: Accepted to NeurIPS 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Diffusion models (DMs) have recently shown outstanding capabilities in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.
- [71] arXiv:2406.10395 (replaced) [pdf, html, other]
-
Title: BrainSegFounder: Towards 3D Foundation Models for Neuroimage SegmentationComments: 19 pages, 5 figures, to be published in Medical Image AnalysisSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
The burgeoning field of brain health research increasingly leverages artificial intelligence (AI) to interpret and analyze neurological data. This study introduces a novel approach towards the creation of medical foundation models by integrating a large-scale multi-modal magnetic resonance imaging (MRI) dataset derived from 41,400 participants in its own. Our method involves a novel two-stage pretraining approach using vision transformers. The first stage is dedicated to encoding anatomical structures in generally healthy brains, identifying key features such as shapes and sizes of different brain regions. The second stage concentrates on spatial information, encompassing aspects like location and the relative positioning of brain structures. We rigorously evaluate our model, BrainFounder, using the Brain Tumor Segmentation (BraTS) challenge and Anatomical Tracings of Lesions After Stroke v2.0 (ATLAS v2.0) datasets. BrainFounder demonstrates a significant performance gain, surpassing the achievements of the previous winning solutions using fully supervised learning. Our findings underscore the impact of scaling up both the complexity of the model and the volume of unlabeled training data derived from generally healthy brains, which enhances the accuracy and predictive capabilities of the model in complex neuroimaging tasks with MRI. The implications of this research provide transformative insights and practical applications in healthcare and make substantial steps towards the creation of foundation models for Medical AI. Our pretrained models and training code can be found at this https URL.
- [72] arXiv:2406.18624 (replaced) [pdf, html, other]
-
Title: Robust Low-Cost Drone Detection and Classification in Low SNR EnvironmentsComments: 11 pages, 9 figuresJournal-ref: IEEE Journal of Radio Frequency Identification, Volume 8, 2024Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
The proliferation of drones, or unmanned aerial vehicles (UAVs), has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of >= 85% at SNR > -12dB. In the field test, these models achieved an average balance accuracy of > 80%, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.
- [73] arXiv:2407.03671 (replaced) [pdf, other]
-
Title: Spatio-temporal cooperative control Method of Highway Ramp Merge Based on Vehicle-road CoordinationSubjects: Systems and Control (eess.SY)
The merging area of highway ramps faces multiple challenges, including traffic congestion, collision risks, speed mismatches, driver behavior uncertainties, limited visibility, and bottleneck effects. However, autonomous vehicles engaging in depth coordination between vehicle and road in merging zones, by pre-planning and uploading travel trajectories, can significantly enhance the safety and efficiency of merging this http URL this paper,we mainly introduce mainline priority cooperation method to achieve the time and space cooperative control of highway this http URL-mounted intelligent units share real-time vehicle status and driving intentions with Road Section Management Units, which pre-plan the spatiotemporal trajectories of vehicle travel. After receiving these trajectories, Vehicle Intelligent Units strictly adhere to them. Through this deep collaboration between vehicles and roads, conflicts in time and space during vehicle travel are eliminated in advance.
- [74] arXiv:2407.03794 (replaced) [pdf, html, other]
-
Title: CardioSpectrum: Comprehensive Myocardium Motion Analysis with 3D Deep Learning and Geometric InsightsComments: This paper has been early accepted to MICCAI 2024, LNCS 15005, Springer, 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The ability to map left ventricle (LV) myocardial motion using computed tomography angiography (CTA) is essential to diagnosing cardiovascular conditions and guiding interventional procedures. Due to their inherent locality, conventional neural networks typically have difficulty predicting subtle tangential movements, which considerably lessens the level of precision at which myocardium three-dimensional (3D) mapping can be performed. Using 3D optical flow techniques and Functional Maps (FMs), we present a comprehensive approach to address this problem. FMs are known for their capacity to capture global geometric features, thus providing a fuller understanding of 3D geometry. As an alternative to traditional segmentation-based priors, we employ surface-based two-dimensional (2D) constraints derived from spectral correspondence methods. Our 3D deep learning architecture, based on the ARFlow model, is optimized to handle complex 3D motion analysis tasks. By incorporating FMs, we can capture the subtle tangential movements of the myocardium surface precisely, hence significantly improving the accuracy of 3D mapping of the myocardium. The experimental results confirm the effectiveness of this method in enhancing myocardium motion analysis. This approach can contribute to improving cardiovascular diagnosis and treatment. Our code and additional resources are available at: this https URL
- [75] arXiv:2407.06633 (replaced) [pdf, html, other]
-
Title: Variational Zero-shot Multispectral PansharpeningSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Pansharpening aims to generate a high spatial resolution multispectral image (HRMS) by fusing a low spatial resolution multispectral image (LRMS) and a panchromatic image (PAN). The most challenging issue for this task is that only the to-be-fused LRMS and PAN are available, and the existing deep learning-based methods are unsuitable since they rely on many training pairs. Traditional variational optimization (VO) based methods are well-suited for addressing such a problem. They focus on carefully designing explicit fusion rules as well as regularizations for an optimization problem, which are based on the researcher's discovery of the image relationships and image structures. Unlike previous VO-based methods, in this work, we explore such complex relationships by a parameterized term rather than a manually designed one. Specifically, we propose a zero-shot pansharpening method by introducing a neural network into the optimization objective. This network estimates a representation component of HRMS, which mainly describes the relationship between HRMS and PAN. In this way, the network achieves a similar goal to the so-called deep image prior because it implicitly regulates the relationship between the HRMS and PAN images through its inherent structure. We directly minimize this optimization objective via network parameters and the expected HRMS image through iterative updating. Extensive experiments on various benchmark datasets demonstrate that our proposed method can achieve better performance compared with other state-of-the-art methods. The codes are available at this https URL.
- [76] arXiv:2407.09038 (replaced) [pdf, other]
-
Title: High-Resolution Hyperspectral Video Imaging Using A Hexagonal Camera ArrayJournal-ref: J. Opt. Soc. Am. A 41, 2303-2315 (2024)Subjects: Image and Video Processing (eess.IV)
Retrieving the reflectance spectrum from objects is an essential task for many classification and detection problems, since many materials and processes have a unique spectral behaviour. In many cases, it is highly desirable to capture hyperspectral images due to the high spectral flexibility. Often, it is even necessary to capture hyperspectral videos or at least to be able to record a hyperspectral image at once, also called snapshot hyperspectral imaging, to avoid spectral smearing. For this task, a high-resolution snapshot hyperspectral camera array using a hexagonal shape is this http URL hexagonal array for hyperspectral imaging uses off-the-shelf hardware, which enables high flexibility regarding employed cameras, lenses and filters. Hence, the spectral range can be easily varied by mounting a different set of filters. Moreover, the concept of using off-the-shelf hardware enables low prices in comparison to other approaches with highly specialized hardware. Since classical industrial cameras are used in this hyperspectral camera array, the spatial and temporal resolution is very high, while recording 37 hyperspectral channels in the range from 400 nm to 760 nm in 10 nm steps. A registration process is required for near-field imaging, which maps the peripheral camera views to the center view. It is shown that this combination using a hyperspectral camera array and the corresponding image registration pipeline is superior in comparison to other popular snapshot approaches. For this evaluation, a synthetic hyperspectral database is rendered. On the synthetic data, the novel approach outperforms its best competitor by more than 3 dB in reconstruction quality. This synthetic data is also used to show the superiority of the hexagonal shape in comparison to an orthogonal-spaced one. Moreover, a real-world high resolution hyperspectral video database is provided.
- [77] arXiv:2408.10552 (replaced) [pdf, html, other]
-
Title: Near-Field Multiuser Communications Aided by Movable AntennasComments: This paper has been accepted by IEEE Wireless Communications LettersSubjects: Signal Processing (eess.SP)
This letter investigates movable antenna (MA)-aided downlink (DL) multiuser communication systems under the near-field channel condition, where both the base station (BS) and the users are equipped with MAs to fully exploit the degrees of freedom (DoFs) in antenna position optimization. We develop a general channel model to accurately describe the channel characteristics in the near-field region and formulate an MA-position optimization problem to minimize the BS's transmit power subject to users' individual rate constraints. To solve this problem, we propose a two-loop dynamic neighborhood pruning particle swarm optimization (DNPPSO) algorithm that significantly reduces the computational complexity as compared to the standard particle swarm optimization (PSO) algorithm while achieving similar performance. Simulation results validate the effectiveness and advantages of the proposed scheme in power-saving for near-field multiuser communications.
- [78] arXiv:2408.14050 (replaced) [pdf, html, other]
-
Title: Fast Edge-Aware Occlusion Detection in the Context of Multispectral Camera ArraysSubjects: Image and Video Processing (eess.IV)
Multispectral imaging is very beneficial in diverse applications, like healthcare and agriculture, since it can capture absorption bands of molecules in different spectral areas. A promising approach for multispectral snapshot imaging are camera arrays. Image processing is necessary to warp all different views to the same view to retrieve a consistent multispectral datacube. This process is also called multispectral image registration. After a cross spectral disparity estimation, an occlusion detection is required to find the pixels that were not recorded by the peripheral cameras. In this paper, a novel fast edge-aware occlusion detection is presented, which is shown to reduce the runtime by at least a factor of 12. Moreover, an evaluation on ground truth data reveals better performance in terms of precision and recall. Finally, the quality of a final multispectral datacube can be improved by more than 1.5 dB in terms of PSNR as well as in terms of SSIM in an existing multispectral registration pipeline. The source code is available at \url{this https URL}.
- [79] arXiv:2410.07269 (replaced) [pdf, other]
-
Title: Deep Learning for Surgical Instrument Recognition and Segmentation in Robotic-Assisted Surgeries: A Systematic ReviewFatimaelzahraa Ali Ahmed, Mahmoud Yousef, Mariam Ali Ahmed, Hasan Omar Ali, Anns Mahboob, Hazrat Ali, Zubair Shah, Omar Aboumarzouk, Abdulla Al Ansari, Shidin BalakrishnanComments: 57 pages, 9 figures, Published in Artificial Intelligence Reviews journal <this https URLSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Applying deep learning (DL) for annotating surgical instruments in robot-assisted minimally invasive surgeries (MIS) represents a significant advancement in surgical technology. This systematic review examines 48 studies that and advanced DL methods and architectures. These sophisticated DL models have shown notable improvements in the precision and efficiency of detecting and segmenting surgical tools. The enhanced capabilities of these models support various clinical applications, including real-time intraoperative guidance, comprehensive postoperative evaluations, and objective assessments of surgical skills. By accurately identifying and segmenting surgical instruments in video data, DL models provide detailed feedback to surgeons, thereby improving surgical outcomes and reducing complication risks. Furthermore, the application of DL in surgical education is transformative. The review underscores the significant impact of DL on improving the accuracy of skill assessments and the overall quality of surgical training programs. However, implementing DL in surgical tool detection and segmentation faces challenges, such as the need for large, accurately annotated datasets to train these models effectively. The manual annotation process is labor-intensive and time-consuming, posing a significant bottleneck. Future research should focus on automating the detection and segmentation process and enhancing the robustness of DL models against environmental variations. Expanding the application of DL models across various surgical specialties will be essential to fully realize this technology's potential. Integrating DL with other emerging technologies, such as augmented reality (AR), also offers promising opportunities to further enhance the precision and efficacy of surgical procedures.
- [80] arXiv:2410.12438 (replaced) [pdf, other]
-
Title: Modeling, Prediction and Risk Management of Distribution System Voltages with Non-Gaussian Probability DistributionsSubjects: Systems and Control (eess.SY)
High renewable energy penetration into power distribution systems causes a substantial risk of exceeding voltage security limits, which needs to be accurately assessed and properly managed. However, the existing methods usually rely on the joint probability models of power generation and loads provided by probabilistic prediction to quantify the voltage risks, where inaccurate prediction results could lead to over or under estimated risks. This paper proposes an uncertain voltage component (UVC) prediction method for assessing and managing voltage risks. First, we define the UVC to evaluate voltage variations caused by the uncertainties associated with power generation and loads. Second, we propose a Gaussian mixture model-based probabilistic UVC prediction method to depict the non-Gaussian distribution of voltage variations. Then, we derive the voltage risk indices, including value-at-risk (VaR) and conditional value-at-risk (CVaR), based on the probabilistic UVC prediction model. Third, we investigate the mechanism of UVC-based voltage risk management and establish the voltage risk management problems, which are reformulated into linear programming or mixed-integer linear programming for convenient solutions. The proposed method is tested on power distribution systems with actual photovoltaic power and load data and compared with those considering probabilistic prediction of nodal power injections. Numerical results show that the proposed method is computationally efficient in assessing voltage risks and outperforms existing methods in managing voltage risks. The deviation of voltage risks obtained by the proposed method is only 15% of that by the methods based on probabilistic prediction of nodal power injections.
- [81] arXiv:2410.14116 (replaced) [pdf, html, other]
-
Title: Robustness to Model Approximation, Empirical Model Learning, and Sample Complexity in Wasserstein Regular MDPsComments: 35 pages; submitted to Mathematics of Operations ResearchSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
The paper studies the robustness properties of discrete-time stochastic optimal control under Wasserstein model approximation for both discounted cost and average cost criteria. Specifically, we study the performance loss when applying an optimal policy designed for an approximate model to the true dynamics compared with the optimal cost for the true model under the sup-norm-induced metric, and relate it to the Wasserstein-1 distance between the approximate and true transition kernels. A primary motivation of this analysis is empirical model learning, as well as empirical noise distribution learning, where Wasserstein convergence holds under mild conditions but stronger convergence criteria, such as total variation, may not. We discuss applications of the results to the disturbance estimation problem, where sample complexity bounds are given, and also to a general empirical model learning approach, obtained under either Markov or i.i.d.~learning settings. Further applications regarding the continuity of invariant probability measures with respect to transition kernels are also discussed.
- [82] arXiv:2410.23247 (replaced) [pdf, html, other]
-
Title: bit2bit: 1-bit quanta video reconstruction via self-supervised photon predictionComments: NeurIPS 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Quanta image sensors, such as SPAD arrays, are an emerging sensor technology, producing 1-bit arrays representing photon detection events over exposures as short as a few nanoseconds. In practice, raw data are post-processed using heavy spatiotemporal binning to create more useful and interpretable images at the cost of degrading spatiotemporal resolution. In this work, we propose bit2bit, a new method for reconstructing high-quality image stacks at the original spatiotemporal resolution from sparse binary quanta image data. Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data by predicting the photon arrival location probability distribution. However, due to the binary nature of the data, we show that the assumption of a Poisson distribution is inadequate. Instead, we model the process with a Bernoulli lattice process from the truncated Poisson. This leads to the proposal of a novel self-supervised solution based on a masked loss function. We evaluate our method using both simulated and real data. On simulated data from a conventional video, we achieve 34.35 mean PSNR with extremely photon-sparse binary input (<0.06 photons per pixel per frame). We also present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions. The scenes cover strong/weak ambient light, strong motion, ultra-fast events, etc., which will be made available to the community, on which we demonstrate the promise of our approach. Both reconstruction quality and throughput substantially surpass the state-of-the-art methods (e.g., Quanta Burst Photography (QBP)). Our approach significantly enhances the visualization and usability of the data, enabling the application of existing analysis techniques.
- [83] arXiv:2411.02072 (replaced) [pdf, html, other]
-
Title: Multitarget Bistatic MIMO RADARComments: 10 pages, 10 figuresSubjects: Signal Processing (eess.SP)
This paper is concerned with the investigation of the bistatic MIMO radar for estimating various multitarget parameters of interest in the presence of clutter and noise. The parameters of interest include Direction of Departure (DOD), Direction of Arrival (DOA), range and velocity and a novel algorithm is proposed for estimating these target parameters based on the concepts of the "array manifold" and "manifold extenders". The performance of the proposed algorithm is evaluated using Monte Carlo simulation studies.
- [84] arXiv:2302.12921 (replaced) [pdf, html, other]
-
Title: Pre-Finetuning for Few-Shot Emotional Speech RecognitionComments: Published at INTERSPEECH 2023. 5 pages, 4 figures. Code available at this https URLJournal-ref: Proc. INTERSPEECH 2023, 3602-3606 (2023)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.
- [85] arXiv:2305.15405 (replaced) [pdf, html, other]
-
Title: Textless Speech-to-Speech Translation With Limited Parallel DataComments: Accepted to EMNLP 2024 FindingsSubjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Existing speech-to-speech translation (S2ST) models fall into two camps: they either leverage text as an intermediate step or require hundreds of hours of parallel speech data. Both approaches are incompatible with textless languages or language pairs with limited parallel data. We present PFB, a framework for training textless S2ST models that require just dozens of hours of parallel speech data. We first pretrain a model on large-scale monolingual speech data, finetune it with a small amount of parallel speech data (20-60 hours), and lastly train with an unsupervised backtranslation objective. We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains (European Parliament, Common Voice, and All India Radio) with single-speaker synthesized speech. Evaluated using the ASR-BLEU metric, our models achieve reasonable performance on all three domains, with some being within 1-2 points of our higher-resourced topline.
- [86] arXiv:2306.05597 (replaced) [pdf, html, other]
-
Title: On the implementation of zero-determinant strategies in repeated gamesComments: 21 pagesSubjects: Statistical Mechanics (cond-mat.stat-mech); Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
Zero-determinant strategies are a class of strategies in repeated games which unilaterally control payoffs. Zero-determinant strategies have attracted much attention in studies of social dilemma, particularly in the context of evolution of cooperation. So far, not only general properties of zero-determinant strategies have been investigated, but zero-determinant strategies have been applied to control in the fields of information and communications technology and analysis of imitation. Here, we further deepen our understanding on general mathematical properties of zero-determinant strategies. We first prove that zero-determinant strategies, if exist, can be implemented by some one-dimensional transition probability. Next, we prove that, if a two-player game has a non-trivial potential function, a zero-determinant strategy exists in its repeated version. These results assist us to implement zero-determinant strategies in broader situations.
- [87] arXiv:2309.02340 (replaced) [pdf, html, other]
-
Title: Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Texture models based on Generative Adversarial Networks (GANs) use zero-padding to implicitly encode positional information of the image features. However, when extending the spatial input to generate images at large sizes, zero-padding can often lead to degradation in image quality due to the incorrect positional information at the center of the image. Moreover, zero-padding can limit the diversity within the generated large images. In this paper, we propose a novel approach for generating stochastic texture images at large arbitrary sizes using GANs based on patch-by-patch generation. Instead of zero-padding, the model uses \textit{local padding} in the generator that shares border features between the generated patches; providing positional context and ensuring consistency at the boundaries. The proposed models are trainable on a single texture image and have a constant GPU scalability with respect to the output image size, and hence can generate images of infinite sizes. We show in the experiments that our method has a significant advancement beyond existing GANs-based texture models in terms of the quality and diversity of the generated textures. Furthermore, the implementation of local padding in the state-of-the-art super-resolution models effectively eliminates tiling artifacts enabling large-scale super-resolution. Our code is available at \url{this https URL}.
- [88] arXiv:2310.00903 (replaced) [pdf, html, other]
-
Title: Symmetric Solutions to Symmetric Partial Difference EquationsComments: To appear in Journal of Difference Equations and ApplicationsSubjects: Classical Analysis and ODEs (math.CA); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper studies systems of linear difference equations on the lattice $\Z^n$ that are invariant under a finite group of symmetries, and shows that there exist solutions to such systems that are also invariant under this group of symmetries.
- [89] arXiv:2401.03115 (replaced) [pdf, html, other]
-
Title: Transferable Learned Image Compression-Resistant Adversarial PerturbationsComments: Accepted by BMVC 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.
- [90] arXiv:2402.05625 (replaced) [pdf, html, other]
-
Title: Coded Many-User Multiple Access via Approximate Message PassingComments: 25 pages, 8 figures. A shorter version of this paper appeared in the Proceedings of IEEE ISIT 2024Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We consider communication over the Gaussian multiple-access channel in the regime where the number of users grows linearly with the codelength. In this regime, schemes based on sparse superposition coding can achieve a near-optimal tradeoff between spectral efficiency and signal-to-noise ratio. However, these schemes are feasible only for small values of user payload. This paper investigates efficient schemes for larger user payloads, focusing on coded CDMA schemes where each user's information is encoded via a linear code before being modulated with a signature sequence. We propose an efficient approximate message passing (AMP) decoder that can be tailored to the structure of the linear code, and provide an exact asymptotic characterization of its performance. Based on this result, we consider a decoder that integrates AMP and belief propagation and characterize its tradeoff between spectral efficiency and signal-to-noise ratio, for a given target error rate. Simulation results show that the decoder achieves state-of-the-art performance at finite lengths, with a coded CDMA scheme defined using LDPC codes and a spatially coupled matrix of signature sequences.
- [91] arXiv:2403.02633 (replaced) [pdf, html, other]
-
Title: Spatially Non-Stationary XL-MIMO Channel Estimation: A Three-Layer Generalized Approximate Message Passing MethodComments: A revised manuscript has been submitted to the IEEE journal for possible publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, channel estimation problem for extremely large-scale multi-input multi-output (XL-MIMO) systems is investigated with the considerations of the spherical wavefront effect and the spatially non-stationary (SnS) property. Due to the diversities of SnS characteristics among different propagation paths, the concurrent channel estimation of multiple paths becomes intractable. To address this challenge, we propose a two-phase channel estimation scheme. In the first phase, the angles of departure (AoDs) on the user side are estimated, and a carefully designed pilot transmission scheme enables the decomposition of the received signal from different paths. In the second phase, the subchannel estimation corresponding to different paths is formulated as a three-layer Bayesian inference problem. Specifically, the first layer captures block sparsity in the angular domain, the second layer promotes SnS property in the antenna domain, and the third layer decouples the subchannels from the observed signals. To efficiently facilitate Bayesian inference, we propose a novel three-layer generalized approximate message passing (TL-GAMP) algorithm based on structured variational massage passing and belief propagation rules. Simulation results validate the convergence and effectiveness of the proposed algorithm, showcasing its robustness to different channel scenarios.
- [92] arXiv:2403.10012 (replaced) [pdf, html, other]
-
Title: Representing Domain-Mixing Optical Degradation for Real-World Computational Aberration Correction via Vector QuantizationQi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, JinXing Niu, Kaiwei Wang, Kailun Yang, Jian BaiComments: Accepted to Optics & Laser Technology. Codes and datasets are made publicly available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV); Optics (physics.optics)
Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervised Domain Adaptation (UDA). By incorporating readily accessible unpaired real-world data into training, we formalize the Domain Adaptive CAC (DACAC) task, and then introduce a comprehensive Real-world aberrated images (Realab) dataset to benchmark it. The setup task presents a formidable challenge due to the intricacy of understanding the target optical degradation domain. To this intent, we propose a novel Quantized Domain-Mixing Representation (QDMR) framework as a potent solution to the issue. Centering around representing and quantizing the optical degradation which is consistent across different images, QDMR adapts the CAC model to the target domain from three key aspects: (1) reconstructing aberrated images of both domains by a VQGAN to learn a Domain-Mixing Codebook (DMC) characterizing the optical degradation; (2) modulating the deep features in CAC model with DMC to transfer the target domain knowledge; and (3) leveraging the trained VQGAN to generate pseudo target aberrated images from the source ones for convincing target domain supervision. Extensive experiments on both synthetic and real-world benchmarks reveal that the models with QDMR consistently surpass the competitive methods in mitigating the synthetic-to-real gap, which produces visually pleasant real-world CAC results with fewer artifacts. Codes and datasets are made publicly available at this https URL.
- [93] arXiv:2406.06005 (replaced) [pdf, html, other]
-
Title: WoCoCo: Learning Whole-Body Humanoid Control with Sequential ContactsComments: Website, Code, and Videos: this https URLSubjects: Robotics (cs.RO); Graphics (cs.GR); Systems and Control (eess.SY)
Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks.
- [94] arXiv:2406.16148 (replaced) [pdf, html, other]
-
Title: Towards Open Respiratory Acoustic Foundation Models: Pretraining and BenchmarkingYuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia MascoloComments: accepted by NeurIPS 2024 Track Datasets and BenchmarksSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing advantages and possibly unlock this impasse. However, given the safety-critical nature of healthcare applications, it is pivotal to also ensure openness and replicability for any proposed foundation model solution. To this end, we introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system, as the first approach answering this need. We curate large-scale respiratory audio datasets (~136K samples, over 400 hours), pretrain three pioneering foundation models, and build a benchmark consisting of 19 downstream respiratory health tasks for evaluation. Our pretrained models demonstrate superior performance (against existing acoustic models pretrained with general audio on 16 out of 19 tasks) and generalizability (to unseen datasets and new respiratory audio modalities). This highlights the great promise of respiratory acoustic foundation models and encourages more studies using OPERA as an open resource to accelerate research on respiratory audio for health. The system is accessible from this https URL.
- [95] arXiv:2407.16803 (replaced) [pdf, html, other]
-
Title: C3T: Cross-modal Transfer Through Time for Human Action RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)
In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between modalities using the structure of a unified multimodal representation space for Human Action Recognition (HAR). We formalize and explore an understudied cross-modal transfer setting we term Unsupervised Modality Adaptation (UMA), where the modality used in testing is not used in supervised training, i.e. zero labeled instances of the test modality are available during training. We develop three methods to perform UMA: Student-Teacher (ST), Contrastive Alignment (CA), and Cross-modal Transfer Through Time (C3T). Our extensive experiments on various camera+IMU datasets compare these methods to each other in the UMA setting, and to their empirical upper bound in the supervised setting. The results indicate C3T is the most robust and highest performing by at least a margin of 8%, and nears the supervised setting performance even in the presence of temporal noise. This method introduces a novel mechanism for aligning signals across time-varying latent vectors, extracted from the receptive field of temporal convolutions. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for multi-modal learning in various applications.
- [96] arXiv:2408.05988 (replaced) [pdf, html, other]
-
Title: Eigenvalue Based Active User Enumeration for Grant-Free Access Under Carrier Frequency OffsetsComments: This work has been submitted to the IEEE for possible publicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This paper investigates a grant-free non-orthogonal multiple access (GF-NOMA) system in the presence of carrier frequency offsets. We propose two schemes for enumerating active users in such a GF-NOMA system, which is equivalent to estimating the sparsity level. Both schemes utilize a short common pilot and the eigenvalues of the sample covariance matrix of the received signal. The two schemes differ in their treatment of noise variance: one exploits known variance information, while the other is designed to function without this knowledge. Simulation results demonstrate the effectiveness of the proposed schemes in terms of the normalized root-mean-squared error.
- [97] arXiv:2408.15140 (replaced) [pdf, other]
-
Title: GEM: A GEneral Memristive Transistor ModelComments: 5 pages, 5 figuresSubjects: Applied Physics (physics.app-ph); Signal Processing (eess.SP)
Neuromorphic devices, with their distinct advantages in energy efficiency and parallel processing, are pivotal in advancing artificial intelligence applications. Among these devices, memristive transistors have attracted significant attention due to their superior stability and operation flexibility compared to two-terminal memristors. However, the lack of a robust model that accurately captures their complex electrical behavior has hindered further exploration of their potential. In this work, we introduce the GEneral Memristive transistor (GEM) model to address this challenge. The GEM model incorporates time-dependent differential equation, a voltage-controlled moving window function, and a nonlinear current output function, enabling precise representation of both switching and output characteristics in memristive transistors. Compared to previous models, the GEM model demonstrates a 300% improvement in modeling the switching behavior, while effectively capturing the inherent nonlinearities and physical limits of these devices. This advancement significantly enhances the realistic simulation of memristive transistors, thereby facilitating further exploration and application development.
- [98] arXiv:2410.19615 (replaced) [pdf, html, other]
-
Title: Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled RobotsComments: 11 pages, 7 figuresSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Stationary balance control is challenging for single-track two-wheeled (STTW) robots due to the lack of elegant balancing mechanisms and the conflict between the limited attraction domain and external disturbances. To address the absence of balancing mechanisms, we draw inspiration from cyclists and leverage the track stand maneuver, which relies solely on steering and rear-wheel actuation. To achieve accurate tracking in the presence of matched and mismatched disturbances, we propose an equilibrium adaptation-based control (EABC) scheme that can be seamlessly integrated with standard disturbance observers and controllers. This scheme enables adaptation to slow-varying disturbances by utilizing a disturbed equilibrium estimator, effectively handling both matched and mismatched disturbances in a unified manner while ensuring accurate tracking with zero steady-state error. We integrate the EABC scheme with nonlinear model predictive control (MPC) for the track stand of STTW robots and validate its effectiveness through two experimental scenarios. Our method demonstrates significant improvements in tracking accuracy, reducing errors by several orders of magnitude.
- [99] arXiv:2411.02471 (replaced) [pdf, other]
-
Title: Energy-Aware Dynamic Neural InferenceComments: \c{opyright}2024 IEEE. This work has been submitted to the IEEE for possible publicationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Systems and Control (eess.SY)
The growing demand for intelligent applications beyond the network edge, coupled with the need for sustainable operation, are driving the seamless integration of deep learning (DL) algorithms into energy-limited, and even energy-harvesting end-devices. However, the stochastic nature of ambient energy sources often results in insufficient harvesting rates, failing to meet the energy requirements for inference and causing significant performance degradation in energy-agnostic systems. To address this problem, we consider an on-device adaptive inference system equipped with an energy-harvester and finite-capacity energy storage. We then allow the device to reduce the run-time execution cost on-demand, by either switching between differently-sized neural networks, referred to as multi-model selection (MMS), or by enabling earlier predictions at intermediate layers, called early exiting (EE). The model to be employed, or the exit point is then dynamically chosen based on the energy storage and harvesting process states. We also study the efficacy of integrating the prediction confidence into the decision-making process. We derive a principled policy with theoretical guarantees for confidence-aware and -agnostic controllers. Moreover, in multi-exit networks, we study the advantages of taking decisions incrementally, exit-by-exit, by designing a lightweight reinforcement learning-based controller. Experimental results show that, as the rate of the ambient energy increases, energy- and confidence-aware control schemes show approximately 5% improvement in accuracy compared to their energy-aware confidence-agnostic counterparts. Incremental approaches achieve even higher accuracy, particularly when the energy storage capacity is limited relative to the energy consumption of the inference model.
- [100] arXiv:2411.02551 (replaced) [pdf, html, other]
-
Title: PIAST: A Multimodal Piano Dataset with Audio, Symbolic and TextComments: Accepted for publication at the 3rd Workshop on NLP for Music and Audio (NLP4MusA 2024)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
While piano music has become a significant area of study in Music Information Retrieval (MIR), there is a notable lack of datasets for piano solo music with text labels. To address this gap, we present PIAST (PIano dataset with Audio, Symbolic, and Text), a piano music dataset. Utilizing a piano-specific taxonomy of semantic tags, we collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts, resulting in two subsets: PIAST-YT and PIAST-AT. Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models. Among many possible tasks with the multi-modal dataset, we conduct music tagging and retrieval using both audio and MIDI data and report baseline performances to demonstrate its potential as a valuable resource for MIR research.