Search | arXiv e-print repository

Cellular Plasticity Model for Bottom-Up Robotic Design

Authors: Trevor R. Smith, Thomas J. Smith, Nicholas S. Szczecinski, Sergiy Yakovenko, Yu Gu

Abstract: Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesi… ▽ More Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesis -- the emergence of form from simple interactions. Turing patterns describe how diffusion and interactions between two chemical substances-an activator and an inhibitor-can lead to complex patterns and structures, such as the formation of limbs and feathers. Our study extends this concept by modeling cellular plasticity as an activator-inhibitor reaction augmented with environmental stimuli, encapsulating the core phenomena observed across various cell types: stem cells, neurons, and muscle cells. In addition to demonstrating self-regulation and self-containment, this approach ensures that a robot's form and function are direct emergent responses to its environment without a comprehensive environmental model. In the proposed model, a factory acts as the activator, producing a product that serves as the inhibitor, which is then influenced by environmental stimuli through consumption. These components are regulated by cellular plasticity phenomena as feedback loops. We calculate the equilibrium points of the model and the stability criterion. Simulations examine how varying parameters affect the system's transient behavior and the impact of competing functions on its functional capacity. Results show the model converges to a single stable equilibrium tuned to the environmental stimulation. Such dynamic behavior underscores the model's utility for generating predictable responses within robotics and biological systems, showcasing its potential for navigating the complexities of adaptive systems. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: 15 pages, 7 figures, Living Machines 2024

arXiv:2408.01452 [pdf, other]

Building a Domain-specific Guardrail Model in Production

Authors: Mohammad Niknazar, Paul V Haley, Latha Ramanan, Sang T. Truong, Yedendra Shrinivasan, Ayan Kumar Bhowmick, Prasenjit Dey, Ashish Jagmohan, Hema Maheshwari, Shom Ponoth, Robert Smith, Aditya Vempaty, Nick Haber, Sanmi Koyejo, Sharad Sundararajan

Abstract: Generative AI holds the promise of enabling a range of sought-after capabilities and revolutionizing workflows in various consumer and enterprise verticals. However, putting a model in production involves much more than just generating an output. It involves ensuring the model is reliable, safe, performant and also adheres to the policy of operation in a particular domain. Guardrails as a necessit… ▽ More Generative AI holds the promise of enabling a range of sought-after capabilities and revolutionizing workflows in various consumer and enterprise verticals. However, putting a model in production involves much more than just generating an output. It involves ensuring the model is reliable, safe, performant and also adheres to the policy of operation in a particular domain. Guardrails as a necessity for models has evolved around the need to enforce appropriate behavior of models, especially when they are in production. In this paper, we use education as a use case, given its stringent requirements of the appropriateness of content in the domain, to demonstrate how a guardrail model can be trained and deployed in production. Specifically, we describe our experience in building a production-grade guardrail model for a K-12 educational platform. We begin by formulating the requirements for deployment to this sensitive domain. We then describe the training and benchmarking of our domain-specific guardrail model, which outperforms competing open- and closed- instruction-tuned models of similar and larger size, on proprietary education-related benchmarks and public benchmarks related to general aspects of safety. Finally, we detail the choices we made on architecture and the optimizations for deploying this service in production; these range across the stack from the hardware infrastructure to the serving layer to language model inference optimizations. We hope this paper will be instructive to other practitioners looking to create production-grade domain-specific services based on generative AI and large language models. △ Less

Submitted 24 July, 2024; originally announced August 2024.

arXiv:2407.17579 [pdf, other]

doi 10.1145/3678884.3681833

Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing

Authors: JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, Amy X. Zhang

Abstract: Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis to… ▽ More Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis toward leveraging social technologies to support and augment human flourishing. This workshop is organized around three themes relevant to Positech: 1) "Exploring Relevant and Adjacent Research" to define and widen the Positech scope with insights from related fields, 2) "Projecting the Landscape of Positech" for participants to outline the domain's key aspects and 3) "Envisioning the Future of Positech," anchored around strategic planning towards a sustainable research community. Ultimately, this workshop will serve as a platform to shift the narrative of social technology research towards a more positive, human-centric approach. It will foster research that goes beyond fixing technologies to protect humans from harm, to also pursue enriching human experiences and connections through technology. △ Less

Submitted 27 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

arXiv:2406.16399 [pdf, other]

doi 10.4204/EPTCS.403.17

Pop Stacks with a Bypass

Authors: Lapo Cioni, Luca Ferrari, Rebecca Smith

Abstract: We consider sorting procedures for permutations making use of pop stacks with a bypass operation, and explore the combinatorial properties of the associated algorithms. We consider sorting procedures for permutations making use of pop stacks with a bypass operation, and explore the combinatorial properties of the associated algorithms. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: In Proceedings GASCom 2024, arXiv:2406.14588

ACM Class: G.2.1

Journal ref: EPTCS 403, 2024, pp. 73-78

arXiv:2405.18368 [pdf, other]

The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key role in treatment planning and post-treatment longitudinal assessment. The 2024 Brain Tumor Segmentation (BraTS) challenge on post-treatment glioma MRI will provide a community standard and benchmark for state-of-the-art automated segmentation models based on the largest expert-annotated post-treatment glioma MRI dataset. Challenge competitors will develop automated segmentation models to predict four distinct tumor sub-regions consisting of enhancing tissue (ET), surrounding non-enhancing T2/fluid-attenuated inversion recovery (FLAIR) hyperintensity (SNFH), non-enhancing tumor core (NETC), and resection cavity (RC). Models will be evaluated on separate validation and test datasets using standardized performance metrics utilized across the BraTS 2024 cluster of challenges, including lesion-wise Dice Similarity Coefficient and Hausdorff Distance. Models developed during this challenge will advance the field of automated MRI segmentation and contribute to their integration into clinical practice, ultimately enhancing patient care. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, 1 table

arXiv:2404.06529 [pdf, other]

Emergent Braitenberg-style Behaviours for Navigating the ViZDoom `My Way Home' Labyrinth

Authors: Caleidgh Bayer, Robert J. Smith, Malcolm I. Heywood

Abstract: The navigation of complex labyrinths with tens of rooms under visual partially observable state is typically addressed using recurrent deep reinforcement learning architectures. In this work, we show that navigation can be achieved through the emergent evolution of a simple Braitentberg-style heuristic that structures the interaction between agent and labyrinth, i.e. complex behaviour from simple… ▽ More The navigation of complex labyrinths with tens of rooms under visual partially observable state is typically addressed using recurrent deep reinforcement learning architectures. In this work, we show that navigation can be achieved through the emergent evolution of a simple Braitentberg-style heuristic that structures the interaction between agent and labyrinth, i.e. complex behaviour from simple heuristics. To do so, the approach of tangled program graphs is assumed in which programs cooperatively coevolve to develop a modular indexing scheme that only employs 0.8\% of the state space. We attribute this simplicity to several biases implicit in the representation, such as the use of pixel indexing as opposed to deploying a convolutional kernel or image processing operators. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.01100 [pdf, other]

Finite Sample Frequency Domain Identification

Authors: Anastasios Tsiamis, Mohamed Abdalmoaty, Roy S. Smith, John Lygeros

Abstract: We study non-parametric frequency-domain system identification from a finite-sample perspective. We assume an open loop scenario where the excitation input is periodic and consider the Empirical Transfer Function Estimate (ETFE), where the goal is to estimate the frequency response at certain desired (evenly-spaced) frequencies, given input-output samples. We show that under sub-Gaussian colored n… ▽ More We study non-parametric frequency-domain system identification from a finite-sample perspective. We assume an open loop scenario where the excitation input is periodic and consider the Empirical Transfer Function Estimate (ETFE), where the goal is to estimate the frequency response at certain desired (evenly-spaced) frequencies, given input-output samples. We show that under sub-Gaussian colored noise (in time-domain) and stability assumptions, the ETFE estimates are concentrated around the true values. The error rate is of the order of $\mathcal{O}((d_{\mathrm{u}}+\sqrt{d_{\mathrm{u}}d_{\mathrm{y}}})\sqrt{M/N_{\mathrm{tot}}})$, where $N_{\mathrm{tot}}$ is the total number of samples, $M$ is the number of desired frequencies, and $d_{\mathrm{u}},\,d_{\mathrm{y}}$ are the dimensions of the input and output signals respectively. This rate remains valid for general irrational transfer functions and does not require a finite order state-space representation. By tuning $M$, we obtain a $N_{\mathrm{tot}}^{-1/3}$ finite-sample rate for learning the frequency response over all frequencies in the $ \mathcal{H}_{\infty}$ norm. Our result draws upon an extension of the Hanson-Wright inequality to semi-infinite matrices. We study the finite-sample behavior of ETFE in simulations. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.05899 [pdf, other]

Online Identification of Stochastic Continuous-Time Wiener Models Using Sampled Data

Authors: Mohamed Abdalmoaty, Efe C. Balta, John Lygeros, Roy S. Smith

Abstract: It is well known that ignoring the presence of stochastic disturbances in the identification of stochastic Wiener models leads to asymptotically biased estimators. On the other hand, optimal statistical identification, via likelihood-based methods, is sensitive to the assumptions on the data distribution and is usually based on relatively complex sequential Monte Carlo algorithms. We develop a sim… ▽ More It is well known that ignoring the presence of stochastic disturbances in the identification of stochastic Wiener models leads to asymptotically biased estimators. On the other hand, optimal statistical identification, via likelihood-based methods, is sensitive to the assumptions on the data distribution and is usually based on relatively complex sequential Monte Carlo algorithms. We develop a simple recursive online estimation algorithm based on an output-error predictor, for the identification of continuous-time stochastic parametric Wiener models through stochastic approximation. The method is applicable to generic model parameterizations and, as demonstrated in the numerical simulation examples, it is robust with respect to the assumptions on the spectrum of the disturbance process. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.04447 [pdf, other]

Context-Aware Spectrum Coexistence of Terrestrial Beyond 5G Networks in Satellite Bands

Authors: Ta Seen Reaz Niloy, Zoheb Hasan, Rob Smith, Vikram R. Anapana, Vijay K. Shah

Abstract: Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art s… ▽ More Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art spectrum-sharing policies usually consider several worst-case assumptions and ignore site-specific contextual factors in making spectrum-sharing decisions, and thus, often results in under-utilization of the shared band for the secondary licensees. To address such limitations, this paper introduces CAT3S (Context-Aware Terrestrial-Satellite Spectrum Sharing) framework that empowers the coexisting terrestrial 5G network to maximize utilization of the shared satellite band without creating harmful interference to the incumbent links by exploiting the contextual factors. CAT3S consists of the following two components: (i) context-acquisition unit to collect and process essential contextual information for spectrum sharing and (ii) context-aware base station (BS) control unit to optimize the set of operational BSs and their operation parameters (i.e., transmit power and active beams per sector). To evaluate the performance of the CAT3S, a realistic spectrum coexistence case study over the 12 GHz band is considered. Experiment results demonstrate that the proposed CAT3S achieves notably higher spectrum utilization than state-of-the-art spectrum-sharing policies in different weather contexts. △ Less

Submitted 14 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2312.12566 [pdf, other]

doi 10.1109/ICRA57147.2024.10611283

Johnsen-Rahbek Capstan Clutch: A High Torque Electrostatic Clutch

Authors: Timothy E. Amish, Jeffrey T. Auletta, Chad C. Kessens, Joshua R. Smith, Jeffrey I. Lipton

Abstract: In many robotic systems, the holding state consumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict generated holding torque, limiting many applications. H… ▽ More In many robotic systems, the holding state consumes power, limits operating time, and increases operating costs. Electrostatic clutches have the potential to improve robotic performance by generating holding torques with low power consumption. A key limitation of electrostatic clutches has been their low specific shear stresses which restrict generated holding torque, limiting many applications. Here we show how combining the Johnsen-Rahbek (JR) effect with the exponential tension scaling capstan effect can produce clutches with the highest specific shear stress in the literature. Our system generated 31.3 N/cm^2 sheer stress and a total holding torque of 7.1 Nm while consuming only 2.5 mW/cm^2 at 500 V. We demonstrate a theoretical model of an electrostatic adhesive capstan clutch and demonstrate how large angle (theta > 2pi) designs increase efficiency over planar or small angle (theta < pi) clutch designs. We also report the first unfilled polymeric material, polybenzimidazole (PBI), to exhibit the JR-effect. △ Less

Submitted 27 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2310.12804 [pdf, other]

Differentiable Vertex Fitting for Jet Flavour Tagging

Authors: Rachel E. C. Smith, Inês Ochoa, Rúben Inácio, Jonathan Shoemaker, Michael Kagan

Abstract: We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network compone… ▽ More We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 11 pages

arXiv:2310.03223 [pdf, other]

TacoGFN: Target-conditioned GFlowNet for Structure-based Drug Design

Authors: Tony Shen, Seonghwan Seo, Grayson Lee, Mohit Pandey, Jason R Smith, Artem Cherkasov, Woo Youn Kim, Martin Ester

Abstract: Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distrib… ▽ More Searching the vast chemical space for drug-like and synthesizable molecules with high binding affinity to a protein pocket is a challenging task in drug discovery. Recently, molecular deep generative models have been introduced which promise to be more efficient than exhaustive virtual screening, by directly generating molecules based on the protein structure. However, since they learn the distribution of a limited protein-ligand complex dataset, the existing methods struggle with generating novel molecules with significant property improvements. In this paper, we frame the generation task as a Reinforcement Learning task, where the goal is to search the wider chemical space for molecules with desirable properties as opposed to fitting a training data distribution. More specifically, we propose TacoGFN, a Generative Flow Network conditioned on protein pocket structure, using binding affinity, drug-likeliness and synthesizability measures as our reward. Empirically, our method outperforms state-of-art methods on the CrossDocked2020 benchmark for every molecular property (Vina score, QED, SA), while significantly improving the generation time. TacoGFN achieves $-8.82$ in median docking score and $52.63\%$ in Novel Hit Rate. △ Less

Submitted 7 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted at NeurIPS 2023 AID3 and at NeurIPS 2023 GenBio as Spotlight

Journal ref: NeurIPS 2023 Generative AI and Biology (GenBio) Workshop

arXiv:2308.13135 [pdf, other]

Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery

Authors: Patrick Emedom-Nnamdi, Timothy R. Smith, Jukka-Pekka Onnela, Junwei Lu

Abstract: We propose a nonparametric additive model for estimating interpretable value functions in reinforcement learning. Learning effective adaptive clinical interventions that rely on digital phenotyping features is a major for concern medical practitioners. With respect to spine surgery, different post-operative recovery recommendations concerning patient mobilization can lead to significant variation… ▽ More We propose a nonparametric additive model for estimating interpretable value functions in reinforcement learning. Learning effective adaptive clinical interventions that rely on digital phenotyping features is a major for concern medical practitioners. With respect to spine surgery, different post-operative recovery recommendations concerning patient mobilization can lead to significant variation in patient recovery. While reinforcement learning has achieved widespread success in domains such as games, recent methods heavily rely on black-box methods, such neural networks. Unfortunately, these methods hinder the ability of examining the contribution each feature makes in producing the final suggested decision. While such interpretations are easily provided in classical algorithms such as Least Squares Policy Iteration, basic linearity assumptions prevent learning higher-order flexible interactions between features. In this paper, we present a novel method that offers a flexible technique for estimating action-value functions without making explicit parametric assumptions regarding their additive functional form. This nonparametric estimation strategy relies on incorporating local kernel regression and basis expansion to obtain a sparse, additive representation of the action-value function. Under this approach, we are able to locally approximate the action-value function and retrieve the nonlinear, independent contribution of select features as well as joint feature pairs. We validate the proposed approach with a simulation study, and, in an application to spine disease, uncover recovery recommendations that are inline with related clinical knowledge. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 28 pages, 13 figures

arXiv:2308.11066 [pdf, other]

doi 10.1109/ACCESS.2024.3446274

CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection

Authors: Songhui Yue, Xiaoyan Hong, Randy K. Smith

Abstract: The automation of High-Level Context (HLC) reasoning across intelligent systems at scale is imperative because of the unceasing accumulation of contextual data, the trend of the fusion of data from multiple sources (e.g., sensors, intelligent systems), and the intrinsic complexity and dynamism of context-based decision-making processes. To mitigate the challenges posed by these issues, we propose… ▽ More The automation of High-Level Context (HLC) reasoning across intelligent systems at scale is imperative because of the unceasing accumulation of contextual data, the trend of the fusion of data from multiple sources (e.g., sensors, intelligent systems), and the intrinsic complexity and dynamism of context-based decision-making processes. To mitigate the challenges posed by these issues, we propose a novel Hierarchical Ontology-State Modeling (HOSM) framework CSM-H-R, which programmatically combines ontologies and states at the modeling phase and runtime phase for attaining the ability to recognize meaningful HLC. It builds on the model of our prior work on the Context State Machine (CSM) engine by incorporating the H (Hierarchy) and R (Relationship and tRansition) dimensions to take care of the dynamic aspects of context. The design of the framework supports the sharing and interoperation of context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition. Case studies are developed for IntellElevator and IntellRestaurant, two intelligent applications in a smart campus setting. The prototype implementation of the framework experiments on translating the HLC reasoning into vector and matrix computing and presents the potential of using advanced probabilistic models to reach the next level of automation in integrating intelligent systems; meanwhile, privacy protection support is achieved in the application domain by anonymization through indexing and reducing information correlation. An implementation of the framework is available at https://github.com/songhui01/CSM-H-R. △ Less

Submitted 5 April, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 13 pages, 10 figures, Keywords: Automation, Context Dynamism, Context Modeling, Context Reasoning, Intelligent System, Interoperability, Privacy Protection, System Integration

arXiv:2308.08029 [pdf, other]

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning

Authors: Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P. Shock, Ryan Smith

Abstract: Active Inference is a recent framework for modeling planning under uncertainty. Empirical and theoretical work have now begun to evaluate the strengths and weaknesses of this approach and how it might be improved. A recent extension - the sophisticated inference (SI) algorithm - improves performance on multi-step planning problems through recursive decision tree search. However, little work to dat… ▽ More Active Inference is a recent framework for modeling planning under uncertainty. Empirical and theoretical work have now begun to evaluate the strengths and weaknesses of this approach and how it might be improved. A recent extension - the sophisticated inference (SI) algorithm - improves performance on multi-step planning problems through recursive decision tree search. However, little work to date has been done to compare SI to other established planning algorithms. SI was also developed with a focus on inference as opposed to learning. The present paper has two aims. First, we compare performance of SI to Bayesian reinforcement learning (RL) schemes designed to solve similar problems. Second, we present an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy. This allows a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment designed to highlight the problem structure for which SL offers a unique solution. Here, an agent must continually search for available (but changing) resources in the presence of competing affordances for information gain. Our simulations show that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 31 pages, 5 figures

arXiv:2308.05866 [pdf]

Using Twitter Data to Determine Hurricane Category: An Experiment

Authors: Songhui Yue, Jyothsna Kondari, Aibek Musaev, Randy K. Smith, Songqing Yue

Abstract: Social media posts contain an abundant amount of information about public opinion on major events, especially natural disasters such as hurricanes. Posts related to an event, are usually published by the users who live near the place of the event at the time of the event. Special correlation between the social media data and the events can be obtained using data mining approaches. This paper prese… ▽ More Social media posts contain an abundant amount of information about public opinion on major events, especially natural disasters such as hurricanes. Posts related to an event, are usually published by the users who live near the place of the event at the time of the event. Special correlation between the social media data and the events can be obtained using data mining approaches. This paper presents research work to find the mappings between social media data and the severity level of a disaster. Specifically, we have investigated the Twitter data posted during hurricanes Harvey and Irma, and attempted to find the correlation between the Twitter data of a specific area and the hurricane level in that area. Our experimental results indicate a positive correlation between them. We also present a method to predict the hurricane category for a specific area using relevant Twitter data. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: 9 Pages, 6 Figures, in Proceedings of the 15th ISCRAM Conference Rochester, NY, USA May 2018

arXiv:2305.03735 [pdf, other]

Stackelberg Games for Learning Emergent Behaviors During Competitive Autocurricula

Authors: Boling Yang, Liyuan Zheng, Lillian J. Ratliff, Byron Boots, Joshua R. Smith

Abstract: Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the… ▽ More Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the generation of sophisticated policies challenging. Indeed, the asymmetry in the environment may implicitly or explicitly provide an advantage to a subset of agents which could, in turn, lead to a low-quality equilibrium. This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG), which formulates a two-player MARL problem as a Stackelberg game with one player as the `leader' and the other as the `follower' in a hierarchical interaction structure wherein the leader has an advantage. We first demonstrate that the leader's advantage from ST-MADDPG can be used to alleviate the inherent asymmetry in the environment. By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2303.14055 [pdf, ps, other]

Addressing Potential Pitfalls of SAR Assistance on the Aging Population

Authors: Emilyann Nault, Ronnie Smith, Lynne Baillie

Abstract: In the field of Human Robot Interaction (HRI), socially assistive robots are being investigated to see if they can help combat challenges that can come with aging by providing different forms of support to older adults. As a result, it is imperative that the HRI community are aware of the potential pitfalls that can occur such as over-attachment, over-reliance, and increased isolation. This positi… ▽ More In the field of Human Robot Interaction (HRI), socially assistive robots are being investigated to see if they can help combat challenges that can come with aging by providing different forms of support to older adults. As a result, it is imperative that the HRI community are aware of the potential pitfalls that can occur such as over-attachment, over-reliance, and increased isolation. This position paper argues designers should (a) avoid pitfalls that can lead to a negative impact on decline, and (b) leverage SAR decision making to avoid the pitfalls while attaining the benefits of this technology. Finally, we describe the concept for a framework as a starting point for addressing the concerns raised in this paper. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: To be presented at the CHI 2023 Workshop: Socially Assistive Robots as Decision Makers: Transparency, Motivations, and Intentions 5 pages

Report number: SARTMI/2023/3 MSC Class: 68T40

arXiv:2302.04337 [pdf, ps, other]

(Re)Defining Expertise in Machine Learning Development

Authors: Mark Díaz, Angela D. R. Smith

Abstract: Domain experts are often engaged in the development of machine learning systems in a variety of ways, such as in data collection and evaluation of system performance. At the same time, who counts as an 'expert' and what constitutes 'expertise' is not always explicitly defined. In this project, we conduct a systematic literature review of machine learning research to understand 1) the bases on whic… ▽ More Domain experts are often engaged in the development of machine learning systems in a variety of ways, such as in data collection and evaluation of system performance. At the same time, who counts as an 'expert' and what constitutes 'expertise' is not always explicitly defined. In this project, we conduct a systematic literature review of machine learning research to understand 1) the bases on which expertise is defined and recognized and 2) the roles experts play in ML development. Our goal is to produce a high-level taxonomy to highlight limits and opportunities in how experts are identified and engaged in ML research. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: NeurIPS 2022 Workshop on Data-Centric AI, 2 pages

arXiv:2301.10619 [pdf, other]

Simultaneous Transmitting and Reflecting (STAR)-RIS for Harmonious Millimeter Wave Spectrum Sharing

Authors: Omar Hashash, Walid Saad, Mohammadreza F. Imani, David R. Smith

Abstract: The opening of the millimeter wave (mmWave) spectrum bands for 5G communications has motivated the need for novel spectrum sharing solutions at these high frequencies. In fact, reconfigurable intelligent surfaces (RISs) have recently emerged to enable spectrum sharing while enhancing the incumbents' quality-of-service (QoS). Nonetheless, co-existence over mmWave bands remains persistently challeng… ▽ More The opening of the millimeter wave (mmWave) spectrum bands for 5G communications has motivated the need for novel spectrum sharing solutions at these high frequencies. In fact, reconfigurable intelligent surfaces (RISs) have recently emerged to enable spectrum sharing while enhancing the incumbents' quality-of-service (QoS). Nonetheless, co-existence over mmWave bands remains persistently challenging due to their unfavorable propagation characteristics. Hence, initiating mmWave spectrum sharing requires the RIS to further assist in improving the QoS over mmWave bands without jeopardizing spectrum sharing demands. In this paper, a novel simultaneous transmitting and reflecting RIS (STAR-RIS)-aided solution to enable mmWave spectrum sharing is proposed. In particular, the transmitting and reflecting abilities of the STAR-RIS are leveraged to tackle the mmWave spectrum sharing and QoS requirements separately. The STAR-RIS-enabled spectrum sharing problem between a primary network (e.g. a radar transmit-receive pair) and a secondary network is formulated as an optimization problem whose goal is to maximize the downlink sum-rate over a secondary multiple-input-single-output (MISO) network, while limiting interference over a primary network. Moreover, the STAR-RIS response coefficients and beamforming matrix in the secondary network are jointly optimized. To solve this non-convex problem, an alternating iterative algorithm is employed, where the STAR-RIS response coefficients and beamforming matrix are obtained using the successive convex approximation method. Simulation results show that the proposed solution outperforms conventional RIS schemes for mmWave spectrum sharing by achieving a 14.57% spectral efficiency gain. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2212.11367 [pdf, other]

doi 10.1029/2023GH000784

Forecasting West Nile Virus with Graph Neural Networks: Harnessing Spatial Dependence in Irregularly Sampled Geospatial Data

Authors: Adam Tonks, Trevor Harris, Bo Li, William Brown, Rebecca Smith

Abstract: Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially awa… ▽ More Machine learning methods have seen increased application to geospatial environmental problems, such as precipitation nowcasting, haze forecasting, and crop yield prediction. However, many of the machine learning methods applied to mosquito population and disease forecasting do not inherently take into account the underlying spatial structure of the given data. In our work, we apply a spatially aware graph neural network model consisting of GraphSAGE layers to forecast the presence of West Nile virus in Illinois, to aid mosquito surveillance and abatement efforts within the state. More generally, we show that graph neural networks applied to irregularly sampled geospatial data can exceed the performance of a range of baseline methods including logistic regression, XGBoost, and fully-connected neural networks. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Journal ref: GeoHealth 8 (7), e2023GH000784

arXiv:2211.01134 [pdf, other]

doi 10.1088/2058-9565/aceb87

Faster variational quantum algorithms with quantum kernel-based surrogate models

Authors: Alistair W. R. Smith, A. J. Paige, M. S. Kim

Abstract: We present a new optimization method for small-to-intermediate scale variational algorithms on noisy near-term quantum processors which uses a Gaussian process surrogate model equipped with a classically-evaluated quantum kernel. Variational algorithms are typically optimized using gradient-based approaches however these are difficult to implement on current noisy devices, requiring large numbers… ▽ More We present a new optimization method for small-to-intermediate scale variational algorithms on noisy near-term quantum processors which uses a Gaussian process surrogate model equipped with a classically-evaluated quantum kernel. Variational algorithms are typically optimized using gradient-based approaches however these are difficult to implement on current noisy devices, requiring large numbers of objective function evaluations. Our scheme shifts this computational burden onto the classical optimizer component of these hybrid algorithms, greatly reducing the number of queries to the quantum processor. We focus on the variational quantum eigensolver (VQE) algorithm and demonstrate numerically that such surrogate models are particularly well suited to the algorithm's objective function. Next, we apply these models to both noiseless and noisy VQE simulations and show that they exhibit better performance than widely-used classical kernels in terms of final accuracy and convergence speed. Compared to the typically-used stochastic gradient-descent approach for VQAs, our quantum kernel-based approach is found to consistently achieve significantly higher accuracy while requiring less than an order of magnitude fewer quantum circuit evaluations. We analyse the performance of the quantum kernel-based models in terms of the kernels' induced feature spaces and explicitly construct their feature maps. Finally, we describe a scheme for approximating the best-performing quantum kernel using a classically-efficient tensor network representation of its input state and so provide a pathway for scaling these methods to larger systems. △ Less

Submitted 14 August, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Journal ref: Quantum Sci. Technol. 8 045016 (2023)

arXiv:2209.09313 [pdf, ps, other]

Natural Wave Numbers, Natural Wave Co-numbers, and the Computation of the Primes

Authors: Terence R. Smith

Abstract: The paper exploits an isomorphism between the natural numbers N and a space U of periodic sequences of the roots of unity in constructing a recursive procedure for representing and computing the prime numbers. The nth wave number ${\bf u}_n$ is the countable sequence of the nth roots of unity having frequencies k/n for all integer phases k. The space U is closed under a commutative and associative… ▽ More The paper exploits an isomorphism between the natural numbers N and a space U of periodic sequences of the roots of unity in constructing a recursive procedure for representing and computing the prime numbers. The nth wave number ${\bf u}_n$ is the countable sequence of the nth roots of unity having frequencies k/n for all integer phases k. The space U is closed under a commutative and associative binary operation ${\bf u}_m \odot{\bf u}_n={\bf u}_{mn}$, termed the circular product, and is isomorphic with N under their respective product operators. Functions are defined on U that partition wave numbers into two complementary sequences, of which the co-number $ {\overset {\bf \ast }{ \bf u}}_n$ is a function of a wave number in which zeros replace its positive roots of unity. The recursive procedure $ {\overset {\bf \ast }{ \bf U}}_{N+1}= {\overset {\bf \ast }{ \bf U}}_{N}\odot{\overset {\bf \ast }{\bf u}}_{N+1}$ represents prime numbers explicitly in terms of preceding prime numbers, starting with $p_1=2$, and is shown never to terminate. If ${p}_1, ... , { p}_{N+1}$ are the first $N+1$ prime phases, then the phases in the range $p_{N+1} \leq k < p^2_{N+1}$ that are associated with the non-zero terms of $ {\overset {\bf \ast }{\bf U}}_{N}$ are, together with $ p_1, ...,p_N$, all of the prime phases less than $p^2_{N+1}$. When applied with all of the primes identified at the previous step, the recursive procedure identifies approximately $7^{2(N-1)}/(2(N-1)ln7)$ primes at each iteration for $ N>1$. When the phases of wave numbers are represented in modular arithmetic, the prime phases are representable in terms of sums of reciprocals of the initial set of prime phases and have a relation with the zeta-function. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 16 pages

arXiv:2208.12187 [pdf, other]

JAXFit: Trust Region Method for Nonlinear Least-Squares Curve Fitting on the GPU

Authors: Lucas R. Hofer, Milan Krstajić, Robert P. Smith

Abstract: We implement a trust region method on the GPU for nonlinear least squares curve fitting problems using a new deep learning Python library called JAX. Our open source package, JAXFit, works for both unconstrained and constrained curve fitting problems and allows the fit functions to be defined in Python alone -- without any specialized knowledge of either the GPU or CUDA programming. Since JAXFit r… ▽ More We implement a trust region method on the GPU for nonlinear least squares curve fitting problems using a new deep learning Python library called JAX. Our open source package, JAXFit, works for both unconstrained and constrained curve fitting problems and allows the fit functions to be defined in Python alone -- without any specialized knowledge of either the GPU or CUDA programming. Since JAXFit runs on the GPU, it is much faster than CPU based libraries and even other GPU based libraries, despite being very easy to use. Additionally, due to JAX's deep learning foundations, the Jacobian in JAXFit's trust region algorithm is calculated with automatic differentiation, rather than than using derivative approximations or requiring the user to define the fit function's partial derivatives. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2208.11701 [pdf]

Ontology-Driven Self-Supervision for Adverse Childhood Experiences Identification Using Social Media Datasets

Authors: Jinge Wu, Rowena Smith, Honghan Wu

Abstract: Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from textual data with Natural Language Proce… ▽ More Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from textual data with Natural Language Processing (NLP) is challenging because (a) there are no NLP ready ACE ontologies; (b) there are few resources available for machine learning, necessitating the data annotation from clinical experts; (c) costly annotations by domain experts and large number of documents for supporting large machine learning models. In this paper, we present an ontology-driven self-supervised approach (derive concept embeddings using an auto-encoder from baseline NLP results) for producing a publicly available resource that would support large-scale machine learning (e.g., training transformer based large language models) on social media corpus. This resource as well as the proposed approach are aimed to facilitate the community in training transferable NLP models for effectively surfacing ACEs in low-resource scenarios like NLP on clinical notes within Electronic Health Records. The resource including a list of ACE ontology terms, ACE concept embeddings and the NLP annotated corpus is available at https://github.com/knowlab/ACE-NLP. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2208.11466

arXiv:2208.11466 [pdf]

Adverse Childhood Experiences Identification from Clinical Notes with Ontologies and NLP

Authors: Jinge Wu, Rowena Smith, Honghan Wu

Abstract: Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from free-text Electronic Health Records (EHR… ▽ More Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. However, the identification of ACEs from free-text Electronic Health Records (EHRs) with Natural Language Processing (NLP) is challenging because (a) there is no NLP ready ACE ontologies; (b) there are limited cases available for machine learning, necessitating the data annotation from clinical experts. We are currently developing a tool that would use NLP techniques to assist us in surfacing ACEs from clinical notes. This will enable us further research in identifying evidence of the relationship between ACEs and the subsequent developments of mental illness (e.g., addictions) in large-scale and longitudinal free-text EHRs, which has previously not been possible. △ Less

Submitted 24 August, 2022; originally announced August 2022.

arXiv:2208.09285 [pdf, other]

Shadows Aren't So Dangerous After All: A Fast and Robust Defense Against Shadow-Based Adversarial Attacks

Authors: Andrew Wang, Wyatt Mayor, Ryan Smith, Gopal Nookula, Gregory Ditzler

Abstract: Robust classification is essential in tasks like autonomous vehicle sign recognition, where the downsides of misclassification can be grave. Adversarial attacks threaten the robustness of neural network classifiers, causing them to consistently and confidently misidentify road signs. One such class of attack, shadow-based attacks, causes misidentifications by applying a natural-looking shadow to i… ▽ More Robust classification is essential in tasks like autonomous vehicle sign recognition, where the downsides of misclassification can be grave. Adversarial attacks threaten the robustness of neural network classifiers, causing them to consistently and confidently misidentify road signs. One such class of attack, shadow-based attacks, causes misidentifications by applying a natural-looking shadow to input images, resulting in road signs that appear natural to a human observer but confusing for these classifiers. Current defenses against such attacks use a simple adversarial training procedure to achieve a rather low 25\% and 40\% robustness on the GTSRB and LISA test sets, respectively. In this paper, we propose a robust, fast, and generalizable method, designed to defend against shadow attacks in the context of road sign recognition, that augments source images with binary adaptive threshold and edge maps. We empirically show its robustness against shadow attacks, and reformulate the problem to show its similarity to $\varepsilon$ perturbation-based attacks. Experimental results show that our edge defense results in 78\% robustness while maintaining 98\% benign test accuracy on the GTSRB test set, with similar results from our threshold defense. Link to our code is in the paper. △ Less

Submitted 28 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: This is a draft version - our core results are reported, but additional experiments for journal submission are still being run

arXiv:2208.00921 [pdf, other]

AdaWCT: Adaptive Whitening and Coloring Style Injection

Authors: Antoine Dufour, Yohan Poirier-Ginter, Alexandre Lessard, Ryan Smith, Michael Lockyer, Jean-Francois Lalonde

Abstract: Adaptive instance normalization (AdaIN) has become the standard method for style injection: by re-normalizing features through scale-and-shift operations, it has found widespread use in style transfer, image generation, and image-to-image translation. In this work, we present a generalization of AdaIN which relies on the whitening and coloring transformation (WCT) which we dub AdaWCT, that we appl… ▽ More Adaptive instance normalization (AdaIN) has become the standard method for style injection: by re-normalizing features through scale-and-shift operations, it has found widespread use in style transfer, image generation, and image-to-image translation. In this work, we present a generalization of AdaIN which relies on the whitening and coloring transformation (WCT) which we dub AdaWCT, that we apply for style injection in large GANs. We show, through experiments on the StarGANv2 architecture, that this generalization, albeit conceptually simple, results in significant improvements in the quality of the generated images. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 4 pages + refs

arXiv:2207.03928 [pdf, other]

doi 10.1038/s41524-023-01028-1

Accelerating Material Design with the Generative Toolkit for Scientific Discovery

Authors: Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan Nana Teukam, Giorgio Giannone, Samuel C. Hoffman, Matthew Buchan, Vijil Chenthamarakshan, Timothy Donovan, Hsiang Han Hsu, Federico Zipoli, Oliver Schilter, Akihiro Kishimoto, Lisa Hamada, Inkit Padhi, Karl Wehden, Lauren McHugh, Alexy Khrabrov, Payel Das, Seiji Takeda, John R. Smith

Abstract: With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible… ▽ More With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible open-source library enables scientists, developers, and researchers to train and use state-of-the-art generative models to accelerate scientific discovery focused on material design. △ Less

Submitted 31 January, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: 15 pages, 2 figures

Journal ref: Nature Partner Journals (npj) Computational Materials 9, 69 (2023)

arXiv:2205.06304 [pdf, other]

Overparameterization Improves StyleGAN Inversion

Authors: Yohan Poirier-Ginter, Alexandre Lessard, Ryan Smith, Jean-François Lalonde

Abstract: Deep generative models like StyleGAN hold the promise of semantic image editing: modifying images by their content, rather than their pixel values. Unfortunately, working with arbitrary images requires inverting the StyleGAN generator, which has remained challenging so far. Existing inversion approaches obtain promising yet imperfect results, having to trade-off between reconstruction quality and… ▽ More Deep generative models like StyleGAN hold the promise of semantic image editing: modifying images by their content, rather than their pixel values. Unfortunately, working with arbitrary images requires inverting the StyleGAN generator, which has remained challenging so far. Existing inversion approaches obtain promising yet imperfect results, having to trade-off between reconstruction quality and downstream editability. To improve quality, these approaches must resort to various techniques that extend the model latent space after training. Taking a step back, we observe that these methods essentially all propose, in one way or another, to increase the number of free parameters. This suggests that inversion might be difficult because it is underconstrained. In this work, we address this directly and dramatically overparameterize the latent space, before training, with simple changes to the original StyleGAN architecture. Our overparameterization increases the available degrees of freedom, which in turn facilitates inversion. We show that this allows us to obtain near-perfect image reconstruction without the need for encoders nor for altering the latent space after training. Our approach also retains editability, which we demonstrate by realistically interpolating between images. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 6 pages, accepted for publication at AI for Content Creation Workshop (CVPR 2022)

arXiv:2205.02318 [pdf, other]

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Authors: Ryan Smith, Jason A. Fries, Braden Hancock, Stephen H. Bach

Abstract: We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define… ▽ More We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. △ Less

Submitted 4 May, 2022; originally announced May 2022.

arXiv:2204.04722 [pdf, ps, other]

Regret Analysis of Online Gradient Descent-based Iterative Learning Control with Model Mismatch

Authors: Efe C. Balta, Andrea Iannelli, Roy S. Smith, John Lygeros

Abstract: In Iterative Learning Control (ILC), a sequence of feedforward control actions is generated at each iteration on the basis of partial model knowledge and past measurements with the goal of steering the system toward a desired reference trajectory. This is framed here as an online learning task, where the decision-maker takes sequential decisions by solving a sequence of optimization problems havin… ▽ More In Iterative Learning Control (ILC), a sequence of feedforward control actions is generated at each iteration on the basis of partial model knowledge and past measurements with the goal of steering the system toward a desired reference trajectory. This is framed here as an online learning task, where the decision-maker takes sequential decisions by solving a sequence of optimization problems having only partial knowledge of the cost functions. Having established this connection, the performance of an online gradient-descent based scheme using inexact gradient information is analyzed in the setting of dynamic and static regret, standard measures in online learning. Fundamental limitations of the scheme and its integration with adaptation mechanisms are further investigated, followed by numerical simulations on a benchmark ILC problem. △ Less

Submitted 10 April, 2022; originally announced April 2022.

arXiv:2204.04330 [pdf, other]

Improved Object Pose Estimation via Deep Pre-touch Sensing

Authors: Patrick Lancaster, Boling Yang, Joshua R. Smith

Abstract: For certain manipulation tasks, object pose estimation from head-mounted cameras may not be sufficiently accurate. This is at least in part due to our inability to perfectly calibrate the coordinate frames of today's high degree of freedom robot arms that link the head to the end-effectors. We present a novel framework combining pre-touch sensing and deep learning to more accurately estimate pose… ▽ More For certain manipulation tasks, object pose estimation from head-mounted cameras may not be sufficiently accurate. This is at least in part due to our inability to perfectly calibrate the coordinate frames of today's high degree of freedom robot arms that link the head to the end-effectors. We present a novel framework combining pre-touch sensing and deep learning to more accurately estimate pose in an efficient manner. The use of pre-touch sensing allows our method to localize the object directly with respect to the robot's end effector, thereby avoiding error caused by miscalibration of the arms. Instead of requiring the robot to scan the entire object with its pre-touch sensor, we use a deep neural network to detect object regions that contain distinctive geometric features. By focusing pre-touch sensing on these regions, the robot can more efficiently gather the information necessary to adjust its original pose estimate. Our region detection network was trained using a new dataset containing objects of widely varying geometries and has been labeled in a scalable fashion that is free from human bias. This dataset is applicable to any task that involves a pre-touch sensor gathering geometric information, and has been made publicly available. We evaluate our framework by having the robot re-estimate the pose of a number of objects of varying geometries. Compared to two simpler region proposal methods, we find that our deep neural network performs significantly better. In addition, we find that after a sequence of scans, objects can typically be localized to within 0.5 cm of their true position. We also observe that the original pose estimate can often be significantly improved after collecting a single quick scan. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 8 pages, 6 figures

arXiv:2204.02371 [pdf, other]

Optical Proximity Sensing for Pose Estimation During In-Hand Manipulation

Authors: Patrick Lancaster, Pratik Gyawali, Christoforos Mavrogiannis, Siddhartha S. Srinivasa, Joshua R. Smith

Abstract: During in-hand manipulation, robots must be able to continuously estimate the pose of the object in order to generate appropriate control actions. The performance of algorithms for pose estimation hinges on the robot's sensors being able to detect discriminative geometric object features, but previous sensing modalities are unable to make such measurements robustly. The robot's fingers can occlude… ▽ More During in-hand manipulation, robots must be able to continuously estimate the pose of the object in order to generate appropriate control actions. The performance of algorithms for pose estimation hinges on the robot's sensors being able to detect discriminative geometric object features, but previous sensing modalities are unable to make such measurements robustly. The robot's fingers can occlude the view of environment- or robot-mounted image sensors, and tactile sensors can only measure at the local areas of contact. Motivated by fingertip-embedded proximity sensors' robustness to occlusion and ability to measure beyond the local areas of contact, we present the first evaluation of proximity sensor based pose estimation for in-hand manipulation. We develop a novel two-fingered hand with fingertip-embedded optical time-of-flight proximity sensors as a testbed for pose estimation during planar in-hand manipulation. Here, the in-hand manipulation task consists of the robot moving a cylindrical object from one end of its workspace to the other. We demonstrate, with statistical significance, that proximity-sensor based pose estimation via particle filtering during in-hand manipulation: a) exhibits 50% lower average pose error than a tactile-sensor based baseline; b) empowers a model predictive controller to achieve 30% lower final positioning error compared to when using tactile-sensor based pose estimates. △ Less

Submitted 30 October, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: 8 pages, 6 figures

arXiv:2203.01363 [pdf, other]

Faking feature importance: A cautionary tale on the use of differentially-private synthetic data

Authors: Oscar Giles, Kasra Hosseini, Grigorios Mingas, Oliver Strickson, Louise Bowler, Camila Rangel Smith, Harrison Wilde, Jen Ning Lim, Bilal Mateen, Kasun Amarasinghe, Rayid Ghani, Alison Heppenstall, Nik Lomax, Nick Malleson, Martin O'Reilly, Sebastian Vollmerteke

Abstract: Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering a… ▽ More Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering and selecting features. This phase often involves considerable time, and depends on the availability of data. There would be substantial value in synthetic data that permitted these steps to be carried out while, for example, data access was being negotiated, or with fewer information governance restrictions. This paper presents an empirical analysis of the agreement between the feature importance obtained from raw and from synthetic data, on a range of artificially generated and real-world datasets (where feature importance represents how useful each feature is when predicting a the outcome). We employ two differentially-private methods to produce synthetic data, and apply various utility measures to quantify the agreement in feature importance as this varies with the level of privacy. Our results indicate that synthetic data can sometimes preserve several representations of the ranking of feature importance in simple settings but their performance is not consistent and depends upon a number of factors. Particular caution should be exercised in more nuanced real-world settings, where synthetic data can lead to differences in ranked feature importance that could alter key modelling decisions. This work has important implications for developing synthetic versions of highly sensitive data sets in fields such as finance and healthcare. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 27 pages, 8 figures

arXiv:2202.07074 [pdf, other]

doi 10.1109/LRA.2020.2969912

Benchmarking Robot Manipulation with the Rubik's Cube

Authors: Boling Yang, Patrick E. Lancaster, Siddhartha S. Srinivasa, Joshua R. Smith

Abstract: Benchmarks for robot manipulation are crucial to measuring progress in the field, yet there are few benchmarks that demonstrate critical manipulation skills, possess standardized metrics, and can be attempted by a wide array of robot platforms. To address a lack of such benchmarks, we propose Rubik's cube manipulation as a benchmark to measure simultaneous performance of precise manipulation and s… ▽ More Benchmarks for robot manipulation are crucial to measuring progress in the field, yet there are few benchmarks that demonstrate critical manipulation skills, possess standardized metrics, and can be attempted by a wide array of robot platforms. To address a lack of such benchmarks, we propose Rubik's cube manipulation as a benchmark to measure simultaneous performance of precise manipulation and sequential manipulation. The sub-structure of the Rubik's cube demands precise positioning of the robot's end effectors, while its highly reconfigurable nature enables tasks that require the robot to manage pose uncertainty throughout long sequences of actions. We present a protocol for quantitatively measuring both the accuracy and speed of Rubik's cube manipulation. This protocol can be attempted by any general-purpose manipulator, and only requires a standard 3x3 Rubik's cube and a flat surface upon which the Rubik's cube initially rests (e.g. a table). We demonstrate this protocol for two distinct baseline approaches on a PR2 robot. The first baseline provides a fundamental approach for pose-based Rubik's cube manipulation. The second baseline demonstrates the benchmark's ability to quantify improved performance by the system, particularly that resulting from the integration of pre-touch sensing. To demonstrate the benchmark's applicability to other robot platforms and algorithmic approaches, we present the functional blocks required to enable the HERB robot to manipulate the Rubik's cube via push-grasping. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: IEEE RAL

Journal ref: IEEE Robotics and Automation Letters 5.2 (2020): 2094-2099

arXiv:2202.07068 [pdf, other]

Motivating Physical Activity via Competitive Human-Robot Interaction

Authors: Boling Yang, Golnaz Habibi, Patrick E. Lancaster, Byron Boots, Joshua R. Smith

Abstract: This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor throu… ▽ More This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor through iterative multi-agent reinforcement learning and show that it can perform well against human competitors. Our user study additionally found that our system was able to continuously create challenging and enjoyable interactions that significantly increased human subjects' heart rates. The majority of human subjects considered the system to be entertaining and desirable for improving the quality of their exercise. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: Conference on Robot Learning. PMLR, 2022

arXiv:2112.02607 [pdf, other]

Differentiating Approach and Avoidance from Traditional Notions of Sentiment in Economic Contexts

Authors: Jacob Turton, Ali Kabiri, David Tuckett, Robert Elliott Smith, David P. Vinson

Abstract: There is growing interest in the role of sentiment in economic decision-making. However, most research on the subject has focused on positive and negative valence. Conviction Narrative Theory (CNT) places Approach and Avoidance sentiment (that which drives action) at the heart of real-world decision-making, and argues that it better captures emotion in financial markets. This research, bringing to… ▽ More There is growing interest in the role of sentiment in economic decision-making. However, most research on the subject has focused on positive and negative valence. Conviction Narrative Theory (CNT) places Approach and Avoidance sentiment (that which drives action) at the heart of real-world decision-making, and argues that it better captures emotion in financial markets. This research, bringing together psychology and machine learning, introduces new techniques to differentiate Approach and Avoidance from positive and negative sentiment on a fundamental level of meaning. It does this by comparing word-lists, previously constructed to capture these concepts in text data, across a large range of semantic features. The results demonstrate that Avoidance in particular is well defined as a separate type of emotion, which is evaluative/cognitive and action-orientated in nature. Refining the Avoidance word-list according to these features improves macroeconomic models, suggesting that they capture the essence of Avoidance and that it plays a crucial role in driving real-world economic decision-making. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2111.08629 [pdf, other]

doi 10.1073/pnas.2201337119

Communication by means of Modulated Johnson Noise

Authors: Zerina Kapetanovic, Miguel Morales, Joshua R. Smith

Abstract: We present the design of a new passive wireless communication system that does not rely on ambient or generated RF sources. Instead, we exploit the Johnson (thermal) noise generated by a resistor to transmit information bits wirelessly. By switching the load connected to an antenna between a resistor and open circuit, we can achieve data rates of up to 26bps and distances of up to 7.3 meters. This… ▽ More We present the design of a new passive wireless communication system that does not rely on ambient or generated RF sources. Instead, we exploit the Johnson (thermal) noise generated by a resistor to transmit information bits wirelessly. By switching the load connected to an antenna between a resistor and open circuit, we can achieve data rates of up to 26bps and distances of up to 7.3 meters. This communication method is orders of magnitude less power consuming than conventional communication schemes and presents the opportunity to enable wireless communication in areas with a complete lack of connectivity. △ Less

Submitted 6 August, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

arXiv:2110.15911 [pdf, other]

doi 10.1016/j.apenergy.2021.118491

Physics-informed linear regression is competitive with two Machine Learning methods in residential building MPC

Authors: Felix Bünning, Benjamin Huber, Adrian Schalbetter, Ahmed Aboudonia, Mathias Hudoba de Badyn, Philipp Heer, Roy S. Smith, John Lygeros

Abstract: Because physics-based building models are difficult to obtain as each building is individual, there is an increasing interest in generating models suitable for building MPC directly from measurement data. Machine learning methods have been widely applied to this problem and validated mostly in simulation; there are, however, few studies on a direct comparison of different models or validation in r… ▽ More Because physics-based building models are difficult to obtain as each building is individual, there is an increasing interest in generating models suitable for building MPC directly from measurement data. Machine learning methods have been widely applied to this problem and validated mostly in simulation; there are, however, few studies on a direct comparison of different models or validation in real buildings to be found in the literature. Methods that are indeed validated in application often lead to computationally complex non-convex optimization problems. Here we compare physics-informed Autoregressive-Moving-Average with Exogenous Inputs (ARMAX) models to Machine Learning models based on Random Forests and Input Convex Neural Networks and the resulting convex MPC schemes in experiments on a practical building application with the goal of minimizing energy consumption while maintaining occupant comfort, and in a numerical case study. We demonstrate that Predictive Control in general leads to savings between 26% and 49% of heating and cooling energy, compared to the building's baseline hysteresis controller. Moreover, we show that all model types lead to satisfactory control performance in terms of constraint satisfaction and energy reduction. However, we also see that the physics-informed ARMAX models have a lower computational burden, and a superior sample efficiency compared to the Machine Learning based models. Moreover, even if abundant training data is available, the ARMAX models have a significantly lower prediction error than the Machine Learning models, which indicates that the encoded physics-based prior of the former cannot independently be found by the latter. △ Less

Submitted 26 January, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: 17 pages, 11 Figures, submitted to Applied Energy

Journal ref: Applied Energy 310 (2020) 118491

arXiv:2110.03790 [pdf, other]

Scaling Bayesian Optimization With Game Theory

Authors: L. Mathesen, G. Pedrielli, R. L. Smith

Abstract: We introduce the algorithm Bayesian Optimization (BO) with Fictitious Play (BOFiP) for the optimization of high dimensional black box functions. BOFiP decomposes the original, high dimensional, space into several sub-spaces defined by non-overlapping sets of dimensions. These sets are randomly generated at the start of the algorithm, and they form a partition of the dimensions of the original spac… ▽ More We introduce the algorithm Bayesian Optimization (BO) with Fictitious Play (BOFiP) for the optimization of high dimensional black box functions. BOFiP decomposes the original, high dimensional, space into several sub-spaces defined by non-overlapping sets of dimensions. These sets are randomly generated at the start of the algorithm, and they form a partition of the dimensions of the original space. BOFiP searches the original space with alternating BO, within sub-spaces, and information exchange among sub-spaces, to update the sub-space function evaluation. The basic idea is to distribute the high dimensional optimization across low dimensional sub-spaces, where each sub-space is a player in an equal interest game. At each iteration, BO produces approximate best replies that update the players belief distribution. The belief update and BO alternate until a stopping condition is met. High dimensional problems are common in real applications, and several contributions in the BO literature have highlighted the difficulty in scaling to high dimensions due to the computational complexity associated to the estimation of the model hyperparameters. Such complexity is exponential in the problem dimension, resulting in substantial loss of performance for most techniques with the increase of the input dimensionality. We compare BOFiP to several state-of-the-art approaches in the field of high dimensional black box optimization. The numerical experiments show the performance over three benchmark objective functions from 20 up to 1000 dimensions. A neural network architecture design problem is tested with 42 up to 911 nodes in 6 up to 92 layers, respectively, resulting into networks with 500 up to 10,000 weights. These sets of experiments empirically show that BOFiP outperforms its competitors, showing consistent performance across different problems and increasing problem dimensionality. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 17 pages, 6 Figures, 4 Tables. Submitted for Journal Publication, Under review

arXiv:2109.05103 [pdf, other]

No Size Fits All: Automated Radio Configuration for LPWANs

Authors: Zerina Kapetanovic, Deepak Vasisht, Tusher Chakraborty, Joshua R. Smith, Ranveer Chandra

Abstract: Low power long-range networks like LoRa have become increasingly mainstream for Internet of Things deployments. Given the versatility of applications that these protocols enable, they support many data rates and bandwidths. Yet, for a given network that supports hundreds of devices over multiple miles, the network operator typically needs to specify the same configuration or among a small subset o… ▽ More Low power long-range networks like LoRa have become increasingly mainstream for Internet of Things deployments. Given the versatility of applications that these protocols enable, they support many data rates and bandwidths. Yet, for a given network that supports hundreds of devices over multiple miles, the network operator typically needs to specify the same configuration or among a small subset of configurations for all the client devices to communicate with the gateway. This one-size-fits-all approach is highly inefficient in large networks. We propose an alternative approach -- we allow network devices to transmit at any data rate they choose. The gateway uses the first few symbols in the preamble to classify the correct data rate, switches its configuration, and then decodes the data. Our design leverages the inherent asymmetry in outdoor IoT deployments where the clients are power-starved and resource-constrained, but the gateway is not. Our gateway design, Proteus, runs a neural network architecture and is backward compatible with existing LoRa protocols. Our experiments reveal that Proteus can identify the correct configuration with over 97% accuracy in both indoor and outdoor deployments. Our network architecture leads to a 3.8 to 11 times increase in throughput for our LoRa testbed. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:2108.12453 [pdf, other]

Convolutional Autoencoders for Reduced-Order Modeling

Authors: Sreeram Venkat, Ralph C. Smith, Carl T. Kelley

Abstract: In the construction of reduced-order models for dynamical systems, linear projection methods, such as proper orthogonal decompositions, are commonly employed. However, for many dynamical systems, the lower dimensional representation of the state space can most accurately be described by a \textit{nonlinear} manifold. Previous research has shown that deep learning can provide an efficient method fo… ▽ More In the construction of reduced-order models for dynamical systems, linear projection methods, such as proper orthogonal decompositions, are commonly employed. However, for many dynamical systems, the lower dimensional representation of the state space can most accurately be described by a \textit{nonlinear} manifold. Previous research has shown that deep learning can provide an efficient method for performing nonlinear dimension reduction, though they are dependent on the availability of training data and are often problem-specific \citep[see][]{carlberg_ca}. Here, we utilize randomized training data to create and train convolutional autoencoders that perform nonlinear dimension reduction for the wave and Kuramoto-Shivasinsky equations. Moreover, we present training methods that are independent of full-order model samples and use the manifold least-squares Petrov-Galerkin projection method to define a reduced-order model for the heat, wave, and Kuramoto-Shivasinsky equations using the same autoencoder. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2108.07206 [pdf, other]

doi 10.1109/TRO.2021.3111786

Proximity Perception in Human-Centered Robotics: A Survey on Sensing Systems and Applications

Authors: Stefan Escaida Navarro, Stephan Mühlbacher-Karrer, Hosam Alagi, Hubert Zangl, Keisuke Koyama, Björn Hein, Christian Duriez, Joshua R. Smith

Abstract: Proximity perception is a technology that has the potential to play an essential role in the future of robotics. It can fulfill the promise of safe, robust, and autonomous systems in industry and everyday life, alongside humans, as well as in remote locations in space and underwater. In this survey paper, we cover the developments of this field from the early days up to the present, with a focus o… ▽ More Proximity perception is a technology that has the potential to play an essential role in the future of robotics. It can fulfill the promise of safe, robust, and autonomous systems in industry and everyday life, alongside humans, as well as in remote locations in space and underwater. In this survey paper, we cover the developments of this field from the early days up to the present, with a focus on human-centered robotics. Here, proximity sensors are typically deployed in two scenarios: first, on the exterior of manipulator arms to support safety and interaction functionality, and second, on the inside of grippers or hands to support grasping and exploration. Starting from this observation, we propose a categorization for the approaches found in the literature. To provide a basis for understanding these approaches, we devote effort to present the technologies and different measuring principles that were developed over the years, also providing a summary in form of a table. Then, we show the diversity of applications that have been presented in the literature. Finally, we give an overview of the most important trends that will shape the future of this domain. △ Less

Submitted 17 August, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

Journal ref: IEEE Transactions on Robotics 2021

arXiv:2108.01067 [pdf, other]

Co-Design of Assistive Robotics with Additive Manufacturing and Cyber-Physical Modularity to Improve Trust

Authors: Alexandre Colle, Ronnie Smith, Scott Macleod, Mauro Dragone

Abstract: Robotics and automation have the potential to significantly improve quality of life for people with assistive needs and their carers. Adoption of such technologies at this point in time is far from widespread. This paper presents a novel approach to the design of highly customisable robotic concepts, embracing modularity and a co-design process to increase the involvement of end-users in the devel… ▽ More Robotics and automation have the potential to significantly improve quality of life for people with assistive needs and their carers. Adoption of such technologies at this point in time is far from widespread. This paper presents a novel approach to the design of highly customisable robotic concepts, embracing modularity and a co-design process to increase the involvement of end-users in the development life cycle. We discuss this process within the context of an elderly care use case. Using design methodology and additive manufacturing, we outline how key stakeholders can be involved from initial conception through to integration of the final product within their environments. In future work, we will apply this process to demonstrate the effectiveness of our approach for improving long-term acceptance and trust of robotic technology in care contexts. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 4 pages workshop paper, REDOUBLE 2021 Roman-2021 August 8th 2021

Report number: REDOUBLE/2021/1

arXiv:2105.00794 [pdf, ps, other]

Robust 3D Cell Segmentation: Extending the View of Cellpose

Authors: Dennis Eschweiler, Richard S. Smith, Johannes Stegmaier

Abstract: Increasing data set sizes of 3D microscopy imaging experiments demand for an automation of segmentation processes to be able to extract meaningful biomedical information. Due to the shortage of annotated 3D image data that can be used for machine learning-based approaches, 3D segmentation approaches are required to be robust and to generalize well to unseen data. The Cellpose approach proposed by… ▽ More Increasing data set sizes of 3D microscopy imaging experiments demand for an automation of segmentation processes to be able to extract meaningful biomedical information. Due to the shortage of annotated 3D image data that can be used for machine learning-based approaches, 3D segmentation approaches are required to be robust and to generalize well to unseen data. The Cellpose approach proposed by Stringer et al. proved to be such a generalist approach for cell instance segmentation tasks. In this paper, we extend the Cellpose approach to improve segmentation accuracy on 3D image data and we further show how the formulation of the gradient maps can be simplified while still being robust and reaching similar segmentation accuracy. The code is publicly available and was integrated into two established open-source applications that allow using the 3D extension of Cellpose without any programming knowledge. △ Less

Submitted 1 February, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

arXiv:2104.12444 [pdf, other]

Improved Bounded Model Checking of Timed Automata

Authors: Robert L. Smith, Marcello M. Bersani, Matteo Rossi, Pierluigi San Pietro

Abstract: Timed Automata (TA) are a very popular modeling formalism for systems with time-sensitive properties. A common task is to verify if a network of TA satisfies a given property, usually expressed in Linear Temporal Logic (LTL), or in a subset of Timed Computation Tree Logic (TCTL). In this paper, we build upon the TACK bounded model checker for TA, which supports a signal-based semantics of TA and t… ▽ More Timed Automata (TA) are a very popular modeling formalism for systems with time-sensitive properties. A common task is to verify if a network of TA satisfies a given property, usually expressed in Linear Temporal Logic (LTL), or in a subset of Timed Computation Tree Logic (TCTL). In this paper, we build upon the TACK bounded model checker for TA, which supports a signal-based semantics of TA and the richer Metric Interval Temporal Logic (MITL). TACK encodes both the TA network and property into a variant of LTL, Constraint LTL over clocks (CLTLoc). The produced CLTLoc formula can then be solved by tools such as Zot, which transforms CLTLoc properties into the input logics of Satisfiability Modulo Theories (SMT) solvers. We present a novel method that preserves TACK's encoding of MITL properties while encoding the TA network directly into the SMT solver language, making use of both the BitVector logic and the logic of real arithmetics. We also introduce several optimizations that allow us to significantly outperform the CLTLoc encoding in many practical scenarios. △ Less

Submitted 26 April, 2021; originally announced April 2021.

arXiv:2102.11537 [pdf, other]

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

Authors: Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith

Abstract: Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers. In this literature, acceleration is often supposed to be linked to the quality of the integrator (accuracy, energy preservation, symplecticity). In this work, we propose a novel ordinary differential equation… ▽ More Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers. In this literature, acceleration is often supposed to be linked to the quality of the integrator (accuracy, energy preservation, symplecticity). In this work, we propose a novel ordinary differential equation that questions this connection: both the explicit and the semi-implicit (a.k.a symplectic) Euler discretizations on this ODE lead to an accelerated algorithm for convex programming. Although semi-implicit methods are well-known in numerical analysis to enjoy many desirable features for the integration of physical systems, our findings show that these properties do not necessarily relate to acceleration. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: 18 pages, 5 figures; Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130

arXiv:2012.15353 [pdf]

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Authors: Jacob Turton, David Vinson, Robert Elliott Smith

Abstract: Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information about words in context. However, as single entities, these embeddings are difficult to interpret and the models used to create them have been described as opaqu… ▽ More Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information about words in context. However, as single entities, these embeddings are difficult to interpret and the models used to create them have been described as opaque. Binder and colleagues proposed an intuitive embedding space where each dimension is based on one of 65 core semantic features. Unfortunately, the space only exists for a small dataset of 535 words, limiting its uses. Previous work (Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020) has shown that Binder features can be derived from static embeddings and successfully extrapolated to a large new vocabulary. Taking the next step, this paper demonstrates that Binder features can be derived from the BERT embedding space. This provides contextualised Binder embeddings, which can aid in understanding semantic differences between words in context. It additionally provides insights into how semantic features are represented across the different layers of the BERT model. △ Less

Submitted 30 December, 2020; originally announced December 2020.

arXiv:2012.11022 [pdf, ps, other]

Parameter Identification for Digital Fabrication: A Gaussian Process Learning Approach

Authors: Yvonne R. Stürz, Mohammad Khosravi, Roy S. Smith

Abstract: Tensioned cable nets can be used as supporting structures for the efficient construction of lightweight building elements, such as thin concrete shell structures. To guarantee important mechanical properties of the latter, the tolerances on deviations of the tensioned cable net geometry from the desired target form are very tight. Therefore, the form needs to be readjusted on the construction site… ▽ More Tensioned cable nets can be used as supporting structures for the efficient construction of lightweight building elements, such as thin concrete shell structures. To guarantee important mechanical properties of the latter, the tolerances on deviations of the tensioned cable net geometry from the desired target form are very tight. Therefore, the form needs to be readjusted on the construction site. In order to employ model-based optimization techniques, the precise identification of important uncertain model parameters of the cable net system is required. This paper proposes the use of Gaussian process regression to learn the function that maps the cable net geometry to the uncertain parameters. In contrast to previously proposed methods, this approach requires only a single form measurement for the identification of the cable net model parameters. This is beneficial since measurements of the cable net form on the construction site are very expensive. For the training of the Gaussian processes, simulated data is efficiently computed via convex programming. The effectiveness of the proposed method and the impact of the precise identification of the parameters on the form of the cable net are demonstrated in numerical experiments on a quarter-scale prototype of a roof structure. △ Less

Submitted 20 December, 2020; originally announced December 2020.

Showing 1–50 of 141 results for author: Smith, R