-
GPT Deciphering Fedspeak: Quantifying Dissent Among Hawks and Doves
Authors:
Denis Peskoff,
Adam Visokay,
Sander Schulhoff,
Benjamin Wachspress,
Alan Blinder,
Brandon M. Stewart
Abstract:
Markets and policymakers around the world hang on the consequential monetary policy decisions made by the Federal Open Market Committee (FOMC). Publicly available textual documentation of their meetings provides insight into members' attitudes about the economy. We use GPT-4 to quantify dissent among members on the topic of inflation. We find that transcripts and minutes reflect the diversity of m…
▽ More
Markets and policymakers around the world hang on the consequential monetary policy decisions made by the Federal Open Market Committee (FOMC). Publicly available textual documentation of their meetings provides insight into members' attitudes about the economy. We use GPT-4 to quantify dissent among members on the topic of inflation. We find that transcripts and minutes reflect the diversity of member views about the macroeconomic outlook in a way that is lost or omitted from the public statements. In fact, diverging opinions that shed light upon the committee's "true" attitudes are almost entirely omitted from the final statements. Hence, we argue that forecasting FOMC sentiment based solely on statements will not sufficiently reflect dissent among the hawks and doves.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play
Authors:
Wichayaporn Wongkamjan,
Feng Gu,
Yanze Wang,
Ulf Hermjakob,
Jonathan May,
Brandon M. Stewart,
Jonathan K. Kummerfeld,
Denis Peskoff,
Jordan Lee Boyd-Graber
Abstract:
The boardgame Diplomacy is a challenging setting for communicative and cooperative artificial intelligence. The most prominent communicative Diplomacy AI, Cicero, has excellent strategic abilities, exceeding human players. However, the best Diplomacy players master communication, not just tactics, which is why the game has received attention as an AI challenge. This work seeks to understand the de…
▽ More
The boardgame Diplomacy is a challenging setting for communicative and cooperative artificial intelligence. The most prominent communicative Diplomacy AI, Cicero, has excellent strategic abilities, exceeding human players. However, the best Diplomacy players master communication, not just tactics, which is why the game has received attention as an AI challenge. This work seeks to understand the degree to which Cicero succeeds at communication. First, we annotate in-game communication with abstract meaning representation to separate in-game tactics from general language. Second, we run two dozen games with humans and Cicero, totaling over 200 human-player hours of competition. While AI can consistently outplay human players, AI-Human communication is still limited because of AI's difficulty with deception and persuasion. This shows that Cicero relies on strategy and has not yet reached the full promise of communicative and cooperative AI.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion
Authors:
Tyler Bikaun,
Michael Stewart,
Wei Liu
Abstract:
This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textu…
▽ More
This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.
△ Less
Submitted 7 May, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection
Authors:
Colby Banbury,
Emil Njor,
Matthew Stewart,
Pete Warden,
Manjunath Kudlur,
Nat Jeffries,
Xenofon Fafoutis,
Vijay Janapa Reddi
Abstract:
Tiny machine learning (TinyML), which enables machine learning applications on extremely low-power devices, suffers from limited size and quality of relevant datasets. To address this issue, we introduce Wake Vision, a large-scale, diverse dataset tailored for person detection, the canonical task for TinyML visual sensing. Wake Vision comprises over 6 million images, representing a hundredfold inc…
▽ More
Tiny machine learning (TinyML), which enables machine learning applications on extremely low-power devices, suffers from limited size and quality of relevant datasets. To address this issue, we introduce Wake Vision, a large-scale, diverse dataset tailored for person detection, the canonical task for TinyML visual sensing. Wake Vision comprises over 6 million images, representing a hundredfold increase compared to the previous standard, and has undergone thorough quality filtering. We provide two Wake Vision training sets: Wake Vision (Large) and Wake Vision (Quality), a smaller set with higher-quality labels. Our results demonstrate that using the Wake Vision (Quality) training set produces more accurate models than the Wake Vision (Large) training set, strongly suggesting that label quality is more important than quantity in our setting. We find use for the large training set for pre-training and knowledge distillation. To minimize label errors that can obscure true model performance, we manually label the validation and test sets, improving the test set error rate from 7.8% in the prior standard to only 2.2%. In addition to the dataset, we provide a collection of five detailed benchmark sets to facilitate the evaluation of model quality in challenging real world scenarios that are often ignored when focusing solely on overall accuracy. These novel fine-grained benchmarks assess model performance on specific segments of the test data, such as varying lighting conditions, distances from the camera, and demographic characteristics of subjects. Our results demonstrate that using Wake Vision for training results in a 2.49% increase in accuracy compared to the established dataset. We also show the importance of dataset quality for low-capacity models and the value of dataset size for high-capacity models. wakevision.ai
△ Less
Submitted 6 June, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Materiality and Risk in the Age of Pervasive AI Sensors
Authors:
Matthew Stewart,
Emanuel Moss,
Pete Warden,
Brian Plancher,
Susan Kennedy,
Mona Sloane,
Vijay Janapa Reddi
Abstract:
Artificial intelligence systems connected to sensor-laden devices are becoming pervasive, which has significant implications for a range of AI risks, including to privacy, the environment, autonomy, and more. There is therefore a growing need for increased accountability around the responsible development and deployment of these technologies. In this paper, we provide a comprehensive analysis of t…
▽ More
Artificial intelligence systems connected to sensor-laden devices are becoming pervasive, which has significant implications for a range of AI risks, including to privacy, the environment, autonomy, and more. There is therefore a growing need for increased accountability around the responsible development and deployment of these technologies. In this paper, we provide a comprehensive analysis of the evolution of sensors, the risks they pose by virtue of their material existence in the world, and the impacts of ubiquitous sensing and on-device AI. We propose incorporating sensors into risk management frameworks and call for more responsible sensor and system design paradigms that address risks of such systems. To do so, we trace the evolution of sensors from analog devices to intelligent, networked systems capable of real-time data analysis and decision-making at the extreme edge of the network. We show that the proliferation of sensors is driven by calculative models that prioritize data collection and cost reduction and produce risks that emerge around privacy, surveillance, waste, and power dynamics. We then analyze these risks, highlighting issues of validity, safety, security, accountability, interpretability, and bias. We surface sensor-related risks not commonly captured in existing approaches to AI risk management, using a materiality lens that reveals how physical sensor properties shape data and algorithmic models. We conclude by advocating for increased attention to the materiality of algorithmic systems, and of on-device AI sensors in particular, and highlight the need for development of a responsible sensor design paradigm that empowers users and communities and leads to a future of increased fairness, accountability and transparency.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Rel2Graph: Automated Mapping From Relational Databases to a Unified Property Knowledge Graph
Authors:
Ziyu Zhao,
Wei Liu,
Tim French,
Michael Stewart
Abstract:
Although a few approaches are proposed to convert relational databases to graphs, there is a genuine lack of systematic evaluation across a wider spectrum of databases. Recognising the important issue of query mapping, this paper proposes an approach Rel2Graph, an automatic knowledge graph construction (KGC) approach from an arbitrary number of relational databases. Our approach also supports the…
▽ More
Although a few approaches are proposed to convert relational databases to graphs, there is a genuine lack of systematic evaluation across a wider spectrum of databases. Recognising the important issue of query mapping, this paper proposes an approach Rel2Graph, an automatic knowledge graph construction (KGC) approach from an arbitrary number of relational databases. Our approach also supports the mapping of conjunctive SQL queries into pattern-based NoSQL queries. We evaluate our proposed approach on two widely used relational database-oriented datasets: Spider and KaggleDBQA benchmarks for semantic parsing. We employ the execution accuracy (EA) metric to quantify the proportion of results by executing the NoSQL queries on the property knowledge graph we construct that aligns with the results of SQL queries performed on relational databases. Consequently, the counterpart property knowledge graph of benchmarks with high accuracy and integrity can be ensured. The code and data will be publicly available. The code and data are available at github\footnote{https://github.com/nlp-tlp/Rel2Graph}.
△ Less
Submitted 26 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
RobotPerf: An Open-Source, Vendor-Agnostic, Benchmarking Suite for Evaluating Robotics Computing System Performance
Authors:
Víctor Mayoral-Vilches,
Jason Jabbour,
Yu-Shun Hsiao,
Zishen Wan,
Martiño Crespo-Álvarez,
Matthew Stewart,
Juan Manuel Reina-Muñoz,
Prateek Nagras,
Gaurav Vikhe,
Mohammad Bakhshalipour,
Martin Pinzger,
Stefan Rass,
Smruti Panigrahi,
Giulio Corradi,
Niladri Roy,
Phillip B. Gibbons,
Sabrina M. Neuman,
Brian Plancher,
Vijay Janapa Reddi
Abstract:
We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and re…
▽ More
We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and replacing them with a test application, and grey-box testing, an application-specific measure that observes internal system states with minimal interference. Our benchmarking framework provides ready-to-use tools and is easily adaptable for the assessment of custom ROS 2 computational graphs. Drawing from the knowledge of leading robot architects and system architecture experts, RobotPerf establishes a standardized approach to robotics benchmarking. As an open-source initiative, RobotPerf remains committed to evolving with community input to advance the future of hardware-accelerated robotics.
△ Less
Submitted 29 January, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
Large Language Models for Failure Mode Classification: An Investigation
Authors:
Michael Stewart,
Melinda Hodkiewicz,
Sirui Li
Abstract:
In this paper we present the first investigation into the effectiveness of Large Language Models (LLMs) for Failure Mode Classification (FMC). FMC, the task of automatically labelling an observation with a corresponding failure mode code, is a critical task in the maintenance domain as it reduces the need for reliability engineers to spend their time manually analysing work orders. We detail our a…
▽ More
In this paper we present the first investigation into the effectiveness of Large Language Models (LLMs) for Failure Mode Classification (FMC). FMC, the task of automatically labelling an observation with a corresponding failure mode code, is a critical task in the maintenance domain as it reduces the need for reliability engineers to spend their time manually analysing work orders. We detail our approach to prompt engineering to enable an LLM to predict the failure mode of a given observation using a restricted code list. We demonstrate that the performance of a GPT-3.5 model (F1=0.80) fine-tuned on annotated data is a significant improvement over a currently available text classification model (F1=0.60) trained on the same annotated data set. The fine-tuned model also outperforms the out-of-the box GPT-3.5 (F1=0.46). This investigation reinforces the need for high quality fine-tuning data sets for domain-specific tasks using LLMs.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
REFORMS: Reporting Standards for Machine Learning Based Science
Authors:
Sayash Kapoor,
Emily Cantrell,
Kenny Peng,
Thanh Hien Pham,
Christopher A. Bail,
Odd Erik Gundersen,
Jake M. Hofman,
Jessica Hullman,
Michael A. Lones,
Momin M. Malik,
Priyanka Nanayakkara,
Russell A. Poldrack,
Inioluwa Deborah Raji,
Michael Roberts,
Matthew J. Salganik,
Marta Serra-Garcia,
Brandon M. Stewart,
Gilles Vandewiele,
Arvind Narayanan
Abstract:
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways acros…
▽ More
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
△ Less
Submitted 19 September, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Datasheets for Machine Learning Sensors: Towards Transparency, Auditability, and Responsibility for Intelligent Sensing
Authors:
Matthew Stewart,
Pete Warden,
Yasmine Omri,
Shvetank Prakash,
Joao Santos,
Shawn Hymel,
Benjamin Brown,
Jim MacArthur,
Nat Jeffries,
Sachin Katti,
Brian Plancher,
Vijay Janapa Reddi
Abstract:
Machine learning (ML) sensors are enabling intelligence at the edge by empowering end-users with greater control over their data. ML sensors offer a new paradigm for sensing that moves the processing and analysis to the device itself rather than relying on the cloud, bringing benefits like lower latency and greater data privacy. The rise of these intelligent edge devices, while revolutionizing are…
▽ More
Machine learning (ML) sensors are enabling intelligence at the edge by empowering end-users with greater control over their data. ML sensors offer a new paradigm for sensing that moves the processing and analysis to the device itself rather than relying on the cloud, bringing benefits like lower latency and greater data privacy. The rise of these intelligent edge devices, while revolutionizing areas like the internet of things (IoT) and healthcare, also throws open critical questions about privacy, security, and the opacity of AI decision-making. As ML sensors become more pervasive, it requires judicious governance regarding transparency, accountability, and fairness. To this end, we introduce a standard datasheet template for these ML sensors and discuss and evaluate the design and motivation for each section of the datasheet in detail including: standard dasheet components like the system's hardware specifications, IoT and AI components like the ML model and dataset attributes, as well as novel components like end-to-end performance metrics, and expanded environmental impact metrics. To provide a case study of the application of our datasheet template, we also designed and developed two examples for ML sensors performing computer vision-based person detection: one an open-source ML sensor designed and developed in-house, and a second commercial ML sensor developed by our industry collaborators. Together, ML sensors and their datasheets provide greater privacy, security, transparency, explainability, auditability, and user-friendliness for ML-enabled embedded systems. We conclude by emphasizing the need for standardization of datasheets across the broader ML community to ensure the responsible use of sensor data.
△ Less
Submitted 16 February, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models
Authors:
Naoki Egami,
Musashi Hinck,
Brandon M. Stewart,
Hanying Wei
Abstract:
In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalabl…
▽ More
In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80-90%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.
△ Less
Submitted 14 January, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems
Authors:
Jason Yik,
Korneel Van den Berghe,
Douwe den Blanken,
Younes Bouhadjar,
Maxime Fabre,
Paul Hueber,
Denis Kleyko,
Noah Pacik-Nelson,
Pao-Sheng Vincent Sun,
Guangzhi Tang,
Shenqi Wang,
Biyan Zhou,
Soikat Hasan Ahmed,
George Vathakkattil Joseph,
Benedetto Leto,
Aurora Micheli,
Anurag Kumar Mishra,
Gregor Lenz,
Tao Sun,
Zergham Ahmed,
Mahmoud Akl,
Brian Anderson,
Andreas G. Andreou,
Chiara Bartolozzi,
Arindam Basu
, et al. (73 additional authors not shown)
Abstract:
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu…
▽ More
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community.
△ Less
Submitted 17 January, 2024; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers
Authors:
Shvetank Prakash,
Matthew Stewart,
Colby Banbury,
Mark Mazumder,
Pete Warden,
Brian Plancher,
Vijay Janapa Reddi
Abstract:
The sustained growth of carbon emissions and global waste elicits significant sustainability concerns for our environment's future. The growing Internet of Things (IoT) has the potential to exacerbate this issue. However, an emerging area known as Tiny Machine Learning (TinyML) has the opportunity to help address these environmental challenges through sustainable computing practices. TinyML, the d…
▽ More
The sustained growth of carbon emissions and global waste elicits significant sustainability concerns for our environment's future. The growing Internet of Things (IoT) has the potential to exacerbate this issue. However, an emerging area known as Tiny Machine Learning (TinyML) has the opportunity to help address these environmental challenges through sustainable computing practices. TinyML, the deployment of machine learning (ML) algorithms onto low-cost, low-power microcontroller systems, enables on-device sensor analytics that unlocks numerous always-on ML applications. This article discusses both the potential of these TinyML applications to address critical sustainability challenges, as well as the environmental footprint of this emerging technology. Through a complete life cycle analysis (LCA), we find that TinyML systems present opportunities to offset their carbon emissions by enabling applications that reduce the emissions of other sectors. Nevertheless, when globally scaled, the carbon footprint of TinyML systems is not negligible, necessitating that designers factor in environmental impact when formulating new devices. Finally, we outline research directions to enable further sustainable contributions of TinyML.
△ Less
Submitted 21 November, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Standing Balance Improvement Using Vibrotactile Feedback in Virtual Reality
Authors:
M. Rasel Mahmud,
Michael Stewart,
Alberto Cordova,
John Quarles
Abstract:
Virtual Reality (VR) users often encounter postural instability, i.e., balance issues, which can be a significant impediment to universal usability and accessibility, particularly for those with balance impairments. Prior research has validated imbalance issues, but little effort has been made to mitigate them. We recruited 39 participants (with balance impairments: 18, without balance impairments…
▽ More
Virtual Reality (VR) users often encounter postural instability, i.e., balance issues, which can be a significant impediment to universal usability and accessibility, particularly for those with balance impairments. Prior research has validated imbalance issues, but little effort has been made to mitigate them. We recruited 39 participants (with balance impairments: 18, without balance impairments: 21) to examine the effect of various vibrotactile feedback techniques on balance in virtual reality, specifically spatial vibrotactile, static vibrotactile, rhythmic vibrotactile, and vibrotactile feedback mapped to the center of pressure (CoP). Participants completed standing visual exploration and standing reach and grasp tasks. According to within-subject results, each vibrotactile feedback enhanced balance in VR significantly (p < .001) for those with and without balance impairments. Spatial and CoP vibrotactile feedback enhanced balance significantly more (p < .001) than other vibrotactile feedback. This study presents strategies that might be used in future virtual environments to enhance standing balance and bring VR closer to universal usage.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Auditory Feedback to Make Walking in Virtual Reality More Accessible
Authors:
M. Rasel Mahmud,
Michael Stewart,
Alberto Cordova,
John Quarles
Abstract:
The objective of this study is to investigate the impact of several auditory feedback modalities on gait (i.e., walking patterns) in virtual reality (VR). Prior research has substantiated gait disturbances in VR users as one of the primary obstacles to VR usability. However, minimal research has been done to mitigate this issue. We recruited 39 participants (with mobility impairments: 18, without…
▽ More
The objective of this study is to investigate the impact of several auditory feedback modalities on gait (i.e., walking patterns) in virtual reality (VR). Prior research has substantiated gait disturbances in VR users as one of the primary obstacles to VR usability. However, minimal research has been done to mitigate this issue. We recruited 39 participants (with mobility impairments: 18, without mobility impairments: 21) who completed timed walking tasks in a real-world environment and the same tasks in a VR environment with various types of auditory feedback. Within-subject results showed that each auditory condition significantly improved gait performance while in VR (p < .001) compared to the no auditory condition in VR for both groups of participants with and without mobility impairments. Moreover, spatial audio improved gait performance significantly (p < .001) compared to other auditory conditions for both groups of participants. This research could help to make walking in VR more accessible for people with and without mobility impairments.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Vibrotactile Feedback to Make Real Walking in Virtual Reality More Accessible
Authors:
M. Rasel Mahmud,
Michael Stewart,
Alberto Cordova,
John Quarles
Abstract:
This research aims to examine the effects of various vibrotactile feedback techniques on gait (i.e., walking patterns) in virtual reality (VR). Prior studies have demonstrated that gait disturbances in VR users are significant usability barriers. However, adequate research has not been performed to address this problem. In our study, 39 participants (with mobility impairments: 18, without mobility…
▽ More
This research aims to examine the effects of various vibrotactile feedback techniques on gait (i.e., walking patterns) in virtual reality (VR). Prior studies have demonstrated that gait disturbances in VR users are significant usability barriers. However, adequate research has not been performed to address this problem. In our study, 39 participants (with mobility impairments: 18, without mobility impairments: 21) performed timed walking tasks in a real-world environment and identical activities in a VR environment with different forms of vibrotactile feedback (spatial, static, and rhythmic). Within-group results revealed that each form of vibrotactile feedback improved gait performance in VR significantly (p < .001) relative to the no vibrotactile condition in VR for individuals with and without mobility impairments. Moreover, spatial vibrotactile feedback increased gait performance significantly (p < .001) in both participant groups compared to other vibrotactile conditions. The findings of this research will help to make real walking in VR more accessible for those with and without mobility impairments.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Multiface: A Dataset for Neural Face Rendering
Authors:
Cheng-hsin Wuu,
Ningyuan Zheng,
Scott Ardisson,
Rohan Bali,
Danielle Belko,
Eric Brockmeyer,
Lucas Evans,
Timothy Godisart,
Hyowon Ha,
Xuhua Huang,
Alexander Hypes,
Taylor Koska,
Steven Krenn,
Stephen Lombardi,
Xiaomin Luo,
Kevyn McPhail,
Laura Millerschoen,
Michal Perdoch,
Mark Pitts,
Alexander Richard,
Jason Saragih,
Junko Saragih,
Takaaki Shiratori,
Tomas Simon,
Matt Stewart
, et al. (6 additional authors not shown)
Abstract:
Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reali…
▽ More
Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reality Labs Research for neural face rendering. We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance. The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence. Along with the release of the dataset, we conduct ablation studies on the influence of different model architectures toward the model's interpolation capacity of novel viewpoint and expressions. With a conditional VAE model serving as our baseline, we found that adding spatial bias, texture warp field, and residual connections improves performance on novel view synthesis. Our code and data is available at: https://github.com/facebookresearch/multiface
△ Less
Submitted 26 June, 2023; v1 submitted 22 July, 2022;
originally announced July 2022.
-
Machine Learning Sensors
Authors:
Pete Warden,
Matthew Stewart,
Brian Plancher,
Colby Banbury,
Shvetank Prakash,
Emma Chen,
Zain Asgar,
Sachin Katti,
Vijay Janapa Reddi
Abstract:
Machine learning sensors represent a paradigm shift for the future of embedded machine learning applications. Current instantiations of embedded machine learning (ML) suffer from complex integration, lack of modularity, and privacy and security concerns from data movement. This article proposes a more data-centric paradigm for embedding sensor intelligence on edge devices to combat these challenge…
▽ More
Machine learning sensors represent a paradigm shift for the future of embedded machine learning applications. Current instantiations of embedded machine learning (ML) suffer from complex integration, lack of modularity, and privacy and security concerns from data movement. This article proposes a more data-centric paradigm for embedding sensor intelligence on edge devices to combat these challenges. Our vision for "sensor 2.0" entails segregating sensor input data and ML processing from the wider system at the hardware level and providing a thin interface that mimics traditional sensors in functionality. This separation leads to a modular and easy-to-use ML sensor device. We discuss challenges presented by the standard approach of building ML processing into the software stack of the controlling microprocessor on an embedded system and how the modularity of ML sensors alleviates these problems. ML sensors increase privacy and accuracy while making it easier for system builders to integrate ML into their products as a simple component. We provide examples of prospective ML sensors and an illustrative datasheet as a demonstration and hope that this will build a dialogue to progress us towards sensor 2.0.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Auditory Feedback for Standing Balance Improvement in Virtual Reality
Authors:
M. Rasel Mahmud,
Michael Stewart,
Alberto Cordova,
John Quarles
Abstract:
Virtual Reality (VR) users often experience postural instability, i.e., balance problems, which could be a major barrier to universal usability and accessibility for all, especially for persons with balance impairments. Prior research has confirmed the imbalance effect, but minimal research has been conducted to reduce this effect. We recruited 42 participants (with balance impairments: 21, withou…
▽ More
Virtual Reality (VR) users often experience postural instability, i.e., balance problems, which could be a major barrier to universal usability and accessibility for all, especially for persons with balance impairments. Prior research has confirmed the imbalance effect, but minimal research has been conducted to reduce this effect. We recruited 42 participants (with balance impairments: 21, without balance impairments: 21) to investigate the impact of several auditory techniques on balance in VR, specifically spatial audio, static rest frame audio, rhythmic audio, and audio mapped to the center of pressure (CoP). Participants performed two types of tasks - standing visual exploration and standing reach and grasp. Within-subject results showed that each auditory technique improved balance in VR for both persons with and without balance impairments. Spatial and CoP audio improved balance significantly more than other auditory conditions. The techniques presented in this research could be used in future virtual environments to improve standing balance and help push VR closer to universal usability.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
Authors:
Amir Feder,
Katherine A. Keith,
Emaad Manzoor,
Reid Pryzant,
Dhanya Sridhar,
Zach Wood-Doughty,
Jacob Eisenstein,
Justin Grimmer,
Roi Reichart,
Margaret E. Roberts,
Brandon M. Stewart,
Victor Veitch,
Diyi Yang
Abstract:
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the conver…
▽ More
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community.
△ Less
Submitted 30 July, 2022; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Widening Access to Applied Machine Learning with TinyML
Authors:
Vijay Janapa Reddi,
Brian Plancher,
Susan Kennedy,
Laurence Moroney,
Pete Warden,
Anant Agarwal,
Colby Banbury,
Massimo Banzi,
Matthew Bennett,
Benjamin Brown,
Sharad Chitlangia,
Radhika Ghosal,
Sarah Grafman,
Rupert Jaeger,
Srivatsan Krishnan,
Maximilian Lam,
Daniel Leiker,
Cara Mann,
Mark Mazumder,
Dominic Pajak,
Dhilan Ramaprasad,
J. Evan Smith,
Matthew Stewart,
Dustin Tingley
Abstract:
Broadening access to both computational and educational resources is critical to diffusing machine-learning (ML) innovation. However, today, most ML resources and experts are siloed in a few countries and organizations. In this paper, we describe our pedagogical approach to increasing access to applied ML through a massive open online course (MOOC) on Tiny Machine Learning (TinyML). We suggest tha…
▽ More
Broadening access to both computational and educational resources is critical to diffusing machine-learning (ML) innovation. However, today, most ML resources and experts are siloed in a few countries and organizations. In this paper, we describe our pedagogical approach to increasing access to applied ML through a massive open online course (MOOC) on Tiny Machine Learning (TinyML). We suggest that TinyML, ML on resource-constrained embedded devices, is an attractive means to widen access because TinyML both leverages low-cost and globally accessible hardware, and encourages the development of complete, self-contained applications, from data collection to deployment. To this end, a collaboration between academia (Harvard University) and industry (Google) produced a four-part MOOC that provides application-oriented instruction on how to develop solutions using TinyML. The series is openly available on the edX MOOC platform, has no prerequisites beyond basic programming, and is designed for learners from a global variety of backgrounds. It introduces pupils to real-world applications, ML algorithms, data-set engineering, and the ethical considerations of these technologies via hands-on programming and deployment of TinyML applications in both the cloud and their own microcontrollers. To facilitate continued learning, community building, and collaboration beyond the courses, we launched a standalone website, a forum, a chat, and an optional course-project competition. We also released the course materials publicly, hoping they will inspire the next generation of ML practitioners and educators and further broaden access to cutting-edge ML technologies.
△ Less
Submitted 9 June, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
The evolving ecosystem of COVID-19 contact tracing applications
Authors:
Benjamin Levy,
Matthew Stewart
Abstract:
Since the outbreak of the novel coronavirus, COVID-19, there has been increased interest in the use of digital contact tracing as a means of stopping chains of viral transmission, provoking alarm from privacy advocates. Concerning the ethics of this technology, recent studies have predominantly focused on (1) the formation of guidelines for ethical contact tracing, (2) the analysis of specific imp…
▽ More
Since the outbreak of the novel coronavirus, COVID-19, there has been increased interest in the use of digital contact tracing as a means of stopping chains of viral transmission, provoking alarm from privacy advocates. Concerning the ethics of this technology, recent studies have predominantly focused on (1) the formation of guidelines for ethical contact tracing, (2) the analysis of specific implementations, or (3) the review of a select number of contact tracing applications and their relevant privacy or ethical implications. In this study, we provide a comprehensive survey of the evolving ecosystem of COVID-19 tracing applications, examining 152 contact tracing applications and assessing the extent to which they comply with existing guidelines for ethical contact tracing. The assessed criteria cover areas including data collection and storage, transparency and consent, and whether the implementation is open source. We find that although many apps released early in the pandemic fell short of best practices, apps released more recently, following the publication of the Apple/Google exposure notification protocol, have tended to be more closely aligned with ethical contact tracing principles. This dataset will be publicly available and may be updated as the pandemic continues.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Using machine learning to reduce ensembles of geological models for oil and gas exploration
Authors:
Anna Roubícková,
Lucy MacGregor,
Nick Brown,
Oliver Thomson Brown,
Mike Stewart
Abstract:
Exploration using borehole drilling is a key activity in determining the most appropriate locations for the petroleum industry to develop oil fields. However, estimating the amount of Oil In Place (OIP) relies on computing with a very significant number of geological models, which, due to the ever increasing capability to capture and refine data, is becoming infeasible. As such, data reduction tec…
▽ More
Exploration using borehole drilling is a key activity in determining the most appropriate locations for the petroleum industry to develop oil fields. However, estimating the amount of Oil In Place (OIP) relies on computing with a very significant number of geological models, which, due to the ever increasing capability to capture and refine data, is becoming infeasible. As such, data reduction techniques are required to reduce this set down to a smaller, yet still fully representative ensemble. In this paper we explore different approaches to identifying the key grouping of models, based on their most important features, and then using this information select a reduced set which we can be confident fully represent the overall model space. The result of this work is an approach which enables us to describe the entire state space using only 0.5\% of the models, along with a series of lessons learnt. The techniques that we describe are not only applicable to oil and gas exploration, but also more generally to the HPC community as we are forced to work with reduced data-sets due to the rapid increase in data collection capability.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
Naïve regression requires weaker assumptions than factor models to adjust for multiple cause confounding
Authors:
Justin Grimmer,
Dean Knox,
Brandon M. Stewart
Abstract:
The empirical practice of using factor models to adjust for shared, unobserved confounders, $\mathbf{Z}$, in observational settings with multiple treatments, $\mathbf{A}$, is widespread in fields including genetics, networks, medicine, and politics. Wang and Blei (2019, WB) formalizes these procedures and develops the "deconfounder," a causal inference method using factor models of $\mathbf{A}$ to…
▽ More
The empirical practice of using factor models to adjust for shared, unobserved confounders, $\mathbf{Z}$, in observational settings with multiple treatments, $\mathbf{A}$, is widespread in fields including genetics, networks, medicine, and politics. Wang and Blei (2019, WB) formalizes these procedures and develops the "deconfounder," a causal inference method using factor models of $\mathbf{A}$ to estimate "substitute confounders," $\hat{\mathbf{Z}}$, then estimating treatment effects by regressing the outcome, $\mathbf{Y}$, on part of $\mathbf{A}$ while adjusting for $\hat{\mathbf{Z}}$. WB claim the deconfounder is unbiased when there are no single-cause confounders and $\hat{\mathbf{Z}}$ is "pinpointed." We clarify pinpointing requires each confounder to affect infinitely many treatments. We prove under these assumptions, a naïve semiparametric regression of $\mathbf{Y}$ on $\mathbf{A}$ is asymptotically unbiased. Deconfounder variants nesting this regression are therefore also asymptotically unbiased, but variants using $\hat{\mathbf{Z}}$ and subsets of causes require further untestable assumptions. We replicate every deconfounder analysis with available data and find it fails to consistently outperform naïve regression. In practice, the deconfounder produces implausible estimates in WB's case study to movie earnings: estimates suggest comic author Stan Lee's cameo appearances causally contributed \$15.5 billion, most of Marvel movie revenue. We conclude neither approach is a viable substitute for careful research design in real-world applications.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
E2EET: From Pipeline to End-to-end Entity Typing via Transformer-Based Embeddings
Authors:
Michael Stewart,
Wei Liu
Abstract:
Entity Typing (ET) is the process of identifying the semantic types of every entity within a corpus. In contrast to Named Entity Recognition, where each token in a sentence is labelled with zero or one class label, ET involves labelling each entity mention with one or more class labels. Existing entity typing models, which operate at the mention level, are limited by two key factors: they do not m…
▽ More
Entity Typing (ET) is the process of identifying the semantic types of every entity within a corpus. In contrast to Named Entity Recognition, where each token in a sentence is labelled with zero or one class label, ET involves labelling each entity mention with one or more class labels. Existing entity typing models, which operate at the mention level, are limited by two key factors: they do not make use of recently-proposed context-dependent embeddings, and are trained on fixed context windows. They are therefore sensitive to window size selection and are unable to incorporate the context of the entire document. In light of these drawbacks we propose to incorporate context using transformer-based embeddings for a mention-level model, and an end-to-end model using a Bi-GRU to remove the dependency on window size. An extensive ablative study demonstrates the effectiveness of contextualised embeddings for mention-level models and the competitiveness of our end-to-end model for entity typing.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Word-level Lexical Normalisation using Context-Dependent Embeddings
Authors:
Michael Stewart,
Wei Liu,
Rachel Cardell-Oliver
Abstract:
Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while word-level models are seldom used. Recent language models offer solutions to the drawbacks of word-level LN models, yet, to the best of our knowledge, no research has…
▽ More
Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while word-level models are seldom used. Recent language models offer solutions to the drawbacks of word-level LN models, yet, to the best of our knowledge, no research has investigated their effectiveness on LN. In this paper we introduce a word-level GRU-based LN model and investigate the effectiveness of recent embedding techniques on word-level LN. Our results show that our GRU-based word-level model produces greater results than character-level models, and outperforms existing deep-learning based LN techniques on Twitter data. We also find that randomly-initialised embeddings are capable of outperforming pre-trained embedding models in certain scenarios. Finally, we release a substantial lexical normalisation dataset to the community.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
ICDM 2019 Knowledge Graph Contest: Team UWA
Authors:
Michael Stewart,
Majigsuren Enkhsaikhan,
Wei Liu
Abstract:
We present an overview of our triple extraction system for the ICDM 2019 Knowledge Graph Contest. Our system uses a pipeline-based approach to extract a set of triples from a given document. It offers a simple and effective solution to the challenge of knowledge graph construction from domain-specific text. It also provides the facility to visualise useful information about each triple such as the…
▽ More
We present an overview of our triple extraction system for the ICDM 2019 Knowledge Graph Contest. Our system uses a pipeline-based approach to extract a set of triples from a given document. It offers a simple and effective solution to the challenge of knowledge graph construction from domain-specific text. It also provides the facility to visualise useful information about each triple such as the degree, betweenness, structured relation type(s), and named entity types.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
How to Make Causal Inferences Using Texts
Authors:
Naoki Egami,
Christian J. Fong,
Justin Grimmer,
Margaret E. Roberts,
Brandon M. Stewart
Abstract:
New text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories of interest from large collections of text. We introduce a conceptual framework for making causal inferences with discovered measures as a treatment or outcome. Our framework enables researchers to discover high-dimensional textual interventions and es…
▽ More
New text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories of interest from large collections of text. We introduce a conceptual framework for making causal inferences with discovered measures as a treatment or outcome. Our framework enables researchers to discover high-dimensional textual interventions and estimate the ways that observed treatments affect text-based outcomes. We argue that nearly all text-based causal inferences depend upon a latent representation of the text and we provide a framework to learn the latent representation. But estimating this latent representation, we show, creates new risks: we may introduce an identification problem or overfit. To address these risks we describe a split-sample framework and apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic response. Our work provides a rigorous foundation for text-based causal inferences.
△ Less
Submitted 6 February, 2018;
originally announced February 2018.
-
How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility
Authors:
Allison J. B. Chaney,
Brandon M. Stewart,
Barbara E. Engelhardt
Abstract:
Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confoun…
▽ More
Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility.
△ Less
Submitted 26 November, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Natural Language Feature Selection via Cooccurrence
Authors:
Michael Stewart
Abstract:
Specificity is important for extracting collocations, keyphrases, multi-word and index terms [Newman et al. 2012]. It is also useful for tagging, ontology construction [Ryu and Choi 2006], and automatic summarization of documents [Louis and Nenkova 2011, Chali and Hassan 2012]. Term frequency and inverse-document frequency (TF-IDF) are typically used to do this, but fail to take advantage of the s…
▽ More
Specificity is important for extracting collocations, keyphrases, multi-word and index terms [Newman et al. 2012]. It is also useful for tagging, ontology construction [Ryu and Choi 2006], and automatic summarization of documents [Louis and Nenkova 2011, Chali and Hassan 2012]. Term frequency and inverse-document frequency (TF-IDF) are typically used to do this, but fail to take advantage of the semantic relationships between terms [Church and Gale 1995]. The result is that general idiomatic terms are mistaken for specific terms. We demonstrate use of relational data for estimation of term specificity. The specificity of a term can be learned from its distribution of relations with other terms. This technique is useful for identifying relevant words or terms for other natural language processing tasks.
△ Less
Submitted 8 March, 2014;
originally announced March 2014.
-
Algorithmic Diversity for Software Security
Authors:
Michael Stewart
Abstract:
Software diversity protects against a modern-day exploits such as code-reuse attacks. When an attacker designs a code-reuse attack on an example executable, it relies on replicating the target environment. With software diversity, the attacker cannot reliably replicate their target. This is a security benefit which can be applied to massive-scale software distribution. When applied to large-scale…
▽ More
Software diversity protects against a modern-day exploits such as code-reuse attacks. When an attacker designs a code-reuse attack on an example executable, it relies on replicating the target environment. With software diversity, the attacker cannot reliably replicate their target. This is a security benefit which can be applied to massive-scale software distribution. When applied to large-scale communities, an invested attacker may perform analysis of samples to improve the chances of a successful attack (M. Franz).
We present a general NOP-insertion algorithm which can be expanded and customized for security, performance, or other costs. We demonstrate an improvement in security so that a code-reuse attack based on any one variant has minimal chances of success on another and analyse the costs of this method. Alternately, the variants may be customized to meet performance or memory overhead constraints. Deterministic diversification allows for the flexibility to balance these needs in a way that doesn't exist in a random online method.
△ Less
Submitted 13 December, 2013;
originally announced December 2013.