-
CoDi: Conversational Distillation for Grounded Question Answering
Authors:
Patrick Huber,
Arash Einolghozati,
Rylan Conway,
Kanika Narang,
Matt Smith,
Waqar Nayyar,
Adithya Sagar,
Ahmed Aly,
Akshat Shrivastava
Abstract:
Distilling conversational skills into Small Language Models (SLMs) with approximately 1 billion parameters presents significant challenges. Firstly, SLMs have limited capacity in their model parameters to learn extensive knowledge compared to larger models. Secondly, high-quality conversational datasets are often scarce, small, and domain-specific. Addressing these challenges, we introduce a novel…
▽ More
Distilling conversational skills into Small Language Models (SLMs) with approximately 1 billion parameters presents significant challenges. Firstly, SLMs have limited capacity in their model parameters to learn extensive knowledge compared to larger models. Secondly, high-quality conversational datasets are often scarce, small, and domain-specific. Addressing these challenges, we introduce a novel data distillation framework named CoDi (short for Conversational Distillation, pronounced "Cody"), allowing us to synthesize large-scale, assistant-style datasets in a steerable and diverse manner. Specifically, while our framework is task agnostic at its core, we explore and evaluate the potential of CoDi on the task of conversational grounded reasoning for question answering. This is a typical on-device scenario for specialist SLMs, allowing for open-domain model responses, without requiring the model to "memorize" world knowledge in its limited weights. Our evaluations show that SLMs trained with CoDi-synthesized data achieve performance comparable to models trained on human-annotated data in standard metrics. Additionally, when using our framework to generate larger datasets from web data, our models surpass larger, instruction-tuned models in zero-shot conversational grounded reasoning tasks.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Deep Generative Classification of Blood Cell Morphology
Authors:
Simon Deltadahl,
Julian Gilbey,
Christine Van Laer,
Nancy Boeckx,
Mathie Leers,
Tanya Freeman,
Laura Aiken,
Timothy Farren,
Matthew Smith,
Mohamad Zeina,
BloodCounts! consortium,
Concetta Piazzese,
Joseph Taylor,
Nicholas Gleadall,
Carola-Bibiane Schönlieb,
Suthesh Sivapalaratnam,
Michael Roberts,
Parashkev Nachev
Abstract:
Accurate classification of haematological cells is critical for diagnosing blood disorders, but presents significant challenges for machine automation owing to the complexity of cell morphology, heterogeneities of biological, pathological, and imaging characteristics, and the imbalance of cell type frequencies. We introduce CytoDiffusion, a diffusion-based classifier that effectively models blood…
▽ More
Accurate classification of haematological cells is critical for diagnosing blood disorders, but presents significant challenges for machine automation owing to the complexity of cell morphology, heterogeneities of biological, pathological, and imaging characteristics, and the imbalance of cell type frequencies. We introduce CytoDiffusion, a diffusion-based classifier that effectively models blood cell morphology, combining accurate classification with robust anomaly detection, resistance to distributional shifts, interpretability, data efficiency, and superhuman uncertainty quantification. Our approach outperforms state-of-the-art discriminative models in anomaly detection (AUC 0.976 vs. 0.919), resistance to domain shifts (85.85% vs. 74.38% balanced accuracy), and performance in low-data regimes (95.88% vs. 94.95% balanced accuracy). Notably, our model generates synthetic blood cell images that are nearly indistinguishable from real images, as demonstrated by a Turing test in which expert haematologists achieved only 52.3% accuracy (95% CI: [50.5%, 54.2%]). Furthermore, we enhance model explainability through the generation of directly interpretable counterfactual heatmaps. Our comprehensive evaluation framework, encompassing these multiple performance dimensions, establishes a new benchmark for medical image analysis in haematology, ultimately enabling improved diagnostic accuracy in clinical settings. Our code is available at https://github.com/Deltadahl/CytoDiffusion.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy
Authors:
Kartheik G. Iyer,
Mikaeel Yunus,
Charles O'Neill,
Christine Ye,
Alina Hyk,
Kiera McCormick,
Ioana Ciuca,
John F. Wu,
Alberto Accomazzi,
Simone Astarita,
Rishabh Chakrabarty,
Jesse Cranney,
Anjalie Field,
Tirthankar Ghosal,
Michele Ginolfi,
Marc Huertas-Company,
Maja Jablonska,
Sandor Kruk,
Huiling Liu,
Gabriel Marchidan,
Rohit Mistry,
J. P. Naiman,
J. E. G. Peek,
Mugdha Polimera,
Sergio J. Rodriguez
, et al. (5 additional authors not shown)
Abstract:
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.…
▽ More
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 350,000 peer-reviewed papers from the Astrophysics Data System (ADS), Pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool's versatility through case studies, showcasing its application in various research scenarios. The system's performance is evaluated using custom benchmarks, including single-paper and multi-paper tasks. Beyond literature review, Pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g. in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying AI to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
The Llama 3 Herd of Models
Authors:
Abhimanyu Dubey,
Abhinav Jauhri,
Abhinav Pandey,
Abhishek Kadian,
Ahmad Al-Dahle,
Aiesha Letman,
Akhil Mathur,
Alan Schelten,
Amy Yang,
Angela Fan,
Anirudh Goyal,
Anthony Hartshorn,
Aobo Yang,
Archi Mitra,
Archie Sravankumar,
Artem Korenev,
Arthur Hinsvark,
Arun Rao,
Aston Zhang,
Aurelien Rodriguez,
Austen Gregerson,
Ava Spataru,
Baptiste Roziere,
Bethany Biron,
Binh Tang
, et al. (510 additional authors not shown)
Abstract:
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical…
▽ More
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
△ Less
Submitted 15 August, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias
Authors:
Sribala Vidyadhari Chinta,
Zichong Wang,
Xingyu Zhang,
Thang Doan Viet,
Ayesha Kashif,
Monique Antoinette Smith,
Wenbin Zhang
Abstract:
Artificial intelligence (AI) is rapidly advancing in healthcare, enhancing the efficiency and effectiveness of services across various specialties, including cardiology, ophthalmology, dermatology, emergency medicine, etc. AI applications have significantly improved diagnostic accuracy, treatment personalization, and patient outcome predictions by leveraging technologies such as machine learning,…
▽ More
Artificial intelligence (AI) is rapidly advancing in healthcare, enhancing the efficiency and effectiveness of services across various specialties, including cardiology, ophthalmology, dermatology, emergency medicine, etc. AI applications have significantly improved diagnostic accuracy, treatment personalization, and patient outcome predictions by leveraging technologies such as machine learning, neural networks, and natural language processing. However, these advancements also introduce substantial ethical and fairness challenges, particularly related to biases in data and algorithms. These biases can lead to disparities in healthcare delivery, affecting diagnostic accuracy and treatment outcomes across different demographic groups. This survey paper examines the integration of AI in healthcare, highlighting critical challenges related to bias and exploring strategies for mitigation. We emphasize the necessity of diverse datasets, fairness-aware algorithms, and regulatory frameworks to ensure equitable healthcare delivery. The paper concludes with recommendations for future research, advocating for interdisciplinary approaches, transparency in AI decision-making, and the development of innovative and inclusive AI applications.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Amplifying the Kinematics of Origami Mechanisms With Spring Joints
Authors:
Malcolm Smith
Abstract:
Due to its rigid foldability and predictable kinematics, the reverse fold is the fundamental mechanism behind some of the most well known origami kinematic structures, including the Miura Ori, Yoshimura, and waterbomb patterns. However, the reverse fold only has one parameter to control its behavior: the starting fold angle. In this paper I introduce an alternative to the traditional reverse fold,…
▽ More
Due to its rigid foldability and predictable kinematics, the reverse fold is the fundamental mechanism behind some of the most well known origami kinematic structures, including the Miura Ori, Yoshimura, and waterbomb patterns. However, the reverse fold only has one parameter to control its behavior: the starting fold angle. In this paper I introduce an alternative to the traditional reverse fold, based on the spring into action pattern, called the spring joint. This novel rigidly foldable mechanism is able to couple multiple reverse folds into a compact space to amplify the kinematic output of a traditional reverse fold by up to ten times, and to add one parameter for each reverse fold, giving more programmatic control of origami structures. Methods of parameterizing both the starting angle, the path of travel, and the axis of motion are also introduced. Unfortunately, this versatility comes at the cost of a large buildup of layers, making the spring joint impractical for thick origami mechanisms. To solve this problem, I also introduce a modular alternative to the spring joint that has no additional layers, with the same kinematic properties. Both of these mechanisms are tested as replacements for the reverse fold in both traditional and custom origami structures.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Maritime Tracking Data Analysis and Integration with AISdb
Authors:
Gabriel Spadon,
Jay Kumar,
Jinkun Chen,
Matthew Smith,
Casey Hilliard,
Sarah Vela,
Romina Gehrmann,
Claudio DiBacco,
Stan Matwin,
Ronald Pelot
Abstract:
Efficiently handling Automatic Identification System (AIS) data is vital for enhancing maritime safety and navigation, yet is hindered by the system's high volume and error-prone datasets. This paper introduces the Automatic Identification System Database (AISdb), a novel tool designed to address the challenges of processing and analyzing AIS data. AISdb is a comprehensive, open-source platform th…
▽ More
Efficiently handling Automatic Identification System (AIS) data is vital for enhancing maritime safety and navigation, yet is hindered by the system's high volume and error-prone datasets. This paper introduces the Automatic Identification System Database (AISdb), a novel tool designed to address the challenges of processing and analyzing AIS data. AISdb is a comprehensive, open-source platform that enables the integration of AIS data with environmental datasets, thus enriching analyses of vessel movements and their environmental impacts. By facilitating AIS data collection, cleaning, and spatio-temporal querying, AISdb significantly advances AIS data research. Utilizing AIS data from various sources, AISdb demonstrates improved handling and analysis of vessel information, contributing to enhancing maritime safety, security, and environmental sustainability efforts.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
A framework for developing a knowledge management platform
Authors:
Marie Lisandra Zepeda Mendoza,
Sonali Agarwal,
James A. Blackshaw,
Vanesa Bol,
Audrey Fazzi,
Filippo Fiorini,
Amy Louise Foreman,
Nancy George,
Brett R. Johnson,
Brian Martin,
Dave McComb,
Euphemia Mutasa-Gottgens,
Helen Parkinson,
Martin Romacker,
Rolf Russell,
Valérien Ségard,
Shawn Zheng Kai Tan,
Wei Kheng Teh,
F. P. Winstanley,
Benedict Wong,
Adrian M. Smith
Abstract:
Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu…
▽ More
Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide guidance on envisioning, executing, evaluating, and evolving knowledge management platforms. We emphasize essential considerations such as setting knowledge domain boundaries and measuring success, as well as the importance of making knowledge accessible for downstream applications and non-computational users and highlights necessary personal and organizational skills for success. We stress the importance of collaboration and the need for convergence on shared principles and commitment to provide or seek resources to advance KM. The community is invited to join the journey of KM and contribute to the advancement of the field by applying and improving on the guidelines described.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Visual instrument co-design embracing the unique movement capabilities of a dancer with physical disability
Authors:
Sam Trolland,
Melinda Smith,
Alon Ilsar,
Jon McCormack
Abstract:
This paper explores the design of an expressive visual instrument that embraces the unique movement style of a dancer living with physical disability. Through a collaboration between the dancer and an interaction designer/visual artist, the creative qualities of wearable devices for motion tracking are investigated, with emphasis on integrating the dancer's specific movement capabilities with thei…
▽ More
This paper explores the design of an expressive visual instrument that embraces the unique movement style of a dancer living with physical disability. Through a collaboration between the dancer and an interaction designer/visual artist, the creative qualities of wearable devices for motion tracking are investigated, with emphasis on integrating the dancer's specific movement capabilities with their creative goals. The affordances of this technology for imagining new forms of creative expression play a critical role in the design process. These themes are drawn together through an experiential performance which augments an improvised dance with an ephemeral real-time visualisation of the performer's movements. Through practice-based research, the design, development and presentation of this performance work is examined as a 'testbed' for new ideas, allowing for the exploration of HCI concepts within a creative context. This paper outlines the creative process behind the development of the work, the insights derived from the practice-based research enquiry, and the role of movement technology in encouraging new ways of moving through creative expression.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Aya 23: Open Weight Releases to Further Multilingual Progress
Authors:
Viraat Aryabumi,
John Dang,
Dwarak Talupuru,
Saurabh Dash,
David Cairuz,
Hangyu Lin,
Bharat Venkitesh,
Madeline Smith,
Jon Ander Campos,
Yi Chern Tan,
Kelly Marchisio,
Max Bartolo,
Sebastian Ruder,
Acyr Locatelli,
Julia Kreutzer,
Nick Frosst,
Aidan Gomez,
Phil Blunsom,
Marzieh Fadaee,
Ahmet Üstün,
Sara Hooker
Abstract:
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin…
▽ More
This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modeling capabilities to approximately half of the world's population. The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs breadth, exploring the impact of allocating more capacity to fewer languages that are included during pre-training. Aya 23 outperforms both previous massively multilingual models like Aya 101 for the languages it covers, as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks. We release the open weights for both the 8B and 35B models as part of our continued commitment for expanding access to multilingual progress.
△ Less
Submitted 31 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
AstroPT: Scaling Large Observation Models for Astronomy
Authors:
Michael J. Smith,
Ryan J. Roberts,
Eirini Angeloudi,
Marc Huertas-Company
Abstract:
This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find t…
▽ More
This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models. We also find that the models' performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point. We believe that collaborative community development paves the best route towards realising an open source `Large Observation Model' -- a model trained on data taken from the observational sciences at the scale seen in natural language processing. To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models
Authors:
Yi-Lin Tuan,
Xilun Chen,
Eric Michael Smith,
Louis Martin,
Soumya Batra,
Asli Celikyilmaz,
William Yang Wang,
Daniel M. Bikel
Abstract:
As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, an…
▽ More
As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, and hurting users' mental health. In this work, we propose to balance safety and helpfulness in diverse use cases by controlling both attributes in LLM. We explore training-free and fine-tuning methods that do not require extra human annotations and analyze the challenges of controlling safety and helpfulness in LLMs. Our experiments demonstrate that our method can rewind a learned model and unlock its controllability.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Universal Bovine Identification via Depth Data and Deep Metric Learning
Authors:
Asheesh Sharma,
Lucy Randewich,
William Andrew,
Sion Hannuna,
Neill Campbell,
Siobhan Mullan,
Andrew W. Dowsey,
Melvyn Smith,
Mark Hansen,
Tilo Burghardt
Abstract:
This paper proposes and evaluates, for the first time, a top-down (dorsal view), depth-only deep learning system for accurately identifying individual cattle and provides associated code, datasets, and training weights for immediate reproducibility. An increase in herd size skews the cow-to-human ratio at the farm and makes the manual monitoring of individuals more challenging. Therefore, real-tim…
▽ More
This paper proposes and evaluates, for the first time, a top-down (dorsal view), depth-only deep learning system for accurately identifying individual cattle and provides associated code, datasets, and training weights for immediate reproducibility. An increase in herd size skews the cow-to-human ratio at the farm and makes the manual monitoring of individuals more challenging. Therefore, real-time cattle identification is essential for the farms and a crucial step towards precision livestock farming. Underpinned by our previous work, this paper introduces a deep-metric learning method for cattle identification using depth data from an off-the-shelf 3D camera. The method relies on CNN and MLP backbones that learn well-generalised embedding spaces from the body shape to differentiate individuals -- requiring neither species-specific coat patterns nor close-up muzzle prints for operation. The network embeddings are clustered using a simple algorithm such as $k$-NN for highly accurate identification, thus eliminating the need to retrain the network for enrolling new individuals. We evaluate two backbone architectures, ResNet, as previously used to identify Holstein Friesians using RGB images, and PointNet, which is specialised to operate on 3D point clouds. We also present CowDepth2023, a new dataset containing 21,490 synchronised colour-depth image pairs of 99 cows, to evaluate the backbones. Both ResNet and PointNet architectures, which consume depth maps and point clouds, respectively, led to high accuracy that is on par with the coat pattern-based backbone.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Cycling on the Freeway: The Perilous State of Open Source Neuroscience Software
Authors:
Britta U. Westner,
Daniel R. McCloy,
Eric Larson,
Alexandre Gramfort,
Daniel S. Katz,
Arfon M. Smith,
invited co-signees
Abstract:
Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - o…
▽ More
Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - or to generate such data if we work with simulations. In recent years there has been a shift toward relying on free, open-source scientific software (FOSSS) for neuroscience data analysis (Poldrack et al., 2019), in line with the broader open science movement in academia (McKiernan et al., 2016) and wider industry trends (Eghbal, 2016). Importantly, FOSSS is typically developed by working scientists (not professional software developers) which sets up a precarious situation given the nature of the typical academic workplace (wherein academics, especially in their early careers, are on short and fixed term contracts). In this paper, we will argue that the existing ecosystem of neuroscientific open source software is brittle, and discuss why and how the neuroscience community needs to come together to ensure a healthy growth of our software landscape to the benefit of all.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
MineXR: Mining Personalized Extended Reality Interfaces
Authors:
Hyunsung Cho,
Yukang Yan,
Kashyap Todi,
Mark Parent,
Missie Smith,
Tanya R. Jonker,
Hrvoje Benko,
David Lindlbauer
Abstract:
Extended Reality (XR) interfaces offer engaging user experiences, but their effective design requires a nuanced understanding of user behavior and preferences. This knowledge is challenging to obtain without the widespread adoption of XR devices. We introduce MineXR, a design mining workflow and data analysis platform for collecting and analyzing personalized XR user interaction and experience dat…
▽ More
Extended Reality (XR) interfaces offer engaging user experiences, but their effective design requires a nuanced understanding of user behavior and preferences. This knowledge is challenging to obtain without the widespread adoption of XR devices. We introduce MineXR, a design mining workflow and data analysis platform for collecting and analyzing personalized XR user interaction and experience data. MineXR enables elicitation of personalized interfaces from participants of a data collection: for any particular context, participants create interface elements using application screenshots from their own smartphone, place them in the environment, and simultaneously preview the resulting XR layout on a headset. Using MineXR, we contribute a dataset of personalized XR interfaces collected from 31 participants, consisting of 695 XR widgets created from 178 unique applications. We provide insights for XR widget functionalities, categories, clusters, UI element types, and placement. Our open-source tools and data support researchers and designers in developing future XR interfaces.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Anomaly Detection in Offshore Wind Turbine Structures using Hierarchical Bayesian Modelling
Authors:
S. M. Smith,
A. J. Hughes,
T. A. Dardeno,
L. A. Bull,
N. Dervilis,
K. Worden
Abstract:
Population-based structural health monitoring (PBSHM), aims to share information between members of a population. An offshore wind (OW) farm could be considered as a population of nominally-identical wind-turbine structures. However, benign variations exist among members, such as geometry, sea-bed conditions and temperature differences. These factors could influence structural properties and there…
▽ More
Population-based structural health monitoring (PBSHM), aims to share information between members of a population. An offshore wind (OW) farm could be considered as a population of nominally-identical wind-turbine structures. However, benign variations exist among members, such as geometry, sea-bed conditions and temperature differences. These factors could influence structural properties and therefore the dynamic response, making it more difficult to detect structural problems via traditional SHM techniques. This paper explores the use of a hierarchical Bayesian model to infer expected soil stiffness distributions at both population and local levels, as a basis to perform anomaly detection, in the form of scour, for new and existing turbines. To do this, observations of natural frequency will be generated as though they are from a small population of wind turbines. Differences between individual observations will be introduced by postulating distributions over the soil stiffness and measurement noise, as well as reducing soil depth (to represent scour), in the case of anomaly detection.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Robust Quantification of Percent Emphysema on CT via Domain Attention: the Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study
Authors:
Xuzhe Zhang,
Elsa D. Angelini,
Eric A. Hoffman,
Karol E. Watson,
Benjamin M. Smith,
R. Graham Barr,
Andrew F. Laine
Abstract:
Robust quantification of pulmonary emphysema on computed tomography (CT) remains challenging for large-scale research studies that involve scans from different scanner types and for translation to clinical scans. Existing studies have explored several directions to tackle this challenge, including density correction, noise filtering, regression, hidden Markov measure field (HMMF) model-based segme…
▽ More
Robust quantification of pulmonary emphysema on computed tomography (CT) remains challenging for large-scale research studies that involve scans from different scanner types and for translation to clinical scans. Existing studies have explored several directions to tackle this challenge, including density correction, noise filtering, regression, hidden Markov measure field (HMMF) model-based segmentation, and volume-adjusted lung density. Despite some promising results, previous studies either required a tedious workflow or limited opportunities for downstream emphysema subtyping, limiting efficient adaptation on a large-scale study. To alleviate this dilemma, we developed an end-to-end deep learning framework based on an existing HMMF segmentation framework. We first demonstrate that a regular UNet cannot replicate the existing HMMF results because of the lack of scanner priors. We then design a novel domain attention block to fuse image feature with quantitative scanner priors which significantly improves the results.
△ Less
Submitted 6 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings
Authors:
Rajeev V. Rikhye,
Aaron Loh,
Grace Eunhae Hong,
Preeti Singh,
Margaret Ann Smith,
Vijaytha Muralidharan,
Doris Wong,
Rory Sayres,
Michelle Phung,
Nicolas Betancourt,
Bradley Fong,
Rachna Sahasrabudhe,
Khoban Nasim,
Alec Eschholz,
Basil Mustafa,
Jan Freyberg,
Terry Spitz,
Yossi Matias,
Greg S. Corrado,
Katherine Chou,
Dale R. Webster,
Peggy Bui,
Yuan Liu,
Yun Liu,
Justin Ko
, et al. (1 additional authors not shown)
Abstract:
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generali…
▽ More
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generalizable AI that can aid in the diagnosis of skin conditions across a variety of clinical settings. In this retrospective study, we demonstrate that differences in skin condition distribution, rather than in demographics or image capture mode are the main source of errors when an AI algorithm is evaluated on data from a previously unseen source. We demonstrate a series of steps to close this generalization gap, requiring progressively more information about the new source, ranging from the condition distribution to training data enriched for data less frequently seen during training. Our results also suggest comparable performance from end-to-end fine tuning versus fine tuning solely the classification layer on top of a frozen embedding model. Our approach can inform the adaptation of AI algorithms to new settings, based on the information and resources available.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Convolutional Neural Network Ensemble Learning for Hyperspectral Imaging-based Blackberry Fruit Ripeness Detection in Uncontrolled Farm Environment
Authors:
Chollette C. Olisah,
Ben Trewhella,
Bo Li,
Melvyn L. Smith,
Benjamin Winstone,
E. Charles Whitfield,
Felicidad Fernández Fernández,
Harriet Duncalfe
Abstract:
Fruit ripeness estimation models have for decades depended on spectral index features or colour-based features, such as mean, standard deviation, skewness, colour moments, and/or histograms for learning traits of fruit ripeness. Recently, few studies have explored the use of deep learning techniques to extract features from images of fruits with visible ripeness cues. However, the blackberry (Rubu…
▽ More
Fruit ripeness estimation models have for decades depended on spectral index features or colour-based features, such as mean, standard deviation, skewness, colour moments, and/or histograms for learning traits of fruit ripeness. Recently, few studies have explored the use of deep learning techniques to extract features from images of fruits with visible ripeness cues. However, the blackberry (Rubus fruticosus) fruit does not show obvious and reliable visible traits of ripeness when mature and therefore poses great difficulty to fruit pickers. The mature blackberry, to the human eye, is black before, during, and post-ripening. To address this engineering application challenge, this paper proposes a novel multi-input convolutional neural network (CNN) ensemble classifier for detecting subtle traits of ripeness in blackberry fruits. The multi-input CNN was created from a pre-trained visual geometry group 16-layer deep convolutional network (VGG16) model trained on the ImageNet dataset. The fully connected layers were optimized for learning traits of ripeness of mature blackberry fruits. The resulting model served as the base for building homogeneous ensemble learners that were ensemble using the stack generalization ensemble (SGE) framework. The input to the network is images acquired with a stereo sensor using visible and near-infrared (VIS-NIR) spectral filters at wavelengths of 700 nm and 770 nm. Through experiments, the proposed model achieved 95.1% accuracy on unseen sets and 90.2% accuracy with in-field conditions. Further experiments reveal that machine sensory is highly and positively correlated to human sensory over blackberry fruit skin texture.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support System
Authors:
Chollette Olisah,
Lyndon Smith,
Melvyn Smith,
Lawrence Morolake,
Osi Ojukwu
Abstract:
Crop yield prediction has been modeled on the assumption that there is no interaction between weather and soil variables. However, this paper argues that an interaction exists, and it can be finely modelled using the Kendall Correlation coefficient. Given the nonlinearity of the interaction between weather and soil variables, a deep neural network regressor (DNNR) is carefully designed with consid…
▽ More
Crop yield prediction has been modeled on the assumption that there is no interaction between weather and soil variables. However, this paper argues that an interaction exists, and it can be finely modelled using the Kendall Correlation coefficient. Given the nonlinearity of the interaction between weather and soil variables, a deep neural network regressor (DNNR) is carefully designed with consideration to the depth, number of neurons of the hidden layers, and the hyperparameters with their optimizations. Additionally, a new metric, the average of absolute root squared error (ARSE) is proposed to combine the strengths of root mean square error (RMSE) and mean absolute error (MAE). With the ARSE metric, the proposed DNNR(s), optimised random forest regressor (RFR) and the extreme gradient boosting regressor (XGBR) achieved impressively small yield errors, 0.0172 t/ha, and 0.0243 t/ha, 0.0001 t/ha, and 0.001 t/ha, respectively. However, the DNNR(s), with changes to the explanatory variables to ensure generalizability to unforeseen data, DNNR(s) performed best. Further analysis reveals that a strong interaction does exist between weather and soil variables. Precisely, yield is observed to increase when precipitation is reduced and silt increased, and vice-versa. However, the degree of decrease or increase is not quantified in this paper. Contrary to existing yield models targeted towards agricultural policies and global food security, the goal of the proposed corn yield model is to empower the smallholder farmer to farm smartly and intelligently, thus the prediction model is integrated into a mobile application that includes education, and a farmer-to-market access module.
△ Less
Submitted 12 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
Authors:
Ernest Perkowski,
Rui Pan,
Tuan Dung Nguyen,
Yuan-Sen Ting,
Sandor Kruk,
Tong Zhang,
Charlie O'Neill,
Maja Jablonska,
Zechang Sun,
Michael J. Smith,
Huiling Liu,
Kevin Schawinski,
Kartheik Iyer,
Ioana Ciucă for UniverseTBD
Abstract:
We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like…
▽ More
We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like GPT-4 excel in broader question-answering scenarios due to superior reasoning capabilities, our findings suggest that continual pre-training with limited resources can still enhance model performance on specialized topics. Additionally, we present an extension of AstroLLaMA: the fine-tuning of the 7B LLaMA model on a domain-specific conversational dataset, culminating in the release of the chat-enabled AstroLLaMA for community use. Comprehensive quantitative benchmarking is currently in progress and will be detailed in an upcoming full paper. The model, AstroLLaMA-Chat, is now available at https://huggingface.co/universeTBD, providing the first open-source conversational AI tool tailored for the astronomy community.
△ Less
Submitted 5 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
ROBBIE: Robust Bias Evaluation of Large Generative Language Models
Authors:
David Esiobu,
Xiaoqing Tan,
Saghar Hosseini,
Megan Ung,
Yuchen Zhang,
Jude Fernandes,
Jane Dwivedi-Yu,
Eleonora Presani,
Adina Williams,
Eric Michael Smith
Abstract:
As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensur…
▽ More
As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensure equal and equitable treatment of marginalized demographic groups. In this work, our focus is two-fold:
(1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs. Out of those 6 metrics, AdvPromptSet and HolisticBiasR are novel datasets proposed in the paper. The comparison of those benchmarks gives us insights about the bias and toxicity of the compared models. Therefore, we explore the frequency of demographic terms in common LLM pre-training corpora and how this may relate to model biases.
(2) Mitigation: we conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements. ROBBIE aims to provide insights for practitioners while deploying a model, emphasizing the need to not only measure potential harms, but also understand how they arise by characterizing the data, mitigate harms once found, and balance any trade-offs. We open-source our analysis code in hopes of encouraging broader measurements of bias in future LLMs.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Statistical Complexity of Heterogeneous Geometric Networks
Authors:
Keith Malcolm Smith,
Jason P. Smith
Abstract:
Heterogeneity and geometry are key explanatory components underlying the structure of real-world networks. The relationship between these components and the statistical complexity of networks is not well understood. We introduce a parsimonious normalised measure of statistical complexity for networks -- normalised hierarchical complexity. The measure is trivially 0 in regular graphs and we prove t…
▽ More
Heterogeneity and geometry are key explanatory components underlying the structure of real-world networks. The relationship between these components and the statistical complexity of networks is not well understood. We introduce a parsimonious normalised measure of statistical complexity for networks -- normalised hierarchical complexity. The measure is trivially 0 in regular graphs and we prove that this measure tends to 0 in Erdös-Rényi random graphs in the thermodynamic limit. We go on to demonstrate that greater complexity arises from the combination of hierarchical and geometric components to the network structure than either on their own. Further, the levels of complexity achieved are similar to those found in many real-world networks. We also find that real world networks establish connections in a way which increases hierarchical complexity and which our null models and a range of attachment mechanisms fail to explain. This underlines the non-trivial nature of statistical complexity in real-world networks and provides foundations for the comparative analysis of network complexity within and across disciplines.
△ Less
Submitted 29 February, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Authors:
Seungwhan Moon,
Andrea Madotto,
Zhaojiang Lin,
Tushar Nagarajan,
Matt Smith,
Shashank Jain,
Chun-Fu Yeh,
Prakash Murugesan,
Peyman Heidari,
Yue Liu,
Kavya Srinet,
Babak Damavandi,
Anuj Kumar
Abstract:
We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a…
▽ More
We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Improving Section 230, Preserving Democracy and Protecting Free Speech
Authors:
Marshall Van Alstyne,
Michael David Smith,
Herb Lin
Abstract:
This article proposes a framework for content moderation based on a decentralized market where no one party, neither governments nor firms, controls the flow of information.
This article proposes a framework for content moderation based on a decentralized market where no one party, neither governments nor firms, controls the flow of information.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
EarthPT: a time series foundation model for Earth Observation
Authors:
Michael J. Smith,
Luke Fleming,
James E. Geach
Abstract:
We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm r…
▽ More
We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'
△ Less
Submitted 11 January, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages
Authors:
Benjamin Muller,
Belen Alastruey,
Prangthip Hansanti,
Elahe Kalbassi,
Christophe Ropers,
Eric Michael Smith,
Adina Williams,
Luke Zettlemoyer,
Pierre Andrews,
Marta R. Costa-jussà
Abstract:
Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting wil…
▽ More
Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting will enable further mitigation, e.g., via data augmentation. This paper describes the Gender-GAP Pipeline (for Gender-Aware Polyglot Pipeline), an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation. Having unbalanced datasets may indirectly optimize our systems towards outperforming one gender over the others. We suggest introducing our gender quantification pipeline in current datasets and, ideally, modifying them toward a balanced representation.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
AI ATAC 1: An Evaluation of Prominent Commercial Malware Detectors
Authors:
Robert A. Bridges,
Brian Weber,
Justin M. Beaver,
Jared M. Smith,
Miki E. Verma,
Savannah Norem,
Kevin Spakes,
Cory Watson,
Jeff A. Nichols,
Brian Jewell,
Michael. D. Iannacone,
Chelsey Dunivan Stahl,
Kelly M. T. Huffer,
T. Sean Oesch
Abstract:
This work presents an evaluation of six prominent commercial endpoint malware detectors, a network malware detector, and a file-conviction algorithm from a cyber technology vendor. The evaluation was administered as the first of the Artificial Intelligence Applications to Autonomous Cybersecurity (AI ATAC) prize challenges, funded by / completed in service of the US Navy. The experiment employed 1…
▽ More
This work presents an evaluation of six prominent commercial endpoint malware detectors, a network malware detector, and a file-conviction algorithm from a cyber technology vendor. The evaluation was administered as the first of the Artificial Intelligence Applications to Autonomous Cybersecurity (AI ATAC) prize challenges, funded by / completed in service of the US Navy. The experiment employed 100K files (50/50% benign/malicious) with a stratified distribution of file types, including ~1K zero-day program executables (increasing experiment size two orders of magnitude over previous work). We present an evaluation process of delivering a file to a fresh virtual machine donning the detection technology, waiting 90s to allow static detection, then executing the file and waiting another period for dynamic detection; this allows greater fidelity in the observational data than previous experiments, in particular, resource and time-to-detection statistics. To execute all 800K trials (100K files $\times$ 8 tools), a software framework is designed to choreographed the experiment into a completely automated, time-synced, and reproducible workflow with substantial parallelization. A cost-benefit model was configured to integrate the tools' recall, precision, time to detection, and resource requirements into a single comparable quantity by simulating costs of use. This provides a ranking methodology for cyber competitions and a lens through which to reason about the varied statistical viewpoints of the results. These statistical and cost-model results provide insights on state of commercial malware detection.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Large Skew-t Copula Models and Asymmetric Dependence in Intraday Equity Returns
Authors:
Lin Deng,
Michael Stanley Smith,
Worapree Maneesoonthorn
Abstract:
Skew-t copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we…
▽ More
Skew-t copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A stochastic gradient ascent algorithm is used to solve the variational optimization. The methodology is used to estimate skew-t factor copula models with up to 15 factors for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. In a moving window study we show that the asymmetric dependencies also vary over time, and that intraday predictive densities from the skew-t copula are more accurate than those from benchmark copula models. Portfolio selection strategies based on the estimated pairwise asymmetric dependencies improve performance relative to the index.
△ Less
Submitted 2 July, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Authors:
Zhewei Yao,
Reza Yazdani Aminabadi,
Olatunji Ruwase,
Samyam Rajbhandari,
Xiaoxia Wu,
Ammar Ahmad Awan,
Jeff Rasley,
Minjia Zhang,
Conglong Li,
Connor Holmes,
Zhongzhu Zhou,
Michael Wyatt,
Molly Smith,
Lev Kurilenko,
Heyang Qin,
Masahiro Tanaka,
Shuai Che,
Shuaiwen Leon Song,
Yuxiong He
Abstract:
ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at…
▽ More
ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Start Your EM(otion En)gine: Towards Computational Models of Emotion for Improving the Believability of Video Game Non-Player Characters
Authors:
Geneva M. Smith
Abstract:
Believable Non-Player Characters (NPCs) help motivate player engagement with narrative-driven games. An important aspect of believable characters is their contextually-relevant reactions to changing situations, which emotion often drives in humans. Therefore, giving NPCs "emotion" should enhance their believability. For adoption in industry, it is important to create tool development processes to…
▽ More
Believable Non-Player Characters (NPCs) help motivate player engagement with narrative-driven games. An important aspect of believable characters is their contextually-relevant reactions to changing situations, which emotion often drives in humans. Therefore, giving NPCs "emotion" should enhance their believability. For adoption in industry, it is important to create tool development processes to build NPCs "with emotion" that fit current development practices. Psychological validity-the grounding in affective science-is a necessary quality for plausible emotion-driven NPC behaviours. Computational Models of Emotion (CMEs) are one solution because they use at least one affective theory/model in their design. However, CME development tends to be under documented so that its processes seem unsystematic and poorly defined. This makes it difficult to reuse a CME's components, extend or scale them, or compare CMEs. This work draws from software engineering to propose three methods for acknowledging and limiting subjectivity in CME development to improve their reusability, maintainability, and verifiability: a systematic, document analysis-based methodology for choosing a CME's underlying affective theories/models using its high-level design goals and design scope, which critically influence a CME's functional requirements; an approach for transforming natural language descriptions of affective theories into a type-based formal model using an intermediate, second natural language description refining the original descriptions and showing where and what assumptions informed the formalization; and a literary character analysis-based methodology for developing acceptance test cases with known believable characters from professionally-crafted stories that do not rely on specific CME designs. Development of EMgine, a game development CME for generating NPC emotions, shows these methods in practice.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Authors:
Hugo Touvron,
Louis Martin,
Kevin Stone,
Peter Albert,
Amjad Almahairi,
Yasmine Babaei,
Nikolay Bashlykov,
Soumya Batra,
Prajjwal Bhargava,
Shruti Bhosale,
Dan Bikel,
Lukas Blecher,
Cristian Canton Ferrer,
Moya Chen,
Guillem Cucurull,
David Esiobu,
Jude Fernandes,
Jeremy Fu,
Wenyin Fu,
Brian Fuller,
Cynthia Gao,
Vedanuj Goswami,
Naman Goyal,
Anthony Hartshorn,
Saghar Hosseini
, et al. (43 additional authors not shown)
Abstract:
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be…
▽ More
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
△ Less
Submitted 19 July, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Explaining Reinforcement Learning with Shapley Values
Authors:
Daniel Beechey,
Thomas M. S. Smith,
Özgür Şimşek
Abstract:
For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Rei…
▽ More
For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Reinforcement Learning (SVERL). Our analysis exposes the limitations of earlier uses of Shapley values in reinforcement learning. We then develop an approach that uses Shapley values to explain agent performance. In a variety of domains, SVERL produces meaningful explanations that match and supplement human intuition.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Improving Open Language Models by Learning from Organic Interactions
Authors:
Jing Xu,
Da Ju,
Joshua Lane,
Mojtaba Komeili,
Eric Michael Smith,
Megan Ung,
Morteza Behrooz,
William Ngan,
Rashel Moritz,
Sainbayar Sukhbaatar,
Y-Lan Boureau,
Jason Weston,
Kurt Shuster
Abstract:
We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org…
▽ More
We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with organic data is challenging because interactions with people "in the wild" include both high quality conversations and feedback, as well as adversarial and toxic behavior. We study techniques that enable learning from helpful teachers while avoiding learning from people who are trying to trick the model into unhelpful or toxic responses. BlenderBot 3x is both preferred in conversation to BlenderBot 3, and is shown to produce safer responses in challenging situations. While our current models are still far from perfect, we believe further improvement can be achieved by continued use of the techniques explored in this work.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Temporal Evolution of Risk Behavior in a Disease Spread Simulation
Authors:
Ollin D. Langle-Chimal,
Scott C. Merrill,
Eric M. Clark,
Gabriela Bucini,
Tung-Lin Liu,
Trisha R. Shrum,
Christopher Koliba,
Asim Zia,
Julia M. Smith,
Nicholas Cheney
Abstract:
Human behavior is a dynamic process that evolves with experience. Understanding the evolution of individual's risk propensity is critical to design public health interventions to propitiate the adoption of better biosecurity protocols and thus, prevent the transmission of an infectious disease. Using an experimental game that simulates the spread of a disease in a network of porcine farms, we meas…
▽ More
Human behavior is a dynamic process that evolves with experience. Understanding the evolution of individual's risk propensity is critical to design public health interventions to propitiate the adoption of better biosecurity protocols and thus, prevent the transmission of an infectious disease. Using an experimental game that simulates the spread of a disease in a network of porcine farms, we measure how learning from experience affects the risk aversion of over $1000$ players. We used a fully automated approach to segment the players into 4 categories based on the temporal trends of their game plays and compare the outcomes of their overall game performance. We found that the risk tolerant group is $50\%$ more likely to incur an infection than the risk averse one. We also find that while all individuals decrease the amount of time it takes to make decisions as they become more experienced at the game, we find a group of players with constant decision strategies who rapidly decrease their time to make a decision and a second context-aware decision group that contemplates longer before decisions while presumably performing a real-time risk assessment. The behavioral strategies employed by players in this simulated setting could be used in the future as an early warning signal to identify undesirable biosecurity-related risk aversion preferences, or changes in behavior, which may allow for targeted interventions to help mitigate them.
△ Less
Submitted 1 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Co-Learning Empirical Games and World Models
Authors:
Max Olan Smith,
Michael P. Wellman
Abstract:
Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a…
▽ More
Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as realized by a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO -- a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Multi-AP Coordinated Spatial Reuse for Wi-Fi 8: Group Creation and Scheduling
Authors:
David Nunez,
Malcom Smith,
Boris Bellalta
Abstract:
Multi-Access Point Coordination (MAPC) will be a key feature in next generation Wi-Fi 8 networks. MAPC aims to improve the overall network performance by allowing Access Points (APs) to share time, frequency and/or spatial resources in a coordinated way, thus alleviating inter-AP contention and enabling new multi-AP channel access strategies. This paper introduces a framework to support periodic M…
▽ More
Multi-Access Point Coordination (MAPC) will be a key feature in next generation Wi-Fi 8 networks. MAPC aims to improve the overall network performance by allowing Access Points (APs) to share time, frequency and/or spatial resources in a coordinated way, thus alleviating inter-AP contention and enabling new multi-AP channel access strategies. This paper introduces a framework to support periodic MAPC transmissions on top of current Wi-Fi operation. We first focus on the problem of creating multi-AP groups that can transmit simultaneously to leverage Spatial Reuse opportunities. Then, once these groups are created, we study different scheduling algorithms to determine which groups will transmit at every MAPC transmission. Two different types of algorithms are tested: per-AP, and per-Group. While per-AP algorithms base their scheduling decision on the buffer state of individual APs, per-Group algorithms do that taking into account the aggregate buffer state of all APs in a group. Obtained results -- targetting worst-case delay -- show that per-AP based algorithms outperform per-Group ones due to their ability to guarantee that the AP with a) more packets, or b) with the oldest waiting packet in the buffer is selected.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Traffic Characteristics of Extended Reality
Authors:
Abdullah Alnajim,
Seyedmohammad Salehi,
Chien-Chung Shen,
Malcolm Smith
Abstract:
This tutorial paper analyzes the traffic characteristics of immersive experiences with extended reality (XR) technologies, including Augmented reality (AR), virtual reality (VR), and mixed reality (MR). The current trend in XR applications is to offload the computation and rendering to an external server and use wireless communications between the XR head-mounted display (HMD) and the access point…
▽ More
This tutorial paper analyzes the traffic characteristics of immersive experiences with extended reality (XR) technologies, including Augmented reality (AR), virtual reality (VR), and mixed reality (MR). The current trend in XR applications is to offload the computation and rendering to an external server and use wireless communications between the XR head-mounted display (HMD) and the access points. This paradigm becomes essential owing to (1) its high flexibility (in terms of user mobility) compared to remote rendering through a wired connection, and (2) the high computing power available on the server compared to local rendering (on HMD). The requirements to facilitate a pleasant XR experience are analyzed in three aspects: capacity (throughput), latency, and reliability. For capacity, two VR experiences are analyzed: a human eye-like experience and an experience with the Oculus Quest 2 HMD. For latency, the key components of the motion-to-photon (MTP) delay are discussed. For reliability, the maximum packet loss rate (or the minimum packet delivery rate) is studied for different XR scenarios. Specifically, the paper reviews optimization techniques that were proposed to reduce the latency, conserve the bandwidth, extend the scalability, and/or increase the reliability to satisfy the stringent requirements of the emerging XR applications.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
Uncertainty estimation in Deep Learning for Panoptic segmentation
Authors:
Michael Smith,
Frank Ferrie
Abstract:
As deep learning-based computer vision algorithms continue to improve and advance the state of the art, their robustness to real-world data continues to lag their performance on datasets. This makes it difficult to bring an algorithm from the lab to the real world. Ensemble-based uncertainty estimation approaches such as Monte Carlo Dropout have been successfully used in many applications in an at…
▽ More
As deep learning-based computer vision algorithms continue to improve and advance the state of the art, their robustness to real-world data continues to lag their performance on datasets. This makes it difficult to bring an algorithm from the lab to the real world. Ensemble-based uncertainty estimation approaches such as Monte Carlo Dropout have been successfully used in many applications in an attempt to address this robustness issue. Unfortunately, it is not always clear if such ensemble-based approaches can be applied to a new problem domain. This is the case with panoptic segmentation, where the structure of the problem and architectures designed to solve it means that unlike image classification or even semantic segmentation, the typical solution of using a mean across samples cannot be directly applied. In this paper, we demonstrate how ensemble-based uncertainty estimation approaches such as Monte Carlo Dropout can be used in the panoptic segmentation domain with no changes to an existing network, providing both improved performance and more importantly a better measure of uncertainty for predictions made by the network. Results are demonstrated quantitatively and qualitatively on the COCO, KITTI-STEP and VIPER datasets.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
A "Perspectival" Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, YouTube, and Wikipedia
Authors:
Queenie Luo,
Michael J. Puett,
Michael D. Smith
Abstract:
Contrary to Google Search's mission of delivering information from "many angles so you can form your own understanding of the world," we find that Google and its most prominent returned results - Wikipedia and YouTube - simply reflect a narrow set of culturally dominant views tied to the search language for complex topics like "Buddhism," "Liberalism," "colonization," "Iran" and "America." Simply…
▽ More
Contrary to Google Search's mission of delivering information from "many angles so you can form your own understanding of the world," we find that Google and its most prominent returned results - Wikipedia and YouTube - simply reflect a narrow set of culturally dominant views tied to the search language for complex topics like "Buddhism," "Liberalism," "colonization," "Iran" and "America." Simply stated, they present, to varying degrees, distinct information across the same search in different languages, a phenomenon we call language bias. This paper presents evidence and analysis of language bias and discusses its larger social implications. We find that our online searches and emerging tools like ChatGPT turn us into the proverbial blind person touching a small portion of an elephant, ignorant of the existence of other cultural perspectives. Language bias sets a strong yet invisible cultural barrier online, where each language group thinks they can see other groups through searches, but in fact, what they see is their own reflection.
△ Less
Submitted 7 March, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Mesh Strikes Back: Fast and Efficient Human Reconstruction from RGB videos
Authors:
Rohit Jena,
Pratik Chaudhari,
James Gee,
Ganesh Iyer,
Siddharth Choudhary,
Brandon M. Smith
Abstract:
Human reconstruction and synthesis from monocular RGB videos is a challenging problem due to clothing, occlusion, texture discontinuities and sharpness, and framespecific pose changes. Many methods employ deferred rendering, NeRFs and implicit methods to represent clothed humans, on the premise that mesh-based representations cannot capture complex clothing and textures from RGB, silhouettes, and…
▽ More
Human reconstruction and synthesis from monocular RGB videos is a challenging problem due to clothing, occlusion, texture discontinuities and sharpness, and framespecific pose changes. Many methods employ deferred rendering, NeRFs and implicit methods to represent clothed humans, on the premise that mesh-based representations cannot capture complex clothing and textures from RGB, silhouettes, and keypoints alone. We provide a counter viewpoint to this fundamental premise by optimizing a SMPL+D mesh and an efficient, multi-resolution texture representation using only RGB images, binary silhouettes and sparse 2D keypoints. Experimental results demonstrate that our approach is more capable of capturing geometric details compared to visual hull, mesh-based methods. We show competitive novel view synthesis and improvements in novel pose synthesis compared to NeRF-based methods, which introduce noticeable, unwanted artifacts. By restricting the solution space to the SMPL+D model combined with differentiable rendering, we obtain dramatic speedups in compute, training times (up to 24x) and inference times (up to 192x). Our method therefore can be used as is or as a fast initialization to NeRF-based methods.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
Authors:
Marc Lanctot,
John Schultz,
Neil Burch,
Max Olan Smith,
Daniel Hennes,
Thomas Anthony,
Julien Perolat
Abstract:
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro…
▽ More
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are intentionally sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several RL, online learning, and language model approaches can learn good counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.
△ Less
Submitted 31 October, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models
Authors:
Weiben Zhang,
Michael Stanley Smith,
Worapree Maneesoonthorn,
Ruben Loaiza-Maya
Abstract:
Stochastic models with global parameters and latent variables are common, and for which variational inference (VI) is popular. However, existing methods are often either slow or inaccurate in high dimensions. We suggest a fast and accurate VI method for this case that employs a well-defined natural gradient variational optimization that targets the joint posterior of the global parameters and late…
▽ More
Stochastic models with global parameters and latent variables are common, and for which variational inference (VI) is popular. However, existing methods are often either slow or inaccurate in high dimensions. We suggest a fast and accurate VI method for this case that employs a well-defined natural gradient variational optimization that targets the joint posterior of the global parameters and latent variables. It is a hybrid method, where at each step the global parameters are updated using the natural gradient and the latent variables are generated from their conditional posterior. A fast to compute expression for the Tikhonov damped Fisher information matrix is used, along with the re-parameterization trick, to provide a stable natural gradient. We apply the approach to deep mixed models, which are an emerging class of Bayesian neural networks with random output layer coefficients to allow for heterogeneity. A range of simulations show that using the natural gradient is substantially more efficient than using the ordinary gradient, and that the approach is faster and more accurate than two cutting-edge natural gradient VI methods. In a financial application we show that accounting for industry level heterogeneity using the deep mixed model improves the accuracy of asset pricing models. MATLAB code to implement the method can be found at: https://github.com/WeibenZhang07/NG-HVI.
△ Less
Submitted 24 July, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Why Target Networks Stabilise Temporal Difference Methods
Authors:
Mattie Fellows,
Matthew J. A. Smith,
Shimon Whiteson
Abstract:
Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process. Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the que…
▽ More
Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process. Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the question: `why do target networks stabilise TD learning'? To do so, we formalise the notion of a partially fitted policy evaluation method, which describes the use of target networks and bridges the gap between fitted methods and semigradient temporal difference algorithms. Using this framework we are able to uniquely characterise the so-called deadly triad - the use of TD updates with (nonlinear) function approximation and off-policy data - which often leads to nonconvergent algorithms. This insight leads us to conclude that the use of target networks can mitigate the effects of poor conditioning in the Jacobian of the TD update. Instead, we show that under mild regularity conditions and a well tuned target network update frequency, convergence can be guaranteed even in the extremely challenging off-policy sampling and nonlinear function approximation setting.
△ Less
Submitted 11 August, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Do We Still Need Clinical Language Models?
Authors:
Eric Lehman,
Evan Hernandez,
Diwakar Mahajan,
Jonas Wulff,
Micah J. Smith,
Zachary Ziegler,
Daniel Nadler,
Peter Szolovits,
Alistair Johnson,
Emily Alsentzer
Abstract:
Although recent advances in scaling large language models (LLMs) have resulted in improvements on many NLP tasks, it remains unclear whether these models trained primarily with general web text are the right tool in highly specialized, safety critical domains such as clinical text. Recent results have suggested that LLMs encode a surprising amount of medical knowledge. This raises an important que…
▽ More
Although recent advances in scaling large language models (LLMs) have resulted in improvements on many NLP tasks, it remains unclear whether these models trained primarily with general web text are the right tool in highly specialized, safety critical domains such as clinical text. Recent results have suggested that LLMs encode a surprising amount of medical knowledge. This raises an important question regarding the utility of smaller domain-specific language models. With the success of general-domain LLMs, is there still a need for specialized clinical models? To investigate this question, we conduct an extensive empirical analysis of 12 language models, ranging from 220M to 175B parameters, measuring their performance on 3 different clinical tasks that test their ability to parse and reason over electronic health records. As part of our experiments, we train T5-Base and T5-Large models from scratch on clinical notes from MIMIC III and IV to directly investigate the efficiency of clinical tokens. We show that relatively small specialized clinical models substantially outperform all in-context learning approaches, even when finetuned on limited annotated data. Further, we find that pretraining on clinical tokens allows for smaller, more parameter-efficient models that either match or outperform much larger language models trained on general text. We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
A Strong Baseline for Batch Imitation Learning
Authors:
Matthew Smith,
Lucas Maystre,
Zhenwen Dai,
Kamil Ciosek
Abstract:
Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our…
▽ More
Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our algorithm requires no additional hyper-parameter tuning beyond any standard batch reinforcement learning (RL) algorithm, making it an ideal baseline for such data-strict regimes. Furthermore, we provide formal sample complexity guarantees for the algorithm in finite Markov Decision Problems. In doing so, we formally demonstrate an unproven claim from Kearns & Singh (1998). On the empirical side, our contribution is twofold. First, we develop a practical, robust and principled evaluation protocol for offline RL methods, making use of only the dataset provided for model selection. This stands in contrast to the vast majority of previous works in offline RL, which tune hyperparameters on the evaluation environment, limiting the practical applicability when deployed in new, cost-critical environments. As such, we establish precedent for the development and fair evaluation of offline RL algorithms. Second, we evaluate our own algorithm on challenging continuous control benchmarks, demonstrating its practical applicability and competitiveness with state-of-the-art performance, despite being a simpler algorithm.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy
Authors:
Michael J. Smith,
James E. Geach
Abstract:
In this review, we explore the historical development and future prospects of artificial intelligence (AI) and deep learning in astronomy. We trace the evolution of connectionism in astronomy through its three waves, from the early use of multilayer perceptrons, to the rise of convolutional and recurrent neural networks, and finally to the current era of unsupervised and generative deep learning m…
▽ More
In this review, we explore the historical development and future prospects of artificial intelligence (AI) and deep learning in astronomy. We trace the evolution of connectionism in astronomy through its three waves, from the early use of multilayer perceptrons, to the rise of convolutional and recurrent neural networks, and finally to the current era of unsupervised and generative deep learning methods. With the exponential growth of astronomical data, deep learning techniques offer an unprecedented opportunity to uncover valuable insights and tackle previously intractable problems. As we enter the anticipated fourth wave of astronomical connectionism, we argue for the adoption of GPT-like foundation models fine-tuned for astronomical applications. Such models could harness the wealth of high-quality, multimodal astronomical data to serve state-of-the-art downstream tasks. To keep pace with advancements driven by Big Tech, we propose a collaborative, open-source approach within the astronomy community to develop and maintain these foundation models, fostering a symbiotic relationship between AI and astronomy that capitalizes on the unique strengths of both fields.
△ Less
Submitted 12 May, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Bayesian Neural Networks for Geothermal Resource Assessment: Prediction with Uncertainty
Authors:
Stephen Brown,
William L. Rodi,
Marco Seracini,
Chen Gu,
Michael Fehler,
James Faulds,
Connor M. Smith,
Sven Treitel
Abstract:
We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) an…
▽ More
We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) and negative training sites (known drill sites with unsuitable geothermal conditions) and use these to constrain and optimize artificial neural networks for this classification task. The main objective is to predict the geothermal resource potential at unknown sites within a large geographic area where the defining features are known. These predictions could be used to target promising areas for further detailed investigations. We describe the evolution of our work from defining a specific neural network architecture to training and optimization trials. Upon analysis we expose the inevitable problems of model variability and resulting prediction uncertainty. Finally, to address these problems we apply the concept of Bayesian neural networks, a heuristic approach to regularization in network training, and make use of the practical interpretation of the formal uncertainty measures they provide.
△ Less
Submitted 25 October, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
Authors:
Kurt Shuster,
Jing Xu,
Mojtaba Komeili,
Da Ju,
Eric Michael Smith,
Stephen Roller,
Megan Ung,
Moya Chen,
Kushal Arora,
Joshua Lane,
Morteza Behrooz,
William Ngan,
Spencer Poff,
Naman Goyal,
Arthur Szlam,
Y-Lan Boureau,
Melanie Kambadur,
Jason Weston
Abstract:
We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc…
▽ More
We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (architecture, model and training scheme), and details of its deployment, including safety mechanisms. Human evaluations show its superiority to existing open-domain dialogue agents, including its predecessors (Roller et al., 2021; Komeili et al., 2022). Finally, we detail our plan for continual learning using the data collected from deployment, which will also be publicly released. The goal of this research program is thus to enable the community to study ever-improving responsible agents that learn through interaction.
△ Less
Submitted 10 August, 2022; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Shallow and Deep Nonparametric Convolutions for Gaussian Processes
Authors:
Thomas M. McDonald,
Magnus Ross,
Michael T. Smith,
Mauricio A. Álvarez
Abstract:
A key challenge in the practical application of Gaussian processes (GPs) is selecting a proper covariance function. The moving average, or process convolutions, construction of GPs allows some additional flexibility, but still requires choosing a proper smoothing kernel, which is non-trivial. Previous approaches have built covariance functions by using GP priors over the smoothing kernel, and by e…
▽ More
A key challenge in the practical application of Gaussian processes (GPs) is selecting a proper covariance function. The moving average, or process convolutions, construction of GPs allows some additional flexibility, but still requires choosing a proper smoothing kernel, which is non-trivial. Previous approaches have built covariance functions by using GP priors over the smoothing kernel, and by extension the covariance, as a way to bypass the need to specify it in advance. However, such models have been limited in several ways: they are restricted to single dimensional inputs, e.g. time; they only allow modelling of single outputs and they do not scale to large datasets since inference is not straightforward. In this paper, we introduce a nonparametric process convolution formulation for GPs that alleviates these weaknesses by using a functional sampling approach based on Matheron's rule to perform fast sampling using interdomain inducing variables. Furthermore, we propose a composition of these nonparametric convolutions that serves as an alternative to classic deep GP models, and allows the covariance functions of the intermediate layers to be inferred from the data. We test the performance of our model on benchmarks for single output GPs, multiple output GPs and deep GPs and find that our approach can provide improvements over standard GP models, particularly for larger datasets.
△ Less
Submitted 18 October, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.