Skip to main content

Showing 1–50 of 199 results for author: Smith, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11219  [pdf, other

    cs.CL cs.AI

    CoDi: Conversational Distillation for Grounded Question Answering

    Authors: Patrick Huber, Arash Einolghozati, Rylan Conway, Kanika Narang, Matt Smith, Waqar Nayyar, Adithya Sagar, Ahmed Aly, Akshat Shrivastava

    Abstract: Distilling conversational skills into Small Language Models (SLMs) with approximately 1 billion parameters presents significant challenges. Firstly, SLMs have limited capacity in their model parameters to learn extensive knowledge compared to larger models. Secondly, high-quality conversational datasets are often scarce, small, and domain-specific. Addressing these challenges, we introduce a novel… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 13 pages

  2. arXiv:2408.08982  [pdf, other

    cs.CV

    Deep Generative Classification of Blood Cell Morphology

    Authors: Simon Deltadahl, Julian Gilbey, Christine Van Laer, Nancy Boeckx, Mathie Leers, Tanya Freeman, Laura Aiken, Timothy Farren, Matthew Smith, Mohamad Zeina, BloodCounts! consortium, Concetta Piazzese, Joseph Taylor, Nicholas Gleadall, Carola-Bibiane Schönlieb, Suthesh Sivapalaratnam, Michael Roberts, Parashkev Nachev

    Abstract: Accurate classification of haematological cells is critical for diagnosing blood disorders, but presents significant challenges for machine automation owing to the complexity of cell morphology, heterogeneities of biological, pathological, and imaging characteristics, and the imbalance of cell type frequencies. We introduce CytoDiffusion, a diffusion-based classifier that effectively models blood… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  3. arXiv:2408.01556  [pdf, other

    astro-ph.IM cs.DL cs.IR

    pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy

    Authors: Kartheik G. Iyer, Mikaeel Yunus, Charles O'Neill, Christine Ye, Alina Hyk, Kiera McCormick, Ioana Ciuca, John F. Wu, Alberto Accomazzi, Simone Astarita, Rishabh Chakrabarty, Jesse Cranney, Anjalie Field, Tirthankar Ghosal, Michele Ginolfi, Marc Huertas-Company, Maja Jablonska, Sandor Kruk, Huiling Liu, Gabriel Marchidan, Rohit Mistry, J. P. Naiman, J. E. G. Peek, Mugdha Polimera, Sergio J. Rodriguez , et al. (5 additional authors not shown)

    Abstract: The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 25 pages, 9 figures, submitted to AAS jorunals. Comments are welcome, and the tools mentioned are available online at https://pfdr.app

  4. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  5. arXiv:2407.19655  [pdf, ps, other

    cs.AI

    AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias

    Authors: Sribala Vidyadhari Chinta, Zichong Wang, Xingyu Zhang, Thang Doan Viet, Ayesha Kashif, Monique Antoinette Smith, Wenbin Zhang

    Abstract: Artificial intelligence (AI) is rapidly advancing in healthcare, enhancing the efficiency and effectiveness of services across various specialties, including cardiology, ophthalmology, dermatology, emergency medicine, etc. AI applications have significantly improved diagnostic accuracy, treatment personalization, and patient outcome predictions by leveraging technologies such as machine learning,… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  6. arXiv:2407.17516  [pdf, other

    cs.RO math.AG

    Amplifying the Kinematics of Origami Mechanisms With Spring Joints

    Authors: Malcolm Smith

    Abstract: Due to its rigid foldability and predictable kinematics, the reverse fold is the fundamental mechanism behind some of the most well known origami kinematic structures, including the Miura Ori, Yoshimura, and waterbomb patterns. However, the reverse fold only has one parameter to control its behavior: the starting fold angle. In this paper I introduce an alternative to the traditional reverse fold,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  7. arXiv:2407.08082  [pdf, other

    cs.DB

    Maritime Tracking Data Analysis and Integration with AISdb

    Authors: Gabriel Spadon, Jay Kumar, Jinkun Chen, Matthew Smith, Casey Hilliard, Sarah Vela, Romina Gehrmann, Claudio DiBacco, Stan Matwin, Ronald Pelot

    Abstract: Efficiently handling Automatic Identification System (AIS) data is vital for enhancing maritime safety and navigation, yet is hindered by the system's high volume and error-prone datasets. This paper introduces the Automatic Identification System Database (AISdb), a novel tool designed to address the challenges of processing and analyzing AIS data. AISdb is a comprehensive, open-source platform th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  8. arXiv:2406.12313  [pdf

    cs.DB

    A framework for developing a knowledge management platform

    Authors: Marie Lisandra Zepeda Mendoza, Sonali Agarwal, James A. Blackshaw, Vanesa Bol, Audrey Fazzi, Filippo Fiorini, Amy Louise Foreman, Nancy George, Brett R. Johnson, Brian Martin, Dave McComb, Euphemia Mutasa-Gottgens, Helen Parkinson, Martin Romacker, Rolf Russell, Valérien Ségard, Shawn Zheng Kai Tan, Wei Kheng Teh, F. P. Winstanley, Benedict Wong, Adrian M. Smith

    Abstract: Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 1 figure

  9. Visual instrument co-design embracing the unique movement capabilities of a dancer with physical disability

    Authors: Sam Trolland, Melinda Smith, Alon Ilsar, Jon McCormack

    Abstract: This paper explores the design of an expressive visual instrument that embraces the unique movement style of a dancer living with physical disability. Through a collaboration between the dancer and an interaction designer/visual artist, the creative qualities of wearable devices for motion tracking are investigated, with emphasis on integrating the dancer's specific movement capabilities with thei… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Preprint of paper accepted at MOCO 24, The 9th International Conference on Movement and Computing, Utrecht, Netherlands, May 30-June 02, 2024

  10. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.14930  [pdf, other

    astro-ph.IM astro-ph.GA cs.LG

    AstroPT: Scaling Large Observation Models for Astronomy

    Authors: Michael J. Smith, Ryan J. Roberts, Eirini Angeloudi, Marc Huertas-Company

    Abstract: This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, 1 table. Code available at https://github.com/Smith42/astroPT

  12. arXiv:2404.01295  [pdf, other

    cs.CL cs.AI

    Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

    Authors: Yi-Lin Tuan, Xilun Chen, Eric Michael Smith, Louis Martin, Soumya Batra, Asli Celikyilmaz, William Yang Wang, Daniel M. Bikel

    Abstract: As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, an… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  13. arXiv:2404.00172  [pdf, other

    cs.CV cs.AI cs.LG

    Universal Bovine Identification via Depth Data and Deep Metric Learning

    Authors: Asheesh Sharma, Lucy Randewich, William Andrew, Sion Hannuna, Neill Campbell, Siobhan Mullan, Andrew W. Dowsey, Melvyn Smith, Mark Hansen, Tilo Burghardt

    Abstract: This paper proposes and evaluates, for the first time, a top-down (dorsal view), depth-only deep learning system for accurately identifying individual cattle and provides associated code, datasets, and training weights for immediate reproducibility. An increase in herd size skews the cow-to-human ratio at the farm and makes the manual monitoring of individuals more challenging. Therefore, real-tim… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: LaTeX, 38 pages, 14 figures, 3 tables

  14. arXiv:2403.19394  [pdf, ps, other

    cs.CY q-bio.OT

    Cycling on the Freeway: The Perilous State of Open Source Neuroscience Software

    Authors: Britta U. Westner, Daniel R. McCloy, Eric Larson, Alexandre Gramfort, Daniel S. Katz, Arfon M. Smith, invited co-signees

    Abstract: Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - o… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  15. MineXR: Mining Personalized Extended Reality Interfaces

    Authors: Hyunsung Cho, Yukang Yan, Kashyap Todi, Mark Parent, Missie Smith, Tanya R. Jonker, Hrvoje Benko, David Lindlbauer

    Abstract: Extended Reality (XR) interfaces offer engaging user experiences, but their effective design requires a nuanced understanding of user behavior and preferences. This knowledge is challenging to obtain without the widespread adoption of XR devices. We introduce MineXR, a design mining workflow and data analysis platform for collecting and analyzing personalized XR user interaction and experience dat… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 17 pages, 18 figures, Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

    ACM Class: H.5.2

  16. arXiv:2402.19295  [pdf, other

    cs.LG

    Anomaly Detection in Offshore Wind Turbine Structures using Hierarchical Bayesian Modelling

    Authors: S. M. Smith, A. J. Hughes, T. A. Dardeno, L. A. Bull, N. Dervilis, K. Worden

    Abstract: Population-based structural health monitoring (PBSHM), aims to share information between members of a population. An offshore wind (OW) farm could be considered as a population of nominally-identical wind-turbine structures. However, benign variations exist among members, such as geometry, sea-bed conditions and temperature differences. These factors could influence structural properties and there… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Submitted to International Workshop on Structural Health Monitoring 2023, Stanford University, California, USA

  17. arXiv:2402.18383  [pdf

    cs.CV

    Robust Quantification of Percent Emphysema on CT via Domain Attention: the Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study

    Authors: Xuzhe Zhang, Elsa D. Angelini, Eric A. Hoffman, Karol E. Watson, Benjamin M. Smith, R. Graham Barr, Andrew F. Laine

    Abstract: Robust quantification of pulmonary emphysema on computed tomography (CT) remains challenging for large-scale research studies that involve scans from different scanner types and for translation to clinical scans. Existing studies have explored several directions to tackle this challenge, including density correction, noise filtering, regression, hidden Markov measure field (HMMF) model-based segme… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 5 pages, 5 figures. Accepted to IEEE International Symposium on Biomedical Imaging 2024 (ISBI 2024). Camera-ready version

  18. arXiv:2402.15566  [pdf

    eess.IV cs.CV cs.LG

    Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings

    Authors: Rajeev V. Rikhye, Aaron Loh, Grace Eunhae Hong, Preeti Singh, Margaret Ann Smith, Vijaytha Muralidharan, Doris Wong, Rory Sayres, Michelle Phung, Nicolas Betancourt, Bradley Fong, Rachna Sahasrabudhe, Khoban Nasim, Alec Eschholz, Basil Mustafa, Jan Freyberg, Terry Spitz, Yossi Matias, Greg S. Corrado, Katherine Chou, Dale R. Webster, Peggy Bui, Yuan Liu, Yun Liu, Justin Ko , et al. (1 additional authors not shown)

    Abstract: Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generali… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  19. Convolutional Neural Network Ensemble Learning for Hyperspectral Imaging-based Blackberry Fruit Ripeness Detection in Uncontrolled Farm Environment

    Authors: Chollette C. Olisah, Ben Trewhella, Bo Li, Melvyn L. Smith, Benjamin Winstone, E. Charles Whitfield, Felicidad Fernández Fernández, Harriet Duncalfe

    Abstract: Fruit ripeness estimation models have for decades depended on spectral index features or colour-based features, such as mean, standard deviation, skewness, colour moments, and/or histograms for learning traits of fruit ripeness. Recently, few studies have explored the use of deep learning techniques to extract features from images of fruits with visible ripeness cues. However, the blackberry (Rubu… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 25 pages, 10 figures, 6 tables; submited to EAAI

    Report number: Volume 132,

    Journal ref: Engineering Applications of Artificial Intelligence, June 2024, 107945

  20. arXiv:2401.03768  [pdf

    cs.LG cs.AI cs.CY cs.HC

    Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support System

    Authors: Chollette Olisah, Lyndon Smith, Melvyn Smith, Lawrence Morolake, Osi Ojukwu

    Abstract: Crop yield prediction has been modeled on the assumption that there is no interaction between weather and soil variables. However, this paper argues that an interaction exists, and it can be finely modelled using the Kendall Correlation coefficient. Given the nonlinearity of the interaction between weather and soil variables, a deep neural network regressor (DNNR) is carefully designed with consid… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 30 Pages, 11 Figures, 3 Tables

  21. arXiv:2401.01916  [pdf, other

    astro-ph.IM astro-ph.CO astro-ph.GA astro-ph.SR cs.CL cs.LG

    AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets

    Authors: Ernest Perkowski, Rui Pan, Tuan Dung Nguyen, Yuan-Sen Ting, Sandor Kruk, Tong Zhang, Charlie O'Neill, Maja Jablonska, Zechang Sun, Michael J. Smith, Huiling Liu, Kevin Schawinski, Kartheik Iyer, Ioana Ciucă for UniverseTBD

    Abstract: We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: 4 pages, 1 figure, model is available at https://huggingface.co/universeTBD, published in RNAAS

  22. arXiv:2311.18140  [pdf, other

    cs.CL

    ROBBIE: Robust Bias Evaluation of Large Generative Language Models

    Authors: David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Michael Smith

    Abstract: As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensur… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  23. arXiv:2310.20354  [pdf, other

    cs.SI

    Statistical Complexity of Heterogeneous Geometric Networks

    Authors: Keith Malcolm Smith, Jason P. Smith

    Abstract: Heterogeneity and geometry are key explanatory components underlying the structure of real-world networks. The relationship between these components and the statistical complexity of networks is not well understood. We introduce a parsimonious normalised measure of statistical complexity for networks -- normalised hierarchical complexity. The measure is trivially 0 in regular graphs and we prove t… ▽ More

    Submitted 29 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 12 pages, 6 figures

  24. arXiv:2309.16058  [pdf, other

    cs.LG cs.CL cs.CV

    AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

    Authors: Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

    Abstract: We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  25. arXiv:2309.09110  [pdf

    cs.CR cs.SI

    Improving Section 230, Preserving Democracy and Protecting Free Speech

    Authors: Marshall Van Alstyne, Michael David Smith, Herb Lin

    Abstract: This article proposes a framework for content moderation based on a decentralized market where no one party, neither governments nor firms, controls the flow of information.

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 3 pages. This is published by CACM but the authors hold (c) and are happy to license under the listed Creative Commons CC:BY-SA license

    ACM Class: K.4

    Journal ref: Communications of the ACM, 66(4), 26-28 (2023)

  26. arXiv:2309.07207  [pdf, other

    cs.LG physics.geo-ph

    EarthPT: a time series foundation model for Earth Observation

    Authors: Michael J. Smith, Luke Fleming, James E. Geach

    Abstract: We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm r… ▽ More

    Submitted 11 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: 7 pages, 4 figures, accepted to NeurIPS CCAI workshop at https://www.climatechange.ai/papers/neurips2023/2 . Code available at https://github.com/aspiaspace/EarthPT

  27. arXiv:2308.16871  [pdf, other

    cs.CL cs.AI

    The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

    Authors: Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi, Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer, Pierre Andrews, Marta R. Costa-jussà

    Abstract: Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting wil… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 15 pages

  28. arXiv:2308.14835  [pdf, other

    cs.CR

    AI ATAC 1: An Evaluation of Prominent Commercial Malware Detectors

    Authors: Robert A. Bridges, Brian Weber, Justin M. Beaver, Jared M. Smith, Miki E. Verma, Savannah Norem, Kevin Spakes, Cory Watson, Jeff A. Nichols, Brian Jewell, Michael. D. Iannacone, Chelsey Dunivan Stahl, Kelly M. T. Huffer, T. Sean Oesch

    Abstract: This work presents an evaluation of six prominent commercial endpoint malware detectors, a network malware detector, and a file-conviction algorithm from a cyber technology vendor. The evaluation was administered as the first of the Artificial Intelligence Applications to Autonomous Cybersecurity (AI ATAC) prize challenges, funded by / completed in service of the US Navy. The experiment employed 1… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  29. arXiv:2308.05564  [pdf, other

    econ.EM cs.LG q-fin.ST stat.CO

    Large Skew-t Copula Models and Asymmetric Dependence in Intraday Equity Returns

    Authors: Lin Deng, Michael Stanley Smith, Worapree Maneesoonthorn

    Abstract: Skew-t copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we… ▽ More

    Submitted 2 July, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

  30. arXiv:2308.01320  [pdf, other

    cs.LG cs.AI cs.CL

    DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

    Authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He

    Abstract: ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 14 pages, 7 figures

  31. arXiv:2307.10031  [pdf, other

    cs.SE

    Start Your EM(otion En)gine: Towards Computational Models of Emotion for Improving the Believability of Video Game Non-Player Characters

    Authors: Geneva M. Smith

    Abstract: Believable Non-Player Characters (NPCs) help motivate player engagement with narrative-driven games. An important aspect of believable characters is their contextually-relevant reactions to changing situations, which emotion often drives in humans. Therefore, giving NPCs "emotion" should enhance their believability. For adoption in industry, it is important to create tool development processes to… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 358 pages, 36 figures; See record on McMaster's Institutional Repository at http://hdl.handle.net/11375/28699

    ACM Class: D.2.1; D.2.4; J.4; J.5

  32. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  33. arXiv:2306.05810  [pdf, other

    cs.LG

    Explaining Reinforcement Learning with Shapley Values

    Authors: Daniel Beechey, Thomas M. S. Smith, Özgür Şimşek

    Abstract: For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Rei… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: 12 pages, 9 figures. Accepted at ICML 2023

  34. arXiv:2306.04707  [pdf, other

    cs.CL cs.AI

    Improving Open Language Models by Learning from Organic Interactions

    Authors: Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

    Abstract: We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  35. arXiv:2305.16600  [pdf, other

    cs.CY

    Temporal Evolution of Risk Behavior in a Disease Spread Simulation

    Authors: Ollin D. Langle-Chimal, Scott C. Merrill, Eric M. Clark, Gabriela Bucini, Tung-Lin Liu, Trisha R. Shrum, Christopher Koliba, Asim Zia, Julia M. Smith, Nicholas Cheney

    Abstract: Human behavior is a dynamic process that evolves with experience. Understanding the evolution of individual's risk propensity is critical to design public health interventions to propitiate the adoption of better biosecurity protocols and thus, prevent the transmission of an infectious disease. Using an experimental game that simulates the spread of a disease in a network of porcine farms, we meas… ▽ More

    Submitted 1 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 12 pages, 1 table, 7 figures

    MSC Class: ACM-class: F.2.2; I.2.7

  36. arXiv:2305.14223  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Co-Learning Empirical Games and World Models

    Authors: Max Olan Smith, Michael P. Wellman

    Abstract: Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  37. arXiv:2305.04846  [pdf, other

    cs.NI

    Multi-AP Coordinated Spatial Reuse for Wi-Fi 8: Group Creation and Scheduling

    Authors: David Nunez, Malcom Smith, Boris Bellalta

    Abstract: Multi-Access Point Coordination (MAPC) will be a key feature in next generation Wi-Fi 8 networks. MAPC aims to improve the overall network performance by allowing Access Points (APs) to share time, frequency and/or spatial resources in a coordinated way, thus alleviating inter-AP contention and enabling new multi-AP channel access strategies. This paper introduces a framework to support periodic M… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  38. arXiv:2304.07908  [pdf, other

    cs.NI cs.GR

    Traffic Characteristics of Extended Reality

    Authors: Abdullah Alnajim, Seyedmohammad Salehi, Chien-Chung Shen, Malcolm Smith

    Abstract: This tutorial paper analyzes the traffic characteristics of immersive experiences with extended reality (XR) technologies, including Augmented reality (AR), virtual reality (VR), and mixed reality (MR). The current trend in XR applications is to offload the computation and rendering to an external server and use wireless communications between the XR head-mounted display (HMD) and the access point… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

    Comments: 23 pages, 17 figures, tutorial paper

  39. arXiv:2304.02098  [pdf, other

    cs.CV

    Uncertainty estimation in Deep Learning for Panoptic segmentation

    Authors: Michael Smith, Frank Ferrie

    Abstract: As deep learning-based computer vision algorithms continue to improve and advance the state of the art, their robustness to real-world data continues to lag their performance on datasets. This makes it difficult to bring an algorithm from the lab to the real world. Ensemble-based uncertainty estimation approaches such as Monte Carlo Dropout have been successfully used in many applications in an at… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 15 pages, 6 figures

    ACM Class: I.4.6; I.2.10

  40. arXiv:2303.16281  [pdf

    cs.CY cs.AI cs.CL cs.LG cs.SI

    A "Perspectival" Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, YouTube, and Wikipedia

    Authors: Queenie Luo, Michael J. Puett, Michael D. Smith

    Abstract: Contrary to Google Search's mission of delivering information from "many angles so you can form your own understanding of the world," we find that Google and its most prominent returned results - Wikipedia and YouTube - simply reflect a narrow set of culturally dominant views tied to the search language for complex topics like "Buddhism," "Liberalism," "colonization," "Iran" and "America." Simply… ▽ More

    Submitted 7 March, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

  41. arXiv:2303.08808  [pdf, other

    cs.CV

    Mesh Strikes Back: Fast and Efficient Human Reconstruction from RGB videos

    Authors: Rohit Jena, Pratik Chaudhari, James Gee, Ganesh Iyer, Siddharth Choudhary, Brandon M. Smith

    Abstract: Human reconstruction and synthesis from monocular RGB videos is a challenging problem due to clothing, occlusion, texture discontinuities and sharpness, and framespecific pose changes. Many methods employ deferred rendering, NeRFs and implicit methods to represent clothed humans, on the premise that mesh-based representations cannot capture complex clothing and textures from RGB, silhouettes, and… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  42. arXiv:2303.03196  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

    Authors: Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

    Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 25 pages, 8 figures, Accepted at TMLR October 2023

  43. arXiv:2302.13536  [pdf, other

    stat.ML cs.LG

    Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models

    Authors: Weiben Zhang, Michael Stanley Smith, Worapree Maneesoonthorn, Ruben Loaiza-Maya

    Abstract: Stochastic models with global parameters and latent variables are common, and for which variational inference (VI) is popular. However, existing methods are often either slow or inaccurate in high dimensions. We suggest a fast and accurate VI method for this case that employs a well-defined natural gradient variational optimization that targets the joint posterior of the global parameters and late… ▽ More

    Submitted 24 July, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

  44. arXiv:2302.12537  [pdf, other

    cs.LG cs.AI

    Why Target Networks Stabilise Temporal Difference Methods

    Authors: Mattie Fellows, Matthew J. A. Smith, Shimon Whiteson

    Abstract: Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process. Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the que… ▽ More

    Submitted 11 August, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: Found a small error in Appendix (Proposition 1, Appendix B3, penultimate line) that affects results presented in the original submission. These have been fixed and this version is the one accepted at ICML 2023

    Journal ref: ICML 2023

  45. arXiv:2302.08091  [pdf, other

    cs.CL

    Do We Still Need Clinical Language Models?

    Authors: Eric Lehman, Evan Hernandez, Diwakar Mahajan, Jonas Wulff, Micah J. Smith, Zachary Ziegler, Daniel Nadler, Peter Szolovits, Alistair Johnson, Emily Alsentzer

    Abstract: Although recent advances in scaling large language models (LLMs) have resulted in improvements on many NLP tasks, it remains unclear whether these models trained primarily with general web text are the right tool in highly specialized, safety critical domains such as clinical text. Recent results have suggested that LLMs encode a surprising amount of medical knowledge. This raises an important que… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  46. arXiv:2302.02788  [pdf, other

    cs.LG

    A Strong Baseline for Batch Imitation Learning

    Authors: Matthew Smith, Lucas Maystre, Zhenwen Dai, Kamil Ciosek

    Abstract: Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: 28 pages (10 main, 18 appendix), 4 figures

  47. arXiv:2211.03796  [pdf, other

    astro-ph.IM cs.LG

    Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy

    Authors: Michael J. Smith, James E. Geach

    Abstract: In this review, we explore the historical development and future prospects of artificial intelligence (AI) and deep learning in astronomy. We trace the evolution of connectionism in astronomy through its three waves, from the early use of multilayer perceptrons, to the rise of convolutional and recurrent neural networks, and finally to the current era of unsupervised and generative deep learning m… ▽ More

    Submitted 12 May, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: 75 pages, 327 references, 32 figures. Review accepted in Royal Society Open Science

  48. arXiv:2209.15543  [pdf, other

    physics.geo-ph cs.LG

    Bayesian Neural Networks for Geothermal Resource Assessment: Prediction with Uncertainty

    Authors: Stephen Brown, William L. Rodi, Marco Seracini, Chen Gu, Michael Fehler, James Faulds, Connor M. Smith, Sven Treitel

    Abstract: We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) an… ▽ More

    Submitted 25 October, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: 27 pages, 12 figures

  49. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  50. arXiv:2206.08972  [pdf, other

    stat.ML cs.LG

    Shallow and Deep Nonparametric Convolutions for Gaussian Processes

    Authors: Thomas M. McDonald, Magnus Ross, Michael T. Smith, Mauricio A. Álvarez

    Abstract: A key challenge in the practical application of Gaussian processes (GPs) is selecting a proper covariance function. The moving average, or process convolutions, construction of GPs allows some additional flexibility, but still requires choosing a proper smoothing kernel, which is non-trivial. Previous approaches have built covariance functions by using GP priors over the smoothing kernel, and by e… ▽ More

    Submitted 18 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: 19 pages, 7 figures. NP-DGP results and discussion updated