Skip to main content

Showing 1–50 of 176 results for author: Yang, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12796  [pdf, other

    cs.AI cs.CV

    Real-Time Posture Monitoring and Risk Assessment for Manual Lifting Tasks Using MediaPipe and LSTM

    Authors: Ereena Bagga, Ang Yang

    Abstract: This research focuses on developing a real-time posture monitoring and risk assessment system for manual lifting tasks using advanced AI and computer vision technologies. Musculoskeletal disorders (MSDs) are a significant concern for workers involved in manual lifting, and traditional methods for posture correction are often inadequate due to delayed feedback and lack of personalized assessment. O… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine at ACM MM'24

  2. arXiv:2408.08239  [pdf, other

    cs.IT

    Strong Data Processing Inequalities and their Applications to Reliable Computation

    Authors: Andrew K. Yang

    Abstract: In 1952, von Neumann gave a series of groundbreaking lectures that proved it was possible for circuits consisting of 3-input majority gates that have a sufficiently small independent probability $δ> 0$ of malfunctioning to reliably compute Boolean functions. In 1999, Evans and Schulman used a strong data-processing inequality (SDPI) to establish the tightest known necessary condition… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2408.01112  [pdf, other

    cs.MA

    Agentic LLM Workflows for Generating Patient-Friendly Medical Reports

    Authors: Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih

    Abstract: The application of Large Language Models (LLMs) in healthcare is expanding rapidly, with one potential use case being the translation of formal medical reports into patient-legible equivalents. Currently, LLM outputs often need to be edited and evaluated by a human to ensure both factual accuracy and comprehensibility, and this is true for the above use case. We aim to minimize this step by propos… ▽ More

    Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures

  4. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  5. arXiv:2407.19178  [pdf, other

    cs.CV eess.SP

    Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

    Authors: Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

    Abstract: The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  6. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  7. arXiv:2407.08255  [pdf, other

    cs.CV cs.LG

    GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

    Authors: Aitao Yang, Min Li, Yao Ding, Leyuan Fang, Yaoming Cai, Yujie He

    Abstract: Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integ… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  8. POPCat: Propagation of particles for complex annotation tasks

    Authors: Adam Srebrnjak Yang, Dheeraj Khanna, John S. Zelek

    Abstract: Novel dataset creation for all multi-object tracking, crowd-counting, and industrial-based videos is arduous and time-consuming when faced with a unique class that densely populates a video sequence. We propose a time efficient method called POPCat that exploits the multi-target and temporal features of video data to produce a semi-supervised pipeline for segmentation or box-based video annotation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures, Accepted in "Conference on Robots and Vision 2024"

  9. arXiv:2406.05892  [pdf, other

    cs.CR cs.LG cs.SE

    Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models

    Authors: Aidan Z. H. Yang, Haoye Tian, He Ye, Ruben Martins, Claire Le Goues

    Abstract: Software security vulnerabilities allow attackers to perform malicious activities to disrupt software operations. Recent Transformer-based language models have significantly advanced vulnerability detection, surpassing the capabilities of static analysis based deep learning models. However, language models trained solely on code tokens do not capture either the explanation of vulnerability type or… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  10. arXiv:2406.04876  [pdf, other

    cs.CL

    HateDebias: On the Diversity and Variability of Hate Speech Debiasing

    Authors: Nankai Lin, Hongyan Wu, Zhengming Chen, Zijian Li, Lianxi Wang, Shengyi Jiang, Dong Zhou, Aimin Yang

    Abstract: Hate speech on social media is ubiquitous but urgently controlled. Without detecting and mitigating the biases brought by hate speech, different types of ethical problems. While a number of datasets have been proposed to address the problem of hate speech detection, these datasets seldom consider the diversity and variability of bias, making it far from real-world scenarios. To fill this gap, we p… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  11. arXiv:2406.02128  [pdf, other

    cs.LG cs.AI cs.CL

    Iteration Head: A Mechanistic Study of Chain-of-Thought

    Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe

    Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particul… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2405.15682  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Road Less Scheduled

    Authors: Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

    Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More

    Submitted 7 August, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  13. arXiv:2405.14394  [pdf, other

    cs.CL cs.AI

    Instruction Tuning With Loss Over Instructions

    Authors: Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani

    Abstract: Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/ZhengxiangShi/InstructionModelling

  14. arXiv:2405.13907  [pdf, other

    cs.CL cs.AI

    Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries

    Authors: Adam Yang, Chen Chen, Konstantinos Pitas

    Abstract: State-of-the-art large language models are sometimes distributed as open-source software but are also increasingly provided as a closed-source service. These closed-source large-language models typically see the widest usage by the public, however, they often do not provide an estimate of their uncertainty when responding to queries. As even the best models are prone to ``hallucinating" false info… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  15. arXiv:2405.11703  [pdf, other

    cs.LG

    QComp: A QSAR-Based Data Completion Framework for Drug Discovery

    Authors: Bingjia Yang, Yunsie Chung, Archer Y. Yang, Bo Yuan, Xiang Yu

    Abstract: In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  16. arXiv:2405.04029  [pdf, other

    cs.CR

    Enabling Privacy-Preserving and Publicly Auditable Federated Learning

    Authors: Huang Zeng, Anjia Yang, Jian Weng, Min-Rong Chen, Fengjun Xiao, Yi Liu, Ye Yao

    Abstract: Federated learning (FL) has attracted widespread attention because it supports the joint training of models by multiple participants without moving private dataset. However, there are still many security issues in FL that deserve discussion. In this paper, we consider three major issues: 1) how to ensure that the training process can be publicly audited by any third party; 2) how to avoid the infl… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: ICC 2024 - 2024 IEEE International Conference on Communications Conference Program

    ACM Class: C.2.2; C.2.4; E.3

  17. arXiv:2405.02630  [pdf, other

    quant-ph cs.DC cs.SE

    cuTN-QSVM: cuTensorNet-accelerated Quantum Support Vector Machine with cuQuantum SDK

    Authors: Kuan-Cheng Chen, Tai-Yue Li, Yun-Yuan Wang, Simon See, Chun-Chieh Wang, Robert Wille, Nan-Yow Chen, An-Cheng Yang, Chun-Yu Lin

    Abstract: This paper investigates the application of Quantum Support Vector Machines (QSVMs) with an emphasis on the computational advancements enabled by NVIDIA's cuQuantum SDK, especially leveraging the cuTensorNet library. We present a simulation workflow that substantially diminishes computational overhead, as evidenced by our experiments, from exponential to quadratic cost. While state vector simulatio… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 14 figures

  18. arXiv:2404.18852  [pdf, other

    cs.PL cs.SE

    VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners

    Authors: Aidan Z. H. Yang, Yoshiki Takashima, Brandon Paulsen, Josiah Dodds, Daniel Kroening

    Abstract: Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based appr… ▽ More

    Submitted 25 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  19. arXiv:2404.15236  [pdf, other

    cs.SE

    Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models

    Authors: Aidan Z. H. Yang, Sophia Kolak, Vincent J. Hellendoorn, Ruben Martins, Claire Le Goues

    Abstract: Language models have improved by orders of magnitude with the recent emergence of Transformer-based Large Language Models (LLMs). LLMs have demonstrated their ability to generate natural code that is highly similar to code written by professional developers. One intermediate value an LLM can emit is entropy, which measures the naturalness of a token of code. We hypothesize that entropy can be used… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  20. arXiv:2404.13229  [pdf

    cs.HC

    Preserving History through Augmented Reality

    Authors: Annie Yang

    Abstract: Extended reality can weave together the fabric of the past, present, and future. A two-day design hackathon was held to bring the community together through a love for history and a common goal to use technology for good. Through interviewing an influential community elder, Emile Pitre, and referencing his book Revolution to Evolution, my team developed an augmented reality artifact to tell his st… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Presented at CHI 2024 arXiv:2404.05889

    Report number: ARSJ/2024/11

  21. arXiv:2404.08687  [pdf, other

    cs.IR cs.AI

    A Survey of Reasoning for Substitution Relationships: Definitions, Methods, and Directions

    Authors: Anxin Yang, Zhijuan Du, Tao Sun

    Abstract: Substitute relationships are fundamental to people's daily lives across various domains. This study aims to comprehend and predict substitute relationships among products in diverse fields, extensively analyzing the application of machine learning algorithms, natural language processing, and other technologies. By comparing model methodologies across different domains, such as defining substitutes… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  22. arXiv:2404.07600  [pdf, other

    cs.CV

    Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

    Authors: Hefeng Wang, Jiale Cao, Jin Xie, Aiping Yang, Yanwei Pang

    Abstract: Text-to-image diffusion models have shown powerful ability on conditional image synthesis. With large-scale vision-language pre-training, diffusion models are able to generate high-quality images with rich texture and reasonable structure under different text prompts. However, it is an open problem to adapt the pre-trained diffusion model for visual perception. In this paper, we propose an implici… ▽ More

    Submitted 15 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE TMM

  23. arXiv:2404.04393  [pdf, other

    cs.LO cs.CL cs.FL cs.LG

    Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers

    Authors: Andy Yang, David Chiang

    Abstract: Deriving formal bounds on the expressivity of transformers, as well as studying transformers that are constructed to implement known algorithms, are both effective methods for better understanding the computational power of transformers. Towards both ends, we introduce the temporal counting logic $\textbf{K}_\text{t}$[#] alongside the RASP variant $\textbf{C-RASP}$. We show they are equivalent to… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  24. arXiv:2404.00504  [pdf, other

    cs.CV

    NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation

    Authors: Diwei Sheng, Anbang Yang, John-Ross Rizzo, Chen Feng

    Abstract: Visual Place Recognition (VPR) in indoor environments is beneficial to humans and robots for better localization and navigation. It is challenging due to appearance changes at various frequencies, and difficulties of obtaining ground truth metric trajectories for training and evaluation. This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled f… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 7 pages, 7 figures, published in 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  25. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  26. Multi-modal Attribute Prompting for Vision-Language Models

    Authors: Xin Liu, Jiamin Wu, and Wenfei Yang, Xu Zhou, Tianzhu Zhang

    Abstract: Pre-trained Vision-Language Models (VLMs), like CLIP, exhibit strong generalization ability to downstream tasks but struggle in few-shot scenarios. Existing prompting techniques primarily focus on global text and image representations, yet overlooking multi-modal attribute characteristics. This limitation hinders the model's ability to perceive fine-grained visual details and restricts its general… ▽ More

    Submitted 11 July, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted for Publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

  27. arXiv:2402.17801  [pdf, other

    econ.TH cs.AI

    Generative AI and Copyright: A Dynamic Perspective

    Authors: S. Alex Yang, Angela Huyue Zhang

    Abstract: The rapid advancement of generative AI is poised to disrupt the creative industry. Amidst the immense excitement for this new technology, its future development and applications in the creative industry hinge crucially upon two copyright issues: 1) the compensation to creators whose content has been used to train generative AI models (the fair use standard); and 2) the eligibility of AI-generated… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  28. Mapping the Landscape of Independent Food Delivery Platforms in the United States

    Authors: Yuhan Liu, Amna Liaqat, Owen Xingjian Zhang, Mariana Consuelo Fernández Espinosa, Ankhitha Manjunatha, Alexander Yang, Orestis Papakyriakopoulos, Andrés Monroy-Hernández

    Abstract: Beyond the well-known giants like Uber Eats and DoorDash, there are hundreds of independent food delivery platforms in the United States. However, little is known about the sociotechnical landscape of these ``indie'' platforms. In this paper, we analyzed these platforms to understand why they were created, how they operate, and what technologies they use. We collected data on 495 indie platforms a… ▽ More

    Submitted 25 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: To appear in CSCW 2024

  29. arXiv:2402.13210  [pdf, other

    cs.LG

    Bayesian Reward Models for LLM Alignment

    Authors: Adam X. Yang, Maxime Robeyns, Thomas Coste, Zhengyan Shi, Jun Wang, Haitham Bou-Ammar, Laurence Aitchison

    Abstract: To ensure that large language model (LLM) responses are helpful and non-toxic, a reward model trained on human preference data is usually used. LLM responses with high rewards are then selected through best-of-$n$ (BoN) sampling or the LLM is further optimized to produce responses with high rewards through reinforcement learning from human feedback (RLHF). However, these processes are susceptible… ▽ More

    Submitted 2 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  30. arXiv:2402.12782  [pdf, other

    cs.SE cs.AI

    Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

    Authors: Angus Yang, Zehan Li, Jie Li

    Abstract: This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Additionally, adding a CoT-like preliminary confirmation step would further increas… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 18 pages, 4 figures, 3 tables

    ACM Class: D.2.3

  31. arXiv:2402.01031  [pdf

    eess.IV cs.CV

    MRAnnotator: A Multi-Anatomy Deep Learning Model for MRI Segmentation

    Authors: Alexander Zhou, Zelong Liu, Andrew Tieu, Nikhil Patel, Sean Sun, Anthony Yang, Peter Choi, Valentin Fauveau, George Soultanidis, Mingqian Huang, Amish Doshi, Zahi A. Fayad, Timothy Deyer, Xueyan Mei

    Abstract: Purpose To develop a deep learning model for multi-anatomy and many-class segmentation of diverse anatomic structures on MRI imaging. Materials and Methods In this retrospective study, two datasets were curated and annotated for model development and evaluation. An internal dataset of 1022 MRI sequences from various clinical sites within a health system and an external dataset of 264 MRI sequenc… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  32. arXiv:2401.03571  [pdf, other

    q-bio.BM cs.LG

    α-HMM: A Graphical Model for RNA Folding

    Authors: Sixiang Zhang, Aaron J. Yang, Liming Cai

    Abstract: RNA secondary structure is modeled with the novel arbitrary-order hidden Markov model (α-HMM). The α-HMM extends over the traditional HMM with capability to model stochastic events that may be in influenced by historically distant ones, making it suitable to account for long-range canonical base pairings between nucleotides, which constitute the RNA secondary structure. Unlike previous heavy-weigh… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 figures, 1 table

  33. arXiv:2312.15136  [pdf, other

    physics.comp-ph cs.AI cs.CV

    Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning

    Authors: Gabe Guo, Judah Goldfeder, Ling Lan, Aniv Ray, Albert Hanming Yang, Boyuan Chen, Simon JL Billinge, Hod Lipson

    Abstract: The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to o… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  34. arXiv:2312.13454  [pdf, other

    cs.LG stat.ME

    MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

    Authors: Yixuan Li, Archer Y. Yang, Ariane Marelli, Yue Li

    Abstract: Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing surv… ▽ More

    Submitted 16 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 45 pages total, 19 pages main text, 6 main figures, 10 supplementary figures

  35. Celestial Machine Learning: Discovering the Planarity, Heliocentricity, and Orbital Equation of Mars with AI Feynman

    Authors: Zi-Yu Khoo, Gokul Rajiv, Abel Yang, Jonathan Sze Choong Low, Stéphane Bressan

    Abstract: Can a machine or algorithm discover or learn the elliptical orbit of Mars from astronomical sightings alone? Johannes Kepler required two paradigm shifts to discover his First Law regarding the elliptical orbit of Mars. Firstly, a shift from the geocentric to the heliocentric frame of reference. Secondly, the reduction of the orbit of Mars from a three- to a two-dimensional space. We extend AI Fey… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  36. Celestial Machine Learning: From Data to Mars and Beyond with AI Feynman

    Authors: Zi-Yu Khoo, Abel Yang, Jonathan Sze Choong Low, Stéphane Bressan

    Abstract: Can a machine or algorithm discover or learn Kepler's first law from astronomical sightings alone? We emulate Johannes Kepler's discovery of the equation of the orbit of Mars with the Rudolphine tables using AI Feynman, a physics-inspired tool for symbolic regression.

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: v1: long version v2: accepted as a short paper

  37. arXiv:2312.05953  [pdf

    eess.IV cs.CV cs.LG

    RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging

    Authors: Zelong Liu, Alexander Zhou, Arnold Yang, Alara Yilmaz, Maxwell Yoo, Mikey Sullivan, Catherine Zhang, James Grant, Daiqing Li, Zahi A. Fayad, Sean Huver, Timothy Deyer, Xueyan Mei

    Abstract: Deep learning in medical imaging often requires large-scale, high-quality data or initiation with suitably pre-trained weights. However, medical datasets are limited by data availability, domain-specific knowledge, and privacy concerns, and the creation of large and diverse radiologic databases like RadImageNet is highly resource-intensive. To address these limitations, we introduce RadImageGAN, t… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  38. arXiv:2312.05491  [pdf, other

    cs.CL cs.AI

    Using Captum to Explain Generative Language Models

    Authors: Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan

    Abstract: Captum is a comprehensive library for model explainability in PyTorch, offering a range of methods from the interpretability literature to enhance users' understanding of PyTorch models. In this paper, we introduce new features in Captum that are specifically designed to analyze the behavior of generative language models. We provide an overview of the available functionalities and example applicat… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    ACM Class: I.2.7

  39. arXiv:2311.02007  [pdf, other

    cs.CV cs.RO

    Towards Unsupervised Object Detection From LiDAR Point Clouds

    Authors: Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, Raquel Urtasun

    Abstract: In this paper, we study the problem of unsupervised object detection from 3D point clouds in self-driving scenes. We present a simple yet effective method that exploits (i) point clustering in near-range areas where the point clouds are dense, (ii) temporal consistency to filter out noisy unsupervised detections, (iii) translation equivariance of CNNs to extend the auto-labels to long range, and (… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: CVPR 2023

  40. arXiv:2311.01447  [pdf, other

    cs.CV cs.LG cs.RO

    CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation

    Authors: Jingkang Wang, Sivabalan Manivasagam, Yun Chen, Ze Yang, Ioan Andrei Bârsan, Anqi Joyce Yang, Wei-Chiu Ma, Raquel Urtasun

    Abstract: Realistic simulation is key to enabling safe and scalable development of % self-driving vehicles. A core component is simulating the sensors so that the entire autonomy system can be tested in simulation. Sensor simulation involves modeling traffic participants, such as vehicles, with high quality appearance and articulated geometry, and rendering them in real time. The self-driving industry has t… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: CoRL 2022. Project page: https://waabi.ai/cadsim/

  41. arXiv:2311.01444  [pdf, other

    cs.CV cs.RO

    LabelFormer: Object Trajectory Refinement for Offboard Perception from LiDAR Point Clouds

    Authors: Anqi Joyce Yang, Sergio Casas, Nikita Dvornik, Sean Segal, Yuwen Xiong, Jordan Sir Kwang Hu, Carter Fang, Raquel Urtasun

    Abstract: A major bottleneck to scaling-up training of self-driving perception systems are the human annotations required for supervision. A promising alternative is to leverage "auto-labelling" offboard perception models that are trained to automatically generate annotations from raw LiDAR point clouds at a fraction of the cost. Auto-labels are most commonly generated via a two-stage approach -- first obje… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 20 pages, 8 figures, 7 tables

    Journal ref: CoRL 2023

  42. arXiv:2310.13897  [pdf, other

    cs.FL cs.LG cs.LO

    Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages

    Authors: Andy Yang, David Chiang, Dana Angluin

    Abstract: The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that they are equivalent to line… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  43. arXiv:2310.06906  [pdf, other

    cs.CV

    Distillation Improves Visual Place Recognition for Low-Quality Queries

    Authors: Anbang Yang, Yao Wang, John-Ross Rizzo, Chen Feng

    Abstract: The shift to online computing for real-time visual localization often requires streaming query images/videos to a server for visual place recognition (VPR), where fast video transmission may result in reduced resolution or increased quantization. This compromises the quality of global image descriptors, leading to decreased VPR performance. To improve the low recall rate for low-quality query imag… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  44. arXiv:2310.05666  [pdf, other

    cs.CV

    Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection

    Authors: Yilong Lv, Min Li, Yujie He, Shaopeng Li, Zhuzhen He, Aitao Yang

    Abstract: Anchor-based detectors have been continuously developed for object detection. However, the individual anchor box makes it difficult to predict the boundary's offset accurately. Instead of taking each bounding box as a closed individual, we consider using multiple boxes together to get prediction boxes. To this end, this paper proposes the \textbf{Box Decouple-Couple(BDC) strategy} in the inference… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Submitted 29 September, 2023; originally announced October 2023. Accepted by ICCV2023

  45. arXiv:2310.03866  [pdf, other

    cs.DB cs.SE

    On Repairing Natural Language to SQL Queries

    Authors: Aidan Z. H. Yang, Ricardo Brancas, Pedro Esteves, Sofia Aparicio, Joao Pedro Nadkarni, Miguel Terra-Neves, Vasco Manquinho, Ruben Martins

    Abstract: Data analysts use SQL queries to access and manipulate data on their databases. However, these queries are often challenging to write, and small mistakes can lead to unexpected data output. Recent work has explored several ways to automatically synthesize queries based on a user-provided specification. One promising technique called text-to-SQL consists of the user providing a natural language des… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  46. arXiv:2310.01726  [pdf, other

    cs.SE cs.LG

    Large Language Models for Test-Free Fault Localization

    Authors: Aidan Z. H. Yang, Ruben Martins, Claire Le Goues, Vincent J. Hellendoorn

    Abstract: Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  47. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  48. arXiv:2309.13952  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VidChapters-7M: Video Chapters at Scale

    Authors: Antoine Yang, Arsha Nagrani, Ivan Laptev, Josef Sivic, Cordelia Schmid

    Abstract: Segmenting long videos into chapters enables users to quickly navigate to the information of their interest. This important topic has been understudied due to the lack of publicly released datasets. To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at NeurIPS 2023 Track on Datasets and Benchmarks; Project Webpage: https://antoyang.github.io/vidchapters.html ; 31 pages; 8 figures

  49. arXiv:2309.13570  [pdf, other

    cs.CV

    Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

    Authors: Zixun Huang, Keling Yao, Seth Z. Zhao, Chuanyu Pan, Chenfeng Xu, Kathy Zhuang, Tianjian Xu, Weiyu Feng, Allen Y. Yang

    Abstract: Robust 6DoF pose estimation with mobile devices is the foundation for applications in robotics, augmented reality, and digital twin localization. In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise. We highlight that existing 6DoF pose estimation methods suffer significant performance discrepancie… ▽ More

    Submitted 17 June, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

  50. arXiv:2309.10305  [pdf, other

    cs.CL

    Baichuan 2: Open Large-scale Language Models

    Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2